Fix response cross-contamination via request-response correlation UIDs#105
Fix response cross-contamination via request-response correlation UIDs#105dweill merged 2 commits intorevelrylabs:masterfrom
Conversation
The worker protocol previously had no way to correlate a response with its originating request. When a call timed out, the late response from Node.js would sit in the GenServer mailbox and get picked up by the next call's `receive`, returning completely wrong data. This was particularly bad under concurrent load (e.g. async ExUnit tests sharing a pool of workers) where timeouts could cascade and cause responses to be delivered to the wrong callers. The fix adds a unique ID (monotonic counter per worker) to each request. Node.js echoes the ID back in the response prefix. The worker's `get_response` now only accepts responses matching the current request's ID, discarding any stale responses from previously timed-out calls. Also fixes a latent bug in server.js where `fileExists()` (async) was not awaited, causing the `if` check to always evaluate as truthy.
| # Send a call that takes 300ms but times out after 50ms. | ||
| # After timeout, the worker is free but the Node.js response is still pending. | ||
| assert {:error, "Call timed out."} = | ||
| NodeJS.call("slow-async-echo", [9999, 300], | ||
| timeout: 50, | ||
| name: NodeJS.RaceTest | ||
| ) | ||
|
|
||
| # Immediately send a follow-up call that takes 500ms. | ||
| # Its `receive` window overlaps with the stale response arriving at ~300ms. | ||
| # Without UID correlation, `receive` would pick up the stale `9999` response. | ||
| assert {:ok, 42} = | ||
| NodeJS.call("slow-async-echo", [42, 500], | ||
| timeout: 5_000, | ||
| name: NodeJS.RaceTest | ||
| ) |
There was a problem hiding this comment.
Great test! Fails 100% of the time on the old version of the code. 👍
|
Just to make the stakes clear: This bug means it's possible for data processed on behalf of User A to time out, then be returned later when User B uses the Node worker (not even calling the same function). 😱 |
|
Hey @s3cur3 and @nemophrost, thank you for your contributions. We'll get some eyes on this PR ASAP |
| # Send a call that takes 300ms but times out after 50ms. | ||
| # After timeout, the worker is free but the Node.js response is still pending. | ||
| assert {:error, "Call timed out."} = | ||
| NodeJS.call("slow-async-echo", [9999, 300], | ||
| timeout: 50, | ||
| name: NodeJS.RaceTest | ||
| ) |
| // Try to resolve the module in the current path | ||
| const modulePathToTry = path.join(nodePath, modulePath) | ||
| if (fileExists(modulePathToTry)) { | ||
| if (await fileExists(modulePathToTry)) { |
|
@dweill or @grossvogel, should we report a CVE? It'd probably be good to get the word out to the folks using the old version. (Looks like roughly 100k downloads in the last year, but hard to say how many actual users that represents.) I'm happy to go through the bureaucratic side of that if you'd like. |
|
@s3cur3 sure go for it! |
|
@s3cur3 actually hold off. |
|
FWIW, I emailed cna@erlef.org after your first message, but I haven't heard back. |
|
Thank you, sorry about the confusion there. |
|
@dweill It would be great if you could add this affected package to the advisory. This way, the vulnerability will be correctly linked to the hex.pm package and scanners have a chance to pick it up.
The EEF CNA would be happy to support you for future vulnerabilities and we do offer a GitHub Advisory based workflow. That way we can ensure that vulnerabilities in our ecosystem have actionable metadata. https://cna.erlef.org/contact |
|
@maennchen done! |

The worker protocol previously had no way to correlate a response with its originating request. When a call timed out, the late response from Node.js would sit in the GenServer mailbox and get picked up by the next call's
receive, returning completely wrong data.This was particularly bad under concurrent load (e.g. async ExUnit tests sharing a pool of workers) where timeouts could cascade and cause responses to be delivered to the wrong callers.
The fix adds a unique ID (monotonic counter per worker) to each request. Node.js echoes the ID back in the response prefix. The worker's
get_responsenow only accepts responses matching the current request's ID, discarding any stale responses from previously timed-out calls.Also fixes a latent bug in server.js where
fileExists()(async) was not awaited, causing theifcheck to always evaluate as truthy.