-
-
Notifications
You must be signed in to change notification settings - Fork 34.7k
fix: eliminate TOCTOU Race Condition (Hackerone report #3407207) #61664
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
992ba41 to
98018fc
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #61664 +/- ##
==========================================
+ Coverage 88.86% 89.75% +0.88%
==========================================
Files 674 675 +1
Lines 204394 204628 +234
Branches 39188 39327 +139
==========================================
+ Hits 181634 183662 +2028
+ Misses 15014 13239 -1775
+ Partials 7746 7727 -19
🚀 New features to boost your workflow:
|
addaleax
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Please fix the first line of the commit message in line with commit message guidelines, if possible (e.g. worker: eliminate race condition in process.cwd()), and feel free to take as much context from the Problem/Solution sections of the PR description and add it to the commit message body itself.
Calling this a vulnerabillity in Node.js would go a bit far, I think. It's a race condition, but as mentioned in the comments here, if an application relies on the specific relative timing of process.chdir() in the main thread and process.cwd() in the worker thread, it should already have implemented its own synchronization logic for that.
In other words, if this does result in a vulnerability in an application that doesn't explicitly synchronize already, then that application will likely still be vulnerable after this fix.
98018fc to
d1e7867
Compare
Fixes a race condition in worker thread cwd caching where the counter is incremented before the directory change completes, allowing workers to cache stale directory values. In lib/internal/worker.js, the main thread's process.chdir() wrapper previously incremented the shared counter before calling the actual chdir(), creating a race window where workers could read the old directory but cache it with the new counter value. This caused subsequent cwd() calls to return incorrect paths until the next chdir(). This fix reorders the operations to change the directory first, then increment the counter, ensuring workers are only notified after the directory change completes. Before fix: 54.28% error rate (311/573 races) After fix: 0% error rate (0/832 races) Refs: https://hackerone.com/reports/3407207 Co-authored-by: Giulio Comi Co-authored-by: Caleb Everett Co-authored-by: Utku Yildirim
d1e7867 to
7b093bf
Compare
|
@addaleax , thanks fixed commit title and integrated context from the PR into commit message. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR!
Change is good, but the tests do need addressing.
This is a very large test burden for a very small behaviour. For me, this single test is taking around 10s to run, at 130% CPU usage.
As general points:
- There's no advantage to adding test files in TS rather than JS, unless it's Node.js's TS support itself that's being tested. There's no type validation here, it all just gets stripped, and it requires the additional burden of "transpiling" the test script to JS every time it's run.
- Adding mock proofs-of-concept to the test suite isn't useful; they can be illustrative in a PR description, but it's only the actual API behaviour that needs testing.
To test this change, you don't need to pummel workers to try and force race conditions. It's possible to wrap the internal chdir method (the one called as originalChdir() in internal/worker) to directly simulate a worker calling process.cwd() during a slow chdir syscall.
Something along the lines of the following would do the trick:
test/parallel/test-worker-cwd-race-condition.js
// Flags: --expose-internals --no-warnings
'use strict';
const common = require('../common');
const { internalBinding } = require('internal/test/binding');
const assert = require('assert');
const { Worker } = require('worker_threads');
const processBinding = internalBinding('process_methods');
const originalChdir = processBinding.chdir;
const cwdOriginal = process.cwd();
const i32 = new Int32Array(new SharedArrayBuffer(12));
processBinding.chdir = common.mustCall(function chdir(path) {
// Signal to the worker that we're inside the chdir call
Atomics.store(i32, 0, 1);
Atomics.notify(i32, 0);
// Pause the chdir call while the worker calls process.cwd(),
// to simulate a race condition
Atomics.wait(i32, 1, 0);
return originalChdir(path);
});
const worker = new Worker(`
const {
parentPort,
workerData: { i32 },
} = require('worker_threads');
// Wait until the main thread has entered the chdir call
Atomics.wait(i32, 0, 0);
const cwdDuringChdir = process.cwd();
// Signal the main thread to continue the chdir call
Atomics.store(i32, 1, 1);
Atomics.notify(i32, 1);
// Wait until the main thread has left the chdir call
Atomics.wait(i32, 2, 0);
const cwdAfterChdir = process.cwd();
parentPort.postMessage({ cwdDuringChdir, cwdAfterChdir });
`, {
eval: true,
workerData: { i32 },
});
worker.on('exit', common.mustCall());
worker.on('error', common.mustNotCall());
worker.on('message', common.mustCall(({ cwdDuringChdir, cwdAfterChdir }) => {
assert.strictEqual(cwdDuringChdir, cwdOriginal);
assert.strictEqual(cwdAfterChdir, process.cwd());
}));
process.chdir('..');
// Signal to the worker that the chdir call is completed
Atomics.store(i32, 2, 1);
Atomics.notify(i32, 2);There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Renegade334 , thank you for the feedback on the tests!
i have now committed (180906e) the new .js test file according to your great example
Replace the heavy .mts proof-of-concept test with a lightweight deterministic .js test as suggested by @Renegade334. The new test uses SharedArrayBuffer and Atomics to directly simulate a worker calling process.cwd() during a slow chdir syscall, rather than pummeling workers to force race conditions. This is faster, more reliable, and follows Node.js test conventions.
Renegade334
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fails on main as expected, with the post-race cwd() call returning a stale value.
|
i get @Renegade334 / @addaleax would you mind merging on my behalf, please? |
|
The new test is failing when run as a worker (i.e. e.g. https://ci.nodejs.org/job/node-test-commit-custom-suites-freestyle/45886/console The usual way we handle this is, e.g. node/test/parallel/test-process-chdir.js Lines 7 to 11 in 4dc0d20
|
|
@giulioAZ are you able to add the linked skip logic to the test? |
Summary
Fixes a Time-of-Check Time-of-Use race condition in worker thread process.cwd() caching where the counter is incremented before the directory change completes, allowing workers to cache stale directory values.
Description: The main thread's process.chdir() wrapper previously incremented the shared counter before calling the actual chdir(), creating a race window where workers could read the old directory but cache it with the new counter value. This caused subsequent cwd() calls to return incorrect paths until the next chdir().
Proof of Concept: included in the
test/parallel/test-worker-cwd-race-condition.jsand already reviewed and validated by Node.js security team here (https://hackerone.com/reports/3407207).Fix: this fix reorders the operations to change the directory first, then increment the counter, ensuring workers are only notified after the directory change completes.
CVSS 3.0: 3.6 (Low) - AV:L/AC:L/PR:L/UI:N/S:U/C:L/I:L/A:N
Problem
In
lib/internal/worker.js, the main thread'sprocess.chdir()wrapper increments the shared counter before calling the actualchdir():This creates a race window where:
Counter increments (workers see "cwd changed")
Worker calls cwd() during this window
Worker reads old directory but caches it with new counter
Directory actually changes
Worker's cache is now stale and persists until next chdir()
Race Window:
Solution
Swap the order - change directory first, then increment counter:
This ensures workers are only notified after the directory change completes, transforming the race into safe eventual consistency.
After Fix:
Impact
process.chdir()with worker threads, including:Proof of Concept
Tested with test/parallel/test-worker-cwd-race-condition.js on Node.js v25.0.0:
Before fix:
Test case: real process
cwd in worker thread reflected stale value
errors: 54.28% (311/573 races)
After fix:
Test case: fixed process
cwd in worker thread always had expected value
errors: 0% (0/832 races)
Performance Impact
Negligible: same atomic operations, just reordered:
Cache hit rate: Unchanged
Latency: Workers may cache fresh data slightly longer (safer)
Thread safety: Maintained via atomic operations
Related
Original implementation: commit 1d022e8 (April 14, 2019, PR #27224)
CWE-367: Time-of-Check Time-of-Use (TOCTOU) Race Condition
CVSS 3.0: 3.6 (Low) - AV:L/AC:L/PR:L/UI:N/S:U/C:L/I:L/A:N
HackerOne Report: https://hackerone.com/reports/3407207
Credits
Giulio Comi, Caleb Everett, Utku Yildirim