-
Notifications
You must be signed in to change notification settings - Fork 30k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"Fatal error, unreachable code" seen intermittently on v20.10.0 #51555
Comments
We are seeing similar issues with Node.js 20 on Sentry javascript sdk repository. I couldn't find a repro as well. @joyeecheung Any suggestions? |
Without a repro it's hard to tell what's going on here. From the stack trace it could be a V8 bug. |
By the way if a core dump is available that would make it easier to get to clues about what was happening. |
Hello! I'm currently facing what seems to be the same issue in CI. This occurs on Node 20 (specifically, 20.8.0, 20.10.0, 20.11.0). Our build workers run on Amazon Linux 2023 and we build with 4 parallel processes with Nx. I did retireve core dumps of these crashes, unfortunately I'm unable to provide them, but I can provide the stack traces from gdb. This crash:
Has the following stack trace in gdb:
And this crash:
With the following stack trace in gdb:
Ocassionally the following error will occur as well:
However I have been unable to collect a core dump of this error occuring. Interestingly, these errors typically coincide with Nx printing: Hopefully this can help with this issue. Thanks! |
IIUC this and OP were both running into issues in a similar setup in actions/setup-node#887 (with GitHub actions)? From the OP of that issue, I suspect the caching GitHub does could interact poorly with a user land dependency that makes use of vm.Script. It can also be possible that this dependency is monkey patching the builtin CJS loader to compile data with code cache, and for some reason, the cache it uses is mismatching the version of Node.js in the CI, which is why in the other issue, OP reports that they were seeing wrong Node.js versions. AFAIK, one such popular package that does this is https://www.npmjs.com/package/v8-compile-cache - does this package show up in your dependency? If it does, does setting DISABLE_V8_COMPILE_CACHE=1 make the crash go away? |
Also, I am not sure if it still works on v20, but can you try using LLDB and https://github.com/nodejs/llnode to see if you can print the JS stack trace (the command is "v8 bt" with the plugin) from the core dump? The stack traces indicates that this crash comes from user land JS code, so that would help us pinpoint what JS code (likely a third party package) is causing the crash. |
From some quick search, it seems pnpm is using v8-compile-cache: https://github.com/search?q=repo%3Apnpm%2Fpnpm%20v8-compile-cache&type=code And yarn seems to use that too, though I am not able to find where the code is: yarnpkg/berry#5987 I noticed that v8-compile-cache only invalidates the cache when the content of the file changes and doesn't seem to do anything if the V8 version/Node.js version mismatches with the cache. That could be a source of bug like this. If |
Thank you for investigating this!
I'm running into the same issue, but not with Github actions. I'm running into this issue with an Atlasssian Bamboo elastic build agent. This is occurring on a standard AWS EC2 instance, running Amazon Linux. Additionally, I've tested multiple Node 20 versions. Interestingly, this crash does not seem to occur on Windows. We have a project within the same monorepo that is built on Windows. As a result, pnpm and Nx are used there as well.
I'll give this a try and report back
I will also give this a try and report back! Thank you! |
I set Also, it appears llnode still works with Node 20 :D. It also seems core dump 1 has the follow stack trace:
core dump 2 has the following stack trace:
It looks like the last JS function is I inspected the two parameters for this function:
For both core dumps, the values of these variables are the same. Let me know if anything else is required. Thanks! |
We tried setting |
It works as workaround in my project. The use case is that the project has lots of AWS Lambda functions written in TS; the intermittent crash happened when parallelly using |
**Description** * Add DISABLE_V8_COMPILE_CACHE flag to fix build failure **Motivation** * Seeing error below while building package with nodejs 20: ``` Error: Command failed: yarn list --prod --json Fatal error in , line 0 Check failed: current == end_slot_index. ``` * Per recommendation in nodejs/node#51555, setting this flag **Testing Done** * Dry-run building this package after setting this flag. The error no longer shows up. **Backwards Compatibility Criteria (if any)** * N/A
We've started to see this crash in our CI agents at Canva.
It's worth noting that this is the first time we've seen this crash - we run node across most of our CI and we don't see such crashes elsewhere - it's just been since we switched to hyper-parallelising on one machine. Note:
I'm still investigating to try and repro or refine the cause - but just posting because it does seem related. |
I suspect node::contextify::ContextifyScript might be the root cause of this |
Update - turns out our engineers were just treating it as a flake and retrying without telling us - it's occurring quite frequently across our runs. I don't have an exact number of how often but it's enough that I'll have to figure out an auto-retry workaround for now. I haven't had any luck with a reproduction though 😢 |
The root cause was likely incorrect cache usage by a user land package or corrupted cache provided by user land package. node::contextify::ContextifyScript is only the API that these packages tend to use to load code cache (typically coupled with monkey patching CJS loader), like what v8-compile-cache does, which was why DISABLE_V8_COMPILE_CACHE =1 works for some people being affected by that package (this is not a Node.js configuration, just a configuration of that npm package). But that’s not the only package that does this and if you have another package in your dependency that does this, you’d need to find what it as and what workaround it has. I am not sure what else we can do in this repository given that from the stack trace it seems to be mostly caused by bugs in third-party packages (Node.js internally only use node::contextify::ContextifyScript in a few places that are unlikely to be related to this e.g. in the REPL). By default CJS modules are not compiled in this path, it’s usually used by third-party packages monkey patching the CJS loader. |
We're generally very careful to avoid I'll try |
There is a probability of triggering a v8 crash when running the `yarn list --prod --json` command. Related discussion: nodejs/node#51555 This issue does not always occur. When we build `vscode-reh-linux-arm64` in the Dockerfile, there is more than a 50% chance of failure. If this PR is merged, the code related to DISABLE_V8_COMPILE_CACHE in vscode can also be removed (https://github.com/search?q=repo%3Amicrosoft%2Fvscode%20DISABLE_V8_COMPILE_CACHE&type=code). Globally setting `DISABLE_V8_COMPILE_CACHE=1` will increase the entire build time by 3 to 5 minutes. Signed-off-by: Kevin Cui <[email protected]>
There is a probability of triggering a v8 crash when running the `yarn list --prod --json` command. Related discussion: nodejs/node#51555 This issue does not always occur. When we build `vscode-reh-linux-arm64` in the Dockerfile, there is more than a 50% chance of failure. If this PR is merged, the code related to DISABLE_V8_COMPILE_CACHE in vscode can also be removed (https://github.com/search?q=repo%3Amicrosoft%2Fvscode%20DISABLE_V8_COMPILE_CACHE&type=code). Globally setting `DISABLE_V8_COMPILE_CACHE=1` will increase the entire build time by 3 to 5 minutes. Signed-off-by: Kevin Cui <[email protected]>
There is a probability of triggering a v8 crash when running the `yarn list --prod --json` command. Related discussion: nodejs/node#51555 This issue does not always occur. When we build `vscode-reh-linux-arm64` in the Dockerfile, there is more than a 50% chance of failure. If this PR is merged, the code related to DISABLE_V8_COMPILE_CACHE in vscode can also be removed (https://github.com/search?q=repo%3Amicrosoft%2Fvscode%20DISABLE_V8_COMPILE_CACHE&type=code). Globally setting `DISABLE_V8_COMPILE_CACHE=1` will increase the entire build time by 3 to 5 minutes. Signed-off-by: Kevin Cui <[email protected]>
There is a probability of triggering a v8 crash when running the `yarn list --prod --json` command. Related discussion: nodejs/node#51555 This issue does not always occur. When we build `vscode-reh-linux-arm64` in the Dockerfile, there is more than a 50% chance of failure. If this PR is merged, the code related to DISABLE_V8_COMPILE_CACHE in vscode can also be removed (https://github.com/search?q=repo%3Amicrosoft%2Fvscode%20DISABLE_V8_COMPILE_CACHE&type=code). Globally setting `DISABLE_V8_COMPILE_CACHE=1` will increase the entire build time by 3 to 5 minutes. Signed-off-by: Kevin Cui <[email protected]>
Update: It's been a few weeks and it looks like |
Had the same error on a CI pipeline today
|
I'm not sure if this is the same issue, though the failure message is the same. My stack trace is different and much shorter (and has no
I am able to reproduce this 100% with a pristine immich container, and so is someone else using a similar CPU and Hypervisor OS. I have been able to capture 4 core files of ~240MB each. I am not able to work around it with |
The message indicates an node/deps/v8/src/objects/objects.cc Line 1237 in 5ab3140
|
I also encountered this issue frequently in NAPI-RS unit tests recently, as shown in this example: https://github.com/napi-rs/napi-rs/actions/runs/11315439974/job/31466666494?pr=2304 It only shows on And the output information suggests it might happen in this test case: https://github.com/napi-rs/napi-rs/blob/napi%403.0.0-alpha.13/examples/napi/tests/worker-thread.spec.ts#L51. This test case creates 100 small buffers in |
This commit aims to remove flaky crash on CI build due to node. We follow the solution proposed inside nodejs/node#51555 that disable compile cache on v8
This commit aims to remove flaky crash on CI build due to node. We follow the solution proposed inside nodejs/node#51555 that disable compile cache on v8
Version
v20.10.0
Platform
Linux codespaces-110a35 6.2.0-1018-azure #18~22.04.1-Ubuntu SMP Tue Nov 21 19:25:02 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Subsystem
No response
What steps will reproduce the bug?
Sorry, I don't have an easy repro.
How often does it reproduce? Is there a required condition?
I can't repro it reliably, it happens intermittently. In CI, I'm seeing it a few times per day
What is the expected behavior? Why is that the expected behavior?
It shouldn't crash
What do you see instead?
The text was updated successfully, but these errors were encountered: