Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Node 20.3 Crashes all the time when executed inside docker #48444

Closed
ronag opened this issue Jun 13, 2023 · 80 comments · Fixed by libuv/libuv#4059
Closed

Node 20.3 Crashes all the time when executed inside docker #48444

ronag opened this issue Jun 13, 2023 · 80 comments · Fixed by libuv/libuv#4059
Labels
libuv Issues and PRs related to the libuv dependency or the uv binding.

Comments

@ronag
Copy link
Member

ronag commented Jun 13, 2023

  • Node 20.3.0 crashes on start in non-bullseye docker container
  • In bullseye container node-gyp-build (yarn add bufferutil) fails with Text file busy
  • In bullseye container + UV_USE_IO_URING=0 everything works
@tniessen tniessen added the libuv Issues and PRs related to the libuv dependency or the uv binding. label Jun 13, 2023
@ronag ronag changed the title Node 20.3 breaks bufferutil build? Node 20.3 breaks builds? Jun 14, 2023
@ronag
Copy link
Member Author

ronag commented Jun 14, 2023

Another possible ref electron/rebuild#1085

@ronag
Copy link
Member Author

ronag commented Jun 14, 2023

@nodejs/libuv

@bnoordhuis
Copy link
Member

"Text file busy" means trying to write a shared object or binary that's already in use.

My hunch is that node-gyp has some race condition in reading/writing files that wasn't manifesting (much) when everything still went through the much slower thread pool, whereas io_uring is fast enough to make it much more visible.

@ronag
Copy link
Member Author

ronag commented Jun 14, 2023

Is there a way to disable ioring using a env variable as a temporary workaround when running node-gyp?

@bnoordhuis
Copy link
Member

Yes, set UV_USE_IO_URING=0 in the environment. Use at your own risk: not a stable thing, will disappear again the future.

@ronag
Copy link
Member Author

ronag commented Jun 15, 2023

test-srv3.hq.lan:~# docker run -it node:20.3.0 bash
root@6f6c66eb5077:/# UV_USE_IO_URING=0 yarn add bufferutil
node[8]: ../src/node_platform.cc:68:std::unique_ptr<long unsigned int> node::WorkerThreadsTaskRunner::DelayedTaskScheduler::Start(): Assertion `(0) == (uv_thread_create(t.get(), start_thread, this))' failed.
 1: 0xc8e4a0 node::Abort() [node]
 2: 0xc8e51e  [node]
 3: 0xd0a059 node::WorkerThreadsTaskRunner::WorkerThreadsTaskRunner(int) [node]
 4: 0xd0a17c node::NodePlatform::NodePlatform(int, v8::TracingController*, v8::PageAllocator*) [node]
 5: 0xc4bbc4 node::V8Platform::Initialize(int) [node]
 6: 0xc49408  [node]
 7: 0xc497db node::Start(int, char**) [node]
 8: 0x7ff94ec5818a  [/lib/x86_64-linux-gnu/libc.so.6]
 9: 0x7ff94ec58245 __libc_start_main [/lib/x86_64-linux-gnu/libc.so.6]
10: 0xba9ade _start [node]
Aborted

@ronag ronag changed the title Node 20.3 breaks builds? Node 20.3 Crashes Jun 15, 2023
@ronag
Copy link
Member Author

ronag commented Jun 15, 2023

This is actually worse than I thought. Node doesn't run at all with 20.3

@ronag
Copy link
Member Author

ronag commented Jun 15, 2023

test-srv3.hq.lan:~# docker run -it node:20.3.0 node
node[1]: ../src/node_platform.cc:68:std::unique_ptr<long unsigned int> node::WorkerThreadsTaskRunner::DelayedTaskScheduler::Start(): Assertion `(0) == (uv_thread_create(t.get(), start_thread, this))' failed.
 1: 0xc8e4a0 node::Abort() [node]
 2: 0xc8e51e  [node]
 3: 0xd0a059 node::WorkerThreadsTaskRunner::WorkerThreadsTaskRunner(int) [node]
 4: 0xd0a17c node::NodePlatform::NodePlatform(int, v8::TracingController*, v8::PageAllocator*) [node]
 5: 0xc4bbc4 node::V8Platform::Initialize(int) [node]
 6: 0xc49408  [node]
 7: 0xc497db node::Start(int, char**) [node]
 8: 0x7f5c393be18a  [/lib/x86_64-linux-gnu/libc.so.6]
 9: 0x7f5c393be245 __libc_start_main [/lib/x86_64-linux-gnu/libc.so.6]
10: 0xba9ade _start [node]
test-srv3.hq.lan:~# docker run -it node:20.2.0 node
Welcome to Node.js v20.2.0.
Type ".help" for more information.
> 

@mcollina
Copy link
Member

mcollina commented Jun 15, 2023

I can reproduce this, and this is quite critical.

@mcollina
Copy link
Member

cc @nodejs/tsc for visibility

@mcollina mcollina added the confirmed-bug Issues with confirmed bugs. label Jun 15, 2023
@mcollina mcollina changed the title Node 20.3 Crashes Node 20.3 Crashes all the time when executed inside docker Jun 15, 2023
@richardlau
Copy link
Member

richardlau commented Jun 15, 2023

FWIW on two systems I have access to (a Red Hat owned RHEL 8 machine and test-digitalocean-ubuntu1804-docker-x64-1 from the Build infra) docker run -it node:20.3.0 node is fine:

root@test-digitalocean-ubuntu1804-docker-x64-1:~# docker run -it node:20.3.0 node
Unable to find image 'node:20.3.0' locally
20.3.0: Pulling from library/node
bba7bb10d5ba: Pull complete
ec2b820b8e87: Pull complete
284f2345db05: Pull complete
fea23129f080: Pull complete
9063cd8e3106: Pull complete
4b4424ee38d8: Pull complete
0b4eb4cbb822: Pull complete
43443b026dcf: Pull complete
Digest: sha256:fc738db1cbb81214be1719436605e9d7d84746e5eaf0629762aeba114aa0c28d
Status: Downloaded newer image for node:20.3.0
Welcome to Node.js v20.3.0.
Type ".help" for more information.
>

I can reproduce the assertion failure on an Ubuntu 16.04 host with node:20.3.0 but not with node:20.3.0-bullseye:

root@infra-digitalocean-ubuntu1604-x64-1:~# docker run -it node:20.3.0 node
Unable to find image 'node:20.3.0' locally
20.3.0: Pulling from library/node
bba7bb10d5ba: Pull complete
ec2b820b8e87: Pull complete
284f2345db05: Pull complete
fea23129f080: Pull complete
9063cd8e3106: Pull complete
4b4424ee38d8: Pull complete
0b4eb4cbb822: Pull complete
43443b026dcf: Pull complete
Digest: sha256:fc738db1cbb81214be1719436605e9d7d84746e5eaf0629762aeba114aa0c28d
Status: Downloaded newer image for node:20.3.0
node[1]: ../src/node_platform.cc:68:std::unique_ptr<long unsigned int> node::WorkerThreadsTaskRunner::DelayedTaskScheduler::Start(): Assertion `(0) == (uv_thread_create(t.get(), start_thread, this))' failed.
 1: 0xc8e4a0 node::Abort() [node]
 2: 0xc8e51e  [node]
 3: 0xd0a059 node::WorkerThreadsTaskRunner::WorkerThreadsTaskRunner(int) [node]
 4: 0xd0a17c node::NodePlatform::NodePlatform(int, v8::TracingController*, v8::PageAllocator*) [node]
 5: 0xc4bbc4 node::V8Platform::Initialize(int) [node]
 6: 0xc49408  [node]
 7: 0xc497db node::Start(int, char**) [node]
 8: 0x7f6e8486218a  [/lib/x86_64-linux-gnu/libc.so.6]
 9: 0x7f6e84862245 __libc_start_main [/lib/x86_64-linux-gnu/libc.so.6]
10: 0xba9ade _start [node]
root@infra-digitalocean-ubuntu1604-x64-1:~# docker run -it node:20.3.0-bullseye node
Unable to find image 'node:20.3.0-bullseye' locally
20.3.0-bullseye: Pulling from library/node
93c2d578e421: Already exists
c87e6f3487e1: Already exists
65b4d59f9aba: Already exists
d7edca23d42b: Already exists
25c206b29ffe: Already exists
599134452287: Pull complete
bd8a83c4c2aa: Pull complete
d11f4613ae42: Pull complete
Digest: sha256:ceb28814a32b676bf4f6607e036944adbdb6ba7005214134deb657500b26f0d0
Status: Downloaded newer image for node:20.3.0-bullseye
Welcome to Node.js v20.3.0.
Type ".help" for more information.
>

Our website build is actually broken running apt update with the default Node.js LTS image based on Debian 12 (bookworm) - - we've switched to the Debian 11 (bullseye) based image for now: nodejs/build#3382

@targos
Copy link
Member

targos commented Jun 15, 2023

FWIW I opened an issue about this in the docker-node repo: nodejs/docker-node#1918

TLDR: this is not a problem with Node.js itself, but with the default base OS used by the Docker image, which was upgraded for v20.3.0.

@ronag
Copy link
Member Author

ronag commented Jun 15, 2023

bullseye works for me as well

@ronag
Copy link
Member Author

ronag commented Jun 15, 2023

Now I also get the file busy error:

test-srv3.hq.lan:~# docker run -it node:20.3.0-bullseye bash
root@ed020dd3f80e:/# yarn add bufferutil
yarn add v1.22.19
info No lockfile found.
[1/4] Resolving packages...
[2/4] Fetching packages...
[3/4] Linking dependencies...
[4/4] Building fresh packages...
error /node_modules/bufferutil: Command failed.
Exit code: 126
Command: node-gyp-build
Arguments: 
Directory: /node_modules/bufferutil
Output:
/bin/sh: 1: node-gyp-build: Text file busy

EDIT: Works with UV_USE_IO_URING=0

@ronag
Copy link
Member Author

ronag commented Jun 15, 2023

So to summarize:

  • Node 20.3.0 crashes on start in non-bullseye docker container
  • In bullseye container install fails with "node-gyp-build: Text file busy"
  • In bullseye container + UV_USE_IO_URING=0 everything works

@ronag
Copy link
Member Author

ronag commented Jun 15, 2023

Should I split the uring problem into a separate issue?

@bnoordhuis
Copy link
Member

Can someone post the result of strace -o trace.log -yy -f node app.js when it crashes with that uv_thread_create check? I expect to see a failing clone/clone2/clone3 system call but it'd be good to confirm.

@richardlau
Copy link
Member

richardlau commented Jun 15, 2023

On the Ubuntu 16.04 infra machine I cannot run apt in the bookworm based node:20.3.0 or node:lts containers to install strace in them (it's not there by default).

Another datapoint, adding --security-opt=seccomp:unconfined makes this work on the Ubuntu 16.04 host:

root@infra-digitalocean-ubuntu1604-x64-1:~# docker run --security-opt=seccomp:unconfined -it node:20.3.0 node
Welcome to Node.js v20.3.0.
Type ".help" for more information.
>

@bnoordhuis
Copy link
Member

Right, then I can predict with near 100% certainty what the problem is: docker doesn't know about the newish clone3 system call. Its seccomp filter rejects it with some bogus error and node consequently fails when it tries to start a new thread.

This docker seccomp thing is like clockwork, it always pops up when new system calls are starting to see broader use. It's quite possibly fixed in newer versions.

@mcollina
Copy link
Member

Updating docker to the latest version fixed it (v24.0.2) for me.

A few notes:

  • Ubuntu 22/04 LTS ships with Docker v20.x, which does not support this.
  • I did not test any version in between, and I couldn't quickly identify what releases of Docker fixed it. From various comments in issues, it seems runc (a dependency of Docker) fixed it in v1.0.2.

Here is what I think we should do:

  • document this error and the UV_USE_IO_URING=0 solution for v20
  • disable io_uring when we backport libuv in LTS lines

This seems a future-proof solution while keeping the current functionality available.

@bnoordhuis
Copy link
Member

document this error and the UV_USE_IO_URING=0 solution for v20

UV_USE_IO_URING is (intentionally) undocumented and going away again so don't do that.

@mcollina
Copy link
Member

@bnoordhuis Would you just document this as "if you are hit by this bug, update docker"?

@ronag
Copy link
Member Author

ronag commented Jun 16, 2023

I think there are two different things here. I'm not sure updating docker will help with the uring problem. Or does it? Please confirm.

@santigimeno
Copy link
Member

santigimeno commented Jun 16, 2023

If I'm reading this correctly there are 2 separate issues here.

  • The crash in some instances. This seems to be directly related to a bug with docker and nothing to do with io_uring.
  • The Text file busy error, which might or might not be io_uring related but, at least, seems to be exacerbated by the use of io_uring.

I think we should try to understand better the 2nd issue before disabling it.

@kamalkech
Copy link

i have the same, what is the correct solution ?

@LeoK80
Copy link

LeoK80 commented Apr 26, 2024

i have the same, what is the correct solution ?

Try upgrade to latest container runtime (docker, containerd, etc.) to latest. If nothing newer is picked up from your package manager, consider upgrading manually.

@kode54
Copy link

kode54 commented May 18, 2024

Experiencing this with nodejs 22.0.0-1 from official Arch packages, on x86_64, as of kernel 6.9.1. (Also experienced it with -rc7 before that, so I downgraded to 6.8.9 at the time.)

@getong
Copy link

getong commented May 18, 2024

archlinux 6.9.1, nodejs 22.0.0-1, same error

archlinux-github pushed a commit to archlinux/aur that referenced this issue May 18, 2024
node-gyp or node have a bug that prevents building with "text file busy"
if the kernel is too fast, so we have to disable IO_URING support. This
is cleary a hack and needs to be removed as soon as possible
nodejs/node#48444 is the necro bumped thread
originally from docker
@colddegree
Copy link

Same problem

➜  ~ uname -a
Linux tuf 6.9.1-arch1-1 #1 SMP PREEMPT_DYNAMIC Fri, 17 May 2024 16:56:38 +0000 x86_64 GNU/Linux
➜  ~ pacman -Q | grep node        
node-gyp 10.1.0-2
nodejs 22.2.0-1
nodejs-nopt 7.2.0-1

@daniluk4000
Copy link

Btw we still use bullseye, works pretty good

@Vortetty
Copy link

Vortetty commented May 21, 2024

not to bring up an old issue again but this appears to be a reoccurring bug. reproducing it consistently with kernel 6.9.1-arch1 and node-gyp 10.1.0-2 nodejs 22.2.0-1 nodejs-nopt 7.2.0-1 packages on Crystal Linux (basically arch)

it is fixed with the UV_USE_IORING=0 tag, but i'd rather not have to use that if a fix can be found for newer kernels as well

given that that does fix it, it may also podentially be an issue with libuv-1.48.0-2

@mcollina mcollina reopened this May 21, 2024
@mcollina
Copy link
Member

cc @santigimeno can you take another look?

@richardlau
Copy link
Member

it is fixed with the UV_USE_IORING=0 tag, but i'd rather not have to use that if a fix can be found for newer kernels as well

This is supposed to be the default for Node.js (since the February security releases).

https://nodejs.org/docs/latest-v22.x/api/cli.html#uv_use_io_uringvalue

io_uring is disabled by default due to security concerns.

@santigimeno
Copy link
Member

santigimeno commented May 21, 2024

A patch has just been sent to the kernel fixing this:
https://lore.kernel.org/io-uring/[email protected]/T/#u

It should land in stable shortly:
https://lore.kernel.org/io-uring/[email protected]/T/#t

@santigimeno
Copy link
Member

This is supposed to be the default for Node.js (since the February security releases).

https://nodejs.org/docs/latest-v22.x/api/cli.html#uv_use_io_uringvalue

I just tested with 22.2.0 installed from nvm and as documented, io_uring is disabled there. Maybe is there a problem in the arch linux package?

@gengjiawen
Copy link
Member

If you run into this on ubuntu 16 or 18, my fix is use ubuntu >= 20.04. The issue actually comes from docker for me.

@kode54
Copy link

kode54 commented May 22, 2024

This is supposed to be the default for Node.js (since the February security releases).
https://nodejs.org/docs/latest-v22.x/api/cli.html#uv_use_io_uringvalue

I just tested with 22.2.0 installed from nvm and as documented, io_uring is disabled there. Maybe is there a problem in the arch linux package?

How is arch supposed to be disabling io_uring? It configures nodejs to use the system libuv, and builds its libuv with the default options.

@santigimeno
Copy link
Member

How is arch supposed to be disabling io_uring? It configures nodejs to use the system libuv, and builds its libuv with the default options.

That's likely the problem. Due to the security reasons mentioned above node.js patched libuv to disable io_uring in the following commits: 42e659c and 6d14352. Maybe the arch packaging hasn't taken that into account?

@mcollina
Copy link
Member

Thanks @santigimeno for the help debugging this.
This issue can indeed be closed.

@felixonmars
Copy link
Contributor

felixonmars commented May 22, 2024

How is arch supposed to be disabling io_uring? It configures nodejs to use the system libuv, and builds its libuv with the default options.

That's likely the problem. Due to the security reasons mentioned above node.js patched libuv to disable io_uring in the following commits: 42e659c and 6d14352. Maybe the arch packaging hasn't taken that into account?

I am the Arch packager and indeed I have missed this change.

Opened libuv/libuv#4416 to see if there is a better way forward.

aminvakil added a commit to aminvakil/aur that referenced this issue May 23, 2024
https://aur.archlinux.org/cgit/aur.git/commit/?h=thelounge&id=fd50c63

node-gyp or node have a bug that prevents building with "text file busy"
if the kernel is too fast, so we have to disable IO_URING support. This
is cleary a hack and needs to be removed as soon as possible
nodejs/node#48444 is the necro bumped thread
originally from docker
archlinux-github pushed a commit to archlinux/aur that referenced this issue May 23, 2024
https://aur.archlinux.org/cgit/aur.git/commit/?h=thelounge&id=fd50c63

node-gyp or node have a bug that prevents building with "text file busy"
if the kernel is too fast, so we have to disable IO_URING support. This
is cleary a hack and needs to be removed as soon as possible
nodejs/node#48444 is the necro bumped thread
originally from docker
archlinux-github pushed a commit to archlinux/aur that referenced this issue Jun 2, 2024
archlinux-github pushed a commit to archlinux/aur that referenced this issue Jun 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
libuv Issues and PRs related to the libuv dependency or the uv binding.
Projects
None yet
Development

Successfully merging a pull request may close this issue.