Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] npm silently fails when unable to write cache directory #4996

Closed
2 tasks done
Hi-Fi opened this issue Jun 8, 2022 · 7 comments · Fixed by #5197
Closed
2 tasks done

[BUG] npm silently fails when unable to write cache directory #4996

Hi-Fi opened this issue Jun 8, 2022 · 7 comments · Fixed by #5197
Assignees
Labels
Bug thing that needs fixing Priority 1 high priority issue Release 8.x work is associated with a specific npm 8 release

Comments

@Hi-Fi
Copy link

Hi-Fi commented Jun 8, 2022

Is there an existing issue for this?

  • I have searched the existing issues

This issue exists in the latest npm version

  • I am using the latest npm

Current Behavior

When running older official node image, npm works fine:

docker container run --rm -it -u nobody:nobody --entrypoint /bin/sh node:16.14-alpine -c "id && npm -v || echo Exit"
uid=65534(nobody) gid=65534(nobody)
8.5.0

With latest one, npm just silently exits:

docker container run --rm -it -u nobody:nobody --entrypoint /bin/sh node:16-alpine -c "id && npm -v || echo Exit"
uid=65534(nobody) gid=65534(nobody)

Exit

Custom image that updates NPM:

docker container run --rm -it -u nobody:nobody --entrypoint /bin/sh npm -c "id && npm -v || echo Exit"
uid=65534(nobody) gid=65534(nobody)

Exit
docker container run --rm -it -u node:node --entrypoint /bin/sh npm -c "id && npm -v || echo Exit"
uid=1000(node) gid=1000(node)
8.12.1

Expected Behavior

docker container run --rm -it -u nobody:nobody --entrypoint /bin/sh node:16-alpine -c "id && npm -v || echo Exit"
uid=65534(nobody) gid=65534(nobody)
8.11.0

Steps To Reproduce

With docker behavior steps can be used to reproduce. For newer version image creation is needed, as user can't be changed in shell to nobody.

Custom image created with Dockerfile:

FROM node:16-alpine

RUN npm i -g npm && npm -v

USER nobody

Environment

No response

@Hi-Fi Hi-Fi added Bug thing that needs fixing Needs Triage needs review for next steps Release 8.x work is associated with a specific npm 8 release labels Jun 8, 2022
@Hi-Fi Hi-Fi changed the title [BUG] npm silently fails with used nobody [BUG] npm silently fails with user nobody Jun 8, 2022
@patchon
Copy link

patchon commented Jun 8, 2022

The issue seems to be related to the creation of the "cache-directory".
Setting the npm_config_cache to a writable path (ie. /tmp/) works.

$ > npm --verbose ; echo $?

243

$ > npm_config_cache=/tmp/ npm --version ; echo $?
8.11.0
0

Above is from node:16-alpine-latest in an OpenShift environment (ie. where an arbitrary user id is used).

I traced it down to the following code,

# /usr/local/lib/node_modules/npm/lib/npm.js
243     // mkdir this separately since the logs dir can be set to
244     // a different location. an error here should be surfaced
245     // right away since it will error in cacache later
246     await this.time('npm:load:mkdirpcache', () =>
247       fs.mkdir(this.cache, { recursive: true, owner: 'inherit' }))

Unfortunately my nodejs skills stops here and I'm not really sure what the comment is referring to, nor why we don't see any output from any log.XX code. Adding a catch to above, makes the code work, however I'm not sure if that is the right approach, I'll leave that to someone else.

diff --git a/npm.js b/npm.js
index 2197f11..1b940be 100644
--- a/npm.js
+++ b/npm.js
@@ -244,7 +244,8 @@ class Npm extends EventEmitter {
     // a different location. an error here should be surfaced
     // right away since it will error in cacache later
     await this.time('npm:load:mkdirpcache', () =>
-      fs.mkdir(this.cache, { recursive: true, owner: 'inherit' }))
+      fs.mkdir(this.cache, { recursive: true, owner: 'inherit' })
+        .catch((e) => log.warn('cachedir', `could not create cache-dir: ${e}`)))

With the above patch you'll get something like,

$ > npm --version
npm WARN cachedir could not create cache-dir: Error: EACCES: permission denied, mkdir '/.npm'
npm WARN logfile could not create logs-dir: Error: EACCES: permission denied, mkdir '/.npm'
npm WARN logfile could not be created: Error: ENOENT: no such file or directory, open '/.npm/_logs/2022-06-08T14_58_12_496Z-debug-0.log'
8.11.0

At least that would have pointed us in the right direction from the start.

Sorry for the long post, here's a 🥔😄

¯\(ツ)

@FelipeEmerim
Copy link

We use k8s with readOnlyRootFilesystem: true. We added an emptyDir volume mounted on /tmp and set the npm_config_cache env var to /tmp/npm/. Because the volume is mounted with 777 permissions it solved our problem without downgrading the node image.

@soulchild
Copy link

Wow! This just broke all of our applications on k8s and it took me half a day to figure out what's going on because there was no output whatsoever (Note to future self: Check npm issues first). Will this be treated as a bug which will be fixed or are we going to have to patch each and every one of our applications? 🙈

@chris-addison
Copy link

This made us lose two days of dev time. Took over a day to narrow the issue down to npm. Changing the alpine image back to 3.15 fixed it for us.

@XhmikosR
Copy link
Contributor

Could this be related/the same as #4838?

@soulchild
Copy link

We use k8s with readOnlyRootFilesystem: true. We added an emptyDir volume mounted on /tmp and set the npm_config_cache env var to /tmp/npm/. Because the volume is mounted with 777 permissions it solved our problem without downgrading the node image.

I just realized we had securityContext.runAsUser and securityContext.runAsGroup set to an arbitrary user id in our k8s deployment. After changing this to uid/gid 1000, which is the node user that comes with the node:16-alpine base image, the problem went away.

MadLittleMods added a commit to matrix-org/matrix-viewer that referenced this issue Jun 15, 2022
Fix image not running in K8s (ROSA)

Trying to solve the container exiting with error code `243`. `npm` throws `243` for any command you try to run with it.

Here is how to get into the container while the entrypoint normally errors.
```
$ oc run -i --tty --image=ghcr.io/matrix-org/matrix-public-archive/matrix-public-archive:sha-65edaea1a9713bb2cc93fa98638327fdeff765cd mpa-test2 --port=3050 --restart=Never --env="DOMAIN=cluster" --command /bin/bash
$ npm start

$ echo $?
243
```

Why does the same image work in Docker?
```
$ docker run -it --entrypoint /bin/bash ghcr.io/matrix-org/matrix-public-archive/matrix-public-archive:sha-65edaea1a9713bb2cc93fa98638327fdeff765cd
$ npm start
# it starts ✅
```


## Root cause

Seems like it could be either of these issues. Both are `8.6.0`+ problems as well which lines up with the `npm` version available in each base image.

 - npm/cli#4769
 - npm/cli#4996
    - Specific comment about breaking stuff in k8s for other people, npm/cli#4996 (comment)
@fritzy fritzy changed the title [BUG] npm silently fails with user nobody [BUG] npm silently fails when unable to write cache directory Jun 30, 2022
@fritzy
Copy link
Contributor

fritzy commented Jun 30, 2022

It looks like the exit-handler is not properly reporting ENOENT errors when lib/npm.js fails to create the cache directory.

@fritzy fritzy added Priority 1 high priority issue and removed Needs Triage needs review for next steps labels Jun 30, 2022
@lukekarrys lukekarrys self-assigned this Jul 14, 2022
lukekarrys added a commit that referenced this issue Jul 20, 2022
This also changes all the log messages about not being able to create
initial directories and files to `log.verbose` since we know run those
commands on init. There are a lot of valid reasons why those might fail,
and we don't want to show a warning for them every time.

Fixes: #4769
Fixes: #4838
Fixes: #4996
lukekarrys added a commit that referenced this issue Jul 20, 2022
This also changes all the log messages about not being able to create
initial directories and files to `log.verbose` since we know run those
commands on init. There are a lot of valid reasons why those might fail,
and we don't want to show a warning for them every time.

Fixes: #4769
Fixes: #4838
Fixes: #4996
lukekarrys added a commit that referenced this issue Jul 20, 2022
This also changes all the log messages about not being able to create
initial directories and files to `log.verbose` since we know run those
commands on init. There are a lot of valid reasons why those might fail,
and we don't want to show a warning for them every time.

Fixes: #4769
Fixes: #4838
Fixes: #4996
lukekarrys added a commit that referenced this issue Jul 20, 2022
This also changes all the log messages about not being able to create
initial directories and files to `log.verbose` since we know run those
commands on init. There are a lot of valid reasons why those might fail,
and we don't want to show a warning for them every time.

Fixes: #4769
Fixes: #4838
Fixes: #4996
lukekarrys added a commit that referenced this issue Jul 20, 2022
This also changes all the log messages about not being able to create
initial directories and files to `log.verbose` since we know run those
commands on init. There are a lot of valid reasons why those might fail,
and we don't want to show a warning for them every time.

Fixes: #4769
Fixes: #4838
Fixes: #4996
lukekarrys added a commit that referenced this issue Jul 20, 2022
This also changes all the log messages about not being able to create
initial directories and files to `log.verbose` since we know run those
commands on init. There are a lot of valid reasons why those might fail,
and we don't want to show a warning for them every time.

Fixes: #4769
Fixes: #4838
Fixes: #4996
lukekarrys added a commit that referenced this issue Jul 20, 2022
This also changes all the log messages about not being able to create
initial directories and files to `log.verbose` since we know run those
commands on init. There are a lot of valid reasons why those might fail,
and we don't want to show a warning for them every time.

Fixes: #4769
Fixes: #4838
Fixes: #4996
lukekarrys added a commit that referenced this issue Jul 21, 2022
This also changes all the log messages about not being able to create
initial directories and files to `log.verbose` since we know run those
commands on init. There are a lot of valid reasons why those might fail,
and we don't want to show a warning for them every time.

Fixes: #4769
Fixes: #4838
Fixes: #4996
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug thing that needs fixing Priority 1 high priority issue Release 8.x work is associated with a specific npm 8 release
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants