Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] npm start suddenly started giving npm ERR! Error: EACCES: permission denied, mkdir '/home/myuser' as of yesterday (2023-11-30) #7044

Open
2 tasks done
Foosvald opened this issue Dec 1, 2023 · 37 comments
Assignees
Labels
Bug thing that needs fixing Priority 1 high priority issue Release 10.x

Comments

@Foosvald
Copy link

Foosvald commented Dec 1, 2023

Is there an existing issue for this?

  • I have searched the existing issues

This issue exists in the latest npm version

  • I am using the latest npm

Current Behavior

We do npm start in a Docker container with a user which doesn't have a home directory. We've been doing this for several years without problems, but as of yesterday (2023-11-30) all containers suddenly get npm ERR! Error: EACCES: permission denied, mkdir '/home/myuser'. Setting NPM_CONFIG_UPDATE_NOTIFIER=false solves this problem, but we've already deployed a lot of containers without the update notifier disabled which causes a lot of problems for us.

Did something change in the response from the npm servers or do you know why this problem suddenly started happening yesterday?

Expected Behavior

npm start should start the script and not exit due to being unable to create a home folder.

Steps To Reproduce

Given the following code as repro.sh:

#!/bin/bash

cat <<EOF > Dockerfile
FROM node:20.10.0-buster-slim as production
WORKDIR /usr/src/my

COPY package.json ./

RUN groupadd -g 999 myuser && \
    useradd -r -u 999 -g myuser myuser

USER myuser

WORKDIR /usr/src/my/backend

CMD [ "npm", "start", "--verbose" ]
EOF

cat <<EOF > package.json
{
    "name": "myservice",
    "version": "1.0.0",
    "private": true,
    "scripts": {
        "start": "sleep 3 && echo hello!"
    }
}
EOF

docker build -f Dockerfile -t myservice:tag1 .
docker run -e NPM_CONFIG_UPDATE_NOTIFIER=$NPM_CONFIG_UPDATE_NOTIFIER --rm -it myservice:tag1

./repro.sh or NPM_CONFIG_UPDATE_NOTIFIER=true ./repro.sh causes npm to exit with the following log:

[+] Building 0.1s (10/10) FINISHED                                                                                                                                                                                                                                                                                                                   docker:desktop-linux
 => [internal] load build definition from Dockerfile                                                                                                                                                                                                                                                                                                                 0.0s
 => => transferring dockerfile: 274B                                                                                                                                                                                                                                                                                                                                 0.0s
 => [internal] load .dockerignore                                                                                                                                                                                                                                                                                                                                    0.0s
 => => transferring context: 2B                                                                                                                                                                                                                                                                                                                                      0.0s
 => [internal] load metadata for docker.io/library/node:20.10.0-buster-slim                                                                                                                                                                                                                                                                                          0.0s
 => [1/5] FROM docker.io/library/node:20.10.0-buster-slim                                                                                                                                                                                                                                                                                                            0.0s
 => [internal] load build context                                                                                                                                                                                                                                                                                                                                    0.0s
 => => transferring context: 180B                                                                                                                                                                                                                                                                                                                                    0.0s
 => CACHED [2/5] WORKDIR /usr/src/my                                                                                                                                                                                                                                                                                                                                 0.0s
 => CACHED [3/5] COPY package.json ./                                                                                                                                                                                                                                                                                                                                0.0s
 => CACHED [4/5] RUN groupadd -g 999 myuser &&     useradd -r -u 999 -g myuser myuser                                                                                                                                                                                                                                                                                0.0s
 => CACHED [5/5] WORKDIR /usr/src/my/backend                                                                                                                                                                                                                                                                                                                         0.0s
 => exporting to image                                                                                                                                                                                                                                                                                                                                               0.0s
 => => exporting layers                                                                                                                                                                                                                                                                                                                                              0.0s
 => => writing image sha256:1c9ddb6a1864b4ad4e88b0c503ae3d645b55b8f184c45afdb27cc5e99d54bd92                                                                                                                                                                                                                                                                         0.0s
 => => naming to docker.io/library/myservice:tag1                                                                                                                                                                                                                                                                                                                    0.0s

What's Next?
  View a summary of image vulnerabilities and recommendations → docker scout quickview
npm verb cli /usr/local/bin/node /usr/local/bin/npm
npm info using [email protected]
npm info using [email protected]
npm verb cache could not create cache: Error: EACCES: permission denied, mkdir '/home/myuser'
npm verb logfile could not create logs-dir: Error: EACCES: permission denied, mkdir '/home/myuser'
npm verb title npm start
npm verb argv "start" "--loglevel" "verbose"
npm verb logfile logs-max:10 dir:/home/myuser/.npm/_logs/2023-12-01T12_42_57_202Z-
npm verb logfile could not be created: Error: ENOENT: no such file or directory, open '/home/myuser/.npm/_logs/2023-12-01T12_42_57_202Z-debug-0.log'
npm verb logfile no logfile created

> [email protected] start
> sleep 3 && echo hello!

npm verb stack Error: EACCES: permission denied, mkdir '/home/myuser'
npm verb cwd /usr/src/my/backend
npm verb Linux 6.4.16-linuxkit
npm verb node v20.10.0
npm verb npm  v10.2.3
npm ERR! code EACCES
npm ERR! syscall mkdir
npm ERR! path /home/myuser
npm ERR! errno -13
npm ERR! Error: EACCES: permission denied, mkdir '/home/myuser'
npm ERR!  [Error: EACCES: permission denied, mkdir '/home/myuser'] {
npm ERR!   errno: -13,
npm ERR!   code: 'EACCES',
npm ERR!   syscall: 'mkdir',
npm ERR!   path: '/home/myuser'
npm ERR! }
npm ERR!
npm ERR! The operation was rejected by your operating system.
npm ERR! It is likely you do not have the permissions to access this file as the current user
npm ERR!
npm ERR! If you believe this might be a permissions issue, please double-check the
npm ERR! permissions of the file and its containing directories, or try running
npm ERR! the command again as root/Administrator.
npm verb exit -13
npm verb unfinished npm timer command:start 1701434577260
npm verb unfinished npm timer command:run-script 1701434577267
npm verb code -13

npm ERR! Log files were not written due to an error writing to the directory: /home/myuser/.npm/_logs
npm ERR! You can rerun the command with `--loglevel=verbose` to see the logs in your terminal

But NPM_CONFIG_UPDATE_NOTIFIER=false ./repro.sh works:

[+] Building 0.1s (10/10) FINISHED                                                                                                                                                                                                                                                                                                                   docker:desktop-linux
 => [internal] load build definition from Dockerfile                                                                                                                                                                                                                                                                                                                 0.0s
 => => transferring dockerfile: 274B                                                                                                                                                                                                                                                                                                                                 0.0s
 => [internal] load .dockerignore                                                                                                                                                                                                                                                                                                                                    0.0s
 => => transferring context: 2B                                                                                                                                                                                                                                                                                                                                      0.0s
 => [internal] load metadata for docker.io/library/node:20.10.0-buster-slim                                                                                                                                                                                                                                                                                          0.0s
 => [1/5] FROM docker.io/library/node:20.10.0-buster-slim                                                                                                                                                                                                                                                                                                            0.0s
 => [internal] load build context                                                                                                                                                                                                                                                                                                                                    0.0s
 => => transferring context: 180B                                                                                                                                                                                                                                                                                                                                    0.0s
 => CACHED [2/5] WORKDIR /usr/src/my                                                                                                                                                                                                                                                                                                                                 0.0s
 => CACHED [3/5] COPY package.json ./                                                                                                                                                                                                                                                                                                                                0.0s
 => CACHED [4/5] RUN groupadd -g 999 myuser &&     useradd -r -u 999 -g myuser myuser                                                                                                                                                                                                                                                                                0.0s
 => CACHED [5/5] WORKDIR /usr/src/my/backend                                                                                                                                                                                                                                                                                                                         0.0s
 => exporting to image                                                                                                                                                                                                                                                                                                                                               0.0s
 => => exporting layers                                                                                                                                                                                                                                                                                                                                              0.0s
 => => writing image sha256:1c9ddb6a1864b4ad4e88b0c503ae3d645b55b8f184c45afdb27cc5e99d54bd92                                                                                                                                                                                                                                                                         0.0s
 => => naming to docker.io/library/myservice:tag1                                                                                                                                                                                                                                                                                                                    0.0s

What's Next?
  View a summary of image vulnerabilities and recommendations → docker scout quickview
npm verb cli /usr/local/bin/node /usr/local/bin/npm
npm info using [email protected]
npm info using [email protected]
npm verb cache could not create cache: Error: EACCES: permission denied, mkdir '/home/myuser'
npm verb logfile could not create logs-dir: Error: EACCES: permission denied, mkdir '/home/myuser'
npm verb title npm start
npm verb argv "start" "--loglevel" "verbose"
npm verb logfile logs-max:10 dir:/home/myuser/.npm/_logs/2023-12-01T12_43_50_048Z-
npm verb logfile could not be created: Error: ENOENT: no such file or directory, open '/home/myuser/.npm/_logs/2023-12-01T12_43_50_048Z-debug-0.log'
npm verb logfile no logfile created

> [email protected] start
> sleep 3 && echo hello!

hello!
npm verb exit 0
npm info ok

Environment

  • npm: 10.2.3
  • Node.js: v20.10.0
  • OS Name: Debian buster slim
  • System Model Name: Macbook Pro (but happens on Linux hosts as well)
  • npm config:
; node bin location = /usr/local/bin/node
; node version = v20.10.0
; npm local prefix = /usr/src/my
; npm version = 10.2.3
; cwd = /usr/src/my/backend
; HOME = /home/myuser
@Foosvald Foosvald added Bug thing that needs fixing Needs Triage needs review for next steps Release 10.x labels Dec 1, 2023
@moznion
Copy link

moznion commented Dec 1, 2023

I also encountered the same issue on December 1st at 00:00 UTC when the application restarted. The system had not upgraded the Node and npm versions, and there were no changes to the application code while running the application. I am also wondering about the reason for this issue.

@moznion
Copy link

moznion commented Dec 2, 2023

For your information, another workaround for this problem is to specify the cache directory in your .npmrc file like this:

cache=/path/to/accessible/.npm

@karlbohlmark
Copy link

For people who are running this in containers they can't easily rebuild and redeploy it also worked for me to override the dns entry for registry.npmjs.org (I tried setting it to 127.0.0.1 in /etc/hosts).

Please report this issue to npm through the npm website as well. If you log in to your npm account you can report an issue with the registry. Maybe someone will get through to someone who understands the problem.

@dglozic
Copy link

dglozic commented Dec 2, 2023

IBM products that use npm start have also started to experience this. It is unsettling that npm start that we thought will just use package.json is actually making an outbound request to the npm public servers. Images with a version of Node.js and npm that we didn't touch started having issues starting for this exact reason.

How do we lobby for the change to be modified or reverted (and to consider a npm start variant that does not make an outbound request)?

@karlbohlmark
Copy link

Yes it’s pretty scary that images that have been running fine for a year suddenly breaks. It should probably be easy for npm to fix this if it was possible to get their attention. We have already moved away from using npm in the docker images going forward but these old containers are running in a lot of places where we don’t have access to upgrade them.

@dglozic You can avoid the outgoing request if you set the config variable as mentioned in the ticket.

@dglozic
Copy link

dglozic commented Dec 2, 2023

I understand that (thanks!) but that requires changes to the images.

@karlbohlmark
Copy link

Since it is an environment variable it should be possible to change it without changing the image assuming you have access to relaunch the container. We did this in our kubernetes environments as a hot-fix yesterday.

@dglozic
Copy link

dglozic commented Dec 2, 2023

Ah sorry, was not paying attention, I was seeing changes to .npmrc and didn't see that an env var can do the same.

@yannizhang2019
Copy link

Can anyone from the community explain a bit on what is causing this problem? We are getting more and more reports from customers on this issue.

@texkelly
Copy link

texkelly commented Dec 5, 2023

I assume blocking all egress by default, and then defining only required exceptions (using network policies) would also work, as long as you don't add registry.npmjs.org to that exception list. Container images are generally assumed to be static, ideally they shouldn't be updating themselves over the internet at start time.

@lukekarrys lukekarrys self-assigned this Dec 5, 2023
@lukekarrys lukekarrys added Priority 0 will get attention right away and removed Needs Triage needs review for next steps labels Dec 5, 2023
@lukekarrys
Copy link
Contributor

I will be digging into why this has been happening. My goal is to triage and fix this week to be included in our next release.

@pgodowski
Copy link

Perhaps it would be good to re-iterate that it's the best practice that containers running in production shall run with the read-only filesystem (readOnlyRootFilesystem flag) and it should not be expected that the application middleware updates it own state. If there is a call-home, perhaps the information received shall be kept in memory?

I would assume the issue will be investigated and root cause explained as of WHY the issue started to be seen on Nov 30th/Dec 1st, but I'd suggest that there is a solid review executed against any other places where npm start writes to the local filesystem and under which conditions since it seems npm adopters were caught by surprise here.

@dglozic
Copy link

dglozic commented Dec 6, 2023

For deployments in k8s, images starting using NPM start need NPM to:

  1. only read local files in context (.npmrc, package.json)
  2. Not attempt to write anything to the file system
  3. Not attempt to connect back to the server

I don’t know which combination of cli options would yield such behaviour but it seems important for runtime image starts.

@willisweb
Copy link

We experienced the same issue on an older version of npm 8.5.0. We were using an AWS EC2 AMI which had started up fine many times previously then stopped working on 30th.

@lukekarrys
Copy link
Contributor

I'd suggest that there is a solid review executed against any other places where npm start writes to the local filesystem and under which conditions since it seems npm adopters were caught by surprise here.

This is one of the goals on my investigation, and is currently a goal of npm. There are other places where npm attempts to write to the filesystem or contact servers but are all designed to be best effort and will not cause any errors until necessary, such has attempting to download and install a package with no filesystem or network access.

I don't have any thorough conclusions yet, but so far I have found that an uncaught promise or stream error is bubbling up unexpectedly to our uncaught exception handler.

@lukekarrys
Copy link
Contributor

I don’t know which combination of cli options would yield such behaviour but it seems important for runtime image starts.

update-notifier=false

The update notifier is designed to run weekly and will attempt to contact the configured registry to see if npm can be updated. This behavior is designed to be resilient to network and filesystem access issues, but is currently the source of this bug.

This can be turned off with the following command which will set the config in a local .npmrc file:

npm config set update-notifier=false --location=project

logs-max=0

npm writes log files to the cache directory. This is also designed to be resilient to filesystem issues. If you want npm to not attempt to write these files you can set logs-max=0. This is only available in [email protected]. Older versions will need to rely on logs=/dev/null.

npm config set logs-max=0 --location=project

@dglozic
Copy link

dglozic commented Dec 6, 2023

Thanks @lukekarrys - that is useful!

@wraithgar
Copy link
Member

wraithgar commented Dec 6, 2023

Today's npm release (10.2.5) will be the one that has the logs-max fix. That PR landed awhile ago but we haven't released npm since before the Thanksgiving holiday,

ETA: it will also have the update notifier fix, but that only fixes folks who are not doing any registry fetch w/ npm.

@moznion
Copy link

moznion commented Dec 6, 2023

I appreciate your hard work. It seems that patch fixes the phenomenon, but anyway I would like to know why did this problem begin on 30 Nov or 1st Dec? As I mentioned above, there had been no changes of npm/node runtime on the user side.

@lukekarrys lukekarrys reopened this Dec 6, 2023
@lukekarrys
Copy link
Contributor

I'm going to leave this issue open as I continue to investigate deeper. We've landed a fix for this specific bug, but there are the issues of the uncaught exception as well as why this bug started appearing. I'll use this issue to report my findings.

@lukekarrys
Copy link
Contributor

We experienced the same issue on an older version of npm 8.5.0. We were using an AWS EC2 AMI which had started up fine many times previously then stopped working on 30th.

@willisweb which version of nodejs was this on?

@Foosvald
Copy link
Author

Foosvald commented Dec 7, 2023

We experienced the same issue on an older version of npm 8.5.0. We were using an AWS EC2 AMI which had started up fine many times previously then stopped working on 30th.

@willisweb which version of nodejs was this on?

@lukekarrys, I can answer for us (different company from @willisweb). Initially, we encountered this issue with npm 8.5.5 and Node.js version 16.15.0. These versions had been functioning properly for an extended period until about a week ago, when we began experiencing the reported problem. This prompted me to replicate the issue using the most recent version of npm, after which I created this bug report...

@wraithgar wraithgar added Priority 1 high priority issue and removed Priority 0 will get attention right away labels Dec 7, 2023
@wraithgar
Copy link
Member

Did those who saw the phenomenon at the beginning of December take the Thanksgiving holiday off? Had it been a week since you ran npm?

@AlistairGempf
Copy link

AlistairGempf commented Dec 7, 2023

We had an EC2 in production using npm in a systemd enabled service (so npm start was run without any sort of manual intervention), we saw it happen first sometime between 18:30 and 19:00 UTC on 30th of November on the EC2 that had been deployed earlier that day (around 11:30am), so npm had certainly run more recently than a week before that. The AMI we were using may have first run npm a week earlier, I don't know if I can get you that information.

And we're in the UK so have no concept of Thanksgiving ;)

@EdwardLuke
Copy link

EdwardLuke commented Dec 7, 2023

Thanks to everyone for the attention this has been given so far. It would be useful to know if this issue has been downgraded in priority because using the npm CLI in production in the ways described in this thread isn't an expected use-case for the tool? It would be useful to know since we can adjust our expectations accordingly.

@wraithgar
Copy link
Member

It was downgraded because yesterday's npm release fixed the reported issue. npm instances that don't run commands that fetch registry info will no longer crash.

It was left open because there are some other issues (such as how the error and ones similar to it manifested) that still need attention.

@wraithgar
Copy link
Member

Please note that it can take weeks for this new npm version to make it down into new docker images (the new version has to land in a new version of node, and then that new version of node has to make it to a docker image), so you may want to consider manually updating npm in your workflows to stay current.

@mosman-lvmh
Copy link

Hi

Since we don't know yet why the issue happened on Nov the 30rd, we do not know if and when it could happen again.

So for new application that will use npm start as containers entrypoint, the issue can raise again. I suppose this should not impact production environment directly

My opinion is that understanding the root cause of the issue is the priority, and then fixing it for sure.
Also maybe I missed it, but having a guideline somewhere in npm website on how to run nodejs application using npm in a container environnement with readonly root filesystem can be a good starting point as well.

My 2 cents, and thanks for the support !

@ybjwang
Copy link

ybjwang commented Dec 7, 2023

@wraithgar Will the fix also applied to the old such as NPM 8.5.X version? As we have many images using npm 8.5.0 and 8.5.5. Thanks!

@wraithgar
Copy link
Member

No this will only be fixed in npm 10 as of right now, specifically 10.2.5 or newer. For older versions you will want to disable the update notifier. npm 10 is the only supported version of npm as it's the only version of npm that ships with all currently supported versions of node. npm 9 does get occasional backports but they're not guaranteed, we are doing them to iron out the backport process for when we cut npm 11 not as a support contract. npm 8 hasn't had an update in 9 months.

We also can't fix an existing version of npm. Even if a new version of npm 8 were to be published 8.5.0 and 8.5.5 won't ever change.

@eamonnsullivan
Copy link

We also can't fix an existing version of npm. Even if a new version of npm 8 were to be published 8.5.0 and 8.5.5 won't ever change.

I just want to point out that this is what everyone thought, but it didn't turn out to be the case. Do we understand why v8 suddenly changed behaviour on Nov. 30?

@ljharb
Copy link
Contributor

ljharb commented Dec 8, 2023

@eamonnsullivan presumably it changed behavior at midnight UTC on December 1st, which was November 30th for many.

@willisweb
Copy link

@willisweb which version of nodejs was this on?

It was nodejs v16.14.2

@willisweb
Copy link

willisweb commented Dec 11, 2023

I would just like to make it clear, in case it wasn't above that for us running instances were unaffected, so they didn't poll and then fail, it was only new instances starting up from an AMI that was previously started multiple times without issue that were affected.

@moznion
Copy link

moznion commented Dec 19, 2023

@lukekarrys Sorry, I don't mean to rush you but are there any updates? Actually I would like to know the root cause of this issue.

@rondales

This comment has been minimized.

@meacoleman
Copy link

A teammate of mine just told me that she hasn't encountered this issue at all this week. Is it truly fixed?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug thing that needs fixing Priority 1 high priority issue Release 10.x
Projects
None yet
Development

No branches or pull requests