-
Notifications
You must be signed in to change notification settings - Fork 240
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
RFC: reduce lifecycle script environment size
- Loading branch information
Showing
1 changed file
with
146 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,146 @@ | ||
# Reduce the environment variables provided to lifecycle scripts | ||
|
||
In versions of npm up to v6, the following items are all placed into the | ||
environment of scripts run for various lifecycle events (install, prepare, | ||
etc., as well as explicit scripts such as test and start). | ||
|
||
- `PATH` Configured to include `.../node_modules/.bin` for current and all | ||
parent `node_modules` directories. | ||
- `npm_package_*` for all `package.json` values in the current package for | ||
which the lifecycle event is running. | ||
- `npm_config_*` for all npm configuration values that do not start with a | ||
`_` character. | ||
- `npm_lifecycle_event` the current lifecycle event. | ||
- `npm_lifecycle_script` the command being run. | ||
- `npm_node_execpath` the path to the Node.js executable npm is using. | ||
- `npm_execpath` the path to the npm executable being run. | ||
|
||
The suggestion presented here is to remove (or vastly reduce) the | ||
`npm_config_*` and `npm_package_*` environment variables from the context | ||
of lifecycle scripts, and potentially also add new fields that may be more | ||
useful to more users. | ||
|
||
## Motivation | ||
|
||
Lifecycle scripts are run in many different contexts throughout the npm | ||
codebase. | ||
|
||
- Explicit scripts are run directly from the `lib/run-script.js` | ||
command implementation. | ||
- Build scripts are run from the context of the tree building logic, which | ||
is moving to a new implementation with `@npmcli/arborist` in v7. | ||
- Prepare scripts are run by `pacote` when it creates a tarball for | ||
publication or when it installs `git` dependencies. | ||
|
||
All of this necessitates passing around a single configuration object, | ||
which has some problems. | ||
|
||
1. It is tedious and error-prone, and has led to a more complicated | ||
codebase | ||
2. While we have not had security issues with it in the past, it runs the | ||
risk of exposing something sensitive in a context where it should not be | ||
exposed. | ||
3. It invites users to fork package behavior based on npm configuration, | ||
which should be a contract between the user and npm, and not between the | ||
user, npm, and the publisher. | ||
4. While the package.json data does not have as many of these problems, it | ||
is also largely unnecessary (and not widely used). The `package.json` | ||
file is readily available and easily parsed, and most scripts that would | ||
depend on package data simply read it directly. | ||
5. The environment is created anew for every script that's run. This could | ||
be optimized further, but as it currently stands, it's pretty | ||
inefficient. | ||
6. Lastly, exposing the full configuration and package.json makes the | ||
environment significantly larger, and can lead to problems on | ||
memory-constrained systems. | ||
|
||
The advantage of including `npm_config_*` values in the lifecycle | ||
environment is that npm commands run from within lifecycle events will have | ||
the same config values as the process that spawned them, since `env` values | ||
will override any other values except explicit command line flags. | ||
|
||
For example, a script named `release` may run tests, update the changelog, | ||
and then publishe the package. Running `npm run release --otp=123456` will | ||
put the two-factor auth one-time password into the `npm_config_otp` | ||
environment variable, so that the subsequent `npm publish` command will | ||
have the one-time password provided in the config. | ||
|
||
## Detailed Explanation | ||
|
||
1. Remove `npm_package*` values from the script lifecycle environment. | ||
2. Provide a new field, `npm_package_json`, with the path to the | ||
`package.json` file. | ||
3. Provide a new field, `npm_command`, with the canonical name of the | ||
command being run. | ||
3. Remove all `npm_config_*` values from the script lifecycle environment | ||
_except_: | ||
1. `npm_config_userconfig` | ||
2. `npm_config_globalconfig` | ||
3. Environment variables corresponding to any non-default config | ||
values. | ||
4. Add `npm_package_from`, `npm_package_resolved`, and | ||
`npm_package_integrity` for the package whose lifecycle event is | ||
running, if it's part of an install. (This addresses the needs of build | ||
tools, as discussed in | ||
[#38](https://github.com/npm/rfcs/pull/38#issuecomment-529182151).) | ||
5. `PATH` will continue to be provided as it currently is, so that scripts | ||
find their dependencies' executables first. | ||
|
||
This makes it easier to find and rely on package.json data, while ensuring | ||
that config defaults are maintained, without blowing up the size of the | ||
environment for lifecycle processes, or requiring access to the npm config | ||
subsystem in every npm CLI dependency. | ||
|
||
## Rationale and Alternatives | ||
|
||
Possible alternatives: | ||
|
||
### Just go ahead and pass around the whole config object like we do today | ||
|
||
This is not ideal for the reasons mentioned above, but also, it makes it | ||
virtually assured that Arborist remains tightly coupled to the npm cli. | ||
While _some_ degree of coupling is unavoidable, having to provide a valid | ||
npm config object would make this coupling much tighter than necessary. | ||
|
||
### Inversion of Control on the npm-lifecycle environment creation | ||
|
||
Rather than provide a config object matching a given interface, provide | ||
`npm-lifecycle` with a method that can build up and return the environment | ||
object. | ||
|
||
This approach would address the tight coupling between cli and arborist, | ||
but it doesn't address the other problems with having a giant config object | ||
dumped into the environment. | ||
|
||
## Implementation | ||
|
||
The npm CLI will set all non-default config flags in the environment so | ||
that scripts and sub-scripts will have them set in their configs by default | ||
at the env level. | ||
|
||
Replace `npm-lifecycle` with a lighter-weight approach, published as | ||
`@npmcli/run-script`. | ||
|
||
Instead of building the environment up from the config and package data, | ||
`@npmcli/run-script` will only set `npm_package_json` to the path to the | ||
package.json file for the package being run, take an object to define | ||
additional environment variables, and always inherit the environment in the | ||
child process being run. | ||
|
||
Because the npm CLI sets the relevant config fields, they'll be inherited | ||
to the child processes automatically. Arborist will use the environment | ||
option to pass in the `npm_package_from`, `npm_package_resolved`, and | ||
`npm_package_integrity` values. | ||
|
||
## Prior Art | ||
|
||
npm v6 and yarn both do roughly the same thing, though they have different | ||
config values. | ||
|
||
## Downsides | ||
|
||
Some modules today use `npm_config_argv`. These will have to be updated to | ||
use other means to get this information. | ||
|
||
Where the argv is being parsed in order to determine the command being run, | ||
the `npm_command` environ provides a safer approach. |