diff --git a/accepted/0000-no-package-json-_fields.md b/accepted/0000-no-package-json-_fields.md new file mode 100644 index 000000000..ec56ad927 --- /dev/null +++ b/accepted/0000-no-package-json-_fields.md @@ -0,0 +1,260 @@ +# Do Not Put Meta Information In `_`-Prefixed `package.json` Fields + +## Summary + +Currently, npm stashes various information in the `package.json` files that +it creates. + +We will stop putting metadata in `package.json` files in `node_modules`. +The fields that serve a purpose will be relocated to +`node_modules/.arborist-resolutions.json`. When this metadata does not +exist, npm will make a best effort to gather metadata from +`package-lock.json` or `yarn.lock`. + +## Motivation + +These fields are only relevant to npm, and as of version 6, npm relies on +the presence of these fields for most things it does with packages on disk. + +This creates problems when teams use both npm and yarn or pnpm, because +the fields are missing. + +In particular, `npm ls` does not function properly if these fields are not +set, treating packages without the `_fields` as if they were not present in +the tree at all. Running `npm install` will cause npm to behave in +confusing ways as well. + +Writing back to `package.json` adds work to the installer's process, which +`npm ci` attempts to streamline. However, the mechanism of doing this +means that `npm ci` and `npm install` write _different_ metadata to +installed `package.json` files, further adding to the confusion. + +All of this would be justifiable if these fields were necessary, or at +least provide some significant benefit. However, this is not the case. +Several of them are irrelevant or unnecessary, and rather than aid in +debugging, they merely add confusion. + +## Detailed Explanation + +The default in npm has traditionally been to write back the `package.json` +file stashing whatever metadata npm was already using for its own purpose. +However, some of them are no longer necessary, and can be removed with no +problem. Others need to find a new home. + +A real-life example: + +```json +{ + "_args": [ + [ + "which@1.3.1", + "/Users/isaacs/dev/js/tap" + ] + ], + "_from": "which@1.3.1", + "_id": "which@1.3.1", + "_inBundle": false, + "_integrity": "sha512-HxJdYWq1MTIQbJ3nw0cqssHoTNU267KlrDuGZ1WYlxDStUtKUhOaJmh112/TZmHxxUfuJqPXSOm7tDyas0OSIQ==", + "_location": "/which", + "_phantomChildren": {}, + "_requested": { + "type": "version", + "registry": true, + "raw": "which@1.3.1", + "name": "which", + "escapedName": "which", + "rawSpec": "1.3.1", + "saveSpec": null, + "fetchSpec": "1.3.1" + }, + "_requiredBy": [ + "/", + "/cross-spawn", + "/execa/cross-spawn", + "/istanbul-lib-processinfo/cross-spawn", + "/spawn-wrap" + ], + "_resolved": "https://registry.npmjs.org/which/-/which-1.3.1.tgz", + "_spec": "1.3.1", + "_where": "/Users/isaacs/dev/js/tap", + "...": "the rest of package.json" +} +``` + +Most of this information is duplicative of information that is stored in +`package-lock.json`, or generated by the installer when building the tree +and not relevant afterwards. + +Much of it can be omitted without giving up much, especially since it is +already stored elsewhere. + +### Fields + +- `_args` - the set of arguments to the install methods that resolved to + this dependency. (This is a 2-d array, and will have multiple entries if + a package is deduped more than once to a given location.) + + As far as I can tell, this is fine to remove, because it is not + load-bearing anywhere in the system. + +- `_inBundle` - Whether the package or any of its parents are a + `bundleDependency`. + + This information can be deduced from the package.json files on disk. + +- `_where` - Excessive. If you can find the `package.json` file to + read it, then you already know where it is. + + Furthermore, since this captures the actual path on disk when the package + is _installed_, it can unexpectedly leak misleading information + in the case of `bundleDependencies`. Installing `tap@14.5.0` will + result in a file at `node_modules/tap/node_modules/ink/package.json` + with a `_from` field of `[["ink@2.3.0","/Users/isaacs/dev/js/tap"]]`. + +- `_integrity` - The SRI integrity field for this package. This is + important for the `package-lock.json` to be able to ensure that we are + receiving the same bits as the lockfile expects. + +- `_shasum` - Legacy sha1 integrity check. Converted to a `sha1` SRI + integrity field if present. + +- `_requested` - The [npa](http://npm.im/npm-package-arg)-parsed specifier + that was requested. + + If a package is required multiple times, `_requested` is the earliest + request that led to it being fetched. + + This is load bearing throughout the system, but most of the cases where + it's being used will be modified or replaced in the Arborist refactor. + Most places where `node.package._requested` is being used, it would be + more appropriate to store the information on `node.requested` instead. + + Arborist Node objects have a `edgesIn` Set that captures all dependents + on a package, with the spec that requires them, so this need + is filled elsewhere. + + Conclusion: can be removed. + +- `_resolved` - The specifier that was ultimately resolved. For registry + dependencies, this is the tarball url that was ultimately fetched. For + git dependencies, this is the requested git url with `#` + attached to the end. Again, this metadata is important because we must + ensure that we get the same package tree between runs and on different + machines. + +- `_from` - Legacy form of `_requested`. Passed to npa to turn into a + parsed object. + +- `_spec` - Just an copy of `_requested.rawSpec`. Unnecessary. + +- `_id` - `${name}@${version}`. Unnecessary. + +While a `remote` or `git` dependency might satisfy a version dependency +(based on the same version number), but the reverse is not true. +Arguably, if we have a dependency on a git or remote dep, then a +non-git or non-remote dep is not valid. + +### Proposal + +Add a file at `node_modules/.arborist-resolutions.json` which stores the +only two pieces of metadata that npm really needs. This file has the +following format: + +```json +{ + "/package-name": { + "resolved": "${current _resolved field}", + "integrity": "${current _integrity field}" + }, + "/package-name/nested-dep": { + "resolved": "${url}", + "integrity": "${sri}" + }, + "..." +} +``` + +The key is the `Node.location` field. Metadata that would have been +stored in `_integrity` and `_resolved` fields in +`node_modules/a/node_modules/b/package.json` will be in the +`.arborist-resolutions.json` under the key `/a/b`. + +This information will _not_ be tracked for `bundleDependencies`, as they +are never fetched directly. + +If the `node_modules/.arborist-resolutions.json` file is not present, then +attempt to get the reso + +## Rationale and Alternatives + +This is closest to how pnpm stores its lock file in +`node_modules/.pnpm-lock.yaml`. However, rather than storing the entire +lock file in `node_modules`, npm will only store the information that +cannot be read from the reified package hierarchy if the package-lock is +missing. + +### Storing in the Cache + +Another potential option is to stash this information in npm's cache, like +Yarn does. Both the `~/.npm` and `node_modules` folders are to some extent +transient, but it would be a step away from letting the npm cache be +focused on providing a content-addressible cache fully managed by `cacache` +in a way that the rest of the system can mostly ignore. + +It does make a kind of intuitive sense to store integrity data in +the cache, and `cacache` does already map a request key to an integrity +guarantee. However, this information is more closely tied to the +installation in a given project tree. Ie, it's not just answering the +question "what is the integrity for the tarball at $url", but rather, "In +this project, when requesting ``, what url and integrity resolution +should I expect?" + +### Storing Many Files Instead of One + +I'm somewhat on the fence about this, frankly. + +Another option might be to create a folder at `node_modules/.arborist` +which can store anything we might find useful. + +A single file for resolutions and integrity does limit our capabilities for +parallel/concurrent installations; either the file gets corrupted (if +written non-atomically), or the last writer wins, or we have to explore +some complex merge logic. This seems like an edge case, and since +resolutions will likely be identical most of the time anyway, it's not +particularly compelling. + +One argument against writing out multiple +`node_modules/.arborist/package-metadata/some-package-resolution.json` +files is that it's slightly slower to write a lot of little files than a +single one, especially since it's yet another thing to shuffle around as +we move packages around in the tree. It's easier to just keep the +resolution and integrity on the Node object, and walk the tree in memory to +generate the JSON output. + +## Implementation + +- Arborist.loadActual and Arborist.loadVirtual will read + `node_modules/.arborist-resolutions.json` for expected `_integrity` and + `_resolved` values. +- If the resolutions file is not found, then loadVirtual will look to the + `package-lock.json` or `yarn.lock` file (since loadVirtual requires the + presence of a lock file anyway); loadActual will look in the lock file, + falling back to the `package.json` file in the realized package tree. +- The integrity and resolved values will be stored on the Node objects, but + _not_ written to `package.json`. +- If thet integrity or resolution of a previously installed package cannot + be ascertained, then the package-lock.json will not include this data. +- When installing from a lock file that does not include integrity or + resolution, accept whatever integrity value the registry provides, or + whatever file we download. +- Whenever we make a change to the `node_modules` tree on disk, write the + resolved and integrity values back to + `node_modules/.arborist-resolutions.json`, even if `--save` is false. + This doesn't lock the dependency in place; it only makes it _possible_ to + lock it properly later, if desired. + +## Prior Art + +`node_modules/.pnpm-lock.yaml` + +`~/Library/Caches/Yarn/v4/npm---/node_modules//.yarn-metadata.json`