-
Notifications
You must be signed in to change notification settings - Fork 240
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
rfc: Don't add metadata to package.json
- Loading branch information
Showing
1 changed file
with
291 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,291 @@ | ||
# Do Not Put Meta Information In `_`-Prefixed `package.json` Fields | ||
|
||
## Summary | ||
|
||
Currently, npm stashes various information in the `package.json` files that | ||
it creates. | ||
|
||
We will stop putting metadata in `package.json` files in `node_modules`. | ||
The fields that serve a purpose will be relocated to | ||
`node_modules/.arborist-resolutions.json`. When this metadata does not | ||
exist, npm will make a best effort to gather metadata from | ||
`package-lock.json` or `yarn.lock`. | ||
|
||
## Motivation | ||
|
||
These fields are only relevant to npm, and as of version 6, npm relies on | ||
the presence of these fields for most things it does with packages on disk. | ||
|
||
This creates problems when teams use both npm and yarn or pnpm, because | ||
the fields are missing. | ||
|
||
In particular, `npm ls` does not function properly if these fields are not | ||
set, treating packages without the `_fields` as if they were not present in | ||
the tree at all. Running `npm install` will cause npm to behave in | ||
confusing ways as well. | ||
|
||
Writing back to `package.json` adds work to the installer's process, which | ||
`npm ci` attempts to streamline. However, the mechanism of doing this | ||
means that `npm ci` and `npm install` write _different_ metadata to | ||
installed `package.json` files, further adding to the confusion. | ||
|
||
All of this would be justifiable if these fields were necessary, or at | ||
least provide some significant benefit. However, this is not the case. | ||
Several of them are irrelevant or unnecessary, and rather than aid in | ||
debugging, they merely add confusion. | ||
|
||
## Detailed Explanation | ||
|
||
The default in npm has traditionally been to write back the `package.json` | ||
file stashing whatever metadata npm was already using for its own purpose. | ||
However, some of them are no longer necessary, and can be removed with no | ||
problem. Others need to find a new home. | ||
|
||
A real-life example: | ||
|
||
```json | ||
{ | ||
"_args": [ | ||
[ | ||
"[email protected]", | ||
"/Users/isaacs/dev/js/tap" | ||
] | ||
], | ||
"_from": "[email protected]", | ||
"_id": "[email protected]", | ||
"_inBundle": false, | ||
"_integrity": "sha512-HxJdYWq1MTIQbJ3nw0cqssHoTNU267KlrDuGZ1WYlxDStUtKUhOaJmh112/TZmHxxUfuJqPXSOm7tDyas0OSIQ==", | ||
"_location": "/which", | ||
"_phantomChildren": {}, | ||
"_requested": { | ||
"type": "version", | ||
"registry": true, | ||
"raw": "[email protected]", | ||
"name": "which", | ||
"escapedName": "which", | ||
"rawSpec": "1.3.1", | ||
"saveSpec": null, | ||
"fetchSpec": "1.3.1" | ||
}, | ||
"_requiredBy": [ | ||
"/", | ||
"/cross-spawn", | ||
"/execa/cross-spawn", | ||
"/istanbul-lib-processinfo/cross-spawn", | ||
"/spawn-wrap" | ||
], | ||
"_resolved": "https://registry.npmjs.org/which/-/which-1.3.1.tgz", | ||
"_spec": "1.3.1", | ||
"_where": "/Users/isaacs/dev/js/tap", | ||
"...": "the rest of package.json" | ||
} | ||
``` | ||
|
||
Most of this information is duplicative of information that is stored in | ||
`package-lock.json`, or generated by the installer when building the tree | ||
and not relevant afterwards. | ||
|
||
Much of it can be omitted without giving up much, especially since it is | ||
already stored elsewhere. | ||
|
||
### Fields | ||
|
||
- `_args` - the set of arguments to the install methods that resolved to | ||
this dependency. (This is a 2-d array, and will have multiple entries if | ||
a package is deduped more than once to a given location.) | ||
|
||
As far as I can tell, this is fine to remove, because it is not | ||
load-bearing anywhere in the system. | ||
|
||
- `_inBundle` - Whether the package or any of its parents are a | ||
`bundleDependency`. | ||
|
||
This information can be deduced from the package.json files on disk. | ||
|
||
- `_where` - Excessive. If you can find the `package.json` file to | ||
read it, then you already know where it is. | ||
|
||
Furthermore, since this captures the actual path on disk when the package | ||
is _installed_, it can unexpectedly leak misleading information | ||
in the case of `bundleDependencies`. Installing `[email protected]` will | ||
result in a file at `node_modules/tap/node_modules/ink/package.json` | ||
with a `_from` field of `[["[email protected]","/Users/isaacs/dev/js/tap"]]`. | ||
|
||
- `_integrity` - The SRI integrity field for this package. This is | ||
important for the `package-lock.json` to be able to ensure that we are | ||
receiving the same bits as the lockfile expects. | ||
|
||
- `_shasum` - Legacy sha1 integrity check. Converted to a `sha1` SRI | ||
integrity field if present. | ||
|
||
- `_requested` - The [npa](http://npm.im/npm-package-arg)-parsed specifier | ||
that was requested. | ||
|
||
If a package is required multiple times, `_requested` is the earliest | ||
request that led to it being fetched. | ||
|
||
This is load bearing throughout the system, but most of the cases where | ||
it's being used will be modified or replaced in the Arborist refactor. | ||
Most places where `node.package._requested` is being used, it would be | ||
more appropriate to store the information on `node.requested` instead. | ||
|
||
Arborist Node objects have a `edgesIn` Set that captures all dependents | ||
on a package, with the spec that requires them, so this need | ||
is filled elsewhere. | ||
|
||
Conclusion: can be removed. | ||
|
||
- `_resolved` - The specifier that was ultimately resolved. For registry | ||
dependencies, this is the tarball url that was ultimately fetched. For | ||
git dependencies, this is the requested git url with `#<commit-hash>` | ||
attached to the end. Again, this metadata is important because we must | ||
ensure that we get the same package tree between runs and on different | ||
machines. | ||
|
||
- `_from` - Legacy form of `_requested`. Passed to npa to turn into a | ||
parsed object. | ||
|
||
- `_spec` - Just an copy of `_requested.rawSpec`. Unnecessary. | ||
|
||
- `_id` - `${name}@${version}`. Unnecessary. | ||
|
||
- `_requiredBy` - This is load bearing currently, to create the package | ||
trees shown in `npm ls`. However, Arborist adds a `node.edgesIn` field | ||
which lists all the dependencies on a given node along with their types. | ||
|
||
Still, having that information in-place is useful when attempting to | ||
determine why a given package in the file tree is being included. | ||
Using `npm ls`, this can be difficult to determine, since `npm ls` does | ||
not provide a facility for filtering by the _location_ of a node, only | ||
by its _name_. | ||
|
||
A subsequent RFC will discuss adding `npm ls <location>`, which will | ||
address this use case in a more user-friendly way than looking at | ||
implementation details directly. | ||
|
||
Conclusion: can be removed. | ||
|
||
- `_shrinkwrap` - This is a complete copy of the `npm-shrinkwrap.json` file | ||
if it exists in the package. Since it's a duplicate of data that already | ||
exists (in roughly the same location), it can be safely omitted. | ||
|
||
While a `remote` or `git` dependency might satisfy a version dependency | ||
(based on the same version number), but the reverse is not true. | ||
Arguably, if we have a dependency on a git or remote dep, then a | ||
non-git or non-remote dep is not valid. | ||
|
||
### Proposal | ||
|
||
Add a file at `node_modules/.arborist-resolutions.json` which stores the | ||
only two pieces of metadata that npm really needs. This file has the | ||
following format: | ||
|
||
```json | ||
{ | ||
"/package-name": { | ||
"resolved": "${current _resolved field}", | ||
"integrity": "${current _integrity field}" | ||
}, | ||
"/package-name/nested-dep": { | ||
"resolved": "${url}", | ||
"integrity": "${sri}" | ||
}, | ||
"..." | ||
} | ||
``` | ||
|
||
The key is the `Node.location` field. Metadata that would have been | ||
stored in `_integrity` and `_resolved` fields in | ||
`node_modules/a/node_modules/b/package.json` will be in the | ||
`.arborist-resolutions.json` under the key `/a/b`. | ||
|
||
This information will _not_ be tracked for `bundleDependencies`, as they | ||
are never fetched directly. | ||
|
||
If the `node_modules/.arborist-resolutions.json` file is not present, then | ||
attempt to get the reso | ||
|
||
## Rationale and Alternatives | ||
|
||
This is closest to how pnpm stores its lock file in | ||
`node_modules/.pnpm-lock.yaml`. However, rather than storing the entire | ||
lock file in `node_modules`, npm will store only the information that | ||
cannot be read from the reified `node_modules` package hierarchy. | ||
|
||
### Storing in the Cache | ||
|
||
Another potential option is to stash this information in npm's cache, like | ||
Yarn does. Both the `~/.npm` and `node_modules` folders are to some extent | ||
transient, but it would be a step away from letting the npm cache be | ||
focused on providing a content-addressible cache fully managed by `cacache` | ||
in a way that the rest of the system can mostly ignore. | ||
|
||
It does make a kind of intuitive sense to store integrity data in | ||
the cache, and `cacache` does already map a request key to an integrity | ||
guarantee. However, this information is more closely tied to the | ||
installation in a given project tree. Ie, it's not just answering the | ||
question "what is the integrity for the tarball at $url", but rather, "In | ||
this project, when requesting `<spec>`, what url and integrity resolution | ||
should I expect?" | ||
|
||
### Storing Many Files Instead of One | ||
|
||
Another option might be to create a folder at `node_modules/.arborist` | ||
which can store anything we might find useful. | ||
|
||
A single file for resolutions and integrity does limit our capabilities for | ||
parallel/concurrent installations; either the file gets corrupted (if | ||
written non-atomically), or the last writer wins, or we have to explore | ||
some complex merge logic. This seems like an edge case, and since | ||
resolutions will likely be identical most of the time anyway, it's not | ||
particularly compelling. | ||
|
||
One argument against writing out multiple | ||
`node_modules/.arborist/package-metadata/some-package-resolution.json` | ||
files is that it's slightly slower to write a lot of little files than a | ||
single one, especially since it's yet another thing to shuffle around as | ||
we move packages around in the tree. It's easier to just keep the | ||
resolution and integrity on the Node object, and walk the tree in memory to | ||
generate the JSON output. | ||
|
||
### Compatibility Requirements | ||
|
||
There is a high cost to breaking changes in `package.json` or | ||
`package-lock.json` semantics, because this affects colleagues on a project | ||
and, in the case of package.json, downstream installers. | ||
|
||
However, in this case, the only compatibility issue with removing \_fields | ||
from package.json would be if a user _downgraded_ their npm client to a | ||
version that relied on this metadata, which is far more rare. | ||
|
||
Improving compatibility with current yarn and pnpm versions is more | ||
valuable than maintaining compatibility (on this feature) with previous npm | ||
clients. | ||
|
||
## Implementation | ||
|
||
- Arborist.loadActual and Arborist.loadVirtual will read | ||
`node_modules/.arborist-resolutions.json` for expected `_integrity` and | ||
`_resolved` values. | ||
- If the resolutions file is not found, then loadVirtual will look to the | ||
`package-lock.json` or `yarn.lock` file (since loadVirtual requires the | ||
presence of a lock file anyway); loadActual will look in the lock file, | ||
falling back to the `package.json` file in the realized package tree. | ||
- The integrity and resolved values will be stored on the Node objects, but | ||
_not_ written to `package.json`. | ||
- If thet integrity or resolution of a previously installed package cannot | ||
be ascertained, then the package-lock.json will not include this data. | ||
- When installing from a lock file that does not include integrity or | ||
resolution, accept whatever integrity value the registry provides, or | ||
whatever file we download. | ||
- Whenever we make a change to the `node_modules` tree on disk, write the | ||
resolved and integrity values back to | ||
`node_modules/.arborist-resolutions.json`, even if `--save` is false. | ||
This doesn't lock the dependency in place; it only makes it _possible_ to | ||
lock it properly later, if desired. | ||
|
||
## Prior Art | ||
|
||
`node_modules/.pnpm-lock.yaml` | ||
|
||
`~/Library/Caches/Yarn/v4/npm-<package>-<version>-<hash>/node_modules/<package>/.yarn-metadata.json` |