-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: reproducibility #22
Conversation
|
||
In the [Node release process](https://github.com/nodejs/Release), there are generally two most recent even-numbered major versions of Node that are the most commonly used (corresponding to either one *Active LTS* and one *Current*, or two Active LTS, depending on the time of year), and one most recent odd-numbered major version. All of these versions are actively supported and may acquire new point-releases at any time. | ||
|
||
Based on some initial experiments, after running `git gc`, a repository containing all versions (for one platform, e.g., darwin-x64) of a single major version of Node compresses to roughly 50MB; a repo consisting of all versions of 2 major versions of Node compresses to roughly 100 - 150MB; and a repo consisting of all versions of 3 major versions compresses to roughly 200MB. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding platform packs, these numbers seem to indicate that space requirements increase faster than major versions are added (50MB for one, 50-75MB each for two, ~67MB each for three).
If that's the case, what do we gain by bundling multiple majors together? While it's easy to imagine a developer needing all three, it seems to me equally easy to imagine them not needing all three, in which case the ballooning space requirements are doubly punishing.
... | ||
``` | ||
|
||
If the lockfile pins `"typescript"` at version 3.0.3, then entering the project and running: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does Notion know to create the tsc
shim in this case? Perhaps I'm out of the loop, but I seem to recall that we're not looking at hooking npm install
or yarn add
anymore?
|
||
Based on some initial experiments, after running `git gc`, a repository containing all versions (for one platform, e.g., darwin-x64) of a single major version of Node compresses to roughly 50MB; a repo consisting of all versions of 2 major versions of Node compresses to roughly 100 - 150MB; and a repo consisting of all versions of 3 major versions compresses to roughly 200MB. | ||
|
||
This suggests that during the Notion installation process, we could fetch a current **platform pack** with all of the two most recent even-numbered and one most recent odd-numbered versions of Node in a compressed git directory. (Less commonly-needed versions of Node could be provided in separate repositories, perhaps one major-version per repository.) This would take a bit of up-front setup time, but as a result, users would get the following benefits: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a compressed git directory
Would this be an actual archive file, or will the repository be git clone
d? Do we know which would typically be faster?
### Limited connectivity mode | ||
We may eventually want to offer a mode that users can opt into to use smaller platform packs, for situations where limited connectivity makes it hard to download the default packs. This should not be the default, and the UX should not *encourage* this mode, because it’s too tempting to avoid up-front downloads and then end up with less compression, more aggregate downloads and disk usage, and generally slower behavior. Also, users might be confused into thinking that the initial download is wasteful and spread misinformation about disk usage. Generally, the bet is that a multi-version git repository will lead to lower disk usage, so we should not encourage the use of more minimal packs. But as long as the UX is framed in terms of the use case of *limited connectivity* (for example, `install.sh --limited-connectivity`), this should help avoid misconceptions. | ||
|
||
Moreover, it should still be possible to upgrade a limited-connectivity install to a normal install, if the user finds themselves in better network conditions later on, without having to reinstall Notion from scratch. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may be worth considering a downgrade as well, for users switching from WiFi to cellular-tethered or some such.
- Global package binaries (via `npm i -g` or `yarn global add`) are a convenient and popular deployment model for tools—not only project build tools like `babel` and `tsc`, for which `npx` is a popular solution, but also for use cases that aren’t associated with a pre-existing project, such as `surge`, `ember new`, or `svgo`. | ||
- Global package binaries make projects sensitive to the global state of a user’s machine, making project builds brittle, unportable, and unreliable. This is such a pervasive problem that many developers recommend against the use of global installations entirely. | ||
|
||
A separate but related motivation is the desire to be able to install user tools on a local machine one and not worry about “drift” over time—that is, a developer should be able to install, say, `surge` or `svgo` and not worry that those tools might stop working because of changes to the currently-installed version of Node. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"on a local machine and not worry"? I think there's an extra word in there
|
||
## Scenarios | ||
|
||
It’s helpful to think about a few different kinds of scenarios that should support, and how reproducibility fits in. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"scenarios that Notion should support"?
|
||
It’s natural to question whether this implementation strategy is worth the effort: an alternative approach would be to allow projects to specify less precise version requirements (such as the ranges typically expressed in the `"engines"` field of `package.json`) and assume most differences will be benign. However, behavioral divergences between versions of Node do happen and are some of the trickiest bugs to nail down. Putting in extra work up front to ensure that these divergences *cannot happen, by construction* will eventually pay for itself when scaled across the Node ecosystem. | ||
|
||
Another reasonable criticism is that pinning the Node version for tools in the user toolchain means that users will not automatically benefit from performance and security improvements in Node. There are a couple of reasons this is outweighed by the benefits of pinning. First, as users upgrade the tool version itself, they will automatically get re-pinned to the newer version of Node specified by newer versions of the tool’s `package.json`. Second, we should at least allow users the option to override the tool’s specified Node version. But by letting the tool choose its platform version by default, user’s have a stronger guarantee that the tool will work and continue to work consistently based on how it was tested. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This RFC is missing a discussion about installing user tools that do not have a pinned node version. Currently there are 0 tools that specify a "toolchain"
, and even if Notion is wildly popular there will be a transition period where some tools have a pinned node
version and some do not. Actually there may always be some tools that do not specify a "toolchain"
, for whatever reason.
Notion should be able to handle this situation and preserve reproducibility. That could be using the version of node in the user's config, or if that is not specified, using the user's default version of node, and pinning that somehow. In these cases Notion could modify the installed tool's package.json to include a "toolchain"
section if it is not already present.
I think what I need to do next is rip out the discussion of platform packs and git representations, which should be separately investigated as an optimization technique down the road. That way this RFC can stay focused on the core ideas of reproducibility, which are mostly orthogonal to that implementation approach. |
Closing since this is superseded by #27. |
From the RFC:
Rendered