Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Please support loading node_modules from an archive #27501

Closed
guilt opened this issue Apr 30, 2019 · 15 comments
Closed

Please support loading node_modules from an archive #27501

guilt opened this issue Apr 30, 2019 · 15 comments
Labels
feature request Issues that request new features to be added to Node.js. module Issues and PRs related to the module subsystem. stale

Comments

@guilt
Copy link

guilt commented Apr 30, 2019

node_modules is the folder where Node typically picks up libraries for a runtime/application. Often, the number of files being distributed with a stock Node.js distribution is insane and very slow to load with random I/O on some flash/external memories. Assuming that one has a decent amount of RAM, it must be able to partially extract an archive to RAM and load the relevant JS files on demand. I would like this support for the stock distribution as well.

Is your feature request related to a problem? Please describe.
I have troubles loading some node projects from a slower SD card on R Pi; Buying a faster disk with a better queue is not an option, because I can buy one for myself, not for the many users out there.

Describe the solution you'd like
Please allow node to alternatively use a node_modules.zip, as a start?.

Describe alternatives you've considered

  • Delete many .JS files in node_modules/lib
  • Write a bin2cpp based v8 eval (increases RAM usage if many modules unused)
  • Accept fate
@devsnek
Copy link
Member

devsnek commented Apr 30, 2019

maybe duplicate of #1278

@ChALkeR ChALkeR added feature request Issues that request new features to be added to Node.js. module Issues and PRs related to the module subsystem. labels Apr 30, 2019
@devsnek
Copy link
Member

devsnek commented Apr 30, 2019

@guilt what kind of performance hit are we talking about here? can you get around this problem by reusing running node instances? node is sorta designed to be long-lived.

@bnoordhuis
Copy link
Member

Is "slow random I/O" seek time? What latencies are we talking?

If seek time is your problem then reading from an archive helps only a little because you'll be jumping back and forth just as much. Compression/locality means you'll read fewer blocks in total but since those blocks need not be contiguous on disk, seek times will still be the limiting factor.

Sequentially reading and decompressing the archive to memory alleviates that problem but then you're forced to keep a lot of files around on the off chance that the application needs them.

If you sample a few node_modules directories, I think you'll find that > 50% of files will never be imported (think tests, fixtures, docs, etc.), making aggressive caching a poor trade-off.

You could evict such files from the cache LRU-style but then you're halfway to reinventing the kernel's disk cache.

Talking about, there are several FUSE zipfs file systems. Maybe you can experiment with those and check what performance improvements (if any) you get?

@guilt
Copy link
Author

guilt commented May 1, 2019

@devsnek agree. It affects scripts (toolchains) more than services. But if node was not meant for writing scripts, running NPM on cold boot may not be optimal, by that logic. I did not want to separate these workloads logically when filing this.

@bnoordhuis yes, it is equivalent to seek, except it has to do with the flash controllers as opposed to moving heads on a HDD.

There may be a combination of problems here:

  1. node_modules may store files that do not matter for end user; This is a distribution issue.
  2. definitely random read / I/O controller related problems

Fuse introduces its own syscall overhead in addition to the archive seek overhead, I'm open to other suggestions for benchmarking too.

@bnoordhuis
Copy link
Member

squashfs? It's a mainline kernel module so there's a pretty good chance your kernel comes with pre-built support for it; and if not, building it from source is quite easy. It supports gzip, lzo, xz and, with recent kernels, zstd.

@addaleax
Copy link
Member

addaleax commented May 1, 2019

Fwiw, I think Electron already supports something along these lines? Maybe they have some input here? @codebytere @ryzokuken

@Fishrock123
Copy link
Contributor

See also #11903, which was "backlogged". CC also @bmeck

@Antonius-S
Copy link

Consider WebPack, I tried it with some big projects and it seem to work.

@GrosSacASac
Copy link
Contributor

I had a project some time ago that used a lot of npm scripts, each starting node and doing something. It also had a big npm script that run all of them one by one. I refactored it so that independent task could run in parallel. And also I made sure that node itself was only started once. This combined effort reduced the time of the full build by 80%.

@GrosSacASac
Copy link
Contributor

What could help you also is to use a tool to bundle your files. So each task would only open one big self-contained file.

@arcanis
Copy link
Contributor

arcanis commented Aug 15, 2019

Fwiw, I think Electron already supports something along these lines? Maybe they have some input here? @codebytere @ryzokuken

Electron uses ASAR archives, which are supported through a partial fs implementation. Yarn 2 does the same thing, but we use regular Zip archives and we cover a wider area of the fs interface.

Fwiw some the problems I've identified in those approaches the way they're currently implemented:

  • It works well in a context where all modules operate in the same domain, but I'm not sure it's still the case if they all live in separate ones.

  • It's very hard to test the fs module. Splitting the FS testsuite into its own package that both Node and third-party FS could use would be very interesting.

On the other hand, some things work really well:

  • We don't only add support for Zip archives, we also have a virtual FS that we use to avoid having to create thousands of symlinks just to disambiguate peer dependencies. A simple support for Zip archives wouldn't solve that, so we'd still need to use our FS layer for that (note: we might be able to eventually get rid of this depending how the new module API turns out).

  • Some packages tend to use the fs methods on themselves, and adding this Zip support at the fs layer gives a better compatibility with those than if it was a dedicated "Resource API".

Overall, I think allowing modules to officially extend the regular fs interface would be extremely valuable - but the API should be standardized to prevent potential incompatibilities as much as possible.

@devsnek
Copy link
Member

devsnek commented Aug 15, 2019

@bmeck

@bmeck
Copy link
Member

bmeck commented Aug 15, 2019

@arcanis I think virtualizing fs is likely a separate issue to be discussed.

Some packages tend to use the fs methods on themselves, and adding this Zip support at the fs layer gives a better compatibility with those than if it was a dedicated "Resource API".

I have concerns but think loading from different virtual systems should be supported. Not all systems are compatible with all features of fs and sometimes things like case sensitivity can be incorrectly implemented/detected at runtime. I do not think loading https using fs.readFileSync for example is desirable and also tend to find similar things not compatible. Archive based loading is already proposed for the web using http signed exchanges. I think even while it may be tempting to only support fs compatible systems of loading, it is not within the scope of this issue and providing custom fs utility is unrelated to loading archives and the formats of those archives. A higher order runtime (such as electron/yarn/tink/meteor) etc. can provide custom shims for backwards compatibility if they desire but a resource loading API would be preferred for the general use case and as such is its own issue.

@github-actions
Copy link
Contributor

There has been no activity on this feature request for 5 months and it is unlikely to be implemented. It will be closed 6 months after the last non-automated comment.

For more information on how the project manages feature requests, please consult the feature request management document.

@github-actions github-actions bot added the stale label Feb 25, 2022
@targos targos moved this to Pending Triage in Node.js feature requests Feb 25, 2022
@targos targos moved this from Pending Triage to Stale in Node.js feature requests Feb 25, 2022
@github-actions
Copy link
Contributor

There has been no activity on this feature request and it is being closed. If you feel closing this issue is not the right thing to do, please leave a comment.

For more information on how the project manages feature requests, please consult the feature request management document.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request Issues that request new features to be added to Node.js. module Issues and PRs related to the module subsystem. stale
Projects
None yet
Development

No branches or pull requests

10 participants