Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use fcopy for faster file copying #3290

Closed
wants to merge 1 commit into from
Closed

Conversation

sciolist
Copy link
Contributor

@sciolist sciolist commented May 1, 2017

(this is not something that should be merged now, needs more testing)

I dug up my old uv_sendfile code and thought it could be interesting to make a little package out of it (https://github.com/sciolist/fcopy) and see what impact it would have on yarn.

My test setup:

Results

Platform avg speed v0.24 (master) avg speed with fcopy lib
FreeBSD 32s 9s
Ubuntu 30s 9s
Os x 32s 10s
Windows 68s 32s

I've ran diffs on the output and worked with it a bit today without noticing any differences or issues, but, I wouldn't be surprised if there are some.. particularly on other bsds.

@sciolist sciolist force-pushed the fcopy branch 3 times, most recently from 666b4d1 to ec1774c Compare May 1, 2017 02:29
@bestander
Copy link
Member

Nice, what is the next step?

@Daniel15
Copy link
Member

Daniel15 commented May 12, 2017

Probably worth collaborating with @vbfox on this - He was looking at using the Win32 CopyFile API for copying files (#2960). Perhaps the Win32 stuff could be merged into fcopy if it helps.

@vbfox
Copy link
Contributor

vbfox commented May 12, 2017

So I tried on win32 with fcopy vs win32 CopyFile and there is no big gain to be made (~1% on the test project) but there is quite a bit of gain possible on win32 by removing the utimes call if we're sure that we're calling the native CopyFile (Windows preserve mtime by default and it's the only field Yarn seem to use).

On my PC this PR take 11.75s and without the utime call it's 10.21s

@sciolist
Copy link
Contributor Author

So I've been using it for about 2 weeks now, on os x and windows, a bit on linux.. without any issues so far.

Based on the perf. reported in #2960 it seems like this performs similarly to FileCopy, but a fair bit of that would probably be from ffi. I'll be interested in seeing if he finds a way to speed it up for windows.

One problem is that you need a version of the native extension built for your major version of node (specifically NODE_MODULE_VERSION from https://nodejs.org/en/download/releases/.) That makes distribution a bit tricky.

If you're using a mismatching node version it'll fall back to readstream+writestream just like yarn copies files today, so it isn't the end of the world.. and if you want to keep distributing yarn as a single js file that'll work fine as well.

My current setup is that it'll download the correct extension from https://github.com/sciolist/fcopy/releases/tag/v0.0.5 for your version of node when you yarn add fcopy.. But that of course won't work if you're installing yarn through an .msi. One approach would be to just download the binaries for all supported node versions up front so it could pick the correct one at runtime. Another option I was considering is to download the version it needs at runtime, but, that just doesn't feel very good.

@vbfox
Copy link
Contributor

vbfox commented May 12, 2017

I think packaging the binary with yarn would be nice. The total size per platform over supported node versions don't seem so big & the fallback is always there if the user got another version or is using the single-file version.

Also in the end I checked and mtime can't be optimized aways when fcopy is used as it's necessary here (libuv does the copy manually not by using OS functions so it doesn't use win32 API)
I don't think a win32 copier in fcopy is really needed (1% speedup isn't really worth it) but if you want I can PR it (The code is there if needed : https://github.com/vbfox/native-copy-file/blob/gyp/src/hello.cc)

My times btw (Avg over 5 run, offline mode)

0.23.2 CopyFile libuv
24.97s 11.48s 11.75s
100% 46% 47%

@Daniel15
Copy link
Member

My current setup is that it'll download the correct extension from https://github.com/sciolist/fcopy/releases/tag/v0.0.5 for your version of node when you yarn add fcopy.. But that of course won't work if you're installing yarn through an .msi. One approach would be to just download the binaries for all supported node versions up front so it could pick the correct one at runtime.

The binaries look pretty small, we could probably bundle all of them with Yarn and just load the correct one at runtime, if possible. I don't know a lot about how native extensions work though, and whether they can be dynamically loaded.

@bestander
Copy link
Member

bestander commented May 12, 2017 via email

@sciolist
Copy link
Contributor Author

@Daniel15 Yeah you can dynamically load them, you just require() the built .node file and it'll throw an exception if you try to load an incompatible library.. I could probably just package all the different versions into one tar.gz file per platform instead, that way no changes would have to be made in yarn.

@vbfox for a 1% gain it feels like it's probably preferable to just stick with libuv. :/ a bit less code to maintain, and the libuv code is probably at least somewhat battle tested. regarding utimes, you could maybe skip actually await-ing the utimes result.. not sure if that would cause other issues.

@sciolist
Copy link
Contributor Author

sciolist commented May 12, 2017

@bestander not really, though if fcopys post-install downloads binaries for all supported node versions there shouldn't have to be more handling in yarn than for any other library. the alternative would be to convince the node maintainers that a fast file copy function should be added to fs.. not sure how receptive they'd be to that.

That said I do definitely agree that it feels better to not need native extensions.

@sciolist
Copy link
Contributor Author

@Daniel15 Made a few updates, yarn add fcopy downloads binaries for several v8 versions.. I'll have to try out yarns dist builds to see how well that will work with them.

The function for downloading binaries is exposed now, so if there is no local version of the binary it can attempt to download it:

require('fcopy/lib/bindings').downloadIfNeeded(process.versions.modules).then(
  function () {},
  function (ex) { /* will fall back to fs streams */ }
);

Not sure if it should actually be used though.. For most situations it doesn't feel great to just download and execute binaries off the internet, but then yarn already does that, so maybe not a big deal.

I figured I'd also put up a quick video comparison of the current file copying in yarn:

fs streams (osx), taking 9.3 seconds to copy 42k files (304M)

fcopy (osx), taking 2.4 seconds for the same files.

It is a pretty significant difference, but if it's worth adding a binary dependency to get it is hard for me to say.

@Daniel15
Copy link
Member

Daniel15 commented Jun 6, 2017

This will be useful for copy-on-write filesystems and should provide a significant speed improvement for systems that use CoW filesystems. It would be good to get this in, we just need to think about the deployment strategy and how to handle it for the single-file build. Falling back to the JS code when the native library isn't available is probably sufficient.

@Daniel15
Copy link
Member

Daniel15 commented Jun 6, 2017

On the other hand, for a CoW filesystem, maybe simply generating a shell script with all the required cp --reflink=auto commands and execing that would be more maintainable than native code, as Node.js' handling of native modules is notoriously bad. 😛

@sciolist
Copy link
Contributor Author

sciolist commented Jun 6, 2017

Getting a copyfile function in libuv+node would be great, though I'm not sure what the likelihood is of that happening particularly soon.

Creating a shell script does have a compatibility advantage over a native module, since as you said they're pretty painful to maintain.. The biggest problem it that changing your local node version stops the native module from working, unless you keep a lot of different versions of the native module around (and when a new version of node comes out, you'd need to download a new native module.) I do have a fallback for that, but it isn't great that changing node version substantially reduces copy speed.

Definitely worth looking into alternatives.

@develar
Copy link

develar commented Jun 8, 2017

JFYI: Pre-bundled fcopy module size — only 174 KB (fcopy-pre-bundled). And it works great in my project (not yet tested on users :)). Quite safe since in any case there is fallback (but in my fork I removed install task and all other code that can lead to rebuild, develar/fcopy@21bb870)

@BYK
Copy link
Member

BYK commented Jul 5, 2017

Do we wanna still push forward with this? @bestander @arcanis @develar?

@develar
Copy link

develar commented Jul 5, 2017

Works well in the electron-builder (500K downloads in month) — no issues at all. One trick is required for npm 3 — empty install script, but in general it works well (fcopy-pre-bundled, if no prebuild, fallback is used, but not rebuild).

@bestander
Copy link
Member

@sciolist, @develar how does it compare with the latest Yarn?
We had a fs.copy change in 0.25/0.26 that gained a significant boost.

Could you think of how it would work with single-file build?
We bundle yarn into a .js file and download node-gyp, for example if it is not available.
That is the reason why we haven't merged it yet, we kind of like native-dependency-free build, it gives a lot of flexibility.

@develar
Copy link

develar commented Jul 6, 2017

@bestander Results from @sciolist#3539 (comment) In short: 14s vs 9s

we kind of like native-dependency-free build, it gives a lot of flexibility.

No doubt, that's why all native-related dependencies like nan are removed in the fcopy-pre-bundled and if no suitable pre bundled ("Pre-bundled fcopy module size — only 174 KB ") or error occurred during load native module, fallback is used. As workaround for npm@3 bug, empty install script "install": "echo ''" is added.

If you will decide to use fcopy, fallback should be improved — check file size and if lesser than some threshold, use current solution "do bulk file reads" (to ensure that performance will be not degraded significantly).

Update: got your point about single file — will experiment with it.

@bestander
Copy link
Member

Thanks for explaining, @develar.
Sent you and @sciolist an email to clarify a few things and see how we can move forward

@Daniel15
Copy link
Member

Daniel15 commented Jul 8, 2017

that's why all native-related dependencies like nan are removed in the fcopy-pre-bundled and if no suitable pre bundled ("Pre-bundled fcopy module size — only 174 KB ") or error occurred during load native module, fallback is used.

This sounds great. If this works with our standalone .js file build (ie. if we can just bundle the binary parts in our tarball and it all works properly), I think we should do it 😃

@sciolist
Copy link
Contributor Author

sciolist commented Jul 8, 2017

@Daniel15 What happens now when installing fcopy is that it will download all files for the current platform from https://github.com/sciolist/fcopy/releases/tag/bin-v1 (the v1 coming from a version tag in package.json.) Using the prebundled package is definitely also an option, it's either that or we'd have to update the yarn dist build to download the binaries.

One problem is that if a user upgrades their local nodejs, but their local version of yarn does not have a native module built for that new nodejs version, it will fall back to the slow path.

Something like nodejs/node#12902 would be great, but I'm not so sure it will happen. Also there's no handling of reflink/CoW in fcopy.

@bestander I saw your email, haven't had time to take a look at it.. trying to wrap some projects up for the summer.

@Daniel15
Copy link
Member

A PR just landed in libuv to add a native copyfile function: libuv/libuv#1465. This should mean that it'll come to Node.js soon!

@sciolist
Copy link
Contributor Author

That's great! A much better option than handling the issues with a native module.

@Daniel15
Copy link
Member

Issue for adding this to Node.js: nodejs/node#14906

BYK added a commit that referenced this pull request Sep 16, 2017
**Summary**

Fixes #4331. Supercedes #3290. Uses the newly added `fs.copyFile`
on Node 8.5 hen available and falls back to old buffer based
method otherwise. This patch also refactors the file copy code a
bit making it more efficient. Here are the durations on my computer
with [this package.json](https://github.com/sciolist/fcopy/blob/c0ded8b82bbaa3f4cdb0a67e951eb3f8b0f97fb7/perf/project/alottafiles/package.json):

| master | w/o fsCopy | w/ fsCopy |
| ~23s | ~19s | ~15s |

This is with `yarn.lock` in place and w/o `node_modules`.

**Test plan**

CI should pass.
BYK added a commit that referenced this pull request Sep 16, 2017
**Summary**

Fixes #4331. Supercedes #3290. Uses the newly added `fs.copyFile`
on Node 8.5 hen available and falls back to old buffer based
method otherwise. This patch also refactors the file copy code a
bit making it more efficient. Here are the durations on my computer
with [this package.json](https://github.com/sciolist/fcopy/blob/c0ded8b82bbaa3f4cdb0a67e951eb3f8b0f97fb7/perf/project/alottafiles/package.json):

| master | w/o fsCopy | w/ fsCopy |
| - | - | - |
| ~23s | ~19s | ~15s |

This is with `yarn.lock` in place and w/o `node_modules`.

**Test plan**

CI should pass.
@BYK
Copy link
Member

BYK commented Sep 16, 2017

Closing in favor of #4486.

Thanks a lot @sciolist for working on this and bringing it to our attention! 🎉 ❤️

@BYK BYK closed this Sep 16, 2017
BYK added a commit that referenced this pull request Sep 18, 2017
**Summary**

Fixes #4331. Supersedes #3290. Uses the newly added `fs.copyFile` 
on Node 8.5 hen available and falls back to the old buffer based 
method otherwise. This patch also refactors the file copy code a 
bit making it more efficient. Here are the durations on my computer 
with [this package.json](https://github.com/sciolist/fcopy/blob/c0ded8b82bbaa3f4cdb0a67e951eb3f8b0f97fb7/perf/project/alottafiles/package.json):

| master | w/o copyFile | w/ copyFile |
| - | - | - |
| ~23s | ~19s | ~14s |

This is with `yarn.lock` in place and w/o `node_modules`.

**Test plan**

CI should pass.
joaolucasl pushed a commit to joaolucasl/yarn that referenced this pull request Oct 27, 2017
**Summary**

Fixes yarnpkg#4331. Supersedes yarnpkg#3290. Uses the newly added `fs.copyFile` 
on Node 8.5 hen available and falls back to the old buffer based 
method otherwise. This patch also refactors the file copy code a 
bit making it more efficient. Here are the durations on my computer 
with [this package.json](https://github.com/sciolist/fcopy/blob/c0ded8b82bbaa3f4cdb0a67e951eb3f8b0f97fb7/perf/project/alottafiles/package.json):

| master | w/o copyFile | w/ copyFile |
| - | - | - |
| ~23s | ~19s | ~14s |

This is with `yarn.lock` in place and w/o `node_modules`.

**Test plan**

CI should pass.
stefanpenner added a commit to broccolijs/node-symlink-or-copy that referenced this pull request Jul 27, 2018
* Currently would only be utilized on windows
* available in node 8.5.x
* prefer Copy-On-Write if available
* good benchmarks yarnpkg/yarn#3290

Although node-copy-file-sync module exists, it uses more modern JS features. Which this library cannot yet support (without major version bump)
stefanpenner added a commit to broccolijs/node-symlink-or-copy that referenced this pull request Jul 27, 2018
* Currently would only be utilized on windows
* available in node 8.5.x
* prefer Copy-On-Write if available
* good benchmarks yarnpkg/yarn#3290

Although node-copy-file-sync module exists, it uses more modern JS features. Which this library cannot yet support (without major version bump)
stefanpenner added a commit to broccolijs/node-symlink-or-copy that referenced this pull request Jul 27, 2018
* Currently would only be utilized on windows
* available in node 8.5.x
* prefer Copy-On-Write if available
* good benchmarks yarnpkg/yarn#3290

Although node-copy-file-sync module exists, it uses more modern JS features. Which this library cannot yet support (without major version bump)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants