-
Notifications
You must be signed in to change notification settings - Fork 892
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize disk access during install #904
Comments
I don't specifically know what is inefficient here, but somebody could do some stracing on Linux and see if there's anything obvious. |
This could probably be reduced to just one |
Given that the majority of filesystem operations which Rustup performs are reading and writing large (10s of MBs) datasets, I think the I do, however, think that we should look at consider if there's anything we can do to improve install performance on Windows. |
Installation on Linux filesystems is fast even on HDD so measuring performance there won't help much for Windows but still it'd be nice to make it even faster. I don't know which one is more limiting NTFS or AV software but shipping less files would be the way to go here. |
If I understand correctly, the |
I've done some basic analysis on Windows 10 Surface; since OS specific code paths can make Linux traces (while convenient to grab) misleading :). My test setup had Windows Defender running normally; controlled folder access was off, no defender exclusions defined. The high level view is that for docs we spend: sidebar: We are quite chatty in the OS syscalls we make - many extra opens than needed, but looking at the timing, that should be considered a microoptimisation, not a first-pass target. See the trace excerpts below for examples. What should we be able to expect without getting into hairy levels of OS specific integration? Python benchmark
import os
import shutil
import time
now = time.time()
os.mkdir("root")
os.chdir("root")
written = 0
for dirname in range(80):
os.mkdir("%s" % dirname)
for fname in range(200):
with open("%s/%s"%(dirname, fname), "wt") as f:
written += f.write(("this is a small file %s\n" % fname)*512)
finish = time.time()
os.chdir("..")
shutil.rmtree("root")
print("created 16000 files in 80 directories in %0.2fs with %s bytes" % (finish-now, written)) Total time right now is 3m20s, of which 60 seconds is entirely dealing with making an unused copy of the content: the staging process is:
So we can shave a minute off if we unpack directly into the final directory. Would that be safe? If the archive has been validated during download, so we know its contents are intact, then there are no additional IO errors that can occur unpacking directly:
Alternatively, we could unpack to a staging directory but then rename into the final directory - that would require double-touching each file, but ensure we could fit all the files on disk before placing any of them into the target location (if thats important - I don't think it is :)). That would still leave 2m20s to unpack vs 60s for a trivial driver; some of that may be decompression overhead, IO from the source archive etc- I don't have a breakdown for that yet. Trace excerpts
(I will happily mail a tarball of the full trace in PMLformat to anyone that wants it) |
Codewise this is layered like so:
|
Per rust-lang#904 copying the contents of dists out of the extracted staging directory and then later deleting that same staging directory consumes 60s out of a total of 200s on Windows. Wall clock testing shows this patch reduces `rustup toolchain install nightly` from 3m45 to 2m23 for me - including download times etc. I'm sure there is more that can be done, thus I'm not marking this as a closing merge for rust-lang#904.
Per rust-lang#904 copying the contents of dists out of the extracted staging directory and then later deleting that same staging directory consumes 60s out of a total of 200s on Windows. Wall clock testing shows this patch reduces `rustup toolchain install nightly` from 3m45 to 2m23 for me - including download times etc. I'm sure there is more that can be done, thus I'm not marking this as a closing merge for rust-lang#904.
Per rust-lang#904 copying the contents of dists out of the extracted staging directory and then later deleting that same staging directory consumes 60s out of a total of 200s on Windows. Wall clock testing shows this patch reduces `rustup toolchain install nightly` from 3m45 to 2m23 for me - including download times etc. I'm sure there is more that can be done, thus I'm not marking this as a closing merge for rust-lang#904.
Current performance quick metrics: With a defender exclusion on ~/.rustup: So there is still a lot of fat we should be able to cut; I have some experiments (see above) but nothing that is as definitive as halving the IO as my first patch did :). The benchmarks are pretty noisy too :/. A variable I have to isolate yet is the windows search index; that is also possibly double / triple working (once on initial write, once on the rename, and then removing from the index at the rename step as well). Setting the indexing service to pause reduces load while testing, but I haven't run a set of benchmarks with it enabled/off : and when it is enabled it dynamically throttles, so enables is not always enabled, adding complexity in generating robust numbers. tl;dr: there is hope for more improvements; a lot of syscall noise still remains though changes deeper in e.g. tar and possibly rust itself will be needed to fix that (e.g. setting the mode on a file in windows should be done while it is open - chmod is sugar for fchmod). Users wanting things to be as fast as possible today should do the following IMO:
|
I think this ticket should be closed: we're doing create, write, close now for each file, no extra copies. We do do the move-into-place thing but as docs is moved as a top level dir, its a single syscall. |
I concur, thank you @rbtcollins |
Installation is noticeably slow. I believe there is redundant disk work during installation, associated with the transaction machinery.
The text was updated successfully, but these errors were encountered: