-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Download crates in parallel with HTTP/2 #6005
Commits on Sep 18, 2018
-
Move downloading crates higher up in Cargo
This commit is intended to lay the groundwork for downloading crates in parallel from crates.io. It starts out by lifting up the download operation from deep within the `Source` trait to the `PackageSet`. This should allow us to maintain a queue of downloads specifically in a `PackageSet` and arbitrate access through that one type, making it easier for us to implement parallel downloads. The `Source` trait's `download` method may now fail saying "go download this data", and then the data is fed back into the trait object once it's complete to actually get the `Package` out.
Configuration menu - View commit details
-
Copy full SHA for c94804b - Browse repository at this point
Copy the full SHA c94804bView commit details -
Rename PackageSet::get to
get_one
This commit removes the old `PackageSet::get` method (used for downloading crates) to `PackageSet::get_one`. The new method explicitly indicates that only one package is being fetched. Additionally this commit also adds support for an API that allows supporting downloading multiple packages in parallel. Existing callers are all updated to use the parallel API where appropriate except for `BuildContext`, which will be updated in the next commit. Also note that no parallel downloads are done yet, they're still synchronously performed one at a time. A later commit will add support for truly downloading in parallel.
Configuration menu - View commit details
-
Copy full SHA for 7b1a0dc - Browse repository at this point
Copy the full SHA 7b1a0dcView commit details -
Refactor
build_unit_dependencies
for parallel downloadsThis commit refactors the `build_unit_dependencies` function for leveraging parallel downloads. This function is the primary location that crates are download as part of `cargo build`, which is one of the most critical locations to take advantage of parallel downloads! The problem here is somewhat unusal in that we don't want to download the entire crate graph but rather only the precise slice that we're compiling. This means that we're letting the calculation of unit dependencies entirely drive the decision of what crates to download. While we're calculating dependencies, though, we may need to download more crates! The strategy implemented here is to attempt to build the unit dependency graph. Any missing packages are skipped during this construction and enqueued for download. If any package was enqueued, we throw away the entire graph, wait for a download to finish, and then attempt to construct the unit graph again. The hope here is that we'll be able to incrementally make progress and incrementally add more and more crates to be downloaded in parallel. The worry is that this is a relatively inefficient strategy as we could reconstruct the unit dependency graph N times (where you have N total transitive dependencies). To help alleviate this concern wait for only a small handful of crates to actively be downloading before we continue to try to recreate the dependency graph. It's hoped that this will strike a good balance between ensuring we always have a full pipeline of downloads without re-downloading too much.
Configuration menu - View commit details
-
Copy full SHA for 44a7ee7 - Browse repository at this point
Copy the full SHA 44a7ee7View commit details -
Parallelize downloads with HTTP/2
This commit implements parallel downloads using `libcurl` powered by `libnghttp2` over HTTP/2. Using all of the previous refactorings this actually implements usage of `Multi` to download crates in parallel. This achieves some large wins locally, taking download times from 30s to 2s in the best case. The standard output of Cargo is also changed as a result of this commit. It's no longer useful for Cargo to print "Downloading ..." for each crate really as they all start instantaneously. Instead Cargo now no longer prints `Downloading` by default (unless attached to a pipe) and instead only has one progress bar for all downloads. Currently this progress bar is discrete and based on the total number of downloads, no longer specifying how much of one particular download has happened. This provides a less granular view into what Cargo is doing but it's hoped that it looks reasonable from an outside perspective as there's still a progress bar indicating what's happening.
Configuration menu - View commit details
-
Copy full SHA for 468f243 - Browse repository at this point
Copy the full SHA 468f243View commit details -
Only enable downloads through an RAII structure
This should help us scope downloads to ensure that we only ever have one download session at once and it's explicitly scoped so we can handle the progress bar and everything.
Configuration menu - View commit details
-
Copy full SHA for 0ea0c8b - Browse repository at this point
Copy the full SHA 0ea0c8bView commit details -
Configuration menu - View commit details
-
Copy full SHA for 775b4eb - Browse repository at this point
Copy the full SHA 775b4ebView commit details -
Update the progress bar for parallel downloads
This is actually a super tricky problem. We don't really have the capacity for more than one line of update-able information in Cargo right now, so we need to squeeze a lot of information into one line of output for Cargo. The main constraints this tries to satisfy are: * At all times it should be clear what's happening. Cargo shouldn't just hang with no output when downloading a crate for a long time, a counter ideally needs to be decreasing while the download progresses. * If a progress bar is shown, it shouldn't jump around. This ends up just being a surprising user experience for most. Progress bars should only ever increase, but they may increase at different speeds. * Cargo has, currently, at most one line of output (as mentioned above) to pack information into. We haven't delved into fancier terminal features that involve multiple lines of update-able output. * When downloading crates as part of `cargo build` (the norm) we don't actually know ahead of time how many crates are being downloaded. We rely on the calculation of unit dependencies to naturally feed into downloading more crates. * Furthermore, once we decide to download a crate, we don't actually know how big it is! We have to wait for the server to tell us how big it is. There doesn't really seem to be a great solution that satisfies all of these constraints unfortunately. As a result this commit implements a relatively conservative solution which should hopefully get us most of the way there. There isn't actually a progress bar but rather Cargo prints that it's got N crates left to download, and if it takes awhile it prints out that there are M bytes remaining. Unfortunately the progress is pretty choppy and jerky, not providing a smooth UI. This appears to largely be because Cargo will synchronously extract tarballs, which for large crates can cause a noticeable pause. Cargo's not really prepared internally to perform this work on helper threads, but ideally if it could do so it would improve the output quite a bit! (making it much smoother and also able to account for the time tarball extraction takes).
Configuration menu - View commit details
-
Copy full SHA for a46df8f - Browse repository at this point
Copy the full SHA a46df8fView commit details -
Configuration menu - View commit details
-
Copy full SHA for e2637b6 - Browse repository at this point
Copy the full SHA e2637b6View commit details -
Handle callbacks invoked outside of
tls::set
This'll happen apparently when we drop items, and no need to keep tracking information there!
Configuration menu - View commit details
-
Copy full SHA for 78922ca - Browse repository at this point
Copy the full SHA 78922caView commit details -
* Don't use `chain_err` internally inside `try` because then the spurious error printed internally is bland and doesn't contain the right error information. Instead place the `chain_err` on the outside so it propagates up to the user still but doesn't get printed for spurious errors. * Handle `pending_ids` management in the case of an error on the connection.
Configuration menu - View commit details
-
Copy full SHA for 304871c - Browse repository at this point
Copy the full SHA 304871cView commit details -
Use
Message::result_for
for better error messagesRecently fixed in the `curl` crate, this'll allow getting the full and complete error message for each transfer instead of just the error code.
Configuration menu - View commit details
-
Copy full SHA for bc94291 - Browse repository at this point
Copy the full SHA bc94291View commit details -
Don't print a download summary if an error happens
Otherwise it sort of just clutters things up!
Configuration menu - View commit details
-
Copy full SHA for 5a5b611 - Browse repository at this point
Copy the full SHA 5a5b611View commit details -
Avoid a debug overflow from curl
Curl doesn't guarantee that we'll get the total/current progress bars in a "as we expect" fashion, so throw out weird data points (such as we've downloaded more bytes than curl thinks we'll be downloading in total).
Configuration menu - View commit details
-
Copy full SHA for 2245fc4 - Browse repository at this point
Copy the full SHA 2245fc4View commit details -
Don't enable HTTP1 pipelining as apparently it's flaky
It seems to fix some issues perhaps!
Configuration menu - View commit details
-
Copy full SHA for 9da2e70 - Browse repository at this point
Copy the full SHA 9da2e70View commit details -
Disable multiplexing by default for now
Let's hopefully weed out any configuration issues before turning it on by default!
Configuration menu - View commit details
-
Copy full SHA for f4dcb5c - Browse repository at this point
Copy the full SHA f4dcb5cView commit details