Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Download crates in parallel with HTTP/2 #6005

Merged
merged 15 commits into from
Sep 19, 2018

Commits on Sep 18, 2018

  1. Move downloading crates higher up in Cargo

    This commit is intended to lay the groundwork for downloading crates in parallel
    from crates.io. It starts out by lifting up the download operation from deep
    within the `Source` trait to the `PackageSet`. This should allow us to maintain
    a queue of downloads specifically in a `PackageSet` and arbitrate access through
    that one type, making it easier for us to implement parallel downloads.
    
    The `Source` trait's `download` method may now fail saying "go download this
    data", and then the data is fed back into the trait object once it's complete to
    actually get the `Package` out.
    alexcrichton committed Sep 18, 2018
    Configuration menu
    Copy the full SHA
    c94804b View commit details
    Browse the repository at this point in the history
  2. Rename PackageSet::get to get_one

    This commit removes the old `PackageSet::get` method (used for downloading
    crates) to `PackageSet::get_one`. The new method explicitly indicates that only
    one package is being fetched. Additionally this commit also adds support for an
    API that allows supporting downloading multiple packages in parallel. Existing
    callers are all updated to use the parallel API where appropriate except for
    `BuildContext`, which will be updated in the next commit.
    
    Also note that no parallel downloads are done yet, they're still synchronously
    performed one at a time. A later commit will add support for truly downloading
    in parallel.
    alexcrichton committed Sep 18, 2018
    Configuration menu
    Copy the full SHA
    7b1a0dc View commit details
    Browse the repository at this point in the history
  3. Refactor build_unit_dependencies for parallel downloads

    This commit refactors the `build_unit_dependencies` function for leveraging
    parallel downloads. This function is the primary location that crates are
    download as part of `cargo build`, which is one of the most critical locations
    to take advantage of parallel downloads!
    
    The problem here is somewhat unusal in that we don't want to download the entire
    crate graph but rather only the precise slice that we're compiling. This means
    that we're letting the calculation of unit dependencies entirely drive the
    decision of what crates to download. While we're calculating dependencies,
    though, we may need to download more crates!
    
    The strategy implemented here is to attempt to build the unit dependency graph.
    Any missing packages are skipped during this construction and enqueued for
    download. If any package was enqueued, we throw away the entire graph, wait for
    a download to finish, and then attempt to construct the unit graph again.
    
    The hope here is that we'll be able to incrementally make progress and
    incrementally add more and more crates to be downloaded in parallel. The worry
    is that this is a relatively inefficient strategy as we could reconstruct the
    unit dependency graph N times (where you have N total transitive dependencies).
    To help alleviate this concern wait for only a small handful of crates to
    actively be downloading before we continue to try to recreate the dependency
    graph. It's hoped that this will strike a good balance between ensuring we
    always have a full pipeline of downloads without re-downloading too much.
    alexcrichton committed Sep 18, 2018
    Configuration menu
    Copy the full SHA
    44a7ee7 View commit details
    Browse the repository at this point in the history
  4. Parallelize downloads with HTTP/2

    This commit implements parallel downloads using `libcurl` powered by
    `libnghttp2` over HTTP/2. Using all of the previous refactorings this actually
    implements usage of `Multi` to download crates in parallel. This achieves some
    large wins locally, taking download times from 30s to 2s in the best case.
    
    The standard output of Cargo is also changed as a result of this commit. It's
    no longer useful for Cargo to print "Downloading ..." for each crate really as
    they all start instantaneously. Instead Cargo now no longer prints `Downloading`
    by default (unless attached to a pipe) and instead only has one progress bar for
    all downloads. Currently this progress bar is discrete and based on the total
    number of downloads, no longer specifying how much of one particular download
    has happened. This provides a less granular view into what Cargo is doing but
    it's hoped that it looks reasonable from an outside perspective as there's
    still a progress bar indicating what's happening.
    alexcrichton committed Sep 18, 2018
    Configuration menu
    Copy the full SHA
    468f243 View commit details
    Browse the repository at this point in the history
  5. Only enable downloads through an RAII structure

    This should help us scope downloads to ensure that we only ever have one
    download session at once and it's explicitly scoped so we can handle the
    progress bar and everything.
    alexcrichton committed Sep 18, 2018
    Configuration menu
    Copy the full SHA
    0ea0c8b View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    775b4eb View commit details
    Browse the repository at this point in the history
  7. Update the progress bar for parallel downloads

    This is actually a super tricky problem. We don't really have the capacity for
    more than one line of update-able information in Cargo right now, so we need to
    squeeze a lot of information into one line of output for Cargo. The main
    constraints this tries to satisfy are:
    
    * At all times it should be clear what's happening. Cargo shouldn't just hang
      with no output when downloading a crate for a long time, a counter ideally
      needs to be decreasing while the download progresses.
    
    * If a progress bar is shown, it shouldn't jump around. This ends up just being
      a surprising user experience for most. Progress bars should only ever
      increase, but they may increase at different speeds.
    
    * Cargo has, currently, at most one line of output (as mentioned above) to pack
      information into. We haven't delved into fancier terminal features that
      involve multiple lines of update-able output.
    
    * When downloading crates as part of `cargo build` (the norm) we don't actually
      know ahead of time how many crates are being downloaded. We rely on the
      calculation of unit dependencies to naturally feed into downloading more
      crates.
    
    * Furthermore, once we decide to download a crate, we don't actually know how
      big it is! We have to wait for the server to tell us how big it is.
    
    There doesn't really seem to be a great solution that satisfies all of these
    constraints unfortunately. As a result this commit implements a relatively
    conservative solution which should hopefully get us most of the way there. There
    isn't actually a progress bar but rather Cargo prints that it's got N crates
    left to download, and if it takes awhile it prints out that there are M bytes
    remaining.
    
    Unfortunately the progress is pretty choppy and jerky, not providing a smooth
    UI. This appears to largely be because Cargo will synchronously extract
    tarballs, which for large crates can cause a noticeable pause. Cargo's not
    really prepared internally to perform this work on helper threads, but ideally
    if it could do so it would improve the output quite a bit! (making it much
    smoother and also able to account for the time tarball extraction takes).
    alexcrichton committed Sep 18, 2018
    Configuration menu
    Copy the full SHA
    a46df8f View commit details
    Browse the repository at this point in the history
  8. Review comments!

    alexcrichton committed Sep 18, 2018
    Configuration menu
    Copy the full SHA
    e2637b6 View commit details
    Browse the repository at this point in the history
  9. Handle callbacks invoked outside of tls::set

    This'll happen apparently when we drop items, and no need to keep tracking
    information there!
    alexcrichton committed Sep 18, 2018
    Configuration menu
    Copy the full SHA
    78922ca View commit details
    Browse the repository at this point in the history
  10. Tweaks to retry logic

    * Don't use `chain_err` internally inside `try` because then the spurious error
      printed internally is bland and doesn't contain the right error information.
      Instead place the `chain_err` on the outside so it propagates up to the user
      still but doesn't get printed for spurious errors.
    
    * Handle `pending_ids` management in the case of an error on the connection.
    alexcrichton committed Sep 18, 2018
    Configuration menu
    Copy the full SHA
    304871c View commit details
    Browse the repository at this point in the history
  11. Use Message::result_for for better error messages

    Recently fixed in the `curl` crate, this'll allow getting the full and complete
    error message for each transfer instead of just the error code.
    alexcrichton committed Sep 18, 2018
    Configuration menu
    Copy the full SHA
    bc94291 View commit details
    Browse the repository at this point in the history
  12. Don't print a download summary if an error happens

    Otherwise it sort of just clutters things up!
    alexcrichton committed Sep 18, 2018
    Configuration menu
    Copy the full SHA
    5a5b611 View commit details
    Browse the repository at this point in the history
  13. Avoid a debug overflow from curl

    Curl doesn't guarantee that we'll get the total/current progress bars in a "as
    we expect" fashion, so throw out weird data points (such as we've downloaded
    more bytes than curl thinks we'll be downloading in total).
    alexcrichton committed Sep 18, 2018
    Configuration menu
    Copy the full SHA
    2245fc4 View commit details
    Browse the repository at this point in the history
  14. Don't enable HTTP1 pipelining as apparently it's flaky

    It seems to fix some issues perhaps!
    alexcrichton committed Sep 18, 2018
    Configuration menu
    Copy the full SHA
    9da2e70 View commit details
    Browse the repository at this point in the history
  15. Disable multiplexing by default for now

    Let's hopefully weed out any configuration issues before turning it on by
    default!
    alexcrichton committed Sep 18, 2018
    Configuration menu
    Copy the full SHA
    f4dcb5c View commit details
    Browse the repository at this point in the history