-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cut down on data fetch from git dependencies #8363
Conversation
r? @ehuss (rust_highfive has picked a reviewer for you, use r? to override) |
src/cargo/sources/git/utils.rs
Outdated
.ok_or_else(|| anyhow!("couldn't find username"))?; | ||
let repository = pieces | ||
.next() | ||
.ok_or_else(|| anyhow!("couldn't find username"))?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.ok_or_else(|| anyhow!("couldn't find username"))?; | |
.ok_or_else(|| anyhow!("couldn't find repository name"))?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing I've noticed is that this doesn't work if the URL has .git
on the end (like https://github.com/rust-lang/cargo.git). Maybe it should strip .git
if it is in the URL?
I had some questions just to clarify: Why change the refspec mapping to use Is it correct that in the "common" case of specifying a git dependency with a rev, or one without a rev/branch, that this won't help? That is, with |
Currently Cargo pretty heavily over-approximates data fetch for git dependencies. For the index it fetches precisely one branch, but for all other git dependencies Cargo will fetch all branches and all tags all the time. In each of these situations, however, Cargo knows if one branch is desired or if only one tag is desired. This commit updates Cargo's fetching logic to plumb the desired `GitReference` all the way down to `fetch`. In that one location we then determine what to fetch. Namely if a branch or tag is explicitly selected then we only fetch that one reference from the remote, cutting down on the amount of traffic to the git remote. Additionally a bugfix included here is that the GitHub fast path for checking if a repository is up-to-date now works for non-`master`-based branch dependencies.
I don't believe the refspec mapping matters, no. I only changed it really because it seemed more consistent with how the git CLI itself works. AFAIK this was first implemented by me and I didn't really know what I was doing at the time and You're right, yeah, that if you specify a rev this won't help. In general if you say I didn't realize, though, that this also affected lock files. You're right that with a |
This commit refactors various logic of the git source internals to ensure that if we have a locked revision that we plumb the desired branch/tag all the way through to the `fetch`. Previously we'd switch to `Rev` very early on, but the fetching logic for `Rev` is very eager and fetches too much, so instead we only resolve the locked revision later on. Internally this does some various refactoring to try to make various bits and pieces of logic a bit easyer to grok, although it's still perhaps not the cleanest implementation.
24c2701
to
ddc2799
Compare
Ok I think I've fixed the issue where if you use a branch/tag and you have a lock file this should still only fetch that one branch/tag, and then look for the revision in the repository. This... is technically a breaking change though. Previously you could lock to a revision, then rename all your branches in a repo, and the previous lock file would still work. Now that breaks because when fetching a locked version of a |
let repository = if repository.ends_with(".git") { | ||
&repository[..repository.len() - 4] | ||
} else { | ||
repository | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let repository = if repository.ends_with(".git") { | |
&repository[..repository.len() - 4] | |
} else { | |
repository | |
}; | |
let repository = repository.strip_suffix(".git").unwrap_or(repository); |
I dunno if that's too clever?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh nice, although that's still unstable for one more cycle I think
Regarding the "rename branches" case, IIUC the scenario is this:
That doesn't seem so bad to me, as technically the Also, just to verify, the fix is to edit Cargo.toml with the correct branch and run |
You're spot on with the problem and the fix. And yeah I don't really think that this is going to come up too often, so I'd personally be ok with that degree of breakage. |
👍 |
📌 Commit ddc2799 has been approved by |
☀️ Test successful - checks-azure |
Update cargo 9 commits in 089cbb80b73ba242efdcf5430e89f63fa3b5328d..c26576f9adddd254b3dd63aecba176434290a9f6 2020-06-15 14:38:34 +0000 to 2020-06-23 16:21:21 +0000 - Adding environment variable CARGO_PKG_LICENSE_FILE (rust-lang/cargo#8387) - Enable "--target-dir" in "cargo install" (rust-lang/cargo#8391) - Add support for `workspace.metadata` table (rust-lang/cargo#8323) - Fix overzealous `clean -p` for reserved names. (rust-lang/cargo#8398) - Fix order-dependent feature resolution. (rust-lang/cargo#8395) - Correct mispelling of `cargo`. (rust-lang/cargo#8389) - Add missing license field. (rust-lang/cargo#8386) - Adding environment variable CARGO_PKG_LICENSE (rust-lang/cargo#8325) - Cut down on data fetch from git dependencies (rust-lang/cargo#8363)
Currently Cargo pretty heavily over-approximates data fetch for git
dependencies. For the index it fetches precisely one branch, but for all
other git dependencies Cargo will fetch all branches and all tags all
the time. In each of these situations, however, Cargo knows if one
branch is desired or if only one tag is desired.
This commit updates Cargo's fetching logic to plumb the desired
GitReference
all the way down tofetch
. In that one location we thendetermine what to fetch. Namely if a branch or tag is explicitly
selected then we only fetch that one reference from the remote, cutting
down on the amount of traffic to the git remote.
Additionally a bugfix included here is that the GitHub fast path for
checking if a repository is up-to-date now works for non-
master
-basedbranch dependencies.