Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cut down on data fetch from git dependencies #8363

Merged
merged 3 commits into from
Jun 18, 2020

Conversation

alexcrichton
Copy link
Member

Currently Cargo pretty heavily over-approximates data fetch for git
dependencies. For the index it fetches precisely one branch, but for all
other git dependencies Cargo will fetch all branches and all tags all
the time. In each of these situations, however, Cargo knows if one
branch is desired or if only one tag is desired.

This commit updates Cargo's fetching logic to plumb the desired
GitReference all the way down to fetch. In that one location we then
determine what to fetch. Namely if a branch or tag is explicitly
selected then we only fetch that one reference from the remote, cutting
down on the amount of traffic to the git remote.

Additionally a bugfix included here is that the GitHub fast path for
checking if a repository is up-to-date now works for non-master-based
branch dependencies.

@rust-highfive
Copy link

r? @ehuss

(rust_highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jun 15, 2020
.ok_or_else(|| anyhow!("couldn't find username"))?;
let repository = pieces
.next()
.ok_or_else(|| anyhow!("couldn't find username"))?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
.ok_or_else(|| anyhow!("couldn't find username"))?;
.ok_or_else(|| anyhow!("couldn't find repository name"))?;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing I've noticed is that this doesn't work if the URL has .git on the end (like https://github.com/rust-lang/cargo.git). Maybe it should strip .git if it is in the URL?

@ehuss
Copy link
Contributor

ehuss commented Jun 15, 2020

I had some questions just to clarify:

Why change the refspec mapping to use refs/remotes/origin/… (from refs/heads/…)? It seems like it doesn't exactly matter, does it?

Is it correct that in the "common" case of specifying a git dependency with a rev, or one without a rev/branch, that this won't help? That is, with rev it still fetches everything. And without rev, if there is a Cargo.lock, it still fetches everything (because it uses the rev from Cargo.lock).

Currently Cargo pretty heavily over-approximates data fetch for git
dependencies. For the index it fetches precisely one branch, but for all
other git dependencies Cargo will fetch all branches and all tags all
the time. In each of these situations, however, Cargo knows if one
branch is desired or if only one tag is desired.

This commit updates Cargo's fetching logic to plumb the desired
`GitReference` all the way down to `fetch`. In that one location we then
determine what to fetch. Namely if a branch or tag is explicitly
selected then we only fetch that one reference from the remote, cutting
down on the amount of traffic to the git remote.

Additionally a bugfix included here is that the GitHub fast path for
checking if a repository is up-to-date now works for non-`master`-based
branch dependencies.
@alexcrichton
Copy link
Member Author

I don't believe the refspec mapping matters, no. I only changed it really because it seemed more consistent with how the git CLI itself works. AFAIK this was first implemented by me and I didn't really know what I was doing at the time and refs/heads/*:refs/heads/* was just the first thing that worked. I don't think there's any crucial reason to do that specifically though.

You're right, yeah, that if you specify a rev this won't help. In general if you say rev = '...' in Cargo.toml we don't know if that's a branch/tag/etc, it's just something that goes through revparse_single which is documented here. AFAIK we can't tell that to the git server saying to fetch just that and only that, so we do what Cargo does today and fetch everything.

I didn't realize, though, that this also affected lock files. You're right that with a Cargo.lock where you're not using rev otherwise it will automatically switch to as if you used rev, which means you don't benefit from this. I'll see if I can do something about that.

This commit refactors various logic of the git source internals to
ensure that if we have a locked revision that we plumb the desired
branch/tag all the way through to the `fetch`. Previously we'd switch to
`Rev` very early on, but the fetching logic for `Rev` is very eager and
fetches too much, so instead we only resolve the locked revision later
on.

Internally this does some various refactoring to try to make various
bits and pieces of logic a bit easyer to grok, although it's still
perhaps not the cleanest implementation.
@alexcrichton
Copy link
Member Author

Ok I think I've fixed the issue where if you use a branch/tag and you have a lock file this should still only fetch that one branch/tag, and then look for the revision in the repository.

This... is technically a breaking change though. Previously you could lock to a revision, then rename all your branches in a repo, and the previous lock file would still work. Now that breaks because when fetching a locked version of a Cargo.lock we still only fetch the original branch, and then it's assumed that the locked revision is somewhere in the history of that branch.

Comment on lines +983 to +987
let repository = if repository.ends_with(".git") {
&repository[..repository.len() - 4]
} else {
repository
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let repository = if repository.ends_with(".git") {
&repository[..repository.len() - 4]
} else {
repository
};
let repository = repository.strip_suffix(".git").unwrap_or(repository);

I dunno if that's too clever?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh nice, although that's still unstable for one more cycle I think

@ehuss
Copy link
Contributor

ehuss commented Jun 17, 2020

Regarding the "rename branches" case, IIUC the scenario is this:

  1. Check out a project with a git dependency on a branch (or default master), and a Cargo.lock.
  2. The branch is renamed.
  3. I try to fetch, and it fails, even though the SHA in Cargo.lock is valid, the branch it points to no longer exists.

That doesn't seem so bad to me, as technically the Cargo.toml is now pointing to the wrong location. I can see how that violates the spirit of Cargo.lock, but even in the old case, cargo update would fail because the branch is gone, right? I also suspect branch renaming is probably rare. I imagine deleting a branch is maybe more common, but presumably at some point in the future any unique commits on that branch will get gc'd and no longer be fetchable.

Also, just to verify, the fix is to edit Cargo.toml with the correct branch and run cargo update?

@alexcrichton
Copy link
Member Author

You're spot on with the problem and the fix. And yeah I don't really think that this is going to come up too often, so I'd personally be ok with that degree of breakage.

@ehuss
Copy link
Contributor

ehuss commented Jun 18, 2020

👍
@bors r+

@bors
Copy link
Contributor

bors commented Jun 18, 2020

📌 Commit ddc2799 has been approved by ehuss

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jun 18, 2020
@bors
Copy link
Contributor

bors commented Jun 18, 2020

⌛ Testing commit ddc2799 with merge b80fc85...

@bors
Copy link
Contributor

bors commented Jun 18, 2020

☀️ Test successful - checks-azure
Approved by: ehuss
Pushing b80fc85 to master...

@bors bors merged commit b80fc85 into rust-lang:master Jun 18, 2020
bors added a commit to rust-lang-ci/rust that referenced this pull request Jun 24, 2020
Update cargo

9 commits in 089cbb80b73ba242efdcf5430e89f63fa3b5328d..c26576f9adddd254b3dd63aecba176434290a9f6
2020-06-15 14:38:34 +0000 to 2020-06-23 16:21:21 +0000
- Adding environment variable CARGO_PKG_LICENSE_FILE (rust-lang/cargo#8387)
- Enable "--target-dir" in "cargo install" (rust-lang/cargo#8391)
- Add support for `workspace.metadata` table (rust-lang/cargo#8323)
- Fix overzealous `clean -p` for reserved names. (rust-lang/cargo#8398)
- Fix order-dependent feature resolution. (rust-lang/cargo#8395)
- Correct mispelling of `cargo`. (rust-lang/cargo#8389)
- Add missing license field. (rust-lang/cargo#8386)
- Adding environment variable CARGO_PKG_LICENSE (rust-lang/cargo#8325)
- Cut down on data fetch from git dependencies (rust-lang/cargo#8363)
@alexcrichton alexcrichton deleted the less-git-data branch July 29, 2020 17:48
@ehuss ehuss added this to the 1.46.0 milestone Feb 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants