Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cargo time machine (generate lock files based on old registry state) #5221

Open
est31 opened this issue Mar 21, 2018 · 12 comments
Open

Cargo time machine (generate lock files based on old registry state) #5221

est31 opened this issue Mar 21, 2018 · 12 comments
Labels
A-interacts-with-crates.io Area: interaction with registries A-lockfile Area: Cargo.lock issues C-feature-request Category: proposal for a feature. Before PR, ping rust-lang/cargo if this is not `Feature accepted` Command-generate-lockfile S-needs-design Status: Needs someone to work further on the design for the feature or fix. NOT YET accepted.

Comments

@est31
Copy link
Member

est31 commented Mar 21, 2018

Some time ago I wanted to check how much faster my library has got in various Rust versions. So I cloned the repo and checked out an older git commit and used rustup to get an older rustc and tested it both with the older rustc and the newer one... it downloaded various dependencies and tried to build it (with the older rustc) but then it failed because apparently the crates on crates.io required newer Rust versions than the one I was benchmarking my library with. So I figured out a trick: I've told cargo to not use crates.io as a registry source but my own private clone, and I made that clone point to a commit from back when the compiler got released. This worked really well!

Now to my feature request. I'd like to have this automated, via a flag in cargo: if you invoke cargo generate-lockfile --registry-time 2017-01-01, cargo would check out a commit from that day from the registry and use that commit for lockfile generation.

I think it is justified to call this feature "time machine" because it emulates the time from back then.

Everyone who has missed the presence of a Cargo.lock can feel this I think :).

@alexcrichton alexcrichton added the C-feature-request Category: proposal for a feature. Before PR, ping rust-lang/cargo if this is not `Feature accepted` label Mar 21, 2018
@Michael-F-Bryan
Copy link

This sounds like it'd be really awesome as some kind of sub-command/cargo wrapper!

So you might run cargo time-machine bench --registry-time 2017-01-01 -p my-crate and it'll run benchmarks using the version of rustc and your crate closest to that date. You could use cargo time-machine install --registry-time 2017-01-01 -p my-crate for installation, and so on.

@est31
Copy link
Member Author

est31 commented Mar 29, 2018

@Michael-F-Bryan as we've got other commands to influence cargo resolution like #4100 , I think the best place for integration is Cargo itself.

@joshtriplett
Copy link
Member

Time aside, I'd like to have a way to do this via git hash of the crates.io index.That'd be great for reproducing bug reports.

Also see #6161

@fpoli
Copy link

fpoli commented May 28, 2020

Is the publication date of a package stored somewhere? With that, it would be possible to filter package versions from ret just before the sort_unstable_by:

// When we attempt versions for a package we'll want to do so in a
// sorted fashion to pick the "best candidates" first. Currently we try
// prioritized summaries (those in `try_to_use`) and failing that we
// list everything from the maximum version to the lowest version.
ret.sort_unstable_by(|a, b| {
let a_in_previous = self.try_to_use.contains(&a.package_id());
let b_in_previous = self.try_to_use.contains(&b.package_id());
let previous_cmp = a_in_previous.cmp(&b_in_previous).reverse();
match previous_cmp {
Ordering::Equal => {
let cmp = a.version().cmp(b.version());
if self.minimal_versions {
// Lower version ordered first.
cmp
} else {
// Higher version ordered first.
cmp.reverse()
}
}
_ => previous_cmp,
}
});

@fpoli
Copy link

fpoli commented May 28, 2020

Alternatively, to use a specific git hash one could modify this git fetch:

let refspec = "refs/heads/master:refs/remotes/origin/master";
let repo = self.repo.borrow_mut().unwrap();
git::fetch(repo, url.as_str(), refspec, self.config)
.chain_err(|| format!("failed to fetch `{}`", url))?;

@est31
Copy link
Member Author

est31 commented May 28, 2020

@fpoli the publication date is stored in git history, as in you have to find the commit that introduces the crate. Which is quite involved algorithmically and in time overhead as well. I recommend going via git commit hashes which is what I originally envisioned as well.

@mathstuf
Copy link
Contributor

Could crate publication dates be detected once and added to the index data directly (with future additions adding the data automatically)? It should just be a blame for each entry, but that's mostly a guess not knowing the way the index is stored off-hand.

@est31
Copy link
Member Author

est31 commented May 28, 2020

@mathstuf that wouldn't detect yanked/unyanked crates. It seems that you can yank and un-yank crates arbitrarily often.

@mathstuf
Copy link
Contributor

mathstuf commented May 28, 2020

Hmm. It seems that the index could be fetched from an arbitrary refspec.

See that https://github.com/mathstuf/rust-keyutils/blob/master%40%7b2020-01-01%7d/.cirrus.yml is returning valid contents. So, at least for github-hosted index files, this kind of URL abuse is possible. Not so sure about other index hosting locations though.

This is basically using the master@{when} syntax for refs, so arbitrary Git-supported "when" clauses are likely allowed (last week, yesterday, specific times, etc.).

@est31
Copy link
Member Author

est31 commented May 28, 2020

Hmmm nice it works from Github's API as well:

curl -i 'https://api.github.com/repos/est31/cargo-udeps/commits/master@\{2020-01-01\}'

It could be special cased for github with a fall back to a full clone of the index repo when it doesn't detect github or hits an API limit or other HTTP error.

@epage epage added Command-generate-lockfile A-cache-messages Area: caching of compiler messages and removed A-cache-messages Area: caching of compiler messages labels Apr 21, 2022
@epage epage changed the title Cargo time machine Cargo time machine (generate lock files based o Apr 21, 2022
@epage epage changed the title Cargo time machine (generate lock files based o Cargo time machine (generate lock files based on old registry state) Apr 21, 2022
@kornelski
Copy link
Contributor

kornelski commented Jan 26, 2023

I've implemented this:

https://crates.io/crates/lts/0.2.0

@epage
Copy link
Contributor

epage commented Oct 18, 2023

We talked about this recently somewhere. I wonder if it was in person at RustConf which means no notes.

I would expect this to be a part of an interface for cargo generate-lockfile. It doesn't need to exist everywhere.

It can't fully reproduce a lockfile from a past state because the lockfile only resolves maximally for the subsection of the dependency tree that changed. However, still being able to generate it for a given time can be useful.

When we had only the git registry, it would be easy to think we could use the git history until we take squashing into account.

Now we also have the sparse registry, so we'd need a design that can take that into account, including

  • What time resolution can work (or if we use some kind of counter, how to turn that into real-world measurement)
  • At what stage in the publish process do we capture that timestamp

Timestamps, instead of counters, seem really useful for dealing with the human side to this. I don't think we need fine resolution on this, if two packages are published in close succession, oh well.

I bet the server could even backfill the timestamps.

I don't think the specific stage will matter all that much either.

The big question though is yanks. We don't have a history of when things have been yanked and unyanked.

@epage epage added the S-needs-design Status: Needs someone to work further on the design for the feature or fix. NOT YET accepted. label Oct 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-interacts-with-crates.io Area: interaction with registries A-lockfile Area: Cargo.lock issues C-feature-request Category: proposal for a feature. Before PR, ping rust-lang/cargo if this is not `Feature accepted` Command-generate-lockfile S-needs-design Status: Needs someone to work further on the design for the feature or fix. NOT YET accepted.
Projects
None yet
Development

No branches or pull requests

8 participants