-
Notifications
You must be signed in to change notification settings - Fork 198
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cache dependencies between crates published by the same owner #1757
Comments
Tthis sounds like Shouldn't this prevent any possible malicious changes that the crate author can do? Also the |
The most aggressive attack that @pietroalbini mentioned was:
|
Wouldn't that be prevented by only having Or does |
Oh, I forgot one indirection in the first step, the crate |
( sorry if I'm missing something, these build specifics are new to me )
How would that be different with or without caching? |
Without caching: if |
Ah, then I get it. Thank you for spending the time explaining! Couldn't we keep the cache by top-level dependency and version? Of course this would mean duplicate storage of the output binaries, but the implementation could be quite simple |
That means the possible process could be (without timings): for each top-level dependency:
When we did this for all top-level dependencies we can run |
I think the most effective way would be to just keep the same target directory only when the previous crate was published by the same person as the new crate. So, for example:
That would solve the problem of big workspaces being publishes, as I assume they would all be published by the same user, and removes the chance of someone else sneaking something into someone else's docs. |
Ah, so this would also depend on the queued order of the crates? |
So it would only help specifically for these bulk crates that are quickly released after each other, and not for any other crates. |
We have a hard-limit of max 1 day of caching because of updating the compiler. There are probably plenty of cases where a few related crates are published throughout the day which we would miss, but the really bad situations are when we have the mass publishes of workspaces with very deep dependency trees. |
how would that look like with multiple owners? treat them as one (unique) new owner/team? |
We can get the actual publisher of each release from the crates.io API (even for crates with multiple owners, if they're publishing a workspace at once I'd assume a single account will be used for all the crates). |
FWIW this is painful enough I would be ok with omitting the "record the timings in a database" bit for now if it lets us get the caching in sooner, it can only have false positives (your crate intermittently builds when otherwise it would have always failed), not false negatives (your crate intermittently fails when it should have always succeeded). Verifying the crates are owned by the same owner is not something we can omit though, for security reasons. |
I'm with you on both points, the first one we also already talked about, the second was clear to me. What's slowing me down is mostly the iteration speed / debugging with docker on my mac, since I have to go through the web container for everything. While improving the dev experience is certainly possible, currently I'm trying to invest time into artifact caching here, after the CPU load problem was worked around & a possible fix is in a PR. |
@syphar yup definitely, I wasn't trying to imply you weren't doing enough work ❤️ for context @alice-i-cecile was asking how she could help speed up the build queue and I wanted to make sure the issue was up to date since she hasn't been sitting in on our meetings. |
Yep, the Bevy community was curious about how we could help, and I was pointed here <3 |
I wasn't trying to implying that you implied ;) I myself feel slow when I'm working on the build-process, also visible when I want to dig into the other bigger build-process topic (adding builds into the DB before the build)
ah, thanks for the explanation! If needed, I can create WIP PR + some explanation where I am right now, if @alice-i-cecile wants to help out. |
Sure, that sounds lovely :) I don't personally have a ton of time or expertise here, but I'll take a look and share it around the community: we have a lot of talented folks and care about improving things for the whole ecosystem. |
I created the draft PR with what I already had: #2079 |
Right now, when a large workspace publishes its crates, we spend a lot of time rebuilding the same dependencies over and over again (e.g. https://github.com/aptos-labs/aptos-core). To reduce the load on docs.rs and speed up builds, we can cache those dependencies to avoid rebuilding them.
We are only planning to do this between crates owned by the same crates.io owner for security reasons; see https://discord.com/channels/442252698964721669/541978667522195476/996125244026794134 for a discussion of how this causes issues between crates. In practice, this should still be a giant speed up since most of our build times come from large workspaces.
In order to have deterministic time limits, we are going to count the time spent to compile a dependency against the time limit, even if we already have the dependency cached locally. To avoid running out of disk space we'll use an LRU cache that deletes crates when we have less than ~20 GB free on the production server. For reproducibility we are going to make the cache read-only, and wipe the
target/doc
directory after each root crate we build.Here is the rough approach we're imagining:
cargo check --timings=json -Zunstable-options -p {dep}
for each dependency of the crate we're buildingchmod -w
on everything intarget
(except for the top-level directory, so cargo can createtarget/doc
)cargo doc
and upload to S3.target/doc
.rustwide.update_toolchain
, since the new nightly will be unable to reusse the cache.The text was updated successfully, but these errors were encountered: