-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Have an option to make Cargo attempt to clean up after itself. #6229
Comments
As I said on that thread I would love to see this experimented with out of tree. There are a lot of good heuristics that may work well. When we see which are widely useful we can talk about bringing them in. If out of tree development is not possible do to limitations of cargo, (like dose cargo leave a timestamp of the most recent time it used an artefact,) then let's see about getting it to be possible. I'd love a tool based on last used date. I'd love a tool that compared the version of the compiler used to the installed ones by rustup. I'd love to point it at a parent folder and have it search for all the places I hid a rust project. |
Actually I'm now wondering if this could be done as a cargo plugin? I confess I know nothing about how cargo's internals are structured. |
I would think a cargo plugin would work, if the needed information can be gained from cargo.
Might hack on something for this during this week if there is any interest in it :) |
@alexcrichton Is there a good way to determine the last time a file in @holmgr I would definitely be interested in seeing an initial hack! |
I don't think there's a great way to figure it out unfortunately other than running a full build and just seeing what was used (and considering everything else as unused). Supporting this in a first-class fashion I think will be a nontrivial undertaking! |
@alexcrichton I agree that "first-class" support will be hard, hence my encouragement of "out of tree" "derty hack" experimentation. Maybe I am missing something important, but at some point cargo has to make the list of things it needs to be on disk (or to build if they are not on disk). How hard would it be for cargo to take that list, for each item on the write a "x.timestamp" file? If all the cargos (that we care about) did that, then an out of tree tool could come along and del files associated with timestamp's that are older than 30 days. For initial testing purposes we could decide that the tool is only compatible with locally bilt cargos from someone's fork, if we are not willing to merge do to the feature freeze. |
Oh small things here and there probably aren't that hard I think? We could certainly try to patch the situation in the meantime! |
So I have been doing a bit of hacking on this issue to see what is possible, or not (https://github.com/holmgr/cargo-sweep). Since this type of solution is likely not the way we want to do this (i.e letting the tool maintain the timestamp files) I have been doing some digging around to see what else is possible. Finally, the most complex solution is to let cargo output the timestamp files as we discussed above. I have been searching through the source a bit, but not yet found a good place to place that code (partially because I am unsure of exactly which files we are interested in cleaning). |
Another update :) So I went with the second approach outlined in my comment above which seems to be the most straightforward solution, and it seems to be working pretty well. It is now published on crates.io and source is here. Hopefully this solves at least the basic needs for cleaning. |
@pietroalbini suggested a command that would clean up files needed to build a dependency but not needed to use that dependency. The big question is what are "they"? What are the minimum files required to use a dependency? |
cc some other discussion going on at #5885 (comment) |
touch some files when we use them This is a small change to improve the ability for a third party subcommand to clean up a target folder. I consider this part of the push to experiment with out of tree GC, as discussed in #6229. how it works? -------- This updates the modification time of a file in each fingerprint folder and the modification time of the intermediate outputs every time cargo checks that they are up to date. This allows a third party subcommand to look at the modification time of the timestamp file to determine the last time a cargo invocation required that file. This is far more reliable then the current practices of looking at the `accessed` time. `accessed` time is not available or disabled on many operating systems, and is routinely set by arbitrary other programs. is this enough to be useful? -------- The current implementation of cargo sweep on master will automatically use this data with no change to the code. With this PR, it will work even on systems that do not update `accessed` time. This also allows a crude script to clean some of the largest subfolders based on each files modification time. is this worth adding, or should we just build `clean --outdated` into cargo? ------ I would love to see a `clean --outdated` in cargo! However, I think there is a lot of design work before we can make something good enough to deserve the cargo teams stamp of approval. Especially as an in tree version will have to work with many use cases some of witch are yet to be designed (like distributed builds). Even just including `cargo-sweep`s existing functionality opens a full bike shop about what arguments to take, and in what form (`cargo-sweep` takes a days argument, but maybe we should have a minutes or a ISO standard time or ...). This PR, or equivalent, allows out of tree experimentation with all different interfaces, and is basically required for any `LRU` based system. (For example [Crater](rust-lang/crater#346) wants a GC that cleans files in an `LRU` manner to maintain a target folder below a target size. This is not a use case that is widely enough needed to be worth adding to cargo but one supported by this PR.) what are the downsides? ---- 1. There are legitimate performance concerns about writing so many small files during a NOP build. 2. There are legitimate concerns about unnecessary wrights on read-only filesystems. 3. If we add this, and it starts seeing widespread use, we may be de facto stabilizing the folder structure we use. (This is probably true of any system that allows out of tree experimentation.) 4. This may not be an efficient way to store the data. (It does have the advantage of not needing different cargos to manipulate the same file. But if you have a better idea please make a suggestion.)
I would live a feature as described above, is there any update since 2018 on ways to achieve this? |
File access time upkeeping is a heavy process as during each files access separate IO call is executed just to update Atime. So what happens on a lot of build hosts (at least in decent organizations) all file systems are mounted with noatime option |
A garbage collector is being worked on. The tracking issue is #12633. It is first starting with global resources and then we'll be looking at the target directory. |
Cargo leaves build artifacts around forever, without making any attempt to ever clean them up. This is a feature but also a somewhat inconvenient one since your
./target
folder grows forever, and this can make life harder for large projects and CI systems and such. You either delete everything and rebuild everything all the time, wasting a lot of time, or you deal with potentially very large folders. This also makes life harder for the idea of cargo eventually being able to share build artifacts between projects, as described in this reddit thread.I propose to add a simple, conservative command line flag/config option for Cargo to attempt to clean up files that have not been used in any builds for a while. Something like
--clean-unused 30d
to remove any files that haven't been used in any builds that have run in the last 30 days. Hopefully this will provide a simple and useful tool for solving this sort of problem in many cases, while not resulting in unnecessary rebuilding of things. It also should pave the way for experimenting with more sophisticated decision-making about which build artifacts to keep and which to delete in the future.Related: #5026
The text was updated successfully, but these errors were encountered: