Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache management facilities #1035

Closed
cgils opened this issue Mar 12, 2016 · 25 comments
Closed

Cache management facilities #1035

cgils opened this issue Mar 12, 2016 · 25 comments
Assignees
Labels
P2 We'll consider working on this in future. (Assignee optional) team-Performance Issues for Performance teams type: feature request

Comments

@cgils
Copy link

cgils commented Mar 12, 2016

Please consider adding some way of setting a maximum cache size. At the moment our project's build caches grow unbounded leading to us discarding them periodically.

A facility with which to display cache information would be nice as well. Something like the output of ccache -s, for example:

  cache directory                     /home/user/.ccache
  cache hit (direct)                 34993
  cache hit (preprocessed)             153
  cache miss                         23479
  compile failed                         3
  ccache internal error                  1
  preprocessor error                     1
  cache file missing                     1
  unsupported source language          382
  no input file                       7322
  files in cache                     70445
  cache size                          16.4 Gbytes
  max cache size                      90.0 Gbytes
@dslomov dslomov added type: feature request P3 We're not considering working on this, but happy to review a PR. (No assignee) labels Mar 14, 2016
@dslomov
Copy link
Contributor

dslomov commented Mar 14, 2016

@cgils, are you talking about project "cache" (it is not really a cache, it is just your output directory) or stuff that gets into ~/.cache/bazel?

@cgils
Copy link
Author

cgils commented Mar 14, 2016

Stuff that gets put into ~/.cache/bazel.

@glinscott
Copy link

I vote for this to be higher priority :). Bazel is commonly filling up hard drives on our project from the ~/.cache/bazel directory growing unbounded.

@damienmg damienmg modified the milestone: 1.0 Jun 14, 2016
@kavehv
Copy link

kavehv commented Jul 28, 2016

+1

1 similar comment
@jianga
Copy link

jianga commented Aug 22, 2016

+1

@jiayiliu
Copy link

+1, esp. if the home dir is on NFS.

@lefromage
Copy link

+1

4 similar comments
@burnzzz
Copy link

burnzzz commented Apr 23, 2017

+1

@softprops
Copy link

+1

@llhe
Copy link
Contributor

llhe commented May 15, 2018

+1

@dahlstrom-g
Copy link

+1

@hlopko
Copy link
Member

hlopko commented Jul 10, 2018

@buchgr

@buchgr buchgr self-assigned this Jul 10, 2018
@buchgr buchgr modified the milestones: 1.0, Remote Execution Aug 6, 2018
@buchgr buchgr removed their assignment Jan 16, 2019
@buchgr buchgr added untriaged and removed P3 We're not considering working on this, but happy to review a PR. (No assignee) type: feature request category: misc > misc untriaged labels Jan 16, 2019
@buchgr
Copy link
Contributor

buchgr commented Jan 16, 2019

@jin do you have an idea to which component to assign this issue?

@jin
Copy link
Member

jin commented Jan 16, 2019

@buchgr team-Local-Exec sounds like a good fit.

@jin jin added the untriaged label Jan 16, 2019
@jin jin added the team-Local-Exec Issues and PRs for the Execution (Local) team label Jan 16, 2019
@jmmv
Copy link
Contributor

jmmv commented Jan 24, 2019

There are several kinds of data to be managed here:

  • Bazel "installation files". This is Automatically clean up old install bases #2109 and they should be auto-purged. No question.

  • Output trees. These are just the build artifacts of a given build. They are not a cache. Bazel discarding them automatically is a bad idea as we'd be throwing away people's data. bazel clean removes these on a workspace basis. (I think these files living under ~/.cache/ is also a mistake.)

  • Fetch/remote cache (whatever it's called). bazel fetch can download a lot of stuff and this stuff is shared across workspaces. This is a cache and could be automatically cleaned. However, pruning stuff from here affects the ability to work offline, so I'd be wary of doing that too. It'd be pretty annoying to fetch everything before hopping into a plane, only to discover later that some random stuff was pruned and you cannot do anything any longer.

Separately, having a command to dump all files Bazel knows about, grouped by category/project and with a summary of their size, would be awesome. Tag each entry with an ID so you can tell bazel clean some-id and it'd be even better.

@jmmv jmmv added P2 We'll consider working on this in future. (Assignee optional) and removed untriaged labels Jan 24, 2019
@edrumwri
Copy link

edrumwri commented Feb 3, 2019

+1

@jmmv
Copy link
Contributor

jmmv commented Mar 14, 2019

Per #2765, one more thing to consider if implementing a smarter clean command is bazel clean //target. Noting here because this would possibly conflict with the bazel clean some-id proposal above.

@ozio85
Copy link

ozio85 commented Feb 21, 2020

+1

1 similar comment
@jonatanj
Copy link

+1

@meisterT meisterT removed this from the Remote Execution milestone May 12, 2020
@dchichkov
Copy link

It had been some time since 2016, it'd be great for this to get higher priority.

To give some context, I'm on 2TB drive and have to keep fighting with Bazel for living space. My ~/.cache/bazel is consistently eating up to 0.7TB of free space (in 2-3 weeks). I have to purge it then. And that results in multi-GB downloads (CUDA, etc).

It'd be great to add a limit and an LRU-type of behavior to the cache. Alternative - remove "{Fast, Correct} - Choose two" advertisement from the web page. Eating all the space on the disk is not correct. Making a user to purge cache completely is not fast.

@MilesCranmer
Copy link

This issue should really be P1, not P2. It is extremely bad that on my system there have been nearly a million files generated in the ~/.cache that aren't automatically cleared.

On many personal computers, people might just not realize this is happening, and have their entire filesystem slowed down due to the additional effort needed by their system to index everything.

Then, on some shared computers where there are hard file limit constraints, bazel is effectively unusable because it generates so many files. (This is the only reason I noticed this in the first place – otherwise I would have just had a slower filesystem without knowing why)

@Ryang20718
Copy link

+1

1 similar comment
@gosha-frost
Copy link

+1

@meisterT meisterT added team-Core Skyframe, bazel query, BEP, options parsing, bazelrc and removed team-Local-Exec Issues and PRs for the Execution (Local) team labels Jul 28, 2023
@StilesCrisis
Copy link

+1

@meisterT meisterT added team-Performance Issues for Performance teams and removed team-Core Skyframe, bazel query, BEP, options parsing, bazelrc labels May 16, 2024
@tjgq
Copy link
Contributor

tjgq commented May 23, 2024

We acknowledge the problem discussed here; however, for the sake of clarity, I'm going to close this issue in favor of the following narrower-scoped ones:

#2109
#5139
#22515
#22516

@tjgq tjgq closed this as not planned Won't fix, can't repro, duplicate, stale May 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P2 We'll consider working on this in future. (Assignee optional) team-Performance Issues for Performance teams type: feature request
Projects
None yet
Development

No branches or pull requests