-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
/var/lib/swupd/staged
is retaining too many files
#378
Comments
I slightly remember talking with somebody around just culling all files that are not manifests in /var/lib/swupd as my heavy axe approach. In general the file cache doesn't seem to save time outside of bundle-add X -> bundle-remove X -> bundle-add X. I guess there are some cases where files are outside of a pack but haven't been altered and so wouldn't need to be downloaded again but I don't know if that case is particularly impactful. |
@bryteise you were probably talking to me. That is my preferred approach. |
So how about just culling |
We could just do this with
in |
What about just |
Deleting everything would likely destroy any valuable files that we may want to inspect for debugging purposes. Hence keeping them for 24h or so doesn't seem like a bad idea. |
But I'm also thinking about the benefit to swupd of having the staged directory clear whenever we do our operations. We've had to do a few fixes in the last few months because that hasn't always been handled correctly, and I can't be 100% certain it is being handled correctly currently. |
Hrm having an ability to disable cleaning the state dir with an option seems fine to me for the usual developer case. Other than that I'm unsure. |
I would be okay with having an option to clean the state directory, but by default I think swupd-client should correctly manage its own state directory, and if there are bugs in that management, we should fix those bugs. |
@phmccarty that isn't the only reason to wipe it. We don't get much benefit from it being there and it takes up space on the system. |
Given the output of method 1 above in my text, I am dubious that "correct" management will do any significant cleaning, and simply will retain way too many files. |
BTW, with the modified unit proposed above, all you'd need to do to debug issues and keep all files around is disable auto-update. That seems entirely reasonable. |
One thing that we have talked about briefly for swupd is related to A/B updates. There we can regenerate the |
swupd has a lock per state directory for certain operations, so this one might conflict with a manual execution of swupd by the user. What about adding a subcommand to swupd to achieve the same effect, but that holds the lock, and we call it on the service file? |
I gave the subcommand approach a shot to see how it looks. The result is in PR #383. It adds a "swupd clean" that drops old files, but keep enough manifest files to not harm the search functionality. So it is an hybrid between: only care about dates and be very precise about what we want to keep. For ignoring versions and ages, use "swupd clean --all". |
Why was mtime used to decide if a file is old? Mtime is the time the file was modified, i.e., the time it was generated in the server and not the time it was extracted in the system. So if we do a bundle-add && clean && bundle-remove && bundle-add, it's very likely we are going to download everything again. Is there any use case where using mtime to decide if the file is old is better than atime or ctime? Another question, If the size of the cache is a problem, why don't we keep tarballs instead of staged files? Btw, I missed one option to clean everything except Manifests. That's what I would use after each update in my system. |
See "Conclusion" right in the first comment for the general idea. I'm not familiar with how reliable atime/ctime are, but would make sense to try them out for the staged files and compare (either tweaking clean or performing tests with find). It should be easy to swap which one is used in swupd clean.
I can say why I haven't done it: that is a larger change. The main issue we had was that nothing was being cleanup at all, so even the "hammer" solution of mtime would already improve things a lot with a low investment. Note: I'm not sure if
I'd try as much as possible to avoid lots of options on this command. I've given a default behavior (that make sense to be called periodically) and a "clean everything behavior". Maybe we should just tweak the default behavior. If the reason to keep only the manifests is making swupd search happy, I think there are plans to make it use other data source, so |
We have not, and that is the only reason I've been leaving this option open. I would like to run |
The way its done it might keep some extra manifests, but not less. So it doesn't interfere with swupd search in negative ways. |
This is desired behavior after an update. We don't need the old manifests around. |
As far as I understood the idea is to clean files older than x days. And I'm ok with that. The problem is the definition of old. I commented in this issue because I'm running some tests and I realized that swupd clean was cleaning more files than I thought it would clean. For example:
Is this the expected? |
No. I wasn't explicit in my first reply, I agree this should be fixed.
I'd prefer going with ctime here. In many cases we are using that file but we don't access it (because the update says it did not change, so we have no reason to touch it if the hash matches). |
Just to add to the fun, we play games with the times of files in the build process. The "export SOURCE_DATE_EPOCH=12345" stuff in the spec files drives this. This is necessary (but not sufficient) to get reproducible builds as the rpm files (which have cpio archives inside them) contain the dates of artifacts. You don't want to see the games played with '*.pyc' files (hinted at by "delete lots of python files"). So this pretty much rules out mtime. So we have ctime, which pretty much is the last time the file was written to (yes chmod will change ctime) or atime, which should be the last time the file was opened but often isn't because filesystems are mounted with the relatime option rather than strictatime. Fortunately relatime these days does update atime at least once a day. |
I've come back entirely on my earlier assessment. Given the lack of data, I think we should drastically change our strategy:
that's it. We can try to do a little smartness to just retain latest manifests, but everything else (staged, $VER folders) should just go. Right now we're hurting users badly with 3-5gb of wasted storage that is only going to be used in the worst case, and even then we know verify is going to be slow anyway. |
Except by:
And only because I don't know how its being used and why. I'm with ahkok on that. |
+1 |
we have a tmpfiles.d entry for swupd:
However, due to the above issues, this is entirely without result. |
Eh, having the manifests for the current version stay around is a big win, especially for swupd search and bundle-add/remove code. I'd be happy with a |
@matthewrsj after #476 swupd clean will do exactly what you are suggesting (and what Auke suggested).
|
@cmarcelo ok that's fine. I'm just responding that we don't want to run |
@matthewrsj Got it! |
There is a reasonable amount of storage needed to keep updates efficiently, such that files that will be needed on the next release are kept until they are no longer needed. However, it appears that too many files are kept, resulting in wasted disk space.
I've poked at it and since I have several systems where
/var/lib/swupd/staged
is in the order of 7-9GB currently, I've got some data to discuss potential solutions.why doesn't systemd-tmpfiles work?
Before I start with the solutions, I'd like to discuss the problems with systemd-tmpfiles, and why it doesn't work. Most of this is due to a combination of (1) our way of timestamping python files, (2) tmpfiles' way of looking at atime/mtime and a small dose of us shipping files that are
-r--r--r-- root root
which tmpfiles entirely ignores.The result is that systemd-tmpfiles is useless and not usable to do any reasonable amount of pruning.
We have 2 straightforward solutions, however.
1 - prune what we know is old
Given the manifests that are on the system, we can discover all hashes for files that are relevant for the current state of the system. Since we can compare this to the files in
/staged
, we know which files are no longer related to any known manifest on the system. This gives us a list of files that "definitely" can be deleted without ever needing to be redownloaded.In my local system, this would recover 3708879078 bytes from a total of 9354271423 (39%)
This is not a 'cheap" operation. This solution would be costly for a shell script, requiring many seconds of processing, even up to a minute or so, to finish. If done inside
swupd
itself, it could probably be done much more efficiently.2 - prune based on
mtime
Since we mostly create files in staged and never touch them after much, we could decide to use
mtime
to distinguish between files that are considered "aged" and not used recently.The benefits of this method is that it's really quick and can bed one in a shell script.. The downsides is that it will specifically delete lots of Python files that may be relevant and cause fullfiles to be re-downloaded.
Here's what the data looks like:
As to why the gap exists between 2 and 3, this must be due to the actual update content pushing a significant rebuild out.
Conclusion
I'm tempted to pick a broad swinging axe and rely on the availability of data over the network and pick solution 2, since it would have the most effect, and downloads are reasonably cheap. However, that method would cause a significant spike on redownloads from the CDN and potentially increase traffic significantly. Given the size of the spool use, I think
-mtime +3
is reasonable, but we may want-mtime +2
in the long run.The text was updated successfully, but these errors were encountered: