`/var/lib/swupd/staged` is retaining too many files #378

ahkok · 2018-02-12T18:55:35Z

There is a reasonable amount of storage needed to keep updates efficiently, such that files that will be needed on the next release are kept until they are no longer needed. However, it appears that too many files are kept, resulting in wasted disk space.

I've poked at it and since I have several systems where /var/lib/swupd/staged is in the order of 7-9GB currently, I've got some data to discuss potential solutions.

why doesn't systemd-tmpfiles work?

Before I start with the solutions, I'd like to discuss the problems with systemd-tmpfiles, and why it doesn't work. Most of this is due to a combination of (1) our way of timestamping python files, (2) tmpfiles' way of looking at atime/mtime and a small dose of us shipping files that are -r--r--r-- root root which tmpfiles entirely ignores.

The result is that systemd-tmpfiles is useless and not usable to do any reasonable amount of pruning.

We have 2 straightforward solutions, however.

1 - prune what we know is old

Given the manifests that are on the system, we can discover all hashes for files that are relevant for the current state of the system. Since we can compare this to the files in /staged, we know which files are no longer related to any known manifest on the system. This gives us a list of files that "definitely" can be deleted without ever needing to be redownloaded.

In my local system, this would recover 3708879078 bytes from a total of 9354271423 (39%)

This is not a 'cheap" operation. This solution would be costly for a shell script, requiring many seconds of processing, even up to a minute or so, to finish. If done inside swupd itself, it could probably be done much more efficiently.

2 - prune based on `mtime`

Since we mostly create files in staged and never touch them after much, we could decide to use mtime to distinguish between files that are considered "aged" and not used recently.

The benefits of this method is that it's really quick and can bed one in a shell script.. The downsides is that it will specifically delete lots of Python files that may be relevant and cause fullfiles to be re-downloaded.

Here's what the data looks like:

mtime   rm size (1)  rm size (2)
none    9354271423   3801303314
1       9294391574   3759230964
2       8704640059   3423811441
3       2880597007    795056079
4       2774649062    794839002
5       2685577506    794782008

As to why the gap exists between 2 and 3, this must be due to the actual update content pushing a significant rebuild out.

Conclusion

I'm tempted to pick a broad swinging axe and rely on the availability of data over the network and pick solution 2, since it would have the most effect, and downloads are reasonably cheap. However, that method would cause a significant spike on redownloads from the CDN and potentially increase traffic significantly. Given the size of the spool use, I think -mtime +3 is reasonable, but we may want -mtime +2 in the long run.

The text was updated successfully, but these errors were encountered:

bryteise · 2018-02-12T19:02:30Z

I slightly remember talking with somebody around just culling all files that are not manifests in /var/lib/swupd as my heavy axe approach. In general the file cache doesn't seem to save time outside of bundle-add X -> bundle-remove X -> bundle-add X. I guess there are some cases where files are outside of a pack but haven't been altered and so wouldn't need to be downloaded again but I don't know if that case is particularly impactful.

matthewrsj · 2018-02-12T19:07:55Z

@bryteise you were probably talking to me. That is my preferred approach.

ahkok · 2018-02-12T19:10:55Z

So how about just culling -mtime +1 in a post-update in some way?

ahkok · 2018-02-12T19:12:26Z

We could just do this with

ExecStartPost=-/usr/bin/find /var/lib/swupd/staged -type f -mtime +1 -exec rm -f {} \;

in swupd-update.service.

matthewrsj · 2018-02-12T19:14:52Z

What about just swupd_rm everything in /var/lib/swupd that isn't in a version folder? This would keep the manifests around and solve a lot of the issues we've seen (and been fixing) with swupd mishandling the cached files there. Really simplifies things a lot.

matthewrsj · 2018-02-12T19:26:15Z

@phmccarty @tmarcu

ahkok · 2018-02-12T19:29:51Z

Deleting everything would likely destroy any valuable files that we may want to inspect for debugging purposes. Hence keeping them for 24h or so doesn't seem like a bad idea.

matthewrsj · 2018-02-12T19:37:20Z

But I'm also thinking about the benefit to swupd of having the staged directory clear whenever we do our operations. We've had to do a few fixes in the last few months because that hasn't always been handled correctly, and I can't be 100% certain it is being handled correctly currently.

bryteise · 2018-02-12T19:40:18Z

Hrm having an ability to disable cleaning the state dir with an option seems fine to me for the usual developer case. Other than that I'm unsure.

icarus-sparry · 2018-02-12T19:48:06Z

I agree with @ahkok - better to leave clues around for debugging.

If we want the staged directory empty because we don't have confidence in our code then perhaps moving the staged directory to a datestamped version would work using the hypothetical option @bryteise suggests.

phmccarty · 2018-02-12T19:52:31Z

I would be okay with having an option to clean the state directory, but by default I think swupd-client should correctly manage its own state directory, and if there are bugs in that management, we should fix those bugs.

matthewrsj · 2018-02-12T19:53:20Z

@phmccarty that isn't the only reason to wipe it. We don't get much benefit from it being there and it takes up space on the system.

ahkok · 2018-02-12T19:54:03Z

#define correct management

Given the output of method 1 above in my text, I am dubious that "correct" management will do any significant cleaning, and simply will retain way too many files.

ahkok · 2018-02-12T19:55:29Z

BTW, with the modified unit proposed above, all you'd need to do to debug issues and keep all files around is disable auto-update. That seems entirely reasonable.

icarus-sparry · 2018-02-12T20:25:44Z

One thing that we have talked about briefly for swupd is related to A/B updates. There we can regenerate the staged directory from the existing A, to reduce download bandwidth. Strictly speaking we can do this even for the non A/B case. The Manifest gives us the required hash and the filename of the thing that is supposed to have that hash value, so we can can copy that file to the staged directory and then validate it (order is important). If it fails to validate then we can delete it, and fall back to downloading the file.

cmarcelo · 2018-02-12T23:40:31Z

We could just do this with

ExecStartPost=-/usr/bin/find /var/lib/swupd/staged -type f -mtime +1 -exec rm -f {} ;

in swupd-update.service.

swupd has a lock per state directory for certain operations, so this one might conflict with a manual execution of swupd by the user. What about adding a subcommand to swupd to achieve the same effect, but that holds the lock, and we call it on the service file?

cmarcelo · 2018-02-13T21:53:24Z

I gave the subcommand approach a shot to see how it looks. The result is in PR #383. It adds a "swupd clean" that drops old files, but keep enough manifest files to not harm the search functionality. So it is an hybrid between: only care about dates and be very precise about what we want to keep. For ignoring versions and ages, use "swupd clean --all".

matthewrsj · 2018-02-27T19:40:07Z

@ahkok PR #383 has now been merged, so we could talk about if we want this to be invoked periodically behind the scenes, or after an update.

otaviobp · 2018-06-06T22:06:46Z

Why was mtime used to decide if a file is old? Mtime is the time the file was modified, i.e., the time it was generated in the server and not the time it was extracted in the system. So if we do a bundle-add && clean && bundle-remove && bundle-add, it's very likely we are going to download everything again. Is there any use case where using mtime to decide if the file is old is better than atime or ctime?
If we wan't to keep files that were recently downloaded we should use ctime, but I think that atime makes more sense because if a file was used recently, it's more likely that it's going to be used again (at least for the cases I think cache would be useful).

Another question, If the size of the cache is a problem, why don't we keep tarballs instead of staged files?

Btw, I missed one option to clean everything except Manifests. That's what I would use after each update in my system.

cmarcelo · 2018-06-06T23:57:59Z

Why was mtime used to decide if a file is old?

See "Conclusion" right in the first comment for the general idea. I'm not familiar with how reliable atime/ctime are, but would make sense to try them out for the staged files and compare (either tweaking clean or performing tests with find). It should be easy to swap which one is used in swupd clean.

Another question, If the size of the cache is a problem, why don't we keep tarballs instead of staged files?

I can say why I haven't done it: that is a larger change. The main issue we had was that nothing was being cleanup at all, so even the "hammer" solution of mtime would already improve things a lot with a low investment.

Note: I'm not sure if swupd clean is being automatically called already, but that was the plan. @matthewrsj have we enabled something like that?

Btw, I missed one option to clean everything except Manifests. That's what I would use after each update in my system.

I'd try as much as possible to avoid lots of options on this command. I've given a default behavior (that make sense to be called periodically) and a "clean everything behavior". Maybe we should just tweak the default behavior.

If the reason to keep only the manifests is making swupd search happy, I think there are plans to make it use other data source, so swupd clean --all would be what you want.

matthewrsj · 2018-06-07T00:00:18Z

Note: I'm not sure if swupd clean is being automatically called already, but that was the plan. @matthewrsj have we enabled something like that?

We have not, and that is the only reason I've been leaving this option open. I would like to run swupd clean (without --all) after every update. Even if mtime is too aggressive it will still keep the manifests around right?

cmarcelo · 2018-06-07T00:02:25Z

swupd clean will remove all manifests that are not mentioned by the current version running.

The way its done it might keep some extra manifests, but not less. So it doesn't interfere with swupd search in negative ways.

matthewrsj · 2018-06-07T00:03:31Z

swupd clean will remove all manifests that are not mentioned by the current version running.

This is desired behavior after an update. We don't need the old manifests around.

otaviobp · 2018-06-07T16:40:12Z

See "Conclusion" right in the first comment for the general idea. I'm not familiar with how reliable atime/ctime are, but would make sense to try them out for the staged files and compare (either tweaking clean or performing tests with find). It should be easy to swap which one is used in swupd clean.

As far as I understood the idea is to clean files older than x days. And I'm ok with that. The problem is the definition of old.
If we use mtime we are going to remove all files that were build before x days ago.
If we use atime we are going to remove all files that weren't accessed in the system for more than x days.
If we use ctime we are going to remove all files that were extracted before x days ago.

I commented in this issue because I'm running some tests and I realized that swupd clean was cleaning more files than I thought it would clean. For example:

# ls /var/lib/swupd/staged/|wc -l
0
# swupd bundle-add shells
Downloading packs...

Extracting shells pack for version 22690
...100%
Installing bundle(s) files...
	...100%
Calling post-update helper scripts.
Bundle(s) installation done.
# ls /var/lib/swupd/staged/|wc -l
2784
# swupd clean 
2784 files removed.
# ls /var/lib/swupd/staged/|wc -l
0

Is this the expected?

cmarcelo · 2018-06-07T17:04:51Z

Is this the expected?

No. I wasn't explicit in my first reply, I agree this should be fixed.

If we use atime we are going to remove all files that weren't accessed in the system for more than x days.
If we use ctime we are going to remove all files that were extracted before x days ago.

I'd prefer going with ctime here. In many cases we are using that file but we don't access it (because the update says it did not change, so we have no reason to touch it if the hash matches).

icarus-sparry · 2018-06-07T17:22:35Z

Just to add to the fun, we play games with the times of files in the build process. The "export SOURCE_DATE_EPOCH=12345" stuff in the spec files drives this. This is necessary (but not sufficient) to get reproducible builds as the rpm files (which have cpio archives inside them) contain the dates of artifacts.

You don't want to see the games played with '*.pyc' files (hinted at by "delete lots of python files").

So this pretty much rules out mtime.

So we have ctime, which pretty much is the last time the file was written to (yes chmod will change ctime) or atime, which should be the last time the file was opened but often isn't because filesystems are mounted with the relatime option rather than strictatime. Fortunately relatime these days does update atime at least once a day.

ahkok · 2018-06-07T20:47:21Z

I've come back entirely on my earlier assessment. Given the lack of data, I think we should drastically change our strategy:

stop using tmpfiles, entirely
after swupd update, delete all

that's it. We can try to do a little smartness to just retain latest manifests, but everything else (staged, $VER folders) should just go.

Right now we're hurting users badly with 3-5gb of wasted storage that is only going to be used in the worst case, and even then we know verify is going to be slow anyway.

mbelluzzo · 2018-06-07T20:56:35Z

Except by:

stop using tmpfiles, entirely

And only because I don't know how its being used and why. I'm with ahkok on that.

otaviobp · 2018-06-07T20:56:49Z

after swupd update, delete all

+1

ahkok · 2018-06-07T21:32:46Z

And only because I don't know how its being used and why

we have a tmpfiles.d entry for swupd:

d /var/lib/swupd 0700 root root 4d -

However, due to the above issues, this is entirely without result.

matthewrsj · 2018-06-08T17:23:53Z

after swupd update, delete all

Eh, having the manifests for the current version stay around is a big win, especially for swupd search and bundle-add/remove code. I'd be happy with a clean option that deleted all content, but left the manifests referenced in the current MoM behind.

cmarcelo · 2018-06-08T17:26:31Z

@matthewrsj after #476 swupd clean will do exactly what you are suggesting (and what Auke suggested).

swupd clean deletes everything except the relevant manifests (and any other metadata in the future we think is worth keeping in the future)
swupd clean --all deletes everything

matthewrsj · 2018-06-08T17:36:00Z

@cmarcelo ok that's fine. I'm just responding that we don't want to run swupd clean --all after an update and delete everything, which is what I understood from after swupd update, delete all

cmarcelo · 2018-06-08T17:37:11Z

@matthewrsj Got it!

matthewrsj mentioned this issue Feb 27, 2018

Add post update action for wiping the state directory #304

Closed

cmarcelo mentioned this issue Jun 7, 2018

clean: use ctime instead of mtime #473

Closed

cmarcelo mentioned this issue Jun 7, 2018

clean: do not keep content around #476

Merged

matthewrsj mentioned this issue Jun 16, 2018

clean: separate into callable function and use in update #495

Merged

matthewrsj closed this as completed in #495 Jun 18, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`/var/lib/swupd/staged` is retaining too many files #378

`/var/lib/swupd/staged` is retaining too many files #378

ahkok commented Feb 12, 2018

bryteise commented Feb 12, 2018

matthewrsj commented Feb 12, 2018

ahkok commented Feb 12, 2018

ahkok commented Feb 12, 2018

matthewrsj commented Feb 12, 2018

matthewrsj commented Feb 12, 2018

ahkok commented Feb 12, 2018

matthewrsj commented Feb 12, 2018

bryteise commented Feb 12, 2018

icarus-sparry commented Feb 12, 2018

phmccarty commented Feb 12, 2018

matthewrsj commented Feb 12, 2018

ahkok commented Feb 12, 2018

ahkok commented Feb 12, 2018

icarus-sparry commented Feb 12, 2018

cmarcelo commented Feb 12, 2018

cmarcelo commented Feb 13, 2018

matthewrsj commented Feb 27, 2018

otaviobp commented Jun 6, 2018

cmarcelo commented Jun 6, 2018

matthewrsj commented Jun 7, 2018

cmarcelo commented Jun 7, 2018

matthewrsj commented Jun 7, 2018

otaviobp commented Jun 7, 2018

cmarcelo commented Jun 7, 2018

icarus-sparry commented Jun 7, 2018

ahkok commented Jun 7, 2018

mbelluzzo commented Jun 7, 2018 •

edited

Loading

otaviobp commented Jun 7, 2018

ahkok commented Jun 7, 2018

matthewrsj commented Jun 8, 2018

cmarcelo commented Jun 8, 2018

matthewrsj commented Jun 8, 2018

cmarcelo commented Jun 8, 2018

/var/lib/swupd/staged is retaining too many files #378

/var/lib/swupd/staged is retaining too many files #378

Comments

ahkok commented Feb 12, 2018

why doesn't systemd-tmpfiles work?

1 - prune what we know is old

2 - prune based on mtime

Conclusion

bryteise commented Feb 12, 2018

matthewrsj commented Feb 12, 2018

ahkok commented Feb 12, 2018

ahkok commented Feb 12, 2018

matthewrsj commented Feb 12, 2018

matthewrsj commented Feb 12, 2018

ahkok commented Feb 12, 2018

matthewrsj commented Feb 12, 2018

bryteise commented Feb 12, 2018

icarus-sparry commented Feb 12, 2018

phmccarty commented Feb 12, 2018

matthewrsj commented Feb 12, 2018

ahkok commented Feb 12, 2018

ahkok commented Feb 12, 2018

icarus-sparry commented Feb 12, 2018

cmarcelo commented Feb 12, 2018

cmarcelo commented Feb 13, 2018

matthewrsj commented Feb 27, 2018

otaviobp commented Jun 6, 2018

cmarcelo commented Jun 6, 2018

matthewrsj commented Jun 7, 2018

cmarcelo commented Jun 7, 2018

matthewrsj commented Jun 7, 2018

otaviobp commented Jun 7, 2018

cmarcelo commented Jun 7, 2018

icarus-sparry commented Jun 7, 2018

ahkok commented Jun 7, 2018

mbelluzzo commented Jun 7, 2018 • edited Loading

otaviobp commented Jun 7, 2018

ahkok commented Jun 7, 2018

matthewrsj commented Jun 8, 2018

cmarcelo commented Jun 8, 2018

matthewrsj commented Jun 8, 2018

cmarcelo commented Jun 8, 2018

`/var/lib/swupd/staged` is retaining too many files #378

`/var/lib/swupd/staged` is retaining too many files #378

2 - prune based on `mtime`

mbelluzzo commented Jun 7, 2018 •

edited

Loading