-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
flux-shutdown: add --gc garbage collection option #4303
Conversation
0e97b7c
to
bebe13d
Compare
I went ahead and added test coverage here and have been testing this on my home test system, so removing the WIP. I'm still open to doing this another way if people have better ideas! |
Sorry, I haven't had time to take a peek at this. I can't think of a any better interface, given that garbage collection must occur as part of a dump/restore. I can't remember, does GC happen just as a natural result of the restore? As a more general point (and I guess unrelated to this PR), I'm a little worried there will be confusion when to use |
Yes a dump/restore walks the KVS metadata starting with the last root hash, so when it is restored, all the unreferenced data is left behind. In addition, the archive only contains "files", unlike a file system archive created with tar (for example). So empty job directories are removed also. One can still run One other weakness I see here is the dump files aren't removed and will pile up after a while. I was vaguely thinking that this could be valuable if we wanted to revert to a previous checkpoint if the db was corrupted or whatever, and that sys admins could manage the dump files with log rotation tools. Is that reasonable? I guess the other question - is |
Ah, thanks for that refresher, that was helpful. Given the above, this approach seems just fine IMO. I like the idea of extending shutdown semantics in the future. Also, if dumpfiles tend to accumulate, maybe logrotate or systemd-tmpfiles could be configured to automatically clean things up? (Edit: I see now you already mentioned this approach in your previous post. It does seem reasonable to me!) |
And FWIW, I don't have a problem with |
0d2f2d7
to
4384fd9
Compare
Pushed a tmpfiles.d config file and moved "auto" dumps to Tested on my test system instance by running |
Codecov Report
@@ Coverage Diff @@
## master #4303 +/- ##
==========================================
+ Coverage 83.62% 83.64% +0.01%
==========================================
Files 389 389
Lines 65388 65421 +33
==========================================
+ Hits 54680 54720 +40
+ Misses 10708 10701 -7
|
Repushed with reference to #258 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
overall lgtm, just a few comments / nits I found
@@ -807,7 +807,7 @@ static int process_args (struct content_sqlite *ctx, | |||
*truncate = true; | |||
} | |||
else { | |||
flux_log_error (ctx->h, "Unknown module option: '%s'", argv[i]); | |||
flux_log (ctx->h, LOG_ERR, "Unknown module option: '%s'", argv[i]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perhaps should stylize do a similar change in content-files
? (content-files
sets errno = EINVAL to make it not as bad)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh and I guess w/ content-s3
too (given follow up commit to this one)
test_expect_success 'content-files module load fails with unknown option' ' | ||
test_must_fail flux module load content-files notoption | ||
' | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit, should this test be a different commit? not really related to content-files: add truncate module option
exit_rc=1 | ||
fi | ||
fi | ||
flux module remove ${backingmod} || exit_rc=1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should use modrm
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
modrm
tests $RANK before running flux module remove
, but this is within a block that is already conditional on $RANK. I thought it kind of weird to trade a straightforward one-liner for a function call to do same when it was necessary to repeat the rank constraint. Does that make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ahh that makes sense, you have the rank == 0
check above this.
fi | ||
fi | ||
if test -n "${dumpfile}"; then | ||
flux module load ${backingmod} truncate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use modload for consistency?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same deal here
rm -f ${dumplink} | ||
fi | ||
else | ||
flux module load ${backingmod} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use modload for consistency?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ibid
if test -n "${dumplink}"; then | ||
rm -f ${dumplink} | ||
fi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should only remove the link if the restore is successful?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes but the rc1 shebang is #!/bin/bash -e
so the script aborts before the remove if the restore is unsuccessful.
Problem: rc scripts use the content backing store 'truncate' option to manage offline garbage collection, but content-sqlite does not support this option. Add a truncate module option that unlinks the database file before the database is opened, thereby emptying it. Add test.
Problem: if an unknown module option is supplied flux_log_error() is called without errno set. Log the error with flux_log (LOG_ERR) instead.
Problem: there is no option to query the number of objects held by content-files in test. Override the stats.get built-in RPC handler with one that provides the object count. So like content-sqlite, flux module stats content-files returns the count.
Problem: rc scripts use the content backing store 'truncate' option to manage offline garbage collection, but content-files does not support this option. Add a truncate module option that recursively removes the db dir before the database is opened. Add test.
Problem: rc scripts use the content backing store 'truncate' option to manage offline garbage collection, but content-s3 does not support this option. Add a truncate module option. Since emptying an s3 bucket is not directly suported by libs3, this is a rather involved process. For now, if this option is supplied, log an error instructing the user to purge the bucket using s3 console or another mechanism and return failure. Add test.
Problem: we need a way to tell rc scripts to restore content on startup, and dump content on shutdown, for offline KVS garbage collection of a system instance or user checkpoint/restart. Add some logic to rc1 and rc3: rc1: If the content.restore broker attribute is set to a file path, then load the content backing store module with the 'truncate' option, and restore content from the file before loading the KVS. rc3: If the content.dump broker attribute is set to a file path, then dump content to the file after unloading the KVS. Additionally, if content.restore=auto, then rc1 looks for a symlink named RESTORE in the broker's current working directory or ${statedir} if defined. If the symlink exists, then restore content from the file it points to and remove the symlink on success. If content.dump=auto, then rc3 dumps content to an automatically generated file name containing the date in the current working directory or ${statedir} if defined, and creates the RESTORE symlink pointing to it.
Problem: content.restore is not set for the system instance, so automatic restore from a dump for garbage collection purposes cannot be automated. Set content.restore=auto, so if the ${statedir}/RESTORE symlink exists, content will be truncated and then restored from a previously created archive.
Problem: a system instance that runs flux-dump(1) from rc3 might get killed by systemd TimeoutStopSec. Have flux-shutdown(1) arrange for the dump. If the instance is being shut down by this method, then systemctl stop is not being run, so TimeoutStopSec does not apply. Fixes flux-framework#258
Problem: system tests do not set statedir like systemd unit file. Set statedir to a subdirectory under $workdir.
Problem: there is no test coverage for offline KVS garbage collection. Add a sharness script that exercises this functionality. Augment the shutdown-cmd sharness script to cover new shutdown options.
Problem: dump files created for garbage collection may accumulate in $statedir of a system instance. Install a tmpfiles.d config file that removes dumps older than 30 days.
Problem: content-files logs "<option>: Invalid argument" on an invalid module option, rather than mentioning "module option" in the error, which would be more helpful. Fix log message.
Problem: content-s3 logs "<option>" on an invalid module option, which is a bit vague. Change error message to be more descriptive.
Problem: the sharness test for content-files does not cover a bad module option. Add test.
Just pushed fixes based on @chu11's comments, and also rebased on current master. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Thanks! I'll set MWP. |
This adds some logic to rc1 and rc3 to enable offline garbage collection of a system instance using
flux-dump(1)
andflux-restore(1)
. If requested byflux shutdown --gc
, a dump is produced in ${statedir} during shutdown with a RESTORE symlink pointed to it. On startup, if RESTORE exists, the current backing store is truncated and content is restored from the archive file. Then the RESTORE symlink is removed.For example on my test system:
It's also possible to checkpoint/restart an instance with:
although this is only practical if R hasn't changed at the moment.
Marking a WIP as I wanted to get feedback on the approach before writing tests.
I had thought maybe garbage collection could be automated somehow by tracking some metric that could be used as an indicator of the need. However, maybe getting a bit of experience with doing it manually first makes sense.