flux-shutdown: add --gc garbage collection option #4303

garlick · 2022-04-26T00:57:31Z

This adds some logic to rc1 and rc3 to enable offline garbage collection of a system instance using flux-dump(1) and flux-restore(1). If requested by flux shutdown --gc, a dump is produced in ${statedir} during shutdown with a RESTORE symlink pointed to it. On startup, if RESTORE exists, the current backing store is truncated and content is restored from the archive file. Then the RESTORE symlink is removed.

For example on my test system:

$ sudo flux shutdown --gc
flux-shutdown: shutdown will dump KVS (this may take some time)
broker.info[0]: cleanup.0: flux queue stop --quiet Exited (rc=0) 0.0s
broker.info[0]: cleanup.1: flux job cancelall --user=all --quiet -f --states RUN Exited (rc=0) 0.0s
broker.info[0]: cleanup.2: flux queue idle --quiet Exited (rc=0) 0.0s
broker.info[0]: cleanup-success: cleanup->shutdown 0.104717s
broker.info[0]: children-none: shutdown->finalize 0.179421ms
broker.info[0]: rc3.0: dumping content to /var/lib/flux/dump-20220425_174130.tgz
broker.info[0]: rc3.0: /usr/local/etc/flux/rc3 Exited (rc=0) 0.6s
broker.info[0]: rc3-success: finalize->goodbye 0.578407s
$ sudo ls -l /var/lib/flux
total 39684
-rw-r--r-- 1 flux flux  1073152 Apr 25 17:41 content.sqlite
-rw-r--r-- 1 flux flux    48183 Apr 25 17:41 dump-20220425_174130.tgz
-rw-r--r-- 1 flux flux 39510016 Apr 24 18:09 job-archive.sqlite
lrwxrwxrwx 1 flux flux       24 Apr 25 17:41 RESTORE -> dump-20220425_174130.tgz
$ sudo systemctl start flux
$ sudo ls -l /var/lib/flux
total 40896
-rw-r--r-- 1 flux flux     4096 Apr 25 17:42 content.sqlite
-rw-r--r-- 1 flux flux  2307232 Apr 25 17:42 content.sqlite-wal
-rw-r--r-- 1 flux flux    48183 Apr 25 17:41 dump-20220425_174130.tgz
-rw-r--r-- 1 flux flux 39510016 Apr 24 18:09 job-archive.sqlite

It's also possible to checkpoint/restart an instance with:

$ flux shutdown --dump=foo.tar.bz2
$ flux start -o,-Scontent.restore=foo.tar.bz2

although this is only practical if R hasn't changed at the moment.

Marking a WIP as I wanted to get feedback on the approach before writing tests.

I had thought maybe garbage collection could be automated somehow by tracking some metric that could be used as an indicator of the need. However, maybe getting a bit of experience with doing it manually first makes sense.

garlick · 2022-04-26T23:10:40Z

I went ahead and added test coverage here and have been testing this on my home test system, so removing the WIP. I'm still open to doing this another way if people have better ideas!

grondo · 2022-04-27T01:13:25Z

Sorry, I haven't had time to take a peek at this. I can't think of a any better interface, given that garbage collection must occur as part of a dump/restore. I can't remember, does GC happen just as a natural result of the restore?

As a more general point (and I guess unrelated to this PR), I'm a little worried there will be confusion when to use flux shutdown vs systemd for stopping a Flux system instance. I'm not sure there are many systemd services that are stopped via a different command.

garlick · 2022-04-27T02:08:47Z

Yes a dump/restore walks the KVS metadata starting with the last root hash, so when it is restored, all the unreferenced data is left behind. In addition, the archive only contains "files", unlike a file system archive created with tar (for example). So empty job directories are removed also.

One can still run systemctl stop flux on the rank 0 node to shut down the instance. And one could trigger a dump/restore as part of that by manually setting the content.dump attribute to auto. However I didn't want to encourage that because then the dump would be subject to the systemd TimeoutStopSec and would risk getting killed before the dump is complete. Hence adding the option to shutdown so it's tied to that way of bringing down the instance. I anticipate that we will add other things to flux-shutdown(1) like options to stop the queue and let running jobs complete, or to shut down in the future. So maybe it will become a natural way to stop flux.

One other weakness I see here is the dump files aren't removed and will pile up after a while. I was vaguely thinking that this could be valuable if we wanted to revert to a previous checkpoint if the db was corrupted or whatever, and that sys admins could manage the dump files with log rotation tools. Is that reasonable?

I guess the other question - is --gc an annoyingly terse option? It's just a shorthand for --dump=auto. Should we just go with that? The purpose of --gc was to provide an option that matched the desired end effect (garbage collecting the KVS). I'm open to better names.

grondo · 2022-04-27T02:22:55Z

Ah, thanks for that refresher, that was helpful. Given the above, this approach seems just fine IMO. I like the idea of extending shutdown semantics in the future. Also, if dumpfiles tend to accumulate, maybe logrotate or systemd-tmpfiles could be configured to automatically clean things up? (Edit: I see now you already mentioned this approach in your previous post. It does seem reasonable to me!)

grondo · 2022-04-27T02:23:30Z

And FWIW, I don't have a problem with --gc. It is close enough to git gc where I understand what it is meant to do.

garlick · 2022-05-01T23:14:18Z

Pushed a tmpfiles.d config file and moved "auto" dumps to $statedir/dump since that made the tmpfiles rule easier to write.

Tested on my test system instance by running flux shutdown --gc a few times and reducing the age setting for dump files, then observing that they were purged with systemd-tmpfiles --clean

codecov · 2022-05-02T15:11:19Z

Codecov Report

Merging #4303 (4384fd9) into master (d53b662) will increase coverage by 0.01%.
The diff coverage is 85.71%.

❗ Current head 4384fd9 differs from pull request most recent head 2011670. Consider uploading reports for the commit 2011670 to get more accurate results

@@            Coverage Diff             @@
##           master    #4303      +/-   ##
==========================================
+ Coverage   83.62%   83.64%   +0.01%     
==========================================
  Files         389      389              
  Lines       65388    65421      +33     
==========================================
+ Hits        54680    54720      +40     
+ Misses      10708    10701       -7

Impacted Files	Coverage Δ
src/cmd/builtin/shutdown.c	`87.27% <80.00%> (-0.73%)`	⬇️
src/modules/content-files/content-files.c	`78.91% <81.48%> (+2.69%)`	⬆️
src/modules/content-sqlite/content-sqlite.c	`63.00% <100.00%> (+0.44%)`	⬆️
src/modules/job-archive/job-archive.c	`62.13% <0.00%> (-0.74%)`	⬇️
src/shell/pmi/pmi.c	`82.29% <0.00%> (-0.66%)`	⬇️
src/common/libpmi/simple_server.c	`86.63% <0.00%> (-0.50%)`	⬇️
src/cmd/flux-module.c	`83.96% <0.00%> (-0.30%)`	⬇️
src/cmd/flux-job.c	`87.27% <0.00%> (-0.14%)`	⬇️
src/broker/overlay.c	`86.69% <0.00%> (-0.11%)`	⬇️
src/common/libsdprocess/sdprocess.c	`69.25% <0.00%> (+0.12%)`	⬆️
... and 9 more

garlick · 2022-05-02T15:11:53Z

Repushed with reference to #258

chu11

overall lgtm, just a few comments / nits I found

chu11 · 2022-05-02T21:11:34Z

src/modules/content-sqlite/content-sqlite.c

@@ -807,7 +807,7 @@ static int process_args (struct content_sqlite *ctx,
            *truncate = true;
        }
        else {
-            flux_log_error (ctx->h, "Unknown module option: '%s'", argv[i]);
+            flux_log (ctx->h, LOG_ERR, "Unknown module option: '%s'", argv[i]);


perhaps should stylize do a similar change in content-files? (content-files sets errno = EINVAL to make it not as bad)

oh and I guess w/ content-s3 too (given follow up commit to this one)

chu11 · 2022-05-02T21:18:09Z

t/t0018-content-files.t

+test_expect_success 'content-files module load fails with unknown option' '
+	test_must_fail flux module load content-files notoption
+'
+


nit, should this test be a different commit? not really related to content-files: add truncate module option

chu11 · 2022-05-02T21:40:28Z

etc/rc3

+            exit_rc=1
+        fi
+    fi
+    flux module remove ${backingmod} || exit_rc=1


should use modrm?

modrm tests $RANK before running flux module remove, but this is within a block that is already conditional on $RANK. I thought it kind of weird to trade a straightforward one-liner for a function call to do same when it was necessary to repeat the rank constraint. Does that make sense?

ahh that makes sense, you have the rank == 0 check above this.

chu11 · 2022-05-02T21:41:12Z

etc/rc1

+        fi
+    fi
+    if test -n "${dumpfile}"; then
+        flux module load ${backingmod} truncate


use modload for consistency?

same deal here

chu11 · 2022-05-02T21:41:15Z

etc/rc1

+            rm -f ${dumplink}
+        fi
+    else
+        flux module load ${backingmod}


use modload for consistency?

chu11 · 2022-05-02T21:42:29Z

etc/rc1

+        if test -n "${dumplink}"; then
+            rm -f ${dumplink}
+        fi


should only remove the link if the restore is successful?

Yes but the rc1 shebang is #!/bin/bash -e so the script aborts before the remove if the restore is unsuccessful.

Problem: rc scripts use the content backing store 'truncate' option to manage offline garbage collection, but content-sqlite does not support this option. Add a truncate module option that unlinks the database file before the database is opened, thereby emptying it. Add test.

Problem: if an unknown module option is supplied flux_log_error() is called without errno set. Log the error with flux_log (LOG_ERR) instead.

Problem: there is no option to query the number of objects held by content-files in test. Override the stats.get built-in RPC handler with one that provides the object count. So like content-sqlite, flux module stats content-files returns the count.

Problem: rc scripts use the content backing store 'truncate' option to manage offline garbage collection, but content-files does not support this option. Add a truncate module option that recursively removes the db dir before the database is opened. Add test.

Problem: rc scripts use the content backing store 'truncate' option to manage offline garbage collection, but content-s3 does not support this option. Add a truncate module option. Since emptying an s3 bucket is not directly suported by libs3, this is a rather involved process. For now, if this option is supplied, log an error instructing the user to purge the bucket using s3 console or another mechanism and return failure. Add test.

Problem: we need a way to tell rc scripts to restore content on startup, and dump content on shutdown, for offline KVS garbage collection of a system instance or user checkpoint/restart. Add some logic to rc1 and rc3: rc1: If the content.restore broker attribute is set to a file path, then load the content backing store module with the 'truncate' option, and restore content from the file before loading the KVS. rc3: If the content.dump broker attribute is set to a file path, then dump content to the file after unloading the KVS. Additionally, if content.restore=auto, then rc1 looks for a symlink named RESTORE in the broker's current working directory or ${statedir} if defined. If the symlink exists, then restore content from the file it points to and remove the symlink on success. If content.dump=auto, then rc3 dumps content to an automatically generated file name containing the date in the current working directory or ${statedir} if defined, and creates the RESTORE symlink pointing to it.

Problem: content.restore is not set for the system instance, so automatic restore from a dump for garbage collection purposes cannot be automated. Set content.restore=auto, so if the ${statedir}/RESTORE symlink exists, content will be truncated and then restored from a previously created archive.

Problem: a system instance that runs flux-dump(1) from rc3 might get killed by systemd TimeoutStopSec. Have flux-shutdown(1) arrange for the dump. If the instance is being shut down by this method, then systemctl stop is not being run, so TimeoutStopSec does not apply. Fixes flux-framework#258

Problem: system tests do not set statedir like systemd unit file. Set statedir to a subdirectory under $workdir.

Problem: there is no test coverage for offline KVS garbage collection. Add a sharness script that exercises this functionality. Augment the shutdown-cmd sharness script to cover new shutdown options.

Problem: dump files created for garbage collection may accumulate in $statedir of a system instance. Install a tmpfiles.d config file that removes dumps older than 30 days.

Problem: content-files logs "<option>: Invalid argument" on an invalid module option, rather than mentioning "module option" in the error, which would be more helpful. Fix log message.

Problem: content-s3 logs "<option>" on an invalid module option, which is a bit vague. Change error message to be more descriptive.

Problem: the sharness test for content-files does not cover a bad module option. Add test.

garlick · 2022-05-02T22:44:00Z

Just pushed fixes based on @chu11's comments, and also rebased on current master.

chu11

LGTM!

garlick · 2022-05-02T23:09:46Z

Thanks! I'll set MWP.

garlick force-pushed the content_truncate branch 2 times, most recently from 0e97b7c to bebe13d Compare April 26, 2022 21:31

garlick changed the title ~~WIP: flux-shutdown: add --gc garbage collection option~~ flux-shutdown: add --gc garbage collection option Apr 26, 2022

garlick force-pushed the content_truncate branch 4 times, most recently from 0d2f2d7 to 4384fd9 Compare May 1, 2022 22:44

garlick force-pushed the content_truncate branch from 4384fd9 to 2011670 Compare May 2, 2022 15:10

garlick mentioned this pull request May 2, 2022

automate KVS garbage collection #4311

Closed

chu11 reviewed May 2, 2022

View reviewed changes

garlick added 12 commits May 2, 2022 15:43

content-sqlite: fix uninitialized errno

27153d0

Problem: if an unknown module option is supplied flux_log_error() is called without errno set. Log the error with flux_log (LOG_ERR) instead.

testsuite: set statedir attr for system tests

9c4b56c

Problem: system tests do not set statedir like systemd unit file. Set statedir to a subdirectory under $workdir.

testsuite: cover offline KVS garbage collection

e9cfaa7

Problem: there is no test coverage for offline KVS garbage collection. Add a sharness script that exercises this functionality. Augment the shutdown-cmd sharness script to cover new shutdown options.

tmpfiles: add tmpfiles.d config to purge old dumps

9f6da91

Problem: dump files created for garbage collection may accumulate in $statedir of a system instance. Install a tmpfiles.d config file that removes dumps older than 30 days.

content-files: improve log message on bad option

fb2ad9c

Problem: content-files logs "<option>: Invalid argument" on an invalid module option, rather than mentioning "module option" in the error, which would be more helpful. Fix log message.

garlick added 2 commits May 2, 2022 15:43

content-s3: improve log message on bad option

ddec542

Problem: content-s3 logs "<option>" on an invalid module option, which is a bit vague. Change error message to be more descriptive.

testsuite: cover bad content-files module option

2ae2799

Problem: the sharness test for content-files does not cover a bad module option. Add test.

garlick force-pushed the content_truncate branch from 2011670 to 2ae2799 Compare May 2, 2022 22:43

chu11 approved these changes May 2, 2022

View reviewed changes

garlick added the merge-when-passing label May 2, 2022

mergify bot merged commit d9c64e7 into flux-framework:master May 2, 2022

garlick deleted the content_truncate branch May 3, 2022 13:18

garlick mentioned this pull request May 3, 2022

content-s3: cosmetic cleanup #4314

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

flux-shutdown: add --gc garbage collection option #4303

flux-shutdown: add --gc garbage collection option #4303

garlick commented Apr 26, 2022

garlick commented Apr 26, 2022

grondo commented Apr 27, 2022

garlick commented Apr 27, 2022 •

edited

Loading

grondo commented Apr 27, 2022 •

edited

Loading

grondo commented Apr 27, 2022

garlick commented May 1, 2022

codecov bot commented May 2, 2022

garlick commented May 2, 2022

chu11 left a comment

chu11 May 2, 2022

chu11 May 2, 2022

chu11 May 2, 2022

chu11 May 2, 2022

garlick May 2, 2022

chu11 May 2, 2022

chu11 May 2, 2022

garlick May 2, 2022

chu11 May 2, 2022

garlick May 2, 2022

chu11 May 2, 2022

garlick May 2, 2022

garlick commented May 2, 2022

chu11 left a comment

garlick commented May 2, 2022

flux-shutdown: add --gc garbage collection option #4303

flux-shutdown: add --gc garbage collection option #4303

Conversation

garlick commented Apr 26, 2022

garlick commented Apr 26, 2022

grondo commented Apr 27, 2022

garlick commented Apr 27, 2022 • edited Loading

grondo commented Apr 27, 2022 • edited Loading

grondo commented Apr 27, 2022

garlick commented May 1, 2022

codecov bot commented May 2, 2022

Codecov Report

garlick commented May 2, 2022

chu11 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

garlick commented May 2, 2022

chu11 left a comment

Choose a reason for hiding this comment

garlick commented May 2, 2022

garlick commented Apr 27, 2022 •

edited

Loading

grondo commented Apr 27, 2022 •

edited

Loading