Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

automate KVS garbage collection #4311

Closed
Tracked by #4428
garlick opened this issue May 2, 2022 · 1 comment · Fixed by #4528
Closed
Tracked by #4428

automate KVS garbage collection #4311

garlick opened this issue May 2, 2022 · 1 comment · Fixed by #4528
Assignees

Comments

@garlick
Copy link
Member

garlick commented May 2, 2022

Problem: assuming #4303 is merged, we have a mechanism for offline KVS garbage collection that has to be manually invoked by running flux shutdown --gc. We need a way to trigger this automatically based on an estimate of how productive it would be.

For example, maybe if flux shutdown is invoked without --gc, and the KVS root sequence number is past some threshold, OR the number of jobs purged is beyond some threshold, garbage collection could be automatically selected. Possibly with a prompt e.g.

$ sudo flux shutdown
OK to perform dump/restore for offline KVS garbage collection (Y/N)?
@garlick
Copy link
Member Author

garlick commented Jul 29, 2022

Couple of notes:

  • once kvs: add root sequence number to checkpoint object #4446 is resolved, the sequence number counting from 0 to some threshold may be used as an indicator of how "busy" the KVS has been since the last dump.
  • the root sequence number resets to zero across GC because the checkpoint is not part of the dump
  • The flux restore --checkpoint called from rc1 ensures that a new checkpoint (seq=0) is written during startup after the dump is restored. The kvs module on rank 0 initializes from this when is loaded.
  • The sequence threshold should be configurable via TOML [kvs] table and be disabled by default
  • The actual GC should be triggered by flux shutdown, to be carried out in rc3, as it is now when --gc option is specified.
  • If threshold is exceeded, add a prompt to flux shutdown like: "flux-shutdown: gc threshold exceeded, do you want to perform garbage collection (Y/n)?"
  • It is important that GC be initiated by the shutdown command, so that the dump is not inadvertently performed under the systemd stop timeout.i

@chu11 chu11 self-assigned this Aug 26, 2022
chu11 added a commit to chu11/flux-core that referenced this issue Aug 30, 2022
Problem: KVS garbage collection is only done when an
administrator runs flux-shutdown and chooses to
garbage collect via the --dump or --gc options.

Solution: Support a kvs gc-threshold configuration option.
This configuration will take an integer count of KVS changes
(the KVS version number or sequence number).  Once the threshold
has been crossed, flux-shutdown will ask the user if they wish to
garbage collect.  This offers an easy way for administrators
to be reminded of garbage collection on a regular basis.

Fixes flux-framework#4311
chu11 added a commit to chu11/flux-core that referenced this issue Aug 30, 2022
Problem: KVS garbage collection is only done when an
administrator runs flux-shutdown and chooses to
garbage collect via the --dump or --gc options.

Solution: Support a kvs gc-threshold configuration option.
This configuration will take an integer count of KVS changes
(the KVS version number or sequence number).  Once the threshold
has been crossed, flux-shutdown will ask the user if they wish to
garbage collect.  This offers an easy way for administrators
to be reminded of garbage collection on a regular basis.

Fixes flux-framework#4311
chu11 added a commit to chu11/flux-core that referenced this issue Aug 30, 2022
Problem: KVS garbage collection is only done when an
administrator runs flux-shutdown and chooses to
garbage collect via the --dump or --gc options.

Solution: Support a kvs gc-threshold configuration option.
This configuration will take an integer count of KVS changes
(the KVS version number or sequence number).  Once the threshold
has been crossed, flux-shutdown will ask the user if they wish to
garbage collect.  This offers an easy way for administrators
to be reminded of garbage collection on a regular basis.

Fixes flux-framework#4311
chu11 added a commit to chu11/flux-core that referenced this issue Aug 30, 2022
Problem: KVS garbage collection is only done when an
administrator runs flux-shutdown and chooses to
garbage collect via the --dump or --gc options.

Solution: Support a kvs gc-threshold configuration option.
This configuration will take an integer count of KVS changes
(the KVS version number or sequence number).  Once the threshold
has been crossed, flux-shutdown will ask the user if they wish to
garbage collect.  This offers an easy way for administrators
to be reminded of garbage collection on a regular basis.

Fixes flux-framework#4311
chu11 added a commit to chu11/flux-core that referenced this issue Aug 30, 2022
Problem: KVS garbage collection is only done when an
administrator runs flux-shutdown and chooses to
garbage collect via the --dump or --gc options.

Solution: Support a kvs gc-threshold configuration option.
This configuration will take an integer count of KVS changes
(the KVS version number or sequence number).  Once the threshold
has been crossed, flux-shutdown will ask the user if they wish to
garbage collect.  This offers an easy way for administrators
to be reminded of garbage collection on a regular basis.

Fixes flux-framework#4311
chu11 added a commit to chu11/flux-core that referenced this issue Aug 31, 2022
Problem: KVS garbage collection is only done when an
administrator runs flux-shutdown and chooses to
garbage collect via the --dump or --gc options.

Solution: Support a kvs gc-threshold configuration option.
This configuration will take an integer count of KVS changes
(the KVS version number or sequence number).  Once the threshold
has been crossed, flux-shutdown will ask the user if they wish to
garbage collect.  This offers an easy way for administrators
to be reminded of garbage collection on a regular basis.

Fixes flux-framework#4311
chu11 added a commit to chu11/flux-core that referenced this issue Aug 31, 2022
Problem: KVS garbage collection is only done when an
administrator runs flux-shutdown and chooses to
garbage collect via the --dump or --gc options.

Solution: Support a kvs gc-threshold configuration option.
This configuration will take an integer count of KVS changes
(the KVS version number or sequence number).  Once the threshold
has been crossed, flux-shutdown will ask the user if they wish to
garbage collect.  Additional options are added to specify yes/no
if the user is scripting with flux-shutdown.

This offers an easy way for administrators to be reminded of garbage
collection on a regular basis.

Fixes flux-framework#4311
chu11 added a commit to chu11/flux-core that referenced this issue Aug 31, 2022
Problem: KVS garbage collection is only done when an
administrator runs flux-shutdown and chooses to
garbage collect via the --dump or --gc options.

Solution: Support a kvs gc-threshold configuration option.
This configuration will take an integer count of KVS changes
(the KVS version number or sequence number).  Once the threshold
has been crossed, flux-shutdown will ask the user if they wish to
garbage collect.  Additional options are added to specify yes/no
if the user is scripting with flux-shutdown.

This offers an easy way for administrators to be reminded of garbage
collection on a regular basis.

Fixes flux-framework#4311
@mergify mergify bot closed this as completed in #4528 Sep 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants