Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvs: support gc-threshold config #4528

Merged

Conversation

chu11
Copy link
Member

@chu11 chu11 commented Aug 30, 2022

Problem: KVS garbage collection is only done when an administrator runs flux-shutdown and chooses to garbage collect via the --dump or --gc options.

Solution: Support a kvs gc-threshold configuration option. This configuration will take an integer count of KVS changes (the KVS version number or sequence number). Once the threshold has been crossed, flux-shutdown will ask the user if they wish to garbage collect. This offers an easy way for administrators to be reminded of garbage collection on a regular basis.

Fixes #4311

@chu11 chu11 force-pushed the issue4311_automate_kvs_garbage_collection branch from 367097d to 39e3e64 Compare August 30, 2022 20:20
Comment on lines 24 to 30
static void get_checkpoint_sequence (flux_t *h, int *seq) {
flux_future_t *f;
(*seq) = 0;
if (!(f = kvs_checkpoint_lookup (h, NULL, 0))
|| (kvs_checkpoint_lookup_get_sequence (f, seq) < 0
&& errno != ENOENT))
log_msg_exit ("Error fetching checkpoint sequence: %s",
future_strerror (f, errno));
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick comment as I head out to a doctor's apt:

  • open brace for the function should be on a line by itself per project style
  • I think we'll want to check the current live kvs root sequence number, not the stored checkpoint since the latter will be out of date

Comment on lines 26 to 31
(optional) Sets the KVS garbage collection sequence number
threshold. Once this threshold is crossed in the KVS checkpoint,
it will inform :man1:`flux-shutdown` to ask the user to perform
offline KVS garbage collection. It is recommended the
``checkpoint-period`` configuration also be configured if this is
set.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change:

(optional) Sets the number of KVS commits (distinct root snapshots) after which offline garbage collection is performed by flux-shutdown(1). A value of 100000 may be a good starting point. (Default: garbage collection must be manually requested with flux-shutdown --gc).

@chu11 chu11 force-pushed the issue4311_automate_kvs_garbage_collection branch 2 times, most recently from 8abbe8e to e371c08 Compare August 30, 2022 23:16
@chu11
Copy link
Member Author

chu11 commented Aug 30, 2022

re-pushed with fixes per comment above.

Copy link
Member

@garlick garlick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great!
Just one comment about the prompt machinery.

Comment on lines 54 to 57
printf ("gc threshold exceeded, do you want to perform garbage collection (Y/n)? ");
scanf ("%ms", &s);
if (!s || strncasecmp (s, "y", 1) == 0)
rc = true;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I just type a carriage return here, I get no feedback but scanf is still waiting for input. Maybe something like the following would work better, and would handle stdin not a tty also (e.g. when scripted)

bool askyn (char *prompt, bool default_value)
{
    char buf[16];
    int i;

    for (;;) {
        printf ("%s [%s]? ", prompt, default_value ? "Y/n" : "y/N");
        fflush (stdout);
        if (!isatty (STDIN_FILENO)
            || fgets (buf, sizeof (buf), stdin) == NULL
            || buf[0] == '\n')
            break;
        if (buf[0] == 'y' || buf[0] == 'Y')
            return true;
        if (buf[0] == 'n' || buf[0] == 'N')
            return false;
        printf ("Please answer y or n\n");
    };
    return default_value;
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good, and it seems that several builders didn't like the newer %ms that (I thought) was supported in all newer scanfs

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on second thought, it might be a little weird to take the default on stdin EOF or !isatty(). Perhaps this function should be modified so that it can return an error in those cases, and the caller could do something like abort and suggest a -y option?

Also, this seems like it could be a useful libutil function for reuse...

Copy link
Member Author

@chu11 chu11 Aug 31, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was actually thinking about that when I wrote the initial above. Wouldn't it be normal for people to echo "y" | flux shutdown into a script? so I didn't think checking for a tty is a good idea. Unless we want to force the idea that this is something people should only run by hand and --dump or --gc is the way to go with a script?

An aside, if we check for !isatty(), not 100% how to adjust my tests. Not sure if we use expect or similar in other unit tests (we must do something like that in flux-top tests, will check later).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use the runpty script (as used by flux top and other pty driven tests)

@chu11 chu11 force-pushed the issue4311_automate_kvs_garbage_collection branch 2 times, most recently from 8107b7b to e9e6b35 Compare August 31, 2022 20:52
@chu11
Copy link
Member Author

chu11 commented Aug 31, 2022

re-pushed, I went with the approach of

  • if tty, flux shutdown outputs the question for the user and they need to answer yes/no/
  • if no tty, returns error unless user specifies -y or -n.
  • add more tests to cover (i hope) all scenarios

Copy link
Member

@garlick garlick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a couple of minor nits o/w LGTM!
(I did test it out on my test system).

I think this is an important feature so thanks for pushing it through!

(void)json_unpack (o, "{s:{s:i}}", "kvs", "gc-threshold", gc_threshold);
}

int askyn (char *prompt, bool default_value, bool *result)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: make askyn() static like the other local functions.

Comment on lines 196 to 201
{ .name = "yes", .key = 'y', .has_arg = 0,
.usage = "If garbage collection threshold exceeded, "
"perform garbage collection",
},
{ .name = "no", .key = 'n', .has_arg = 0,
.usage = "If garbage collection threshold exceeded, "
"do not perform garbage collection",
},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about for the usage just "answer y to all y/n questions"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and will tweak manpage as well

chu11 added 2 commits August 31, 2022 16:46
Problem: KVS garbage collection is only done when an
administrator runs flux-shutdown and chooses to
garbage collect via the --dump or --gc options.

Solution: Support a kvs gc-threshold configuration option.
This configuration will take an integer count of KVS changes
(the KVS version number or sequence number).  Once the threshold
has been crossed, flux-shutdown will ask the user if they wish to
garbage collect.  Additional options are added to specify yes/no
if the user is scripting with flux-shutdown.

This offers an easy way for administrators to be reminded of garbage
collection on a regular basis.

Fixes flux-framework#4311
@chu11 chu11 force-pushed the issue4311_automate_kvs_garbage_collection branch from e9e6b35 to 9ddda87 Compare August 31, 2022 23:52
@chu11
Copy link
Member Author

chu11 commented Aug 31, 2022

re-pushed with fixes per comments above

@mergify mergify bot merged commit f2cd2b4 into flux-framework:master Sep 1, 2022
@chu11 chu11 deleted the issue4311_automate_kvs_garbage_collection branch September 1, 2022 05:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

automate KVS garbage collection
3 participants