Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvs: add date to kvs-primary checkpoint #4136

Merged
merged 1 commit into from
Feb 18, 2022

Conversation

chu11
Copy link
Member

@chu11 chu11 commented Feb 15, 2022

Thought I'd tackle this old issue. I did a relatively simple thing for the check pointing, writing out a serialized json object storing the rootref + timestamp instead of just the rootref.

Multiple other possibilities, but thought this was the simplest / reasonable (i.e. no need to change content store implementation). Other options include writing two keys out instead of one.

Note that I intentionally did not support backwards compatibility. Figured use of this was rare at the moment.

Problem: It'd be convenient if we knew the date when the kvs
primary checkpoint was checkpointed.

Solution: When checkpointing the primary KVS, store a json
object with both the rootref and timestamp, instead of just
the rootref string.  On retrieval, parse appropriately and
retrieve timestamp for output in logs.

Fixes #3580

@chu11
Copy link
Member Author

chu11 commented Feb 16, 2022

hmmmm few builders failing with

  not ok 36 - unloaded kvs module
  FAIL: t0017-security.t 36 - unloaded kvs module
  #	
  #		flux module remove kvs
  #	
  not ok 37 - flux content flush not allowed for guest user
  FAIL: t0017-security.t 37 - flux content flush not allowed for guest user
  #	
  #		! FLUX_HANDLE_ROLEMASK=0x2 flux content flush 2>content.flush.err &&
  #		grep -q "Operation not permitted" content.flush.err
  #	
  not ok 38 - flux content load/store allowed for guest user
  FAIL: t0017-security.t 38 - flux content load/store allowed for guest user
  #	
  #		echo Hello >content.store.value &&
  #		FLUX_HANDLE_ROLEMASK=0x2 \
  #		    flux content store <content.store.value >content.store.ref &&
  #		FLUX_HANDLE_ROLEMASK=0x2 \
  #		    flux content load $(cat content.store.ref) >content.load.value &&
  #		test_cmp content.store.value content.load.value
  #	
  # failed 3 among 38 test(s)

not entirely sure how what i did could have mucked up these tests, and only on these builders. hmmm

@garlick
Copy link
Member

garlick commented Feb 16, 2022

I like the idea of json here since we can add other metadata as it comes up without necessarily breaking backwards compatibility.

However, breaking backwards compatibility today will require fluke and elmerfudd to blow away their content.sqlite file and start over. It might be good to make an effort to avoid that, for example if unpacking the JSON fails, fall back to the old method? It may also be a good idea to include a version number in the json object to make parsing multiple versions easier going forward.

@garlick
Copy link
Member

garlick commented Feb 16, 2022

Although.... if were were to have an eventlog in the KVS that's posted to on shutdown, right before the checkpoint is written, we could get the same effect and already have tooling for it. I was already thinking about something like that in conjunction with #4128. Can we pause and ponder?

@chu11 chu11 changed the title kvs: add date to kvs-primary checkpoint [WIP] kvs: add date to kvs-primary checkpoint Feb 16, 2022
@chu11
Copy link
Member Author

chu11 commented Feb 16, 2022

Can we pause and ponder?

Seems worthwhile to pause and think about. This was somewhat experimental.

@garlick
Copy link
Member

garlick commented Feb 16, 2022

Thinking about this a bit more, this seems like a perfectly fine solution to #3580. The use of an eventlog for tracking intermediate checkpoints or whatever could be a totally separate thing handled at a higher level than this. (And in fact reading an eventlog within the kvs itself seems like a level violation of some kind).

I think you should carry on but do add a version, and do support restoring from the current checkpoint format if that's not too hard.

@chu11 chu11 force-pushed the issue3580_kvs_checkpoint_date branch from 662e768 to f80b1a5 Compare February 16, 2022 20:42
@chu11 chu11 changed the title [WIP] kvs: add date to kvs-primary checkpoint kvs: add date to kvs-primary checkpoint Feb 16, 2022
@chu11
Copy link
Member Author

chu11 commented Feb 16, 2022

re-pushed, addressing the comments made by @garlick above

@chu11
Copy link
Member Author

chu11 commented Feb 16, 2022

doh!

  ./t2010-kvs-snapshot-restore.t: 3: eval: /usr/bin/sqlite3: not found
  ./t2010-kvs-snapshot-restore.t: 3: eval: /usr/bin/sqlite3: not found

will try and redo test using python sqlite3

Copy link
Member

@garlick garlick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like code footprint could be greatly reduced here.

@@ -2707,14 +2707,50 @@ static void process_args (struct kvs_ctx *ctx, int ac, char **av)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commit message for f80b1a5:

Maybe should read "...when the primary namespace was checkpointed."

and "When checkpointing the primary namespace..."

@@ -2707,14 +2707,50 @@ static void process_args (struct kvs_ctx *ctx, int ac, char **av)
}
}

static int checkpoint_get_version0 (flux_t *h, const char *value,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Drop unused flux_t * parameter.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: don't bother with errno here since there's only ever one value.

return 0;
}

static int checkpoint_get_version1 (flux_t *h, json_t *o,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Drop unused flux_t * parameter.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: take string value not json_t * and do the json_loads() in this function.
Similarly, don't waste effort on errno here.

Comment on lines 2770 to 2734
if (!(o = json_loads (value, 0, NULL))) {
errno = EINVAL;
goto error;
}
strcpy (buf, value);
flux_future_destroy (f);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With above changes, this can just be

if (checkpoint_get_version1() < 0 && checkpoint_get_version0() < 0 {
    // error interpreting checkpoint object
}

Comment on lines 2800 to 2807
static int get_timestamp_now (double *timestamp)
{
struct timespec ts;
if (clock_gettime (CLOCK_REALTIME, &ts) < 0)
return -1;
*timestamp = (1E-9 * ts.tv_nsec) + ts.tv_sec;
return 0;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could flux_reactor_now() be used instead of adding this function?

Comment on lines +2856 to +2848
json_decref (o);
free (value);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potentially clobbers errno.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

json_decref() and free() potentially clobber errno.

Comment on lines 2956 to 2966
char datestr[128];
time_t sec = timestamp;
struct tm tm;
if (timestamp > 0.) {
gmtime_r (&sec, &tm);
strftime (datestr, sizeof (datestr), "%FT%T", &tm);
}
else
snprintf (datestr, sizeof (datestr), "N/A");
flux_log (h, LOG_INFO,
"restored kvs-primary from checkpoint on %s", datestr);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably be in its own function, if not folded into checkpoint_get().

@chu11 chu11 force-pushed the issue3580_kvs_checkpoint_date branch 2 times, most recently from d0ae5b2 to db3f6fa Compare February 17, 2022 20:42
@chu11
Copy link
Member Author

chu11 commented Feb 17, 2022

re-pushed addressing all of @garlick's comments + fixing the /usr/bin/sqlite3 CI issue and fixing up some stupid things along the way, including a mem corruption i accidentally introduced that only showed up on CI el8!!

I kept everything as fixups, lemme know if it would be preferred to just squash all the fixups.

@chu11 chu11 force-pushed the issue3580_kvs_checkpoint_date branch from db3f6fa to 3175ec6 Compare February 17, 2022 21:00
@garlick
Copy link
Member

garlick commented Feb 17, 2022

Thanks! I think you can squash the fixups, then I'll give it another pass.

@chu11 chu11 force-pushed the issue3580_kvs_checkpoint_date branch from 3175ec6 to 3bee205 Compare February 17, 2022 22:00
@chu11
Copy link
Member Author

chu11 commented Feb 17, 2022

squashed all the fixups and re-pushed

as an aside, I tried to generate the change-checkpoint.py script into the middle of the test to avoid creating yet another helper script. But there were so many quoting & escaping issues, it became a pain, thus putting it off into a separate script.

@grondo
Copy link
Contributor

grondo commented Feb 17, 2022

as an aside, I tried to generate the change-checkpoint.py script into the middle of the test to avoid creating yet another helper script. But there were so many quoting & escaping issues, it became a pain, thus putting it off into a separate script.

Sometimes it is easier to use a here-doc outside of any test_expect_success(), then you don't have to deal with the quoting issues. (e.g. see t0016-cron-faketime.t)

Copy link
Member

@garlick garlick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good. Just noticed one minor errno clobber looks like it's still there.

Couple questions:

  • where does the python sqlite3 module come from? Is there a dependency that configure.ac should be checking for? (I realize it wasn't introduced in this PR, so just wondering)
  • I have another PR going right now that calls flux_kvs_getroot() then writes a checkpoint. I wonder if it would be a good idea to add some non-exported functions to libkvs for reuse within flux-core?

I could actually take your functions and turn them into shared ones in my PR if that helps keep things moving along.

Comment on lines +2856 to +2848
json_decref (o);
free (value);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

json_decref() and free() potentially clobber errno.

@chu11 chu11 force-pushed the issue3580_kvs_checkpoint_date branch from 3bee205 to 48d89cd Compare February 17, 2022 22:31
@chu11
Copy link
Member Author

chu11 commented Feb 17, 2022

just did a minor re-push, i realized flux kvs getroot | jq -r .data[0] could be replaced with flux kvs getroot -b and thus jq is no longer required for those tests. Hopefully didn't clobber anything you were reviewing.

@chu11
Copy link
Member Author

chu11 commented Feb 17, 2022

where does the python sqlite3 module come from? Is there a dependency that configure.ac should be checking for? (I realize it wasn't introduced in this PR, so just wondering)

Good question, I initially assumed we didn't have it installed, then to my surprise its actually packaged with the base python3-libs package!

I have another PR going right now that calls flux_kvs_getroot() then writes a checkpoint. I wonder if it would be a ood idea to add some non-exported functions to libkvs for reuse within flux-core?

I had pondered about this too, b/c we checkpoint the guest namespaces in job-exec. Although that checkpointing is a bit of a different beast, was unsure if it would be worthwhile to put all those checkpoints in the same format.

I could actually take your functions and turn them into shared ones in my PR if that helps keep things moving along.

If you're thinking they'd be useful, perhaps some higher level functions would be worthwhile. Are you mostly thinking about functions that would create the checkpoint object for users? and presumably parse said checkpoint object?

Problem: It'd be convenient if we knew the date when the
primary namespace was checkpointed.

Solution: When checkpointing the primary namespace, store a json
object with version, rootref, and timestamp, instead of just
the rootref string.  On retrieval, parse appropriately and
retrieve timestamp for output in logs.  Support the original
checkpointing format by checking if the checkpoint object
is a raw blobref string first.

Fixes flux-framework#3580
@chu11 chu11 force-pushed the issue3580_kvs_checkpoint_date branch from 48d89cd to e1b0450 Compare February 17, 2022 22:43
@chu11
Copy link
Member Author

chu11 commented Feb 17, 2022

re-pushed, fixing up the potential errno clobber @garlick noticed above. Just did a save_errno = errno kinda thing.

@garlick
Copy link
Member

garlick commented Feb 17, 2022

What I have at the moment is a flux kvs flush command that calls

  1. flux_kvs_getroot()
  2. RPC to content.flush on rank 0 (flush KVS to backing store)
  3. content.checkpoint-put the rootref obtained in 1.

Calling this from cron, for example, would let us recover more data in a crash. Also, if we append to an eventlog on startup and on exit, we can detect when an instance starts up that was not properly shut down. But we need to update the checkpoint after the startup append, otherwise the instance always reverts to the last valid shutdown.

@codecov
Copy link

codecov bot commented Feb 17, 2022

Codecov Report

Merging #4136 (892953a) into master (db2c75e) will decrease coverage by 3.25%.
The diff coverage is 92.85%.

❗ Current head 892953a differs from pull request most recent head e1b0450. Consider uploading reports for the commit e1b0450 to get more accurate results

@@            Coverage Diff             @@
##           master    #4136      +/-   ##
==========================================
- Coverage   83.33%   80.08%   -3.26%     
==========================================
  Files         376      376              
  Lines       63037    62636     -401     
==========================================
- Hits        52533    50162    -2371     
- Misses      10504    12474    +1970     
Impacted Files Coverage Δ
src/modules/kvs/kvs.c 64.91% <92.85%> (-4.85%) ⬇️
src/modules/job-info/allow.c 28.57% <0.00%> (-42.86%) ⬇️
src/cmd/top/ucache.c 58.62% <0.00%> (-28.05%) ⬇️
src/modules/job-manager/getattr.c 37.03% <0.00%> (-22.23%) ⬇️
src/common/librlist/rnode.c 65.10% <0.00%> (-19.71%) ⬇️
src/common/libsubprocess/remote.c 53.21% <0.00%> (-17.65%) ⬇️
src/shell/affinity.c 62.22% <0.00%> (-17.19%) ⬇️
src/shell/output.c 61.00% <0.00%> (-15.81%) ⬇️
src/common/librouter/rpc_track.c 80.28% <0.00%> (-15.61%) ⬇️
src/broker/groups.c 68.36% <0.00%> (-14.04%) ⬇️
... and 200 more

@chu11
Copy link
Member Author

chu11 commented Feb 17, 2022

flux_kvs_getroot()
RPC to content.flush on rank 0 (flush KVS to backing store)
content.checkpoint-put the rootref obtained in 1.

would it worthwhile for content.checkpoint-get and content.checkpoint-put to be "smarter" than just taking a string and reading/writing it? Perhaps not for this PR, but could be a refactoring so that it's better in general. Like handle the version & timestamp for the user?

@garlick
Copy link
Member

garlick commented Feb 18, 2022

Well, I do think we want to keep the backends fairly "dumb" so any future changes happen in one place instead of three (or more). However, one nice change we could make would be to use a json "o" as the value, and allow it to be either a json string or a json object, so we don't have to encode the json object as a json string that we put inside a json object :-)

@chu11
Copy link
Member Author

chu11 commented Feb 18, 2022

However, one nice change we could make would be to use a json "o" as the value, and allow it to be either a json string or a json object, so we don't have to encode the json object as a json string that we put inside a json object :-)

Ahh that's a good idea. I was going to do a flux_rpc_pack ("{s:o}") in this PR until I realized I couldn't. Perhaps that could be a follow up PR, and would make your code easier / better too?

@garlick
Copy link
Member

garlick commented Feb 18, 2022

Maybe the next PR could introduce some reusable (but not public) functions in libkvs and make that protocol change?

@chu11
Copy link
Member Author

chu11 commented Feb 18, 2022

Maybe the next PR could introduce some reusable (but not public) functions in libkvs and make that protocol change?

sounds good!, will make an issue after this PR goes in (not sure why the flux-sched build is taking forever)

@chu11
Copy link
Member Author

chu11 commented Feb 18, 2022

re-kicking the tests, the sched builder got stuck in git clone for some reason

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants