-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
improve KVS checkpoint protocol to allow for future changes #4149
improve KVS checkpoint protocol to allow for future changes #4149
Conversation
ci testing for it was added with #3025, although I'm having a bit of trouble unwinding how to run it manually. At least if you add tests to From raw logs of
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple of quick comments inline.
I think we probably only need to support "version 0" in content-sqlite, since I seriously doubt anybody has a running system instance using the other backing stores at this point.
Edit: in the other backing stores, you could leave a comment in the checkpoint_get()
functions that recovery from a version 0 checkpoint is not supported.
if (strlen (key) == 0) { | ||
errno = EINVAL; | ||
goto error; | ||
} | ||
if (!(value = json_dumps (o, JSON_ENCODE_ANY))) { | ||
errno = EINVAL; | ||
goto error; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We really only need to support reading the version 0 checkpoint, not writing it, so JSON_ENCODE_ANY is not needed in the json_dumps()
. While you're there, may as well add JSON_COMPACT. The strlen check can be dropped too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ahhh i remember why i did JSON_ENCODE_ANY before, it's b/c tons of tests like in t0012-content-sqlite
still write general strings instead of objects. Do we still want to support generically writing strings to checkpoint? I assume no b/c this is specifically the kvs checkpoint put/get rpc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I had forgotten about those. No, we should probably fix the tests to write valid objects.
@@ -353,18 +356,28 @@ void checkpoint_get_cb (flux_t *h, | |||
errno = ENOENT; | |||
goto error; | |||
} | |||
s = (char *)sqlite3_column_text (ctx->checkpt_get_stmt, 0); | |||
if (!(o = json_loads (s, JSON_DECODE_ANY, NULL))) { | |||
/* version 0 checkpoint may have been just a blobref string */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/may have been just/is/
Could also add a note that version 0 was used prior to flux-core 0.36?
Didn't think of this before, but instead of json_string(s)
maybe we should just do
o = json_pack ("{s:s s:s s:f}", "version", 0, "value", "rootref", s, "timestamp", 0.);
That would simplify the code on the other end.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
o = json_pack ("{s:s s:s s:f}", "version", 0, "value", "rootref", s, "timestamp", 0.);
I'm not following. Perhaps you messed up the json_pack? I think there's more args than the format can take (edit: and "version" is taking a string instead of int?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ooops, right - delete "value". I mean why not construct a consistent "checkpoint object" for version 0 for outside users since it doesn't add any more complexity really.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh wait, i think i know what you meant ...
if (!(o = json_loads (...))) {
/* if data is a version 0 blobref, respond with version 1 format */
if (blobref_validate (str))
o = json_pack ("{s:i s:s s:f}", "version", 0, "rootref", s, "timestamp", 0.);
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The title of this PR might be better for the release notes if it read "enhance KVS checkpoint protocol."
10497b9
to
a0b5c86
Compare
re-pushed, correctly things from above + more
Edit: apologies if anyone started reviewing, i think i pushed an old branch to github at one point. Updated ~3:29pm |
a0b5c86
to
0659a7b
Compare
Ooops I missed that this was ready for another review. Do you want to squash the fixups and I'll make another pass? |
99a334d
to
0a96fd4
Compare
206323b
to
93503b3
Compare
Maybe it would be a good idea to combine this and that? It's an internal function so probably not vital that it appear standalone in release notes. |
Sure, we can do that! |
93503b3
to
53e63b9
Compare
re-pushed, adding the new kvs checkpoint helper functions in |
hmmm, hit a concerning build error on
haven't seen this locally yet. will be adding some debug commits, perhaps |
one builder also failed with this in t3200-instance-restart. I actually saw a few times under a system with heavy load but couldn't reproduce after awhile, thought maybe system was crazy busy for a short burst. But hmmm.
|
Finally got a good piece of debug info from a local run on
then looked at
huh? why is there an extra curly brace in there? That would explain the EINVAL. Some buffer overflow or non-NULL termination? thought about it and came to following theory,
is ok. But the With the new json object format, the object can be of variable length (b/c of the timestamp field), thus we're overwriting a prior entry, and if they are of different lengths ... badness. Thus the possibility of an invalid json object eventually being stored in the So I think we need to add a Whew, this was a tough one. Hopefully this is correct :P Edit: and this also explains the seeming randomness of the errors. It seemed "racy", but it's more about dumb luck about how the timestamp is formatted in the stored |
Nice bit of sleuthing there! |
f700c0b
to
267dcbc
Compare
Codecov Report
@@ Coverage Diff @@
## master #4149 +/- ##
==========================================
- Coverage 83.42% 83.41% -0.01%
==========================================
Files 379 380 +1
Lines 63426 63471 +45
==========================================
+ Hits 52910 52947 +37
- Misses 10516 10524 +8
|
re-pushed, added a commit for adding Before adding After adding |
If this is close, don't forget to update the PR title for release notes (and I'll update release notes). |
Ah O_TRUNC looks like it was the right thing. Good job. OK, starting a review pass. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a few comments - feel free to push back if you feel I'm off base.
Also: is there a way we could restructure the commits to avoid the history churn within this PR at the call points of the kvs_checkpoint*()
helpers? For example, introduce the helpers first but implement the old protocol, then switch users to the helpers, then change protocol?
src/cmd/builtin/startlog.c
Outdated
@@ -46,26 +46,45 @@ static void content_flush (flux_t *h) | |||
static void kvs_checkpoint_put (flux_t *h, const char *treeobj) | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the commit message:
content-{sqlite,files,s3}: refactor checkpoint
This goes beyond those three modules, so the prefix should probably be dropped. Also "refactor checkpoint" does not communicate much about what is changing. Maybe something like
improve KVS checkpoint protocol to allow for future changes
Problem: the
kvs-checkpoint.get
andkvs-checkpoint.put
methods provided
by content back end modules operate on (key, blobref) tuples, but we would like
to store other metadata with checkpoints, such as a date, and not have to
change the database schema in every back end whenever the value format changes.Solution: change the stored value to a json object that contains a blobref, a date,
and a version=1. The back ends just passively shuttle the json object without
interpretation, so they should not need to be changed for updates going forward.
There is one exception this time: to avoid losing data on system instance installations
when rolling out this change,content-sqlite
translates the old format to a json object
with version=0.Update users of the
kvs-checkpoint
methods and add tests.
src/cmd/builtin/startlog.c
Outdated
/* version 0 checkpoint | ||
* - blobref string only | ||
* version 1 checkpoint object | ||
* - {"version":1 "rootref":s "timestamp":f} | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Drop comment that's removed by a later commit.
src/cmd/builtin/startlog.c
Outdated
errno = ENOMEM; | ||
log_err_exit ("Error encoding checkpoint object"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the ENOMEM is just a guess anyway, suggest skipping it and just calling log_msg_exit()
.
if (size > 0) { | ||
/* recovery from version 0 checkpoint blobref not supported */ | ||
if (!(o = json_loads (data, 0, NULL))) { | ||
errno = EINVAL; | ||
goto error; | ||
} | ||
} | ||
else | ||
o = json_null (); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not thinking of a use case for translating empty value to json_null()
so maybe that check is not needed and the empty string should just be passed to json_loads()
, which will then fail.
Speaking of, we have a size for the data, so it might be good to use json_loadb()
instead.
Also, might as well pass it a json_error_t
and then set errstr
to error.text
before the goto error
.
if (!(value = json_dumps (o, JSON_COMPACT))) { | ||
errno = EINVAL; | ||
goto error; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might want to set errstr = "failed to encode checkpoint value";
here.
if (!(value = json_dumps (o, JSON_COMPACT))) { | ||
errno = EINVAL; | ||
goto error; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might want to set errstr = "failed to encode checkpoint value"; here.
src/modules/kvs/kvs.c
Outdated
/* Synchronously get checkpoint databy key from checkpoint service. | ||
* Copy rootref buf with '\0' termination. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/databy/data by/
@@ -0,0 +1,33 @@ | |||
/************************************************************\ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since there isn't a conflicting public version of this header like say message.h
in libflux, maybe this can just be called kvs_checkpoint.h
?
src/common/libkvs/kvs_checkpoint.c
Outdated
flux_future_t *kvs_checkpoint_update (flux_t *h, | ||
const char *key, | ||
const char *rootref) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would kvs_checkpoint_commit()
sound slightly better? I dunno, just a thought.
@@ -87,7 +87,7 @@ int filedb_put (const char *dbpath, | |||
*errstr = "key name too long for internal buffer"; | |||
return -1; | |||
} | |||
if ((fd = open (path, O_WRONLY | O_CREAT, 0666)) < 0) | |||
if ((fd = open (path, O_WRONLY | O_CREAT | O_TRUNC, 0666)) < 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good find, this is just a straight up bug (and my bad).
Hmmm, not too easily, as I'd be changing a lot of code, sort of starting from scratch. Unfortunately started this before your |
Re-pushed addressing all of the comments above. Went ahead and squashed since everything is tiny changes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for all the changes, looking good!
I just had a couple more suggestions for cleanup, but they're fairly trivial so I'll tentatively mark approved.
src/common/libkvs/kvs_checkpoint.c
Outdated
/* version 0 checkpoint | ||
* - rootref string only | ||
* version 1 checkpoint object | ||
* - {"version":1 "rootref":s "timestamp":f} | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the commit function only ever writes version 1, comment can be deleted.
(Anyway, from here, the version 0 checkpoint is also an object, just missing timestamp)
src/common/libkvs/kvs_checkpoint.c
Outdated
if (!(o = json_pack ("{s:i s:s s:f}", | ||
"version", 1, | ||
"rootref", rootref, | ||
"timestamp", timestamp))) { | ||
errno = ENOMEM; | ||
goto error; | ||
} | ||
|
||
if (!(f = flux_rpc_pack (h, | ||
"kvs-checkpoint.put", | ||
0, | ||
0, | ||
"{s:s s:O}", | ||
"key", | ||
key, | ||
"value", | ||
o))) | ||
goto error; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need for the separate json_pack()
. Just build the whole payload in flux_rpc_pack()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh duh, that's obvious now. I think I could do in a few other places too.
src/common/libkvs/kvs_checkpoint.c
Outdated
if (flux_rpc_get_unpack (f, "{s:o}", "value", &o) < 0) | ||
return -1; | ||
|
||
/* N.B. no need to check version, all versions support rootref */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should actually check the version here (and generally, that the checkpoint object is well formed), since the back ends are only checking that there is a valid json object?
Also: combine unpacks (see comment in get_formatted_timestamp()
)
src/common/libkvs/kvs_checkpoint.c
Outdated
if (flux_rpc_get_unpack (f, "{s:o}", "value", &o) < 0) | ||
return -1; | ||
|
||
if (json_unpack (o, "{s:i}", "version", &version) < 0) { | ||
errno = EINVAL; | ||
return -1; | ||
} | ||
|
||
if (version == 0) { | ||
snprintf (buf, len, "N/A"); | ||
} | ||
else if (version == 1) { | ||
double timestamp; | ||
time_t sec; | ||
struct tm tm; | ||
|
||
if (json_unpack (o, "{s:f}", "timestamp", ×tamp) < 0) { | ||
errno = EINVAL; | ||
return -1; | ||
} | ||
sec = timestamp; | ||
gmtime_r (&sec, &tm); | ||
strftime (buf, len, "%FT%T", &tm); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: combine the three unpacks into one, e.g.
if (flux_rpc_get_unpack (o, "{s:{s:i s:s s?f}}", ...) < 0)
return -1;
if (version != 0 && version != 1) {
errno = EINVAL;
return -1;
}
400d7da
to
34eaf92
Compare
re-pushed addressing the comments above. @garlick wanna do one more quick skim before I set MWP? |
hmmm, hit one build error that i'm 99% sure is not releated to this PR. restarted builders.
|
We've been seeing that one lately: #4097 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! This time just a couple of minor things. If ci isn't complaining, no biggie I guess.
src/common/libkvs/kvs_checkpoint.c
Outdated
int version; | ||
double timestamp; | ||
|
||
if (!f || !buf || len <= 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think len can be less than zero as size_t
is unsigned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, actually, probably the right thing is to ignore len
since zero isn't the only value that's too short. Instead, check the return value of snprintf()
and strftime()
to catch any overflow.
src/common/libkvs/kvs_checkpoint.c
Outdated
size_t len) | ||
{ | ||
int version; | ||
double timestamp; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would initialize timestamp to 0. for good measure since it's conditionally assigned. It's also conditionally used, but matching those two conditions up might be too much to ask of some compilers.
34eaf92
to
9ac82ef
Compare
just re-pushed, just cleaned up the two things @garlick noted above
|
Ready for MWP on this one? |
yup! thanks for review. |
Problem: When calling flux_respond_pack(), an unnecessary parameter is passed to the variable argument function. Solution: Remove the unnecessary argument.
Problem: In the t0018-content-files.t and t0024-content-s3.t tests, a test output file was used twice, thus overwriting the output from a prior test. Solution: Rename the filenames to be unique.
Problem: Tests in t2010-kvs-snapshot-restore.t largely duplicated tests in t0012-content-sqlite.t. Remove the duplicate tests and test files.
Problem: The kvs-checkpoint.put callback checks if the key input by the user is non-empty. This check is unnecessary. Remove the check.
Problem: the kvs-checkpoint.get and kvs-checkpoint.put methods provided by content back end modules operate on (key, blobref) tuples, but we would like to store other metadata with checkpoints, such as a date, and not have to change the database schema in every back end whenever the value format changes. Solution: change the stored value to a json object that contains a blobref, a date, and a version=1. The back ends just passively shuttle the json object without interpretation, so they should not need to be changed for updates going forward. There is one exception this time: to avoid losing data on system instance installations when rolling out this change, content-sqlite translates the old format to a json object with version=0. Update users of the kvs-checkpoint methods and add tests. Fixes flux-framework#4144
Problem: Some kvs checkpointing operations are duplicated. It would be convenient if there were common functions for it. Solution: Add common checkpointing operations into a new kvs_checkpoint api in libkvs. Keep the API private for now. Add unit tests. Fixes flux-framework#4145
Problem: startlog code can be simplified by using kvs checkpointing helper functions. Solution: Use kvs_checkpoint_update() to simplify kvs checkpoint code.
Problem: kvs code can be simplified by using kvs checkpointing helper functions. Solution: Use kvs checkpointing helper functions when handling kvs primary checkpoint.
Problem: In content-files, most files are created using a hash of the data. So in most cases the filenames are unique. However, the kvs-checkpoint.put rpc will typically write to the same file over and over again. In the event the file already exists, content-files did not clear the data in a file before writing. This can lead to corrupted data if the user wrote fewer bytes to the file than before. Solution: Call open() with O_TRUNC to truncate the file before writing new data. Add tests increasing / decreasing checkpoint data size.
9ac82ef
to
43bc1db
Compare
hmm, builder failed with
All tests passed in |
@Mergifyio refresh |
✅ Pull request refreshed |
Problem: We would like to checkpoint more complex json data structures to the content backing service, but since the content checkpoint services only take string values, it is inconvenient to have to encode/decode all json objects into/from strings.
Solution: Update the checkpoint protocol to instead take / send a json object. Support backwards compatibility by sending a json string when the data stored is not valid json. Update all callers accordingly and add additional tests.
Fixes #4144
Couple of side notes:
no new tests via s3 content backing, as I'm not entirely sure how to test at the moment. Perhaps there's some notes somewhere I'm not looking?
i pondered updating
value
todata
everywhere, b/cvalue
suggests just a string. But thesqlite
database usesvalue
as the column header, so I kept it as is.