-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kvs: add defensive checkpoint via sync configuration option #4383
kvs: add defensive checkpoint via sync configuration option #4383
Conversation
On a quick skim, it looks like the timer wakes up even when there has been no kvs activity, and also that it commits actual data to the KVS each time it syncs. I don't think either of those things are desirable or necessary. Maybe a timer is the wrong approach - could the commit logic check how long it's been since the last sync and just do one if periodic sync is configured AND it's been long enough? |
Yup, it does wake up if there hasn't been any activity, but it maintains the last sequence number of the root so that it won't do another commit if unnecessary.
Hmmm, that's an interesting idea. Perhaps if there hasn't been a checkpoint in awhile, just add |
It's probably a non starter to have the code periodically commit something, especially something different each time since that contributes to content.sqlite growth. |
Ahhh that's a good point. Can definitely write the same thing over and over again instead.
As I was trying to experiment with this idea, it occurred to me that adding a FLUX_KVS_SYNC to a kvs transaction probably isn't a good idea. We wouldn't want an error on content.flush or checkpoint to lead to a commit error, especially for a KVS transaction the kvs module doesn't "own". We could add "back out" logic if the sync fails, but things start to get complicated. Leading to the idea, perhaps using the timer technique but have it just do a content.flush and checkpoint, independent of the commit logic. However, I realized in a prior prototype that that would be racy with anyone using the FLUX_KVS_SYNC flag. Thus, by doing a commit w/ the FLUX_KVS_SYNC flag, all commits aren't racy against other attempts to sync. I'll continue to ponder, maybe there's another idea I'm not thinking about yet. |
If a sync fails, something is really going wrong with the system or the backing store, and KVS data is in jeopardy with no real recovery options, so it probably would be fine IMHO to fail a commit in that case. |
Hmmm, here's an idea. What if we implemented a hypothetical FLUX_KVS_SYNC_NO_COMMIT flag (this flag could potentially be internal to KVS only). It's a KVS transaction that has no operations, it just performs the flush & checkpoint. I don't think it'd be too hard b/c it'd just be a tweak to the If we keep the timer approach, not that different than what is in this PR once above is implemented. If we go with the "check how recent last commit was an if it's > sync time ago", we don't have to alter the current transaction. Just add a new KVS transaction. FWIW, I'm currently less than enthused with the latter approach. Would add a |
Oh, duh - the way I proposed (not using a timer) could leave dirty data unsynced longer the configured time if another commit doesn't come along. Sorry about that, your intuition was right that a timer is the way to go! The spurious wakeup can be avoided by not starting the timer watcher until a commit comes in. E.g. each commit checks if timer is active, and if not , sets the time and starts it. Timer expires, sync happens, and the system could be completely idle until another commit comes in and starts the timer again. You're the expert on the best way to do this inside the KVS, so take this with a grain of salt but instead of a flag could we just have the timer post a regular commit with no ops (empty txn) and the FLUX_KVS_SYNC flag, and then just ensure that the state machine does the right thing with that if it doesn't already? It seems like it would be nice to allow that anyway, from outside the KVS module. |
I was just thinking about this. For some reason, I felt that a txn must not be empty. But if it is empty, that should totally be allowed. Then this would be easy and I can perhaps add some "state machine jumps" to skip states that are no longer important. |
So what do we think the average "defensive checkpoint' time length is going to be? B/c I'd have to add a IMO, the extra checking on every commit is a bit much, although I didn't do any testing / measurement yet. |
Could you just do (on every commit): if (!timer_flag) {
flux_timer_watcher_reset (w, sync_time, 0.);
flux_timer_watcher_start (w);
timer_flag = true;
} Then in the timer callback, start the sync and set So you'd get approximately a sync every If the sync time is less than some threshold we could always bump it up and document this. Say 1m? |
You mean not do the checks I was referring to above (or just not do the |
Oh yes, sorry, those checks would be needed. Commits arriving together are normally combined so maybe those checks would effectively be rate limited to the batch timeout? Can't imagine they would amount to much overhead in any case, but famous last words! |
ae3fd87
to
826306a
Compare
re-pushed, per discussion above went with the "no ops transaction" method. Added a test for "no ops" b/c it wasn't actually covered, then I added an optimization to skip several parts of the transaction state machine when there are no ops. Then started the sync timer depended on if a commit just happened and stopped it after a recent sync completed and if there are no more commits in the pipeline. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice - a few minor comments for you.
src/modules/kvs/kvs.c
Outdated
if (kvstxn_mgr_ready_transaction_count (root->ktm) == 0) { | ||
flux_watcher_stop (ctx->sync_w); | ||
ctx->sync_timer_running = false; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be clearer to use a one-shot timer, and start it after a commit completes instead of when one is queued? Then you could just set the running flag false here unconditionally and avoid the kvsroot check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmmm, ok, I see your point, instead of the root check here and the queue-ing checks everywhere else, effectively do some timer math to see if the timer should restarted after every commit. That might be simpler, lemme give it a try.
As an aside, I realized I do have a corner case, in that the current code guarantees to checkpoint within "sync time" after the first commit (or if the KVS has been non-busy, the first commit in awhile in which the timer is woken up), which is probably not correct. It should be "sync time" after the KVS started (or sync immediately if there hasn't been a sync in > "sync time").
Also, it occurred to me, to remove the wasteful "strcmp" on every namespace string, I could just add a special flag in the kvsroot
struct to indicate "i am the primary namespace". I think that'd be useful.
src/modules/kvs/kvs.c
Outdated
|
||
if (flux_rpc_get (f, NULL) < 0) | ||
flux_log_error (ctx->h, "%s: flux_rpc_get", __FUNCTION__); | ||
flux_future_destroy (ctx->sync_f); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might want to use just f
in the destroy to be consistent with the flux_rpc_get()
call.
src/modules/kvs/kvs.c
Outdated
|
||
/* N.B. no actual changes in this transaction */ | ||
if (!(txn = flux_kvs_txn_create ())) { | ||
flux_log_error (ctx->h, "%s: flux_kvs_txn_create", __FUNCTION__); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The log messages in this function are all equally unlikely and don't provide much info. Maybe all the failure cases could use one better message like "periodic checkpoint setup failed: ..."
826306a
to
17f2b60
Compare
re-pushed
|
src/modules/kvs/kvs.c
Outdated
if (t < ctx->sync) | ||
after = ctx->sync - t; | ||
/* else we haven't synced in a long time, so set timer | ||
* to set off immediately (i.e. after = 0.0) | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems a bit weird to force a sync on the first commit after the system has been idle, when this commit is likely to be immediately followed by others if it 's a job (the common case). Maybe we could describe the sync parameter as "the maximum length of time dirty KVS data will remain in memory before being flushed to the backing store" and then just set the timer to the fixed configured value?
That seems more intuitive to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the comment perhaps misleading? I initialize last_sync
to when the module was loaded, so the first sync should occur "sync period" time after the first commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I'm confused, or maybe I wasn't clear - I thought t
was tracking the time since the last sync and would shorten the timer according to how long it's been since the last sync. So for example the sync period is 30m and the system is idle for a day and then someone runs a job, the first commit would trigger a sync. But since nothing else has been committed, there really isn't any increased urgency to do the sync right then.
If instead we think about the sync time period as placing an upper bound on how long after a change we wait to checkpoint it, then we would just start the timer with a fixed duration when ever something changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, i see your point now. So instead of "checkpoint every X time", you're thinking it should be "checkpoint if anything older than X time"
17f2b60
to
0c58cae
Compare
re-pushed, tweaking sync logic per @garlick comments above. Tweaked KVS manpage language as a result. As fixed up some manpage generation fail that I messed up. |
src/modules/kvs/kvs.c
Outdated
flux_log_error (ctx->h, "%s: flux_rpc_get", __FUNCTION__); | ||
flux_future_destroy (f); | ||
ctx->sync_f = NULL; | ||
ctx->sync_started = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be set false in sync_cb()
? If another commit completes before an in-progress sync does, then it wouldn't start the timer again, and could sit there until another commit comes along.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so. The timer could be restarted before the last sync completes. Here's the way I think about it. Here's the transaction queue where a bunch of commits come in and are put on the queue.
putA -> putB -> putC -> NULL
putA
finishes, leading to the sync timer to be started. At some point in the future the sync_cb()
will be called and add a transaction to the end of the queue to do the sync. So perhaps its looks like this at that point (putB
has completed before the sync
was added):
putC -> putD -> putE -> sync -> NULL
if we set sync_started
to false in sync_cb()
, that means the timer can be restarted after putC
completes, which is not what we want. We'd want the sync timer to be restarted on a putF
that will come later after the sync.
But your comment does make me think, is it possible a putF
could complete before the sync_started
flag is set to false in the continuation? The odds are low, but I'm not sure if it could be guaranteed. There's a lot of reactor stuff going on in the background.
I suppose the safe thing is sync_started
would maybe have to be set to false when the sync is simply completed, perhaps via some callback mechanism.
These corner cases are tough to think about wrap around, and I definitely had to ponder them a bit. Is it worth considering going back to something closer to the initial implementation I proposed? Where its simply a timer that sets off every X time? It largely avoids thinking about a lot of these corner cases.
I think it is safe as long as 1) timer is running or started when each
commit finishes, and 2) timer is stopped when sync is started. When a sync
starts it is capturing the root ref at that moment, so it needs to be
possible for a commit that occurs right after that to start the timer again.
Does that make sense? Sorry all I have is my phone this weekend and so I
might be missing some detail I would catch if I could review code....
…On Fri, Jul 1, 2022, 9:22 PM Al Chu ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In src/modules/kvs/kvs.c
<#4383 (comment)>
:
> @@ -1223,6 +1243,47 @@ static int heartbeat_root_cb (struct kvsroot *root, void *arg)
return 0;
}
+static void sync_continuation (flux_future_t *f, void *arg)
+{
+ struct kvs_ctx *ctx = arg;
+
+ if (flux_rpc_get (f, NULL) < 0)
+ flux_log_error (ctx->h, "%s: flux_rpc_get", __FUNCTION__);
+ flux_future_destroy (f);
+ ctx->sync_f = NULL;
+ ctx->sync_started = false;
I don't think so. The timer could be restarted before the last sync
completes. Here's the way I think about it. Here's the transaction queue
where a bunch of commits come in and are put on the queue.
putA -> putB -> putC -> NULL
putA finishes, leading to the sync timer to be started. At some point in
the future the sync_cb() will be called and add a transaction to the end
of the queue to do the sync. So perhaps its looks like this at that point (
putB has completed before the sync was added):
putC -> putD -> putE -> sync -> NULL
if we set sync_started to false in sync_cb(), that means the timer can be
restarted after putC completes, which is not what we want. We'd want the
sync timer to be restarted on a putF that will come later after the sync.
But your comment does make me think, is it possible a putF could complete
before the sync_started flag is set to false in the continuation? The
odds are low, but I'm not sure if it could be guaranteed. There's a lot of
reactor stuff going on in the background.
I suppose the safe thing is sync_started would maybe have to be set to
false when the sync is simply completed, perhaps via some callback
mechanism.
These corner cases are tough to think about wrap around, and I definitely
had to ponder them a bit. Is it worth considering going back to something
closer to the initial implementation I proposed? Where its simply a timer
that sets off every X time? It largely avoids thinking about a lot of these
corner cases.
—
Reply to this email directly, view it on GitHub
<#4383 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AABJPW5SRJH5C5TTAZYYUIDVR67ZLANCNFSM52EKWOXA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
38cde07
to
8ff0952
Compare
re-pushed, missed a "&&" chain in a test |
src/modules/kvs/kvs.c
Outdated
@@ -1053,6 +1072,7 @@ static void kvstxn_apply (kvstxn_t *kt) | |||
} | |||
setroot (ctx, root, kvstxn_get_newroot_ref (kt), root->seq + 1); | |||
setroot_event_send (ctx, root, names, kvstxn_get_keys (kt)); | |||
start_sync_timer (ctx, root); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By chance is the timer re-armed here by the sync (empty commit) itself? I'm noting that even when the system should be idle I'm getting repeated syncs as though the timer were periodic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm that makes sense, although I'm not sure how this isn't being caught by tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ahhh, my tests only ensure that the rootref hasn't changed, which doesn't mean syncs aren't attempted. I assume you're putting in printfs or something and notice the sync callback being called more times than it should?
We need another flag to ensure this doesn't happen, one for the sync timer and one for a sync being active.
But b/c of the racy thing I mentioned above, (a commit could complete before the sync_continuation()
is called), I think this would involve more hackery than I'd like. Lets talk on the coffee time about this.
This has certainly exceeded the complexity I originally imagined this PR to be, I'm not sure if you had other ideas as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. No coffee time today though (it's tuesday). Did you want to chat now? LMK when you're available.
They only race I can imagine is maybe if the sync commit took longer than a concurrent timer.
I didn't entirely understand how the empty commit is handled, but it would seem like it must be publishing the root if it's also starting the timer since that's right next to it in the code? It probably shouldn't be doing that either, so maybe it needs to short circuit out of there earlier?
A side note, @garlick and I chatted and we realized perhaps the problem is that we are using FLUX_KVS_SYNC here. Effectively, Instead, will make an attempt to try and implement as a "feature" within the KVS transaction manager. If that fails, we may fall back and just do a normal cron-like-timer. |
8ff0952
to
82723d5
Compare
Re-pushed, altering the logic of how the syncs are done.
This was important so we don't have to deal with the complexity from before of "what if there's a sync transaction already on the transaction queue". It's now the next transaction all the time, keeping this simpler. We also add
|
fe8e64e
to
239a65b
Compare
rebased, re-pushed, squashed, fixing up a few of the things mentioned by @garlick above |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here are a few comments.
I'm not sure if it's the right thing but I'll throw it out there: would it make sense to create a sync.c source file for this stuff, just to keep it contained?
src/modules/kvs/kvstxn.c
Outdated
@@ -37,13 +37,15 @@ | |||
#define KVSTXN_PROCESSING 0x01 | |||
#define KVSTXN_MERGED 0x02 /* kvstxn is a merger of transactions */ | |||
#define KVSTXN_MERGE_COMPONENT 0x04 /* kvstxn is member of a merger */ | |||
#define KVSTXN_SYNC 0x08 /* is special kvstxn sync */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be helpful if this flag was named something like KVSTXN_SYNC_ONLY or KVSTXN_SPECIAL_SYNC or something like that, so it won't look like an internal version of FLUX_KVS_SYNC.
src/modules/kvs/kvs.c
Outdated
const char *errstr = "Failed to reconfigure kvs"; | ||
char errbuf[256]; | ||
if (flux_conf_reload_decode (msg, &conf) < 0) | ||
goto error; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest you just set errstr to NULL for reload_decode failure, since that failure is unlikely and "Failed to reconfigure kvs" would be a misleading error message if it ever did fail.
src/modules/kvs/kvs.c
Outdated
snprintf (errbuf, | ||
errbufsize, | ||
"error reading config for kvs: %s", | ||
err.text); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These days we have flux_error_t
and errprintf()
. Suggest you use those here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ahhh i happened to copy & paste from resource
, perhaps i should update there too while I'm at it.
src/modules/kvs/kvs.c
Outdated
/* If this transaction is the special internal sync | ||
* transaction, no need to setroot or send event, we just | ||
* checkpointed, so do nothing else */ | ||
if (!kvstxn_is_sync (kt)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might want to flip the two legs of the if here to match the comment for clarity.
src/modules/kvs/kvs.c
Outdated
/* setup timer if it wasn't done before */ | ||
if (sync > 0.0 && !ctx->sync_w) { | ||
flux_reactor_t *r = flux_get_reactor (ctx->h); | ||
if (!(ctx->sync_w = flux_timer_watcher_create (r, | ||
sync, | ||
0.0, | ||
sync_cb, | ||
ctx))) { | ||
snprintf (errbuf, | ||
errbufsize, | ||
"error creating sync timer: %s", | ||
flux_strerror (errno)); | ||
return -1; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may be wise to create the timer watcher unconditionally in the kvs initialization so the module fails to load on failure, instead of needing to handle that error in the reconfig path when the system is running (and sending the error back to flux config reload
).
src/modules/kvs/kvs.c
Outdated
if (ctx->sync_timer_started) { | ||
if (sync > 0.0) { | ||
flux_watcher_stop (ctx->sync_w); | ||
flux_timer_watcher_reset (ctx->sync_w, sync, 0.); | ||
flux_watcher_start (ctx->sync_w); | ||
} | ||
else { /* sync == 0.0 */ | ||
flux_watcher_stop (ctx->sync_w); | ||
ctx->sync_timer_started = false; | ||
} | ||
} | ||
else { | ||
if (sync > 0.0 | ||
&& ctx->sync == 0.0 | ||
&& ctx->primary_commit_since_last_sync) { | ||
/* on rank 0, primary namespace has to be here */ | ||
struct kvsroot *root; | ||
root = kvsroot_mgr_lookup_root (ctx->krm, | ||
KVS_PRIMARY_NAMESPACE); | ||
assert (root); | ||
run_sync_timer (ctx); | ||
} | ||
} | ||
} | ||
|
||
ctx->sync = sync; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this be expressed more simply, and without the need for the big block comment, as
ctx->sync = sync;
flux_watcher_stop (ctx->sync_w);
ctx->sync_timer_started = false;
if (ctx->primary_commit_since_last_sync)
start_sync_timer (ctx);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure I can simplify it that much, but you're right that it could be simplified. The above did have some dead code (kvsroot_mgr_lookup_root()
looks up something that is not used). If I used start_sync_timer_if_needed()
, I'd have to do the lookup.
I did clean it up to the following.
if (ctx->rank == 0 && sync != ctx->sync) {
ctx->sync = sync;
if (!ctx->sync_txn_submitted) {
if (ctx->sync_timer_started) {
flux_watcher_stop (ctx->sync_w);
if (ctx->sync > 0.0) {
flux_timer_watcher_reset (ctx->sync_w, ctx->sync, 0.);
flux_watcher_start (ctx->sync_w);
}
else /* ctx->sync == 0.0 */
ctx->sync_timer_started = false;
}
else {
if (ctx->primary_commit_since_last_sync)
start_sync_timer (ctx); /* N.B. this was previously run_sync_timer() */
}
}
}
src/modules/kvs/kvstxn.c
Outdated
char *name = NULL; | ||
|
||
/* N.B. we "fake" prepend to the ready list by storing the sync | ||
* transactino in a variable in the kvstxn_mgr_t. If we really |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo "transactino"
src/modules/kvs/kvstxn.c
Outdated
|
||
error: | ||
kvstxn_destroy (kt); | ||
free (name); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
free()
might clobber errno, and flux_log_error()
is called when this function fails.
src/modules/kvs/kvs.c
Outdated
double sync; /* in seconds */ | ||
flux_watcher_t *sync_w; | ||
bool sync_timer_started; | ||
bool sync_txn_submitted; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may add unnecessary complexity to have the sync_txn_submitted
flag when you could just allow another sync to be submitted when one is in progress. It would do no harm, right?
Well I suppose if the sync time was very short it could go badly. Regardless, maybe it would be a good idea to have a lower bound on the configured sync time? Like say 1m?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ahh good catch, i don't allow two "sync prepends" to the kvstxn queue, so it literally would do nothing if a second were attempted to be added by the user.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh wait ... this was there to catch a corner case. a "sync prepend" may occur while a commit is in progress, so the checkoutpoint would occur right after the current commit in progress is complete.
If we removed this flag, the sync timer could go off for the commit right before the checkpoint is done. I guess we could allow it? But it seems sort of silly to me to allow this. We'd want the sync timer to start for the commit right after the sync/checkpoint.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm having a hard time following this, and that's not a good indication that someone coming along later will be able to quickly understand it enough to make changes. I'm trying to think what would cordon this feature off from the rest of the complex kvs stuff, so it could be grokked in isolation.
What if we just create a new source module sync.c and each time we complete a transaction on the root namespace, we call a function like sync_setroot()
. Then in that module (without hooking into any other part of the KVS except maybe a configuration callback) we implement the content flush + checkpoint and timer? Maybe sync.c could contain functions that are also used by the FLUX_KVS_SYNC flag handling in the commit code?
Too much of a do-over?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lemme try to split out into another file, see if that will make things clearer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
another thought, would it be worthwhile to go back to the earlier timer based method? Simply wake up every X time and perform a sync using FLUX_KVS_SYNC? So much of the complexity is eliminated, but at the cost of an occasional timer wakeup and the setroot event that goes out every once in awhile.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems wrong to broadcast the setroot event when the KVS content has not changed.
Doing this standalone really shouldn't be that bad - in fact, couldn't it just subscribe to the setroot event and use that to start the timer?
src/modules/kvs/kvs.c
Outdated
ctx->sync_timer_started = true; | ||
} | ||
|
||
static void start_sync_timer (struct kvs_ctx *ctx, struct kvsroot *root) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The names of these two functions are a bit misleading since neither of them "run" the timer. They both only "start" the timer, one conditionally. How about:
start_sync_timer()
start_sync_timer_if_needed()
or similar?
talked to @garlick, we both agree that this has become more complicated than either of as initially imagined. So we're going to backtrack. We're going to go with the timer loop idea that I originally proposed. But we'll create a special internal flag like KVSTXN_NO_PUBLISH or something like that, where it will not publish the setroot event after a transaction has completed. we'll come back to this someday if this ends up being a performance issue, which we both agree is unlikely if the user sticks to sensible checkpoint periods (like 30 minutes), but can be a problem if the period goes down (which is where a lot of the complicated corner cases were coming in). |
239a65b
to
11ba858
Compare
rebased & re-pushed
|
src/modules/kvs/kvs.c
Outdated
@@ -1239,6 +1245,60 @@ static void heartbeat_sync_cb (flux_future_t *f, void *arg) | |||
flux_future_reset (f); | |||
} | |||
|
|||
static void checkpoint_period_cb (flux_reactor_t *r, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason this stuff was not moved to a new source file? It seems like it could be fairly self-contained and kvs.c is already really long.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it seemed just below the threshold for a new file, but it's close. I'll splice out into a new file.
11ba858
to
6c64cb6
Compare
re-pushed, splicing all the checkpoint-period stuff into its own file. There were two tricky parts
I just made it so the checkpoint-period "struct" is set to NULL on rank != 0, and all of the checkpoint-period functions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great @chu11. I think breaking the functionality out to its own source file was a good move. At least for a relative outsider to the code base, it's way easier to grok when it can be isolated.
I made some suggestions but they are all pretty superficial at this point.
Oh I'll set this up on my test system and give it a quick "real life" test.
src/modules/kvs/kvstxn.c
Outdated
@@ -55,7 +55,8 @@ struct kvstxn { | |||
json_t *ops; | |||
json_t *keys; | |||
json_t *names; | |||
int flags; | |||
int flags; /* kvs flags from request caller */ | |||
int api_flags; /* special kvstxn api flags */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure the api_
prefix is communicating that it's an internal flag. Would internal_flags
be better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i wasn't the biggest fan of "api_flags" either. The issue was there is already an "internal_flags" used internally within the kvstxn
api (internal means not externally exposed to users).
perhaps could be called kvstxn_flags
? perhaps the other flags needs to renamed to op_flags
? lemme think about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ahh i had a good idea, the "internal_flags" didn't really have to be bitmasks, they were just there as "bools" for certain conditions. So I replaced them with a bunch of bools. Then api_flags could be "internal_flags" instead.
src/modules/kvs/kvs.c
Outdated
/* ctx->kcp will not be created on rank != 0, all kvs_checkpoint | ||
* functions will do nothing when ctx->kcp is NULL. | ||
*/ | ||
ctx->kcp = kvs_checkpoint_create (h, | ||
NULL, /* set later */ | ||
0.0, /* default 0.0, set later */ | ||
work_queue_check_append_wrapper, | ||
ctx); | ||
if (!ctx->kcp) | ||
goto error; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this code is inside an if (ctx->rank == 0)
block, maybe the comment is not needed here? Might be better to just document in kvs_checkpoint.h
that each function is a no-op if called with a NULL kvs_checkpiont_t
argument to avoid excessive rank 0 tests in the code.
src/modules/kvs/kvs.c
Outdated
@@ -2703,6 +2727,30 @@ static void setroot_unpause_request_cb (flux_t *h, flux_msg_handler_t *mh, | |||
flux_log_error (h, "%s: flux_respond_error", __FUNCTION__); | |||
} | |||
|
|||
static void reload_cb (flux_t *h, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe naming this config_reload_cb()
would be better in such a large code base as the kvs?
src/modules/kvs/kvs_checkpoint.c
Outdated
if (kcp->last_checkpoint_seq == kcp->root_primary->seq) | ||
return; | ||
|
||
if (asprintf (&name, "checkpoint-period.%u", kcp->root_primary->seq) < 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would probably be better to use a static buffer instead of asprintf()
here since the string is of predictable length.
src/modules/kvs/kvs_checkpoint.c
Outdated
free (name); | ||
json_decref (ops); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clobbers errno
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does it matter in this case? the function returns void.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh sorry, I thought I was following a different call stack. This is the timer watcher callback (duh).
Given that the function cannot return an error, maybe it would make sense to rename the goto label to "done".
src/modules/kvs/kvs_checkpoint.h
Outdated
kvs_checkpoint_txn_cb txn_cb, | ||
void *txn_cb_arg); | ||
|
||
/* will update internal checkpoint_period setting as needed */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can drop "will" in all the comments in this header file.
t/t1011-kvs-checkpoint-period.t
Outdated
cat >kvs.toml <<EOF && | ||
[kvs] | ||
checkpoint-period = "1Z" | ||
EOF |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: use <<-EOF
and the hereis content can be indented wtih tab characters.
I set my test system up for 30m checkpoints, put something in the kvs, then after an hour ran |
bb2d01a
to
0e32da4
Compare
re-based and re-pushed with fixes given comments above. Perhaps want to give it one more skim @garlick? The big change is making the previously named
and the previously named |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good after a final skim!
Problem: The kvsroot.h header defines the kvsroot struct, which includes a zlist_t member. However, it does not include the appropriate headers to ensure this type is defined. Solution: Add include of czmq_containers.h into kvsroot.h.
Problem: In the kvstxn code, an internal bitmask of flags is used for "internal" purposes. The use of a bitmask for internal purposes can be confused with other flag parameters that may be passed in by users. Solution: Remove the "internal_flags" and replace with bools representing the conditions the internal code cares about.
Problem: In the future we'd like to add some special options to transactions in the kvstxn API. However, nothing in the api currently supports this. Solution: Add a new "internal_flags" parameter to kvstxn_mgr_add_transaction() which will allow users to set special kvstxn flags. Add a new kvstxn_get_internal_flags() to extract those flags from a transaction. Note that no flags are currently supported.
Problem: We would like to mark some transactions to indicate they should not publish their changes after completion. Solution: Add a new internal flag called KVSTXN_INTERNAL_FLAG_NO_PUBLISH. If set, do not call setroot() or setroot_event_send().
Problem: While there are mechanisms to checkpoint the primary KVS namespace via commands (such as the flux kvs --sync option), there is no configurable way to do it at timed intervals. Solution: Add a "checkpoint-period" KVS configuration option. The configuration option takes a fsd time length as input and will checkpoint the present KVS root at each time interval. Fixes flux-framework#4301
Problem: No documentation for the new [kvs] config options exists. Add man page.
0e32da4
to
882a320
Compare
Codecov Report
@@ Coverage Diff @@
## master #4383 +/- ##
=======================================
Coverage 83.32% 83.32%
=======================================
Files 395 395
Lines 66652 66652
=======================================
Hits 55535 55535
Misses 11117 11117 |
Per discussion in #4301, add a
sync = <time>
option to the KVS, so that the primary kvs namespace can be checkpointed on a regular basis.