-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kvs: support ability to "revert" / "back out" merged commits #1346
Conversation
This sounds like a clean approach! Better to get terminology right in the commit log, if possible. Maybe say "fall back to indiivdual commits if merged commit fails"? Or "upon failure of merged commit, retry the original commits individually"? |
You know what, I like the idea of using "fallback". I think it is far more clearer than "revert" or "unmerge". I'll tweak the commit log, update the variable names, and re-push. |
f680b56
to
3fb1fd7
Compare
re-pushed using the wording "fallback" instead of "revert" everywhere. Went ahead and squashed it since it was only a change in the last commit. |
Codecov Report
@@ Coverage Diff @@
## master #1346 +/- ##
==========================================
+ Coverage 78.47% 78.51% +0.03%
==========================================
Files 162 162
Lines 29689 29739 +50
==========================================
+ Hits 23298 23349 +51
+ Misses 6391 6390 -1
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does continuing to merge after a commit has begun processing prevent fallback on error?
Would be good to explain this in the commit message so that we aren't tempted to add it back later for performance if it would break something.
(oops meant to add that as a single review comment on the first commit) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some comments, mainly suggestions for improving commit messages.
src/modules/kvs/commit.c
Outdated
@@ -1222,8 +1222,7 @@ static int commit_merge (commit_t *dest, commit_t *src) | |||
/* Merge ready commits that are mergeable, where merging consists of | |||
* popping the "donor" commit off the ready list, and appending its | |||
* ops to the top commit. The top commit can be appended to if it | |||
* hasn't started, or is still building the rootcpy, e.g. stalled | |||
* walking the namespace. | |||
* hasn't started. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: add a note here on why only COMMIT_STATE_INIT is mergeable.
Commit message summary should be more descriptive, e.g. "modules/kvs: only merge commit in INIT state" or similar.
src/modules/kvs/commit.c
Outdated
@@ -1219,10 +1219,46 @@ static int commit_merge (commit_t *dest, commit_t *src) | |||
return -1; | |||
} | |||
|
|||
static commit_t *commit_create_empty (commit_mgr_t *cm) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: change commit message to "modules/kvs: merge to new empty commit".
Suggestion: restructure to avoid repetition, e.g.
if (!(cnew = calloc ()))
goto error_nomem;
if (!(cnew->ops = json_array()))
goto error_nomem;
...
error_nomem:
commit_destroy (cnew);
errno = ENOMEM;
return NULL;
...
src/modules/kvs/commit.c
Outdated
@@ -1164,59 +1164,34 @@ int commit_mgr_ready_commit_count (commit_mgr_t *cm) | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what the "atomic" comment refers to. This appears to just be about how this function cleans up on error?
Better commit summary message would be useful, as well as expanded commit main message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, perhaps "atomic" is the wrong word. In the past, data structures were modified on the fly as merging occurred. Any error would lead to exit()
. As we refactored exit()
away and returned errors, we couldn't return half modified data structures. So a number of functions were modified to be "atomic", where the data structure was successfully modified completely or not at all. Maybe there's a better word than "atomic" here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, well I would say in the context of processing transactions/commits, the "atomic" term is a bit overloaded :-) Maybe just "fully clean up on error"?
src/modules/kvs/commit.c
Outdated
@@ -1253,6 +1253,7 @@ int commit_mgr_merge_ready_commits (commit_mgr_t *cm) | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe it would be clearer to make the commit summary "modules/kvs: don't modify ready queue on error"?
src/modules/kvs/commit.c
Outdated
@@ -45,7 +45,9 @@ | |||
|
|||
#define FENCE_READY_MASK 0x01 | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe "modules/kvs: preserve orig commits during merge"?
src/modules/kvs/commit.c
Outdated
@@ -163,6 +163,13 @@ int commit_set_aux_errnum (commit_t *c, int errnum) | |||
return c->aux_errnum; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: modules/kvs: try commits individually if merged commit fails
"core KVS file" could be KVS main or kvs.c.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or "try orig commit if merged comit fails"
Ahh, I should change that commit log message. It's no longer true (was from a prior attempt). I left this in for another reason. Now that we are no longer removing commits from the queue, I'd have to regularly scan the ready queue to see if there are new things to merge. That or keep a pointer to the "last merge" point. Hmmm. I suppose this would be doable, it's just a single pointer. Let me think about this. |
Fix forgotten change to function name.
In kvstxn_mgr_merge_ready_transactions(), instead of merging transactions into the current head ready transaction, create a new empty transaction and merge contents into it. Then push that new transaction onto the head of the ready list. Requires users to call kvstxn_mgr_get_ready_commit() after the merge to get the new head.
With recent changes, kvstxn_merge() no longer needs to be fully cleaned up on error. An error code can be returned to the caller kvstxn_mgr_merge_ready_commits(), which will handle full cleanup.
When merging transactions, also ensure flags are identical.
Alter logic in kvstxn_mgr_merge_ready_transactions(), so that on error, no modifications to the kvstxn ready queue occur.
Add internal checks that ensures only kvstxn's that are ready for processing are passed to processing functions. Add unit tests appropriately.
Do not destroy transactions after they have been merged. Instead flag them as components of a larger merge. When the kvstxn of a set of merged transactions completes/is removed, at that point in time remove all of the components of the larger merge. As a consequence of this change and for optimization purposes, once a merger of transactions has occured, there can no longer be any more mergers until the head merged transaction has completed. If this were not done, the ready queue would constantly be iterted through and new head merged transactions would be created. This can be optimized at a later time. Add unit tests.
In kvstxn_mgr_remove_transaction() support flag for user to fallback a merged kvstxn to the original transactions that made up the merge. By doing so, the user need not send an error to all transactions merged into that kvstxn. Instead, each of the original transactions can be replayed individually, and an error will only be sent to the offending commit/fence transaction. Support kvstxn_fallback_mergeable() so user knows if a kvstxn can be falled back on. In kvstxn_apply(), take advantage of this by not sending an error when a kvstxn's merging can be falled back on. As an exception, do not fallback if it's a "death"-like error (e.g. ENOMEM). Add internal kvstxn API unit tests. Fixes flux-framework#1337
Just re-pushed with updated patches based on current master. Discounting the renaming of data structures/variables/names/etc., changes are largely the same. I did decide to squash some patches into other ones. The one notable difference is I removed my prior change where merges can only occur for transactions in state KVSTXN_STATE_INIT. I instead do not allow merges if a merge as already occurred. The net affect is identical, more clear, and does protect against a corner case where the user calls the merge function multiple times. I put in the commit message why I did this and note that the reason for doing this could be optimized in the future. I may try and optimize before this PR is merged. Gonna think about it a bit, but didn't want that to be the hold up for pushing/merging this PR. |
and two soak runs to compare master
this branch
little surprised the mean is faster. Perhaps just a lucky run. Or atleast ballpark similar performance. |
Does this limit merging to 2:1? |
It limits merging to whatever was in the ready queue at the time of the merge. May it be 2 transactions or a bajillion transactions. |
I'll need to see a test case added for that. |
OK, I thought I must have misunderstood you there. Good! Ready for merge? |
yup, and I'll write up an issue for the optimization of merges. Already working on an idea. |
As described in #1337, a failure in a merged commit should not fail all transactions that make up that commit. In order to accomplish this, this is what I did:
When merging, do not merge ops/names into an existing
commit_t
, instead create a newcommit_t
and merge into that data structure.Flag that the new
commit_t
as a collection of merges and that the other ones are components of a merge. Leave all the componentcommit_t
's on the ready queue as they were.Push this new merged
commit_t
onto the ready queue and use it.If the new merged
commit_t
succeeds, when it is done remove it and all the "components" of the merge. If the new mergedcommit_t
fails, remove it from the ready queue, and leave all of the "components" there for processing later. Flag them as non-mergeable going forward and let processing continue as normal.Ran soak tests over 1000 jobs, and overall performance is < 1% slower. Understandable that there is slowdown as there are additional allocations and what not.
Before
After
(I should say, the "before" is before PR #1343, the refactor done right before this PR.)
Note, I used the word "revert" in the documentation and code to describe a failed merged commit becoming "un-merged". I'm not 100% happy with the use of this term to describe what's going on, but
couldn't think of a better word. Both it and "backout" give the impression that the commit completed and you want to "revert" or "backout" of it. "unmerge" seems to give the wrong impression
(i.e. it's not mergeable). Perhaps this is something that should be solved with #1344, as once the data structure names are changed, a more obvious choice of language would emerge.
Also, no unit tests at the moment for the kvs module, only some unit tests for the internal commit API. Outside of pure instrumentation, impossible to test as the commit merging is racy. It would be easier once #1341 is implemented, as a stress test across namespaces could be done.