Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvs: support ability to "revert" / "back out" merged commits #1346

Merged
merged 8 commits into from
Mar 7, 2018

Conversation

chu11
Copy link
Member

@chu11 chu11 commented Feb 22, 2018

As described in #1337, a failure in a merged commit should not fail all transactions that make up that commit. In order to accomplish this, this is what I did:

  1. When merging, do not merge ops/names into an existing commit_t, instead create a new commit_t and merge into that data structure.

  2. Flag that the new commit_t as a collection of merges and that the other ones are components of a merge. Leave all the component commit_t's on the ready queue as they were.

  3. Push this new merged commit_t onto the ready queue and use it.

  4. If the new merged commit_t succeeds, when it is done remove it and all the "components" of the merge. If the new merged commit_t fails, remove it from the ready queue, and leave all of the "components" there for processing later. Flag them as non-mergeable going forward and let processing continue as normal.

Ran soak tests over 1000 jobs, and overall performance is < 1% slower. Understandable that there is slowdown as there are additional allocations and what not.

Before

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 0.2100  0.2300  0.2300  0.2353  0.2400  0.3300 

After

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 0.2100  0.2300  0.2400  0.2376  0.2400  0.3100 

(I should say, the "before" is before PR #1343, the refactor done right before this PR.)

Note, I used the word "revert" in the documentation and code to describe a failed merged commit becoming "un-merged". I'm not 100% happy with the use of this term to describe what's going on, but
couldn't think of a better word. Both it and "backout" give the impression that the commit completed and you want to "revert" or "backout" of it. "unmerge" seems to give the wrong impression
(i.e. it's not mergeable). Perhaps this is something that should be solved with #1344, as once the data structure names are changed, a more obvious choice of language would emerge.

Also, no unit tests at the moment for the kvs module, only some unit tests for the internal commit API. Outside of pure instrumentation, impossible to test as the commit merging is racy. It would be easier once #1341 is implemented, as a stress test across namespaces could be done.

@chu11 chu11 self-assigned this Feb 22, 2018
@garlick
Copy link
Member

garlick commented Feb 23, 2018

This sounds like a clean approach!

Better to get terminology right in the commit log, if possible. Maybe say "fall back to indiivdual commits if merged commit fails"? Or "upon failure of merged commit, retry the original commits individually"?

@chu11
Copy link
Member Author

chu11 commented Feb 23, 2018

You know what, I like the idea of using "fallback". I think it is far more clearer than "revert" or "unmerge". I'll tweak the commit log, update the variable names, and re-push.

@chu11
Copy link
Member Author

chu11 commented Feb 23, 2018

re-pushed using the wording "fallback" instead of "revert" everywhere. Went ahead and squashed it since it was only a change in the last commit.

@codecov-io
Copy link

codecov-io commented Feb 23, 2018

Codecov Report

Merging #1346 into master will increase coverage by 0.03%.
The diff coverage is 87.8%.

@@            Coverage Diff             @@
##           master    #1346      +/-   ##
==========================================
+ Coverage   78.47%   78.51%   +0.03%     
==========================================
  Files         162      162              
  Lines       29689    29739      +50     
==========================================
+ Hits        23298    23349      +51     
+ Misses       6391     6390       -1
Impacted Files Coverage Δ
src/modules/kvs/kvstxn.c 78.97% <87.5%> (+0.9%) ⬆️
src/modules/kvs/kvs.c 65.21% <90%> (+0.28%) ⬆️
src/common/libflux/rpc.c 93.38% <0%> (-0.83%) ⬇️
src/common/libutil/base64.c 95.07% <0%> (-0.71%) ⬇️
src/common/libflux/future.c 88.78% <0%> (ø) ⬆️
src/common/libflux/message.c 81.72% <0%> (+0.47%) ⬆️
src/common/libflux/mrpc.c 86.66% <0%> (+1.17%) ⬆️

Copy link
Member

@garlick garlick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does continuing to merge after a commit has begun processing prevent fallback on error?

Would be good to explain this in the commit message so that we aren't tempted to add it back later for performance if it would break something.

@garlick
Copy link
Member

garlick commented Feb 23, 2018

(oops meant to add that as a single review comment on the first commit)

Copy link
Member

@garlick garlick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments, mainly suggestions for improving commit messages.

@@ -1222,8 +1222,7 @@ static int commit_merge (commit_t *dest, commit_t *src)
/* Merge ready commits that are mergeable, where merging consists of
* popping the "donor" commit off the ready list, and appending its
* ops to the top commit. The top commit can be appended to if it
* hasn't started, or is still building the rootcpy, e.g. stalled
* walking the namespace.
* hasn't started.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: add a note here on why only COMMIT_STATE_INIT is mergeable.

Commit message summary should be more descriptive, e.g. "modules/kvs: only merge commit in INIT state" or similar.

@@ -1219,10 +1219,46 @@ static int commit_merge (commit_t *dest, commit_t *src)
return -1;
}

static commit_t *commit_create_empty (commit_mgr_t *cm)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: change commit message to "modules/kvs: merge to new empty commit".

Suggestion: restructure to avoid repetition, e.g.

if (!(cnew = calloc ()))
    goto error_nomem;
if (!(cnew->ops = json_array()))
    goto error_nomem;
...
error_nomem:
    commit_destroy (cnew);
    errno = ENOMEM;
    return NULL;
...

@@ -1164,59 +1164,34 @@ int commit_mgr_ready_commit_count (commit_mgr_t *cm)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what the "atomic" comment refers to. This appears to just be about how this function cleans up on error?

Better commit summary message would be useful, as well as expanded commit main message.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, perhaps "atomic" is the wrong word. In the past, data structures were modified on the fly as merging occurred. Any error would lead to exit(). As we refactored exit() away and returned errors, we couldn't return half modified data structures. So a number of functions were modified to be "atomic", where the data structure was successfully modified completely or not at all. Maybe there's a better word than "atomic" here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, well I would say in the context of processing transactions/commits, the "atomic" term is a bit overloaded :-) Maybe just "fully clean up on error"?

@@ -1253,6 +1253,7 @@ int commit_mgr_merge_ready_commits (commit_mgr_t *cm)
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe it would be clearer to make the commit summary "modules/kvs: don't modify ready queue on error"?

@@ -45,7 +45,9 @@

#define FENCE_READY_MASK 0x01

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe "modules/kvs: preserve orig commits during merge"?

@@ -163,6 +163,13 @@ int commit_set_aux_errnum (commit_t *c, int errnum)
return c->aux_errnum;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: modules/kvs: try commits individually if merged commit fails

"core KVS file" could be KVS main or kvs.c.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or "try orig commit if merged comit fails"

@chu11
Copy link
Member Author

chu11 commented Feb 23, 2018

Why does continuing to merge after a commit has begun processing prevent fallback on error?

Ahh, I should change that commit log message. It's no longer true (was from a prior attempt). I left this in for another reason. Now that we are no longer removing commits from the queue, I'd have to regularly scan the ready queue to see if there are new things to merge.

That or keep a pointer to the "last merge" point. Hmmm. I suppose this would be doable, it's just a single pointer. Let me think about this.

chu11 added 8 commits March 6, 2018 16:45
Fix forgotten change to function name.
In kvstxn_mgr_merge_ready_transactions(), instead of merging
transactions into the current head ready transaction, create a
new empty transaction and merge contents into it.  Then push
that new transaction onto the head of the ready list.

Requires users to call kvstxn_mgr_get_ready_commit() after the
merge to get the new head.
With recent changes, kvstxn_merge() no longer needs to be fully cleaned
up on error.  An error code can be returned to the caller
kvstxn_mgr_merge_ready_commits(), which will handle full cleanup.
When merging transactions, also ensure flags are identical.
Alter logic in kvstxn_mgr_merge_ready_transactions(), so that
on error, no modifications to the kvstxn ready queue occur.
Add internal checks that ensures only kvstxn's that are
ready for processing are passed to processing functions.

Add unit tests appropriately.
Do not destroy transactions after they have been merged.  Instead
flag them as components of a larger merge.  When the kvstxn
of a set of merged transactions completes/is removed, at that point
in time remove all of the components of the larger merge.

As a consequence of this change and for optimization purposes, once
a merger of transactions has occured, there can no longer be any more
mergers until the head merged transaction has completed.  If this were
not done, the ready queue would constantly be iterted through and
new head merged transactions would be created.  This can be optimized
at a later time.

Add unit tests.
In kvstxn_mgr_remove_transaction() support flag for user to fallback
a merged kvstxn to the original transactions that made up the merge.

By doing so, the user need not send an error to all transactions merged
into that kvstxn.  Instead, each of the original transactions
can be replayed individually, and an error will only be sent to
the offending commit/fence transaction.

Support kvstxn_fallback_mergeable() so user knows if a kvstxn can
be falled back on.

In kvstxn_apply(), take advantage of this by not sending an error
when a kvstxn's merging can be falled back on.  As an exception,
do not fallback if it's a "death"-like error (e.g. ENOMEM).

Add internal kvstxn API unit tests.

Fixes flux-framework#1337
@chu11 chu11 force-pushed the issue1337-part3 branch from 3fb1fd7 to 96b46c0 Compare March 7, 2018 21:39
@chu11
Copy link
Member Author

chu11 commented Mar 7, 2018

Just re-pushed with updated patches based on current master. Discounting the renaming of data structures/variables/names/etc., changes are largely the same. I did decide to squash some patches into other ones.

The one notable difference is I removed my prior change where merges can only occur for transactions in state KVSTXN_STATE_INIT. I instead do not allow merges if a merge as already occurred. The net affect is identical, more clear, and does protect against a corner case where the user calls the merge function multiple times.

I put in the commit message why I did this and note that the reason for doing this could be optimized in the future. I may try and optimize before this PR is merged. Gonna think about it a bit, but didn't want that to be the hold up for pushing/merging this PR.

@chu11
Copy link
Member Author

chu11 commented Mar 7, 2018

and two soak runs to compare

master

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 0.2100  0.2300  0.2400  0.2362  0.2400  0.2900 

this branch

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 0.2100  0.2300  0.2400  0.2353  0.2400  0.2900 

little surprised the mean is faster. Perhaps just a lucky run. Or atleast ballpark similar performance.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.04%) to 78.825% when pulling 96b46c0 on chu11:issue1337-part3 into c6c48fd on flux-framework:master.

@garlick
Copy link
Member

garlick commented Mar 7, 2018

I instead do not allow merges if a merge as already occurred

Does this limit merging to 2:1?

@chu11
Copy link
Member Author

chu11 commented Mar 7, 2018

It limits merging to whatever was in the ready queue at the time of the merge. May it be 2 transactions or a bajillion transactions.

@grondo
Copy link
Contributor

grondo commented Mar 7, 2018

or a bajillion transactions.

I'll need to see a test case added for that.
😜

@garlick
Copy link
Member

garlick commented Mar 7, 2018

OK, I thought I must have misunderstood you there. Good! Ready for merge?

@chu11
Copy link
Member Author

chu11 commented Mar 7, 2018

yup, and I'll write up an issue for the optimization of merges. Already working on an idea.

@garlick garlick merged commit 5dc1611 into flux-framework:master Mar 7, 2018
@grondo grondo mentioned this pull request May 10, 2018
@chu11 chu11 deleted the issue1337-part3 branch June 5, 2021 17:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants