kvs: support ability to "revert" / "back out" merged commits #1346

chu11 · 2018-02-22T23:51:38Z

As described in #1337, a failure in a merged commit should not fail all transactions that make up that commit. In order to accomplish this, this is what I did:

When merging, do not merge ops/names into an existing commit_t, instead create a new commit_t and merge into that data structure.
Flag that the new commit_t as a collection of merges and that the other ones are components of a merge. Leave all the component commit_t's on the ready queue as they were.
Push this new merged commit_t onto the ready queue and use it.
If the new merged commit_t succeeds, when it is done remove it and all the "components" of the merge. If the new merged commit_t fails, remove it from the ready queue, and leave all of the "components" there for processing later. Flag them as non-mergeable going forward and let processing continue as normal.

Ran soak tests over 1000 jobs, and overall performance is < 1% slower. Understandable that there is slowdown as there are additional allocations and what not.

Before

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 0.2100  0.2300  0.2300  0.2353  0.2400  0.3300

After

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 0.2100  0.2300  0.2400  0.2376  0.2400  0.3100

(I should say, the "before" is before PR #1343, the refactor done right before this PR.)

Note, I used the word "revert" in the documentation and code to describe a failed merged commit becoming "un-merged". I'm not 100% happy with the use of this term to describe what's going on, but
couldn't think of a better word. Both it and "backout" give the impression that the commit completed and you want to "revert" or "backout" of it. "unmerge" seems to give the wrong impression
(i.e. it's not mergeable). Perhaps this is something that should be solved with #1344, as once the data structure names are changed, a more obvious choice of language would emerge.

Also, no unit tests at the moment for the kvs module, only some unit tests for the internal commit API. Outside of pure instrumentation, impossible to test as the commit merging is racy. It would be easier once #1341 is implemented, as a stress test across namespaces could be done.

garlick · 2018-02-23T00:22:18Z

This sounds like a clean approach!

Better to get terminology right in the commit log, if possible. Maybe say "fall back to indiivdual commits if merged commit fails"? Or "upon failure of merged commit, retry the original commits individually"?

chu11 · 2018-02-23T00:28:13Z

You know what, I like the idea of using "fallback". I think it is far more clearer than "revert" or "unmerge". I'll tweak the commit log, update the variable names, and re-push.

chu11 · 2018-02-23T00:57:41Z

re-pushed using the wording "fallback" instead of "revert" everywhere. Went ahead and squashed it since it was only a change in the last commit.

codecov-io · 2018-02-23T01:22:14Z

Codecov Report

Merging #1346 into master will increase coverage by 0.03%.
The diff coverage is 87.8%.

@@            Coverage Diff             @@
##           master    #1346      +/-   ##
==========================================
+ Coverage   78.47%   78.51%   +0.03%     
==========================================
  Files         162      162              
  Lines       29689    29739      +50     
==========================================
+ Hits        23298    23349      +51     
+ Misses       6391     6390       -1

Impacted Files	Coverage Δ
src/modules/kvs/kvstxn.c	`78.97% <87.5%> (+0.9%)`	⬆️
src/modules/kvs/kvs.c	`65.21% <90%> (+0.28%)`	⬆️
src/common/libflux/rpc.c	`93.38% <0%> (-0.83%)`	⬇️
src/common/libutil/base64.c	`95.07% <0%> (-0.71%)`	⬇️
src/common/libflux/future.c	`88.78% <0%> (ø)`	⬆️
src/common/libflux/message.c	`81.72% <0%> (+0.47%)`	⬆️
src/common/libflux/mrpc.c	`86.66% <0%> (+1.17%)`	⬆️

garlick

Why does continuing to merge after a commit has begun processing prevent fallback on error?

Would be good to explain this in the commit message so that we aren't tempted to add it back later for performance if it would break something.

garlick · 2018-02-23T14:44:31Z

(oops meant to add that as a single review comment on the first commit)

garlick

Some comments, mainly suggestions for improving commit messages.

garlick · 2018-02-23T16:47:40Z

src/modules/kvs/commit.c

@@ -1222,8 +1222,7 @@ static int commit_merge (commit_t *dest, commit_t *src)
 /* Merge ready commits that are mergeable, where merging consists of
 * popping the "donor" commit off the ready list, and appending its
 * ops to the top commit.  The top commit can be appended to if it
- * hasn't started, or is still building the rootcpy, e.g. stalled
- * walking the namespace.
+ * hasn't started.


Suggestion: add a note here on why only COMMIT_STATE_INIT is mergeable.

Commit message summary should be more descriptive, e.g. "modules/kvs: only merge commit in INIT state" or similar.

garlick · 2018-02-23T16:54:58Z

src/modules/kvs/commit.c

@@ -1219,10 +1219,46 @@ static int commit_merge (commit_t *dest, commit_t *src)
    return -1;
 }

+static commit_t *commit_create_empty (commit_mgr_t *cm)


Suggestion: change commit message to "modules/kvs: merge to new empty commit".

Suggestion: restructure to avoid repetition, e.g.

if (!(cnew = calloc ())) goto error_nomem; if (!(cnew->ops = json_array())) goto error_nomem; ... error_nomem: commit_destroy (cnew); errno = ENOMEM; return NULL; ...

garlick · 2018-02-23T17:06:07Z

src/modules/kvs/commit.c

@@ -1164,59 +1164,34 @@ int commit_mgr_ready_commit_count (commit_mgr_t *cm)



Not sure what the "atomic" comment refers to. This appears to just be about how this function cleans up on error?

Better commit summary message would be useful, as well as expanded commit main message.

Yeah, perhaps "atomic" is the wrong word. In the past, data structures were modified on the fly as merging occurred. Any error would lead to exit(). As we refactored exit() away and returned errors, we couldn't return half modified data structures. So a number of functions were modified to be "atomic", where the data structure was successfully modified completely or not at all. Maybe there's a better word than "atomic" here.

Ah, well I would say in the context of processing transactions/commits, the "atomic" term is a bit overloaded :-) Maybe just "fully clean up on error"?

garlick · 2018-02-23T17:42:29Z

src/modules/kvs/commit.c

@@ -1253,6 +1253,7 @@ int commit_mgr_merge_ready_commits (commit_mgr_t *cm)
 {


maybe it would be clearer to make the commit summary "modules/kvs: don't modify ready queue on error"?

garlick · 2018-02-23T17:44:11Z

src/modules/kvs/commit.c

@@ -45,7 +45,9 @@

 #define FENCE_READY_MASK 0x01



maybe "modules/kvs: preserve orig commits during merge"?

garlick · 2018-02-23T17:47:07Z

src/modules/kvs/commit.c

@@ -163,6 +163,13 @@ int commit_set_aux_errnum (commit_t *c, int errnum)
    return c->aux_errnum;


Suggestion: modules/kvs: try commits individually if merged commit fails

"core KVS file" could be KVS main or kvs.c.

or "try orig commit if merged comit fails"

chu11 · 2018-02-23T18:20:53Z

Why does continuing to merge after a commit has begun processing prevent fallback on error?

Ahh, I should change that commit log message. It's no longer true (was from a prior attempt). I left this in for another reason. Now that we are no longer removing commits from the queue, I'd have to regularly scan the ready queue to see if there are new things to merge.

That or keep a pointer to the "last merge" point. Hmmm. I suppose this would be doable, it's just a single pointer. Let me think about this.

Fix forgotten change to function name.

In kvstxn_mgr_merge_ready_transactions(), instead of merging transactions into the current head ready transaction, create a new empty transaction and merge contents into it. Then push that new transaction onto the head of the ready list. Requires users to call kvstxn_mgr_get_ready_commit() after the merge to get the new head.

With recent changes, kvstxn_merge() no longer needs to be fully cleaned up on error. An error code can be returned to the caller kvstxn_mgr_merge_ready_commits(), which will handle full cleanup.

When merging transactions, also ensure flags are identical.

Alter logic in kvstxn_mgr_merge_ready_transactions(), so that on error, no modifications to the kvstxn ready queue occur.

Add internal checks that ensures only kvstxn's that are ready for processing are passed to processing functions. Add unit tests appropriately.

Do not destroy transactions after they have been merged. Instead flag them as components of a larger merge. When the kvstxn of a set of merged transactions completes/is removed, at that point in time remove all of the components of the larger merge. As a consequence of this change and for optimization purposes, once a merger of transactions has occured, there can no longer be any more mergers until the head merged transaction has completed. If this were not done, the ready queue would constantly be iterted through and new head merged transactions would be created. This can be optimized at a later time. Add unit tests.

In kvstxn_mgr_remove_transaction() support flag for user to fallback a merged kvstxn to the original transactions that made up the merge. By doing so, the user need not send an error to all transactions merged into that kvstxn. Instead, each of the original transactions can be replayed individually, and an error will only be sent to the offending commit/fence transaction. Support kvstxn_fallback_mergeable() so user knows if a kvstxn can be falled back on. In kvstxn_apply(), take advantage of this by not sending an error when a kvstxn's merging can be falled back on. As an exception, do not fallback if it's a "death"-like error (e.g. ENOMEM). Add internal kvstxn API unit tests. Fixes flux-framework#1337

chu11 · 2018-03-07T21:49:11Z

Just re-pushed with updated patches based on current master. Discounting the renaming of data structures/variables/names/etc., changes are largely the same. I did decide to squash some patches into other ones.

The one notable difference is I removed my prior change where merges can only occur for transactions in state KVSTXN_STATE_INIT. I instead do not allow merges if a merge as already occurred. The net affect is identical, more clear, and does protect against a corner case where the user calls the merge function multiple times.

I put in the commit message why I did this and note that the reason for doing this could be optimized in the future. I may try and optimize before this PR is merged. Gonna think about it a bit, but didn't want that to be the hold up for pushing/merging this PR.

chu11 · 2018-03-07T22:03:19Z

and two soak runs to compare

master

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 0.2100  0.2300  0.2400  0.2362  0.2400  0.2900

this branch

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 0.2100  0.2300  0.2400  0.2353  0.2400  0.2900

little surprised the mean is faster. Perhaps just a lucky run. Or atleast ballpark similar performance.

coveralls · 2018-03-07T22:05:29Z

Coverage increased (+0.04%) to 78.825% when pulling 96b46c0 on chu11:issue1337-part3 into c6c48fd on flux-framework:master.

garlick · 2018-03-07T22:13:52Z

I instead do not allow merges if a merge as already occurred

Does this limit merging to 2:1?

chu11 · 2018-03-07T22:53:34Z

It limits merging to whatever was in the ready queue at the time of the merge. May it be 2 transactions or a bajillion transactions.

grondo · 2018-03-07T22:56:36Z

or a bajillion transactions.

I'll need to see a test case added for that.
😜

garlick · 2018-03-07T23:05:52Z

OK, I thought I must have misunderstood you there. Good! Ready for merge?

chu11 · 2018-03-07T23:13:55Z

yup, and I'll write up an issue for the optimization of merges. Already working on an idea.

chu11 self-assigned this Feb 22, 2018

chu11 force-pushed the issue1337-part3 branch from f680b56 to 3fb1fd7 Compare February 23, 2018 00:57

garlick reviewed Feb 23, 2018

View reviewed changes

chu11 added 8 commits March 6, 2018 16:45

modules/kvs: Update kvstxn comments

ba7c249

Fix forgotten change to function name.

modules/kvs: Refactor internal kvstxn_merge()

c2d030d

With recent changes, kvstxn_merge() no longer needs to be fully cleaned up on error. An error code can be returned to the caller kvstxn_mgr_merge_ready_commits(), which will handle full cleanup.

modules/kvs: Check flags on kvstxn merge

3f0b313

When merging transactions, also ensure flags are identical.

modules/kvs: don't modify ready queue on error

9dd5fb7

Alter logic in kvstxn_mgr_merge_ready_transactions(), so that on error, no modifications to the kvstxn ready queue occur.

modules/kvs: Add check in internal kvstxn API

79c176a

Add internal checks that ensures only kvstxn's that are ready for processing are passed to processing functions. Add unit tests appropriately.

chu11 force-pushed the issue1337-part3 branch from 3fb1fd7 to 96b46c0 Compare March 7, 2018 21:39

chu11 mentioned this pull request Mar 7, 2018

kvs: optimize transaction merging #1352

Closed

garlick merged commit 5dc1611 into flux-framework:master Mar 7, 2018

grondo mentioned this pull request May 10, 2018

0.9.0 Release #1479

Closed

chu11 deleted the issue1337-part3 branch June 5, 2021 17:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kvs: support ability to "revert" / "back out" merged commits #1346

kvs: support ability to "revert" / "back out" merged commits #1346

chu11 commented Feb 22, 2018 •

edited

Loading

garlick commented Feb 23, 2018

chu11 commented Feb 23, 2018

chu11 commented Feb 23, 2018

codecov-io commented Feb 23, 2018 •

edited

Loading

garlick left a comment

garlick commented Feb 23, 2018

garlick left a comment

garlick Feb 23, 2018

garlick Feb 23, 2018

garlick Feb 23, 2018

chu11 Feb 23, 2018

garlick Feb 23, 2018

garlick Feb 23, 2018

garlick Feb 23, 2018

garlick Feb 23, 2018

garlick Feb 23, 2018

chu11 commented Feb 23, 2018

chu11 commented Mar 7, 2018

chu11 commented Mar 7, 2018 •

edited

Loading

coveralls commented Mar 7, 2018

garlick commented Mar 7, 2018

chu11 commented Mar 7, 2018

grondo commented Mar 7, 2018

garlick commented Mar 7, 2018

chu11 commented Mar 7, 2018

		@@ -1164,59 +1164,34 @@ int commit_mgr_ready_commit_count (commit_mgr_t *cm)

		@@ -1253,6 +1253,7 @@ int commit_mgr_merge_ready_commits (commit_mgr_t *cm)
		{

		@@ -163,6 +163,13 @@ int commit_set_aux_errnum (commit_t *c, int errnum)
		return c->aux_errnum;

kvs: support ability to "revert" / "back out" merged commits #1346

kvs: support ability to "revert" / "back out" merged commits #1346

Conversation

chu11 commented Feb 22, 2018 • edited Loading

garlick commented Feb 23, 2018

chu11 commented Feb 23, 2018

chu11 commented Feb 23, 2018

codecov-io commented Feb 23, 2018 • edited Loading

Codecov Report

garlick left a comment

Choose a reason for hiding this comment

garlick commented Feb 23, 2018

garlick left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chu11 commented Feb 23, 2018

chu11 commented Mar 7, 2018

chu11 commented Mar 7, 2018 • edited Loading

coveralls commented Mar 7, 2018

garlick commented Mar 7, 2018

chu11 commented Mar 7, 2018

grondo commented Mar 7, 2018

garlick commented Mar 7, 2018

chu11 commented Mar 7, 2018

chu11 commented Feb 22, 2018 •

edited

Loading

codecov-io commented Feb 23, 2018 •

edited

Loading

chu11 commented Mar 7, 2018 •

edited

Loading