Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: transfer all pending prerequisites during command cancellation #17939

Merged

Conversation

nvanbenschoten
Copy link
Member

@nvanbenschoten nvanbenschoten commented Aug 26, 2017

Fixes #16266.

The first two commits overhaul the cmdQCancelTest type, which allows us to do more in-depth testing on command cancellation. This includes testing exactly where each command is with respect to the CommandQueue at a given time (pending, cancelled, running, or finished). We can then assert that dependencies are properly maintained during command cancellation. The test framework also adds support for handling local keys, which are tested in more depth in the fourth commit.

The third commit fixes the issue discussed in #16266.
The use of the cancelled flag on cmds in the CommandQueue was flawed in
two ways. First, we were only setting it on the cmd for the
SpanAccess/spanScope combination that was active during the context
cancellation. This meant that cmds in later SpanAccess/spanScope combinations
would not have the flag set even if its corresponding batch was cancelled.
Second, neither the cancelled flag nor the prereq list was not being set on
child cmds, only on parent cmds. This meant that context cancellation would
not work properly for BatchRequests that create more than one *cmd for any
access/scope combination.

This commit fixes both of these issues by removing the cancelled flag and
making sure that parent and child cmds keep prereqs in-sync. Instead of
using the cancelled flag to signify a cancelled cmd, we now
uses the cmd.prereq slice itself to signify cmd cancellation. cmds now
remove prerequisites from their prereq set as the prereqs stop pending.
If a cmd cancels early, it will leave around a non-empty prereq set.
The transitive dependency migration happens whenever a prereqs stop pending
while still holding onto prereqs itself. In this case, the dependent command
will add all remaining prereqs to its set. This should be less error-prone
and more easy to reason about since we have to maintain less state.

\cc. @petermattis @tschottdorf

@nvanbenschoten nvanbenschoten requested a review from a team August 26, 2017 01:01
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@petermattis
Copy link
Collaborator

The removal of the cmd.cancelled field looks good, but we need to reproduce the problem in #16266 before giving this the go ahead.


Reviewed 4 of 4 files at r1.
Review status: 2 of 5 files reviewed at latest revision, 1 unresolved discussion, some commit checks failed.


pkg/storage/replica_test.go, line 2456 at r2 (raw file):

	instrs = append(instrs, cancelInstr{reqOverride: &heartbearBa, cancel: false})
	instrs = append(instrs, cancelInstr{reqOverride: &pushBa, cancel: false})
	ct.RunWithoutInitialSpan(t, instrs)

That's some serious test infrastructure. I'm not following how you're controlling the cancellation order described in #16266.


Comments from Reviewable

@bdarnell
Copy link
Contributor

Reviewed 4 of 4 files at r1, 3 of 3 files at r2.
Review status: all files reviewed at latest revision, 4 unresolved discussions, some commit checks failed.


pkg/storage/replica.go, line 2039 at r2 (raw file):

						// Either the prerequisite command finished executing or it was cancelled.
						// If the command finished cleanly, there's nothing for us to do except
						// wait for the next prerequisite to finish. Here, prereq.prereqs == nil

So this is equivalent to the old code and the second commit is just a test with no functional change so far, right?


pkg/storage/replica_test.go, line 2474 at r1 (raw file):

}

// letCmdsRun unblocks commands that are ready to exit the CommandQueue and finsh. It

s/Run/Finish/


pkg/storage/replica_test.go, line 2488 at r1 (raw file):

	ba *roachpb.BatchRequest, action storagebase.CommandQueueAction,
) {
	if _, ok := ba.Requests[0].GetInner().(*roachpb.DeleteRangeRequest); ok {

What if you use reqOverride? Is it OK not to instrument those? (nevermind, I see the change in the second commit. If you're up for a little commit surgery it would be nice to either backport the cmdQCancelTestTimestamp part to the first commit or move the introduction of reqOverride to the second)


pkg/storage/replica_test.go, line 2456 at r2 (raw file):

Previously, petermattis (Peter Mattis) wrote…

That's some serious test infrastructure. I'm not following how you're controlling the cancellation order described in #16266.

Are you relying on the randomness in cancelCmds? It's not great to introduce a test that is expected to fail by becoming flaky instead of being deterministic. It would be better to be able to pass in a cancellation order (perhaps while leaving the randomized option for a fuzzy test).


Comments from Reviewable

@bdarnell
Copy link
Contributor

Review status: all files reviewed at latest revision, 5 unresolved discussions, some commit checks failed.


pkg/storage/replica_test.go, line 2434 at r2 (raw file):

	endTxnBa.Timestamp = now
	endTxnBa.Txn = &txn
	endTxnBa.Add(&roachpb.EndTransactionRequest{

EndTransactionRequest with a split trigger declares a lot more keys than a plain EndTransactionRequest. That may be necessary to trigger the bug. Specifically, with a SplitTrigger the EndTransaction both reads and writes the abort cache key, which I think would move the detection of that collision into a different scope/access pair.


Comments from Reviewable

@petermattis
Copy link
Collaborator

Review status: all files reviewed at latest revision, 5 unresolved discussions, some commit checks failed.


pkg/storage/replica.go, line 2039 at r2 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

So this is equivalent to the old code and the second commit is just a test with no functional change so far, right?

No, this is where the bug was hiding. Previously the code added the prereq.prereqs only if prereq.cancelled was set. But there was a bug in setting prereq.cancelled: we were only doing it for newCmd. Any remaining commands in newCmds were not having their prereqs propagated to dependents correctly.


Comments from Reviewable

@bdarnell
Copy link
Contributor

Review status: all files reviewed at latest revision, 5 unresolved discussions, some commit checks failed.


pkg/storage/replica.go, line 2039 at r2 (raw file):

Previously, petermattis (Peter Mattis) wrote…

No, this is where the bug was hiding. Previously the code added the prereq.prereqs only if prereq.cancelled was set. But there was a bug in setting prereq.cancelled: we were only doing it for newCmd. Any remaining commands in newCmds were not having their prereqs propagated to dependents correctly.

OK, got it now. This comment confused me and I had expected the solution to look more like repeating the block that previously set .cancelled for all the newCmds. Anyway, we still need to figure out a reproduction to make sure.


Comments from Reviewable

@petermattis
Copy link
Collaborator

Review status: all files reviewed at latest revision, 5 unresolved discussions, some commit checks failed.


pkg/storage/replica.go, line 2039 at r2 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

OK, got it now. This comment confused me and I had expected the solution to look more like repeating the block that previously set .cancelled for all the newCmds. Anyway, we still need to figure out a reproduction to make sure.

Agreed we need to figure out a reproduction. I'm testing this fix on denim. Given how long the problem takes to reproduce doing so won't give us complete reassure if there are no failures, though it will indicate this isn't the complete fix if a failure does occur.


Comments from Reviewable

@nvanbenschoten
Copy link
Member Author

Yeah, I wasn't suggesting pushing this until we actually reproduced the issue in TestReplicaCommandQueueCancellation16266 and proven that the fix addresses it.


Review status: all files reviewed at latest revision, 5 unresolved discussions, some commit checks failed.


pkg/storage/replica.go, line 2039 at r2 (raw file):

Previously, petermattis (Peter Mattis) wrote…

Agreed we need to figure out a reproduction. I'm testing this fix on denim. Given how long the problem takes to reproduce doing so won't give us complete reassure if there are no failures, though it will indicate this isn't the complete fix if a failure does occur.

Yeah like @petermattis said, the second commit fixes the bug discussed in #16266 as well. It does so by removing the cancelled flag, which was being mishandled. Now dependency transfer during cancellation is simply handled by leaving *cmds in a command's prereqs slice when closing it's pending channel, which is a bit more elegant and leaves less room for errors like we had before. I'll improve the comment so that it's more explicit.

@petermattis I'm also going to make a prediction that this doesn't fix the issue because I think I found a bigger one with how cmd cancellation currently interacts with the coveringOptimization.


pkg/storage/replica_test.go, line 2474 at r1 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

s/Run/Finish/

Done.


pkg/storage/replica_test.go, line 2488 at r1 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

What if you use reqOverride? Is it OK not to instrument those? (nevermind, I see the change in the second commit. If you're up for a little commit surgery it would be nice to either backport the cmdQCancelTestTimestamp part to the first commit or move the introduction of reqOverride to the second)

Done.


pkg/storage/replica_test.go, line 2456 at r2 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

Are you relying on the randomness in cancelCmds? It's not great to introduce a test that is expected to fail by becoming flaky instead of being deterministic. It would be better to be able to pass in a cancellation order (perhaps while leaving the randomized option for a fuzzy test).

Good point. I'll make the order of cancellation deterministic.


Comments from Reviewable

@nvanbenschoten nvanbenschoten force-pushed the nvanbenschoten/fixCancel branch from 998c131 to 57e29bd Compare August 28, 2017 07:21
@nvanbenschoten
Copy link
Member Author

nvanbenschoten commented Aug 28, 2017

I updated this with the full fix discussed in #16266 along with more testing around local keys. PTAL.

@nvanbenschoten nvanbenschoten force-pushed the nvanbenschoten/fixCancel branch 2 times, most recently from b11f9dc to 7ded8a8 Compare August 28, 2017 07:42
@petermattis
Copy link
Collaborator

:lgtm:


Review status: 2 of 5 files reviewed at latest revision, 7 unresolved discussions, all commit checks successful.


pkg/storage/command_queue.go, line 226 at r6 (raw file):

	for i := range c.children {
		c.children[i].prereqs = c.prereqs
	}

I think this loop is an optimization and not strictly necessary. When a dependent command waits on the prereqs of c.children[i], if one of those prereqs was cancelled we'll call ResolvePendingPrereq which will add the remaining prereqs for that cancelled dependency to c.children[i].

I'd prefer to leave this loop out because I'm having a hard time reasoning about the locking here and why it is safe to modify c.children[i].prereqs. If I'm missing something and the loop is necessary, then there is a gap in the test coverage as removing the loop doesn't cause any failures.


pkg/storage/replica.go, line 2033 at r6 (raw file):

				// prerequisites can add new transitive dependencies to a command, so newCmd.prereqs
				// should not be accessed directly (see ResolvePendingPrereq).
				for pre := newCmd.PendingPrereq(); pre != nil; pre = newCmd.PendingPrereq() {

The repetition of newCmd.PendingPrereq() could be avoided by doing:

for {
  pre := newCmd.PendingPrereq()
  if pre == nil {
    break
  }
}

A few more lines of code, but I find this clearer. Feel free to ignore.


Comments from Reviewable


// Truncate the command's prerequisite list so that it no longer includes
// the first prerequisite. Before doing so, nil out prefix of slice to allow
// GC of the first command. This prevents us from leaking large chunks of
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this comment accurate? I.e. is

var s []*int
for i := 0; i < 1E10; i++ {
  s = append(s, new(5))
  s = s[:1]
}

any different from the version with a s[0] = nil added in? Seems that in both cases GC can only reclaim the old chunk of backing memory after it's had to copy into a new one.

@nvanbenschoten nvanbenschoten force-pushed the nvanbenschoten/fixCancel branch from 7ded8a8 to de768dc Compare August 28, 2017 14:15
@nvanbenschoten
Copy link
Member Author

nvanbenschoten commented Aug 28, 2017

Review status: 2 of 5 files reviewed at latest revision, 8 unresolved discussions.


pkg/storage/command_queue.go, line 215 at r5 (raw file):

Previously, tschottdorf (Tobias Schottdorf) wrote…

Is this comment accurate? I.e. is

var s []*int
for i := 0; i < 1E10; i++ {
  s = append(s, new(5))
  s = s[:1]
}

any different from the version with a s[0] = nil added in? Seems that in both cases GC can only reclaim the old chunk of backing memory after it's had to copy into a new one.

We're not trying to allow the initial chunk of the slice to be GCed, we're trying to allow GC of the cmd it points to.


pkg/storage/command_queue.go, line 226 at r6 (raw file):

Previously, petermattis (Peter Mattis) wrote…

I think this loop is an optimization and not strictly necessary. When a dependent command waits on the prereqs of c.children[i], if one of those prereqs was cancelled we'll call ResolvePendingPrereq which will add the remaining prereqs for that cancelled dependency to c.children[i].

I'd prefer to leave this loop out because I'm having a hard time reasoning about the locking here and why it is safe to modify c.children[i].prereqs. If I'm missing something and the loop is necessary, then there is a gap in the test coverage as removing the loop doesn't cause any failures.

So right now it is necessary because a parent *cmd and its child cmds share the same underlying prereq slice. The reason you weren't seeing any test failures is because the c.prereqs = append(c.prereqs, pre.prereqs...) above happened to be reallocating the slice in all test scenarios. If I initialize prereqs in CommandQueue.getPrereqs with a large capacity so that the append doesn't always reallocate, multiple tests will fail. I can call this out more directly in the comment or remove the c.prereqs[0] = nil above so that we're not modifying the same slice. I strongly prefer the first option though, because letting parent and child cmds diverge is what got us into this mess.


pkg/storage/replica.go, line 2033 at r6 (raw file):

Previously, petermattis (Peter Mattis) wrote…

The repetition of newCmd.PendingPrereq() could be avoided by doing:

for {
  pre := newCmd.PendingPrereq()
  if pre == nil {
    break
  }
}

A few more lines of code, but I find this clearer. Feel free to ignore.

Done. Good call.


Comments from Reviewable

@petermattis
Copy link
Collaborator

Review status: 2 of 5 files reviewed at latest revision, 9 unresolved discussions, all commit checks successful.


pkg/storage/command_queue.go, line 215 at r5 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

We're not trying to allow the initial chunk of the slice to be GCed, we're trying to allow GC of the *cmd it points to.

The comment talks about leaking, but it isn't really a leak, just a reference.


pkg/storage/command_queue.go, line 222 at r6 (raw file):

	// Keep the prereq list in all child commands in-sync in case dependents
	// have dependencies on the expanded child commands instead of the parent
	// command. If we didn't keep them in-sync, cancellation may not properly

The first sentence of this comment implies that dependents could have dependencies on the parent command, but that never happens because getPrereqs always returns commands without children.


pkg/storage/command_queue.go, line 226 at r6 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

So right now it is necessary because a parent *cmd and its child cmds share the same underlying prereq slice. The reason you weren't seeing any test failures is because the c.prereqs = append(c.prereqs, pre.prereqs...) above happened to be reallocating the slice in all test scenarios. If I initialize prereqs in CommandQueue.getPrereqs with a large capacity so that the append doesn't always reallocate, multiple tests will fail. I can call this out more directly in the comment or remove the c.prereqs[0] = nil above so that we're not modifying the same slice. I strongly prefer the first option though, because letting parent and child cmds diverge is what got us into this mess.

Ah, so much subtlety here. Rather than keeping the c.children[i].prereqs in sync with c.prereqs, perhaps they should point to the same slice:

type cmd struct {
  ...
  prereqs    *[]*cmd
  prereqsBuf []*cmd
}

...

func (cq *CommandQueue) add(...) {
  ...
  cmd.prereqsBuf = prereqs
  cmd.prereqs = &cmd.prereqsBuf
  ...
  for i, span := range spans {
    child := &cmd.children[i]
    child.prereqs = cmd.prereqs
  }
}

Comments from Reviewable

@nvanbenschoten nvanbenschoten force-pushed the nvanbenschoten/fixCancel branch from de768dc to 01d3d51 Compare August 28, 2017 15:09
@nvanbenschoten
Copy link
Member Author

Review status: 2 of 5 files reviewed at latest revision, 9 unresolved discussions.


pkg/storage/command_queue.go, line 215 at r5 (raw file):
I guess leak in the literal sense wasn't the right word. I rephrased it to:

Without this, large chunks of the dependency graph would be prevented from being GCed longer than necessary, especially during cascade command cancellation.


pkg/storage/command_queue.go, line 222 at r6 (raw file):

Previously, petermattis (Peter Mattis) wrote…

The first sentence of this comment implies that dependents could have dependencies on the parent command, but that never happens because getPrereqs always returns commands without children.

Couldn't it happen if the request has only 1 span, so that there are no children cmds and only a parent? I think I may be misunderstanding your question, but this comment is removed now anyway because of your other suggestion.


pkg/storage/command_queue.go, line 226 at r6 (raw file):

Previously, petermattis (Peter Mattis) wrote…

Ah, so much subtlety here. Rather than keeping the c.children[i].prereqs in sync with c.prereqs, perhaps they should point to the same slice:

type cmd struct {
  ...
  prereqs    *[]*cmd
  prereqsBuf []*cmd
}

...

func (cq *CommandQueue) add(...) {
  ...
  cmd.prereqsBuf = prereqs
  cmd.prereqs = &cmd.prereqsBuf
  ...
  for i, span := range spans {
    child := &cmd.children[i]
    child.prereqs = cmd.prereqs
  }
}

I like that! Done.


Comments from Reviewable

@petermattis
Copy link
Collaborator

Review status: 2 of 5 files reviewed at latest revision, 7 unresolved discussions, all commit checks successful.


pkg/storage/command_queue.go, line 222 at r6 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Couldn't it happen if the request has only 1 span, so that there are no children cmds and only a parent? I think I may be misunderstanding your question, but this comment is removed now anyway because of your other suggestion.

If a parent doesn't have children, is it really a parent?


Comments from Reviewable

@nvanbenschoten
Copy link
Member Author

Review status: 2 of 5 files reviewed at latest revision, 7 unresolved discussions, all commit checks successful.


pkg/storage/command_queue.go, line 222 at r6 (raw file):

Previously, petermattis (Peter Mattis) wrote…

If a parent doesn't have children, is it really a parent?

That's a larger philosophical question than I'm qualified to answer.


Comments from Reviewable

@bdarnell
Copy link
Contributor

:lgtm:


Reviewed 2 of 3 files at r7, 1 of 1 files at r8.
Review status: all files reviewed at latest revision, 5 unresolved discussions, all commit checks successful.


pkg/storage/replica_test.go, line 2514 at r8 (raw file):

		ct.RunWithoutInitialSpan(instrs, []int{2})
	})
	t.Run("CancelEndTxn", func(t *testing.T) {

Are these other scenarios derived from anything in particular?


Comments from Reviewable

@nvanbenschoten
Copy link
Member Author

Review status: all files reviewed at latest revision, 5 unresolved discussions, all commit checks successful.


pkg/storage/replica_test.go, line 2514 at r8 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

Are these other scenarios derived from anything in particular?

No, I was just trying to think of interesting cases that created dependency chains across local keys and global keys and across reads and writes. Are there any other interesting scenarios you think we should add here that might create interesting dependency patterns?


Comments from Reviewable

@bdarnell
Copy link
Contributor

Review status: all files reviewed at latest revision, 5 unresolved discussions, all commit checks successful.


pkg/storage/replica_test.go, line 2514 at r8 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

No, I was just trying to think of interesting cases that created dependency chains across local keys and global keys and across reads and writes. Are there any other interesting scenarios you think we should add here that might create interesting dependency patterns?

EndTransaction with a SplitTrigger is interesting since it mixes local and global keys and blocks on the entire data span.


Comments from Reviewable

This allows us to test more about which commands are let through the command
queue at a given time and make more directed assertions. This is important
when testing the exact effect of command cancellation.
Previously, `cmdQCancelTest` would cancel commands in a random order.
The test structure now takes the `cancelOrder` as a parameter.
The use of the `cancelled` flag on `cmds` in the `CommandQueue` was flawed in
two ways. First, we were only setting it on the `cmd` for the
SpanAccess/spanScope combination that was active during the context
cancellation. This meant that `cmds` in later SpanAccess/spanScope combinations
would not have the flag set even if its corresponding batch was cancelled.
Second, neither the `cancelled` flag nor the `prereq` list was not being set on
child `cmds`, only on parent `cmds`. This meant that context cancellation would
not work properly for BatchRequests that create more than one `*cmd` for any
access/scope combination.

This commit fixes both of these issues by removing the `cancelled` flag and
making sure that parent and child `cmds` keep `prereqs` in-sync. Instead of
using the `cancelled` flag to signify a cancelled `cmd`, the change now
uses the `cmd.prereq` slice itself to signify `cmd` cancellation. `cmds` now
remove prerequisites from their `prereq` set as the prereqs stop pending.
If a `cmd` cancels early, it will leave around a non-empty `prereq` set.
The transitive dependency migration happens whenever a prereqs stop pending
while still holding onto prereqs itself. In this case, the dependent command
will add all remaining prereqs to its set. This should be less error-prone
and more easy to reason about since we have to maintain less state.
@nvanbenschoten
Copy link
Member Author

Review status: all files reviewed at latest revision, 5 unresolved discussions, all commit checks successful.


pkg/storage/replica_test.go, line 2514 at r8 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

EndTransaction with a SplitTrigger is interesting since it mixes local and global keys and blocks on the entire data span.

Done.


Comments from Reviewable

@nvanbenschoten nvanbenschoten force-pushed the nvanbenschoten/fixCancel branch from 01d3d51 to 93f7fe8 Compare August 28, 2017 19:54
@nvanbenschoten
Copy link
Member Author

I'll merge if #17815 proves successful.

@petermattis has the binary running on denim been updated with the newer version of this (the one from last night onward)?

@petermattis
Copy link
Collaborator

@petermattis has the binary running on denim been updated with the newer version of this (the one from last night onward)?

I ran a binary with your PR as of this morning for much of the day. No failures detected. I've since switched to testing something else on denim, but will go back to the splits test again tomorrow. I think this can merge if #17815 proves successful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

storage: transitive prerequisites are not always transferred on command cancellation
5 participants