Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: add to perf comment in handleRaftReady #38380

Merged
merged 1 commit into from
Jun 26, 2019

Conversation

tbg
Copy link
Member

@tbg tbg commented Jun 24, 2019

I pulled on a thread in etcd/raft today where I was worried that
something was off with our perf improvements around sending MsgApp
before persisting HardState and Entries. Everything ended up being
OK but I updated a comment.

Release note: None

@tbg tbg requested review from a team and nvanbenschoten June 24, 2019 20:35
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten and @tbg)


pkg/storage/replica_raft.go, line 653 at r1 (raw file):

	// etcd/raft does not support commit quorums that do not include the leader,
	// even though the Raft thesis states that this would technically be safe:
	// > The leader may even commit an entry before it has been written to its

If we're going to change the indentation for quotes, do it here too.


pkg/storage/replica_raft.go, line 663 at r1 (raw file):

or for which a corresponding Entry is not previously persisted

I'm not sure I understand this case. When would a leader be sending out a commit index above the index of entries which it has not previously persisted? The paragraph before this one presents an argument for why this would never happen. Am I missing something? Perhaps around leadership changes?

Or are you talking more abstractly here and then saying that this isn't actually a problem given the implementation. If so, I'd soften the language around "the committed index may require persisting Entries from the current Ready". That's only true in the single-node Raft group, right? In which case, there are no messages.


pkg/storage/replica_raft.go, line 716 at r1 (raw file):

		// committed index larger than the last index, which will set off
		// alarms. Right now everything is in the same batch and so that problem
		// does not exist, but should that change we can lower Committed before

s/can lower/can manually lower/ to make it clear that this is an action we'd have to take in this code path. Also maybe talk about why it's ok to do so.

Also, wouldn't appending the entries before writing the HardState be another solution to this if we no longer had atomic batch semantics for the writes? Or is that not true given some of the changes you're making at the moment?

@tbg tbg force-pushed the fix/handle-ready-comment branch from 73a5555 to 816bcc3 Compare June 25, 2019 08:07
I pulled on a thread in etcd/raft today where I was worried that
something was off with our perf improvements around sending MsgApp
before persisting HardState and Entries. Everything ended up being
OK but I updated a comment.

Release note: None
@tbg tbg force-pushed the fix/handle-ready-comment branch from 816bcc3 to 0635f83 Compare June 25, 2019 08:08
Copy link
Member Author

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten)


pkg/storage/replica_raft.go, line 663 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
or for which a corresponding Entry is not previously persisted

I'm not sure I understand this case. When would a leader be sending out a commit index above the index of entries which it has not previously persisted? The paragraph before this one presents an argument for why this would never happen. Am I missing something? Perhaps around leadership changes?

Or are you talking more abstractly here and then saying that this isn't actually a problem given the implementation. If so, I'd soften the language around "the committed index may require persisting Entries from the current Ready". That's only true in the single-node Raft group, right? In which case, there are no messages.

I think the answers to your questions were all in the comment, but I agree that it was hard to parse. I took a stab at reworking this whole block more holistically, PTAL


pkg/storage/replica_raft.go, line 716 at r1 (raw file):

Also, wouldn't appending the entries before writing the HardState be another solution to this if we no longer had atomic batch semantics for the writes? Or is that not true given some of the changes you're making at the moment?

Yes, see later in the same sentence

(or take care to persist the HardState after appending)

However, when we make the changes for etcd-io/etcd#7625 (comment) we want the opposite: we require that certain appends won't be persisted without updating HardState.Commit first. Even worse, when you're a follower catching up on a whole chunk of historical log (that may contain config changes), you'll get commit indexes pointing at the end of each chunk of entries, and you must persist that together with the chunk (not before because otherwise referring to unknown entries, not after because otherwise violate joint quorum invariant). Hmm, we might have to just lower the committed index to lastIndex if necessary when starting up a group. Anyway, certainly something to keep looking at as perf work happens on the Raft ready loop.

I'm also planning to copy this commentary in this file upstream so that by the time all of these questions become real there's documentation on the allowed concurrency patterns in etcd/raft itself.

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained

@tbg
Copy link
Member Author

tbg commented Jun 25, 2019

bors r=nvanbenschoten

craig bot pushed a commit that referenced this pull request Jun 25, 2019
38380: storage: add to perf comment in handleRaftReady r=nvanbenschoten a=tbg

I pulled on a thread in etcd/raft today where I was worried that
something was off with our perf improvements around sending MsgApp
before persisting HardState and Entries. Everything ended up being
OK but I updated a comment.

Release note: None

Co-authored-by: Tobias Schottdorf <[email protected]>
@craig
Copy link
Contributor

craig bot commented Jun 25, 2019

Build failed

@tbg
Copy link
Member Author

tbg commented Jun 26, 2019

In the above flake (unrelated to this PR), acceptance/version-upgrade failed on

t.Fatalf("%d: %s", i, err)

and the harness never manages to quite shut down, eventually hitting the 10 minute timeout and then getting stuck until teamcity comes in for a mercy killing (artifacts lost).

@andreimatei are you touching this at all in #30977? This seems to be a bug in roachtest (on top of a bug in the test/crdb).

[  11] acceptance/version-upgrade: ??? (0s)
22:40:37 test.go:808: test failure: 	upgrade.go:429,retry.go:184,upgrade.go:423,upgrade.go:568,acceptance.go:90,test.go:1249: 2: dial tcp 127.0.0.1:26259: connect: connection refused
22:40:37 cluster.go:344: > /go/src/github.com/cockroachdb/cockroach/bin/roachprod monitor local --oneshot --ignore-empty-nodes
22:40:38 test.go:808: test failure: 	cluster.go:1033,context.go:122,cluster.go:1022,panic.go:406,test.go:783,test.go:773,upgrade.go:429,retry.go:184,upgrade.go:423,upgrade.go:568,acceptance.go:90,test.go:1249: dead node detection: /go/src/github.com/cockroachdb/cockroach/bin/roachprod monitor local --oneshot --ignore-empty-nodes: exit status 1 4: skipped
		2: dead
		1: 23187
		3: 23306
		Error:  2: dead
22:40:38 cluster.go:1102: running (fast) consistency checks on node 1
[  12] acceptance/version-upgrade: ??? (0s)
[  13] acceptance/version-upgrade: ??? (0s)
[  14] acceptance/version-upgrade: ??? (0s)
[  15] acceptance/version-upgrade: ??? (0s)
[  16] acceptance/version-upgrade: ??? (0s)
[  17] acceptance/version-upgrade: ??? (0s)
[  18] acceptance/version-upgrade: ??? (0s)
[  19] acceptance/version-upgrade: ??? (0s)
22:48:55 test.go:808: test failure: 	test.go:1235: test timed out (10m0s)
22:48:55 cluster.go:992: fetching debug zip
22:48:55 cluster.go:344: > /go/src/github.com/cockroachdb/cockroach/bin/roachprod ssh local:1 -- ./cockroach debug zip --url {pgurl:1} debug.zip
22:48:56 cluster.go:270: > /go/src/github.com/cockroachdb/cockroach/bin/roachprod get local:1 debug.zip artifacts/acceptance/version-upgrade/debug.zip
local: getting debug.zip artifacts/acceptance/version-upgrade/debug.zip

@tbg
Copy link
Member Author

tbg commented Jun 26, 2019

Filed #38428

bors r=nvanbenschoten

craig bot pushed a commit that referenced this pull request Jun 26, 2019
38380: storage: add to perf comment in handleRaftReady r=nvanbenschoten a=tbg

I pulled on a thread in etcd/raft today where I was worried that
something was off with our perf improvements around sending MsgApp
before persisting HardState and Entries. Everything ended up being
OK but I updated a comment.

Release note: None

Co-authored-by: Tobias Schottdorf <[email protected]>
@craig
Copy link
Contributor

craig bot commented Jun 26, 2019

Build succeeded

@craig craig bot merged commit 0635f83 into cockroachdb:master Jun 26, 2019
@andreimatei
Copy link
Contributor

@andreimatei are you touching this at all in #30977? This seems to be a bug in roachtest (on top of a bug in the test/crdb).

Hopefully... But I don't entirely follow. Do you understand what's going on?

@tbg tbg deleted the fix/handle-ready-comment branch July 12, 2019 08:15
@tbg
Copy link
Member Author

tbg commented Jul 12, 2019

No, let's see if it happens again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants