-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
storage: coalesce heartbeats #9380
Conversation
Review status: 0 of 10 files reviewed at latest revision, 13 unresolved discussions, some commit checks failed. storage/raft.proto, line 32 at r1 (raw file):
Lots of duplicate (and unnecessary?) info in storage/raft.proto, line 34 at r1 (raw file):
What is the difference between storage/raft.proto, line 36 at r1 (raw file):
Might be better to use
storage/replica.go, line 1720 at r1 (raw file):
No need for storage/replica.go, line 1727 at r1 (raw file):
Is continuing here appropriate? storage/replica.go, line 1742 at r1 (raw file):
This is racey and can block if you lose the race. The better structure is something along the lines of:
But really this makes me suspect that using a buffered channel for storage/replica.go, line 1913 at r1 (raw file):
I would have
Or, given that this is the only place storage/store.go, line 298 at r1 (raw file):
Yes. storage/store.go, line 2336 at r1 (raw file):
Couldn't you just move this metrics accounting below the coalesced heartbeat fanout. storage/store.go, line 2749 at r1 (raw file):
Is there an advantage to making this separate from storage/store.go, line 2772 at r1 (raw file):
Since you need per-node heartbeat lists here, rather than a single slice of queued heartbeats in storage/store.go, line 2796 at r1 (raw file):
Per an earlier comment, since storage/store.go, line 2806 at r1 (raw file):
Maybe loop over all of the replicas and call Comments from Reviewable |
73f77be
to
0f479fc
Compare
Reworked, now failing some Review status: 0 of 10 files reviewed at latest revision, 13 unresolved discussions, some commit checks failed. storage/raft.proto, line 32 at r1 (raw file):
|
0f479fc
to
c3fb5cd
Compare
Review status: 0 of 10 files reviewed at latest revision, 19 unresolved discussions, some commit checks failed. storage/raft.proto, line 67 at r2 (raw file):
I'm not sure if is allowed, but if it is add storage/replica.go, line 2266 at r2 (raw file):
You should move this into a storage/replica.go, line 2283 at r2 (raw file):
This isn't necessary. Appending to a nil slice is the same as appending to an empty slice. storage/store.go, line 2338 at r2 (raw file):
Should failure to fanout one heartbeat stop processing of the remainder? storage/store.go, line 2353 at r2 (raw file):
I wonder if this should be moved to the caller. It is slightly odd to see storage/store.go, line 2757 at r2 (raw file):
I would send the queued heartbeats first. By enqueuing the tick first you're going to be ticking and queueing up some additional heartbeats before sending. Comments from Reviewable |
It might be useful to rework this in light of quiescence and only quiesce Review status: 0 of 10 files reviewed at latest revision, 19 unresolved discussions, some commit checks failed. Comments from Reviewable |
When most ranges are quiesced, I think that quiescence largely undermines the argument for coalescing heartbeats - I don't think it's going to help much in the steady state of our current test clusters. Where it might help, though, is in cases like #9446 in which quiescing doesn't work as well. I'm concerned that sending per-range information in the heartbeats is not going to save enough relative to the non-coalesced case. In the million-ranges-per-store case that motivated coalesced heartbeats, those messages would still be many megabytes (of course, we're not to that scale yet). What if we batched all raft messages by introducing a Review status: 0 of 10 files reviewed at latest revision, 19 unresolved discussions, some commit checks failed. Comments from Reviewable |
Re: batching all Raft messages: heartbeats can be delayed for brief periods without issue. But don't we to send |
Yeah, I was thinking we'd read from the channel until we reached our target message size or it would block and then send that. The channel buffer would have time to fill up while we |
5f9adb6
to
fbdb90c
Compare
I think quiescence and coalesced heartbeats complement each other well - quiescence ensures that we never have a million active ranges per store, and coalesced heartbeats ensures that when we do have a few thousand active ranges per store, we keep network traffic down. Test failures: currently failing Review status: 0 of 10 files reviewed at latest revision, 19 unresolved discussions, some commit checks pending. storage/raft.proto, line 67 at r2 (raw file):
|
d78d223
to
5a2856f
Compare
Reviewed 1 of 10 files at r1, 1 of 9 files at r2, 7 of 8 files at r3, 1 of 1 files at r4. storage/replica.go, line 2323 at r4 (raw file):
wrap this in accordance with the style guide. storage/replica.go, line 2325 at r4 (raw file):
this construction is oddly repetitive. How about:
storage/store.go, line 297 at r4 (raw file):
you should call this storage/store.go, line 2383 at r4 (raw file):
why is this storage/store.go, line 2413 at r4 (raw file):
why is this storage/store.go, line 2971 at r4 (raw file):
wrap this in accordance with the style guide. Comments from Reviewable |
5a2856f
to
8778a98
Compare
3e677f6
to
ef6c024
Compare
Ready for another review. Only failing Review status: 5 of 10 files reviewed at latest revision, 25 unresolved discussions, some commit checks pending. storage/replica.go, line 2323 at r4 (raw file):
|
Reviewed 5 of 5 files at r5. storage/store.go, line 2383 at r4 (raw file):
|
1022dda
to
f4cf613
Compare
In the TestRaftRemoveRace failure, the log line Reviewed 1 of 10 files at r1, 1 of 9 files at r2, 3 of 8 files at r3, 3 of 5 files at r5, 2 of 2 files at r6. storage/raft.proto, line 32 at r1 (raw file):
|
Review status: all files reviewed at latest revision, 12 unresolved discussions, some commit checks failed. storage/replica.go, line 2332 at r6 (raw file):
I think you can do this locking after the switch statement. Even better to move it down to just before storage/store.go, line 2353 at r2 (raw file):
|
Reviewed 2 of 2 files at r6. Comments from Reviewable |
ecc01bd
to
e3f8e85
Compare
Reviewed 1 of 9 files at r8, 1 of 5 files at r10, 1 of 7 files at r12, 1 of 2 files at r13, 1 of 3 files at r14, 5 of 5 files at r15. Comments from Reviewable |
6c40fe5
to
b194fc4
Compare
a978f94
to
aa2546e
Compare
Moved the heartbeat coalescing loop to its own separate timer, which is set to run at 10x the speed of the raft tick loop. This makes heartbeating operate on a faster cadence than quiescing, which ensures that delayed heartbeat responses do not unquiesce ranges. Review status: 0 of 11 files reviewed at latest revision, 19 unresolved discussions, some commit checks pending. storage/replica.go, line 2494 at r13 (raw file):
This is the offending line that causes a deadlock in (and resulting failure of) Comments from Reviewable |
Reviewed 20 of 21 files at r16. Comments from Reviewable |
Review status: all files reviewed at latest revision, 25 unresolved discussions, all commit checks successful. pkg/storage/replica.go, line 2390 at r16 (raw file):
This comment seems out of place given that you're not doing anything to address it here. And is this even still a TODO? pkg/storage/store.go, line 523 at r16 (raw file):
Need additional commentary here or somewhere explaining why delaying the heartbeat for an entire tick is problematic. pkg/storage/store.go, line 664 at r16 (raw file):
How did you arrive at pkg/storage/store.go, line 2456 at r16 (raw file):
Doesn't pkg/storage/store.go, line 2483 at r16 (raw file):
Ditto. pkg/storage/store.go, line 2487 at r16 (raw file):
Ditto. pkg/storage/store.go, line 3078 at r16 (raw file):
Should call Comments from Reviewable |
Review status: 9 of 11 files reviewed at latest revision, 25 unresolved discussions. pkg/storage/replica.go, line 2390 at r16 (raw file):
|
Review status: 9 of 11 files reviewed at latest revision, 21 unresolved discussions. pkg/storage/store.go, line 664 at r16 (raw file):
|
Coalesce heartbeats and heartbeat responses bound for the same store into a single proto. Introduce a new environment flag (default: true) COCKROACH_ENABLE_COALESCED_HEARTBEATS to turn this feature off. Added metrics and a graph in the admin UI to track the number of queued heartbeats waiting to be coalesced. The frequency of heartbeat coalescing is controlled by a new timer, which is set by default to run 10x per raft tick.
Review status: 9 of 11 files reviewed at latest revision, 21 unresolved discussions. pkg/storage/store.go, line 524 at r17 (raw file):
|
Review status: 9 of 11 files reviewed at latest revision, 19 unresolved discussions, all commit checks successful. pkg/storage/store.go, line 664 at r16 (raw file):
|
Coalesce heartbeats and heartbeat responses bound for the same store
into a single proto. Introduce a new environment flag (default: true
to enable this in integration tests, will be changed to false before
merging) COCKROACH_ENABLE_COALESCED_HEARTBEATS to turn this feature
on. Added metrics and a graph in the admin UI to track the number of
queued heartbeats waiting to be coalesced.
Unlike the earlier proposal for coalesced heartbeats which would contain zero
additional information, this version sends large heartbeat packets that wrap up
all the individual heartbeat messages.
I would appreciate feedback on the following points:
as opposed to the current channel based implementation?
This change is