-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kvserver: e2e flow control for raft messages #79755
Labels
A-admission-control
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
Comments
tbg
added
the
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
label
Apr 11, 2022
This was referenced May 31, 2022
tbg
added a commit
to tbg/cockroach
that referenced
this issue
Aug 23, 2022
Inspired by cockroachlabs/support#1770. If either the raft send or receive queue fills up, wide-spread outages can occur as replication progress stalls. We have metrics that can indicate this, but straightforward logging is also appropriate to direct attention to the fact, which this commit achieves. Touches cockroachdb#79755 Release justification: important logging improvement Release note: None
craig bot
pushed a commit
that referenced
this issue
Aug 25, 2022
86608: batcheval: add latch key protecting range key stats update r=erikgrinaker a=aliher1911 Previously GC needed to get a read latch with max timestamp to ensure that range tombstones are not modified during GC. This is causing all writers to get stuck in queue while GC is validating request and removing range tombstone. This commit adds a dedicated latch key LocalRangeRangeTombstoneStatsUpdateLockSuffix to address the problem. All range tombstone writers obtain this read latch on top of the write latches for the ranges they are interested to update. GC on the other hand will obtain write latch on that key. This approach allows point writers to proceed during GC, but will block new range tombstones from being written. Non conflicting writes of range tombstones could still proceed since their write latch ranges don't overlap. Release justification: this is a safe change as range tombstone behaviour is not enabled yet and the change is needed to address potential performance regressions. Release note: None 86645: kvserver: log when raft send/recv queue fills up r=pavelkalinnikov a=tbg Inspired by https://github.com/cockroachlabs/support/issues/1770. If either the raft send or receive queue fills up, wide-spread outages can occur as replication progress stalls. We have metrics that can indicate this, but straightforward logging is also appropriate to direct attention to the fact, which this commit achieves. Touches #79755 Release justification: important logging improvement Release note: None 86679: server,ui: show internal stats with new cluster setting r=maryliag a=maryliag Previously, we were not showing internal results on fingerprint options on SQL Activity. A new cluster setting created `sql.stats.response.show_internal` can be set to `true` and internal statistics will be displayed on SQL Activity page. Fixes #79547 https://www.loom.com/share/1b89ba99a7c247edadb5c8b0d127755c Release justification: low risk, high benefit change Release note (sql change): New cluster setting `sql.stats.response.show_internal` with default value of `false` can be set to true, to display information about internal stats on SQL Activity page, with fingerprint option. 86748: storage: rename `MVCCRangeKeyStack.FirstAbove/Below` r=tbg a=erikgrinaker This patch renames `FirstAbove/Below` to `FirstAtOrAbove/Below`, for clarity. Release justification: bug fixes and low-risk updates to new functionality Release note: None 86809: backupccl: set kv.bulkio.write_metadata_sst.enabled to default false r=stevendanna a=msbutler This patch sets write_metadata_sst cluster setting to false in prep for the 22.2 branch cut, as there's additional worked required before this feature gets used in production. Setting this to false will also stop the roachtest in #86289 from consistently failing due to #86806. Fixes #86289 Release note: none Release justification: prevents using an experimental feature by default Co-authored-by: Oleg Afanasyev <[email protected]> Co-authored-by: Tobias Grieger <[email protected]> Co-authored-by: Marylia Gutierrez <[email protected]> Co-authored-by: Erik Grinaker <[email protected]> Co-authored-by: Michael Butler <[email protected]>
tbg
added a commit
to tbg/cockroach
that referenced
this issue
Sep 20, 2022
Inspired by cockroachlabs/support#1770. If either the raft send or receive queue fills up, wide-spread outages can occur as replication progress stalls. We have metrics that can indicate this, but straightforward logging is also appropriate to direct attention to the fact, which this commit achieves. Touches cockroachdb#79755 Release justification: important logging improvement Release note: None
tbg
added a commit
to tbg/cockroach
that referenced
this issue
Sep 20, 2022
Inspired by cockroachlabs/support#1770. If either the raft send or receive queue fills up, wide-spread outages can occur as replication progress stalls. We have metrics that can indicate this, but straightforward logging is also appropriate to direct attention to the fact, which this commit achieves. Touches cockroachdb#79755 Release justification: important logging improvement Release note: None
I read some of the discussion above and linked issues and PRs. I think this is going to be entirely encompassed by #95563. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
A-admission-control
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
Is your feature request related to a problem? Please describe.
The flow control for raft messages is currently overly simplistic. A high-level overview of the current state follows.
Sending
Raft messages originate in a
raft.Ready
:cockroach/pkg/kv/kvserver/replica_raft.go
Lines 596 to 598 in 40fd5d3
and are passed to the
RaftTransport
queues here:cockroach/pkg/kv/kvserver/replica_raft.go
Lines 1443 to 1465 in 40fd5d3
There is a single queue for messages to each destination store (irrespective of rangeID), which is essentially a 10k-buffered channel:
cockroach/pkg/kv/kvserver/raft_transport.go
Lines 556 to 557 in c23d1aa
from which batches of messages are put on the wire:
cockroach/pkg/kv/kvserver/raft_transport.go
Lines 515 to 534 in c23d1aa
Receiving
Messages arrive here:
cockroach/pkg/kv/kvserver/raft_transport.go
Lines 386 to 395 in c23d1aa
and are put on the raft receive queue:
cockroach/pkg/kv/kvserver/store_raft.go
Lines 167 to 185 in 98a66b5
The next available raft scheduler goroutine will pick up the nonempty queue in
cockroach/pkg/kv/kvserver/store_raft.go
Lines 442 to 460 in 98a66b5
and hands it to the raft group (which will not perform work yet, but instead stages everything for the next Ready):
cockroach/pkg/kv/kvserver/store_raft.go
Lines 254 to 256 in 98a66b5
Finally, a scheduler goroutine will actually process the replica for raft ready handling:
cockroach/pkg/kv/kvserver/store_raft.go
Lines 497 to 506 in 98a66b5
which is where I/O happens and the memory is flushed to disk.
Describe the solution you'd like
Introduce a model that more deliberately handles the case in which a follower is receiving more load than it can handle. Control the memory usage, while prioritizing heartbeats (and generally allowing QOS), without excessive performance cliffs. Integrate with admission control.
See also #79215.
Jira issue: CRDB-15932
Epic CRDB-15069
The text was updated successfully, but these errors were encountered: