Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

base: reduce RaftMaxInflightBytes to 32 MB #101437

Merged
merged 1 commit into from
Apr 13, 2023

Conversation

erikgrinaker
Copy link
Contributor

@erikgrinaker erikgrinaker commented Apr 13, 2023

This patch reduces the default RaftMaxInflightBytes from 256 MB to 32 MB, to reduce the out-of-memory incidence during bulk operations like RESTORE on clusters with overloaded disks.

RaftMaxInflightBytes specifies the maximum aggregate byte size of Raft log entries that a leader will send to a follower without hearing responses. As such, it also bounds the amount of replication data buffered in memory on the receiver. Individual messages can still exceed this limit (consider the default command size limit at 64 MB).

Normally, RaftMaxInflightMsgs * RaftMaxSizePerMsg will bound this at 4 MB (128 messages at 32 KB each). However, individual messages are allowed to exceed the 32 KB limit, typically large AddSSTable commands that can be around 10 MB each. To prevent followers running out of memory, we place an additional total byte limit of 32 MB, which is 8 times more than normal.

A survey of CC clusters over the past 30 days showed that, excluding a single outlier cluster, the total outstanding raft.rcvd.queued_bytes of any individual node never exceeded 500 MB, and was roughly 0 across all clusters for the majority of time.

Touches #71805.
Resolves #100341.
Resolves #100804.
Resolves #100983.
Resolves #101426.

Epic: none
Release note (ops change): the amount of replication traffic in flight from a single Raft leader to a follower has been reduced from 256 MB to 32 MB, in order to reduce the chance of running out of memory during bulk write operations. This can be controlled via the environment variable COCKROACH_RAFT_MAX_INFLIGHT_BYTES.

@erikgrinaker erikgrinaker requested review from pav-kv and a team April 13, 2023 13:20
@erikgrinaker erikgrinaker requested a review from a team as a code owner April 13, 2023 13:20
@erikgrinaker erikgrinaker self-assigned this Apr 13, 2023
@erikgrinaker erikgrinaker requested a review from a team as a code owner April 13, 2023 13:20
@blathers-crl
Copy link

blathers-crl bot commented Apr 13, 2023

It looks like your PR touches production code but doesn't add or edit any test code. Did you consider adding tests to your PR?

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@erikgrinaker
Copy link
Contributor Author

This still needs some benchmarking, but initial results in #100341 are promising.

Copy link
Collaborator

@pav-kv pav-kv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, thanks for experimenting!

This patch reduces the default `RaftMaxInflightBytes` from 256 MB to 32
MB, to reduce the out-of-memory incidence during bulk operations like
`RESTORE` on clusters with overloaded disks.

`RaftMaxInflightBytes` specifies the maximum aggregate byte size of Raft
log entries that a leader will send to a follower without hearing
responses. As such, it also bounds the amount of replication data
buffered in memory on the receiver. Individual messages can still exceed
this limit (consider the default command size limit at 64 MB).

Normally, `RaftMaxInflightMsgs` * `RaftMaxSizePerMsg` will bound this at
4 MB (128 messages at 32 KB each). However, individual messages are
allowed to exceed the 32 KB limit, typically large AddSSTable commands
that can be around 10 MB each. To prevent followers running out of
memory, we place an additional total byte limit of 32 MB, which is 8
times more than normal.

A survey of CC clusters over the past 30 days showed that, excluding a
single outlier cluster, the total outstanding `raft.rcvd.queued_bytes`
of any individual node never exceeded 500 MB, and was roughly 0 across
all clusters for the majority of time.

Epic: none
Release note (ops change): the amount of replication traffic in flight
from a single Raft leader to a follower has been reduced from 256 MB to
32 MB, in order to reduce the chance of running out of memory during
bulk write operations. This can be controlled via the environment
variable `COCKROACH_RAFT_MAX_INFLIGHT_BYTES`.
@erikgrinaker erikgrinaker force-pushed the raft-max-inflight-bytes branch from 42f50f2 to 8136e8e Compare April 13, 2023 13:51
@erikgrinaker erikgrinaker added backport-23.1.x Flags PRs that need to be backported to 23.1 backport-23.1.0 labels Apr 13, 2023
@erikgrinaker
Copy link
Contributor Author

erikgrinaker commented Apr 13, 2023

I ran some kv0 experiments with batches of 10x 1MB blocks at concurrency 32 on a 5-node cluster with 100 GB pd-ssd, which saturated write bandwidth, and we didn't really see any queueing (running without this patch). Even when I starved IO on one node, the queues didn't increase too much (never beyond 100 MB total per node). So it doesn't really seem like this should affect regular SQL traffic.

@erikgrinaker
Copy link
Contributor Author

This PR successfully prevented OOMs in 10 runs of restore/tpce/8TB/aws/nodes=10/cpus=8. Merging to see how it fares in the nightlies, will consider a 23.1 backport after some days of baking.

Thanks for the review, and for implementing RaftMaxInflightBytes in the first place @pavelkalinnikov!

bors r+

@craig
Copy link
Contributor

craig bot commented Apr 13, 2023

Build succeeded:

@blathers-crl
Copy link

blathers-crl bot commented Apr 13, 2023

Encountered an error creating backports. Some common things that can go wrong:

  1. The backport branch might have already existed.
  2. There was a merge conflict.
  3. The backport branch contained merge commits.

You might need to create your backport manually using the backport tool.


error setting reviewers, but backport branch blathers/backport-release-23.1-101437 is ready: POST https://api.github.com/repos/cockroachdb/cockroach/pulls/101507/requested_reviewers: 422 Reviews may only be requested from collaborators. One or more of the teams you specified is not a collaborator of the cockroachdb/cockroach repository. []

Backport to branch 23.1.x failed. See errors above.


error setting reviewers, but backport branch blathers/backport-release-23.1.0-101437 is ready: POST https://api.github.com/repos/cockroachdb/cockroach/pulls/101508/requested_reviewers: 422 Reviews may only be requested from collaborators. One or more of the teams you specified is not a collaborator of the cockroachdb/cockroach repository. []

Backport to branch 23.1.0 failed. See errors above.


🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-23.1.x Flags PRs that need to be backported to 23.1
Projects
None yet
3 participants