Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gossip: delay and batch gossip info propagation #119426

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

nvanbenschoten
Copy link
Member

Fixes #119420.

This commit addresses #119420 by delaying gossiping updated infos to peers by up to 10ms. In doing so, we more effectively batch info updates and bound the amount of time we spend computing info deltas.

With a maxHops = 5, a gossipPropagateInfosDelay of 10ms means that we will be delaying the propagation of info updates by up to 50ms. This should be a reasonable delay for most use cases, given the benefit of this change.

TODO: run some tests.

Release note (performance improvement): gossip info propagation is now delayed by up to 10ms in order to promote more batching of gossip updates. The effect of this is TBD.

Fixes cockroachdb#119420.

This commit addresses cockroachdb#119420 by delaying gossiping updated infos to
peers by up to 10ms. In doing so, we more effectively batch info updates
and bound the amount of time we spend computing info deltas.

With a maxHops = 5, a gossipPropagateInfosDelay of 10ms means that we
will be delaying the propagation of info updates by up to 50ms. This
should be a reasonable delay for most use cases, given the benefit of
this change.

TODO: run some tests.

Release note (performance improvement): gossip info propagation is now
delayed by up to 10ms in order to promote more batching of gossip updates.
The effect of this is TBD.
Copy link

blathers-crl bot commented Feb 20, 2024

It looks like your PR touches production code but doesn't add or edit any test code. Did you consider adding tests to your PR?

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@a-robinson
Copy link
Contributor

a-robinson commented Mar 29, 2024

FWIW this seems like a pretty clear win, I'd love to see it land @nvanbenschoten .

It's nice to see some interest in gossip performance recently. If there's something particular you'd like help with, let me know and I may be able to.

I have a bunch of old ideas scattered around for how to optimize parts of this code of varying usefulness, for example splitting up the infoStore map as in #51838 (comment) but also some other random stuff written up in personal notes. Scanning through them now I don't think many of them would be huge wins, but I'm happy to do what I can if there's some more interest in this part of the code these days.

@spencerkimball
Copy link
Member

How did you choose 10ms for the delay? Is that an empirically-determined optimium? I can imagine it could be a substantially longer delay – 50ms, or 100ms...? I think balancing it against the rate at which the items being gossiped change, and how sensitive the system is to increasingly levels of staleness in the gosipped info.

Also, putting some randomness into the delay probably makes sense.

@nvanbenschoten
Copy link
Member Author

How did you choose 10ms for the delay?

@spencerkimball it's not at all empirically determined to be an optimum between staleness and CPU. This came about from some experimentation with a 1,000 node cluster. While at 400 nodes, a 10ms propagation delay was enough to reduce idle cpu load from 0.69 cpus/node (34.8% of each n2-standard-2 instance) to 0.38 cpus/node. This was after #119252 had already reduced the idle load from 1.02 cpus/node to 0.69 cpus/node1. Combining those two draft changes allowed a cluster of 1,000 n2-standard-4 instances to mostly just work and pretty comfortably serve 1.2M qps.

I think balancing it against the rate at which the items being gossiped change, and how sensitive the system is to increasingly levels of staleness in the gosipped info.

We've recently bumped up against the upper bounds of tolerable gossip delay on some larger customer clusters (~200 nodes, 1,500 stores). It's on the order of seconds, so I agree that a longer delay (50ms or 100ms) is probably fine. We just need to keep in mind that the per-node delay will be multiplied by the number of hops in the gossip network.

Also, putting some randomness into the delay probably makes sense.

Agreed, a jitter feels appropriate.

We're continuing to consider targeted investments in gossip as we look to larger scale clusters, so I imagine we'll take this draft and turn it into something real sometime soon.

Footnotes

  1. @iskettaneh just landed a real version of that on master in https://github.com/cockroachdb/cockroach/pull/126892.

@spencerkimball
Copy link
Member

All makes sense. This suggestion from @andrewbaptist feels like it would be extremely consequential at scale: #117393. Would need to implement some custom diff & merge logic to update store descriptors with very partial information, according to the real-time store capacity and usage metrics.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

gossip: delay and batch gossip info propagation
4 participants