Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kv: wait on latches on each key in reverse acquisition order #109349

Merged

Conversation

nvanbenschoten
Copy link
Member

@nvanbenschoten nvanbenschoten commented Aug 23, 2023

This commit allocates latch IDs from the top of the uint64 space and in reverse order. This is done to order latches in the tree on the same key in reverse order of acquisition. Doing so ensures that when we iterate over the tree and see a key with many conflicting latches, we visit the latches on that key in the reverse order that they will be released. In doing so, we minimize the number of open channels that we wait on (calls to waitForSignal) and minimize the number of goroutine scheduling points. This is important to avoid spikes in runnable goroutine after each request completes, which can negatively affect node health.

See experiments below.

Epic: None
Release note (performance improvement): The impact of high concurrency blind writes to the same key on goroutine scheduling latency was reduced.

@blathers-crl
Copy link

blathers-crl bot commented Aug 23, 2023

It looks like your PR touches production code but doesn't add or edit any test code. Did you consider adding tests to your PR?

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@nvanbenschoten
Copy link
Member Author

nvanbenschoten commented Aug 23, 2023

The following test demonstrates the expected improvement from this PR. I ran the following test, which spins up 512 concurrent workload threads to UPSERT the same key.

roachprod create local -n1
roachprod put    local cockroach
roachprod start  local
roachprod run    local -- ./cockroach workload run kv --init --cycle-length=1 --concurrency=512 --duration=5m

For the first 5 minutes, I ran with master. For the next 5, I ran with this patch. We see a large improvement in Goroutine Scheduling Latency: 99th percentile and Runnable Goroutines per CPU with this patch.

Screenshot 2023-08-23 at 1 45 55 PM Screenshot 2023-08-23 at 1 46 01 PM

This commit allocates latch IDs from the top of the uint64 space and in
reverse order. This is done to order latches in the tree on a same key
in reverse order of acquisition. Doing so ensures that when we iterate
over the tree and see a key with many conflicting latches, we visit the
latches on that key in the reverse order that they will be released. In
doing so, we minimize the number of open channels that we wait on (calls
to `waitForSignal`) and minimize the number of goroutine scheduling
points. This is important to avoid spikes in runnable goroutine after
each request completes, which can negatively affect node health.

Epic: None
Release note (performance improvement): The impact of high concurrency
blind writes to the same key on goroutine scheduling latency was reduced.
@nvanbenschoten nvanbenschoten force-pushed the nvanbenschoten/latchWaitOrder branch from 43a7df2 to c506290 Compare August 23, 2023 17:53
@nvanbenschoten nvanbenschoten changed the title [DNM] kv: wait on latches on each key in reverse acquisition order kv: wait on latches on each key in reverse acquisition order Aug 23, 2023
@nvanbenschoten nvanbenschoten marked this pull request as ready for review August 23, 2023 17:53
@nvanbenschoten nvanbenschoten requested a review from a team as a code owner August 23, 2023 17:53
@nvanbenschoten nvanbenschoten added the backport-23.1.x Flags PRs that need to be backported to 23.1 label Aug 23, 2023
@nvanbenschoten
Copy link
Member Author

Adding the backport-23.1.x label, but I'll let this bake on master for a few weeks before landing the backport.

Copy link
Collaborator

@arulajmani arulajmani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

Reviewed 1 of 1 files at r1, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @irfansharif and @kvoli)

Copy link
Collaborator

@kvoli kvoli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

@nvanbenschoten
Copy link
Member Author

TFTR!

bors r=arulajmani,kvoli

@craig
Copy link
Contributor

craig bot commented Aug 23, 2023

Build succeeded:

@craig craig bot merged commit 44f66d4 into cockroachdb:master Aug 23, 2023
@nvanbenschoten nvanbenschoten deleted the nvanbenschoten/latchWaitOrder branch August 23, 2023 21:34
@shralex shralex added O-support Would prevent or help troubleshoot a customer escalation - bugs, missing observability/tooling, docs C-escalation-improvement Having this feature would have made an escalation easier labels Sep 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-23.1.x Flags PRs that need to be backported to 23.1 C-escalation-improvement Having this feature would have made an escalation easier O-support Would prevent or help troubleshoot a customer escalation - bugs, missing observability/tooling, docs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants