Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kv: avoid immediately launching goroutine for txn heartbeat loop #35015

Closed

Conversation

ajwerner
Copy link
Contributor

This PR comes in two commits. The first extends the stop.Stopper infrastructure to support delaying async tasks. The second adopts this new RunDelayedAsyncTask mechanism to avoid immediately launching goroutines for the txn heartbeat loop.

Note I still need to add unit testing to the kv portion of the code. Just want to get this out there and see if it's what you had in mind.

Fixes #35009

@ajwerner ajwerner requested review from nvanbenschoten and a team February 15, 2019 23:06
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Before this PR every transaction would start a goroutine for its heartbeat loop
immediately upon creation. Most transactions never need to heartbeat so this
goroutine is wasteful. This change delays the start of the heartbeat loop until
after the first heartbeatInterval has passed by using the new DelayedStopper.

Fixes cockroachdb#35009

Release note: None
@ajwerner ajwerner force-pushed the ajwerner/stopper-delayed-async branch from b4e2deb to 5127770 Compare February 19, 2019 05:02
@nvanbenschoten
Copy link
Member

Yes, this is almost exactly what I had in mind! Having some structure hold on to a Stopper and run an AsyncTask through it when necessary is exactly what we'll want to do. I don't necessarily think this structure needs to embed a Stopper, but we can figure that out later.

In the meantime, I'm curious whether you're able to see this have any effect on workloads that require transaction heartbeat loops (like tpcc or any workload that uses explicit SQL transactions). A baseline throughput win would be nice, but let's start by confirming that this reduces the number of goroutines in the system by roughly the number of client connections.

@ajwerner
Copy link
Contributor Author

Initial benchmarks of this change fails to show any positive impact. The initial version of the commit actually hurt performance slightly. Adding a buffer to the channels eliminated the performance degradation. The change does show some decrease in number of goroutines but has no change in performance. I went fishing for a benchmark which might show some improvement. My hunch was that it would matter for highly concurrent transnational workloads. I tested using KV0 with 1 byte values and a batch size of 2 or with a secondary index on 32 core nodes. I ran with a variety of concurrency and splits and none showed any positive input.

The runs pretty much all looked like this.

name            old ops/s   new ops/s   delta
KV0-throughput  2.73k ± 1%  2.73k ± 1%   ~     (p=0.690 n=5+5)

name            old ms/s    new ms/s    delta
KV0-P50           336 ± 0%    342 ± 3%   ~     (p=0.556 n=4+5)
KV0-Avg           190 ± 3%    188 ± 3%   ~     (p=1.000 n=5+5)

In all of these cases the observed difference in goroutines as visible in the admin UI seemed to be around less than 500 but the goroutine count was generally in the 30k range.

@nvanbenschoten
Copy link
Member

Thanks for looking into this. Part of the hope here was that this would reduce the time we spend in runtime.findrunnable, which can hit upwards of 8% of a CPU profile. If you're still up to one more experiment, I'd be curious whether you see any movement here.

Regardless, if we don't see any movement on top-line performance even when we kill off 500 goroutines (what percent of total goroutines was this?) then we should put this on the backburner. We can revisit later, or we can get rid of individual txn heartbeats entirely like the original issue alluded to.

@ajwerner
Copy link
Contributor Author

Anecdotally from a few runs on 32-core nodes with 512 splits and concurrency 1024 I see a goroutine count difference that bounces around but seems to average ~200 fewer with the change (which is about in line with expectations given we'd expect there to be a bit more than 300 outstanding requests per gateway but not all are currently executing). The goroutine count is ~9k. So we're talking about a 2% decrease in goroutines. That's about what I had observed before at higher range counts but there it was even closer to 1%.

@tbg tbg added the X-noremind Bots won't notify about PRs with X-noremind label Jun 19, 2019
@ajwerner
Copy link
Contributor Author

ajwerner commented Jun 9, 2021

Closing as stale.

@ajwerner ajwerner closed this Jun 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
X-noremind Bots won't notify about PRs with X-noremind
Projects
None yet
Development

Successfully merging this pull request may close these issues.

kv: avoid immediately launching goroutine for txn heartbeat loop
4 participants