kv: avoid immediately launching goroutine for txn heartbeat loop #35015

ajwerner · 2019-02-15T23:06:27Z

This PR comes in two commits. The first extends the stop.Stopper infrastructure to support delaying async tasks. The second adopts this new RunDelayedAsyncTask mechanism to avoid immediately launching goroutines for the txn heartbeat loop.

Note I still need to add unit testing to the kv portion of the code. Just want to get this out there and see if it's what you had in mind.

Fixes #35009

Release note: None

cockroach-teamcity · 2019-02-15T23:06:35Z

This change is

Before this PR every transaction would start a goroutine for its heartbeat loop immediately upon creation. Most transactions never need to heartbeat so this goroutine is wasteful. This change delays the start of the heartbeat loop until after the first heartbeatInterval has passed by using the new DelayedStopper. Fixes cockroachdb#35009 Release note: None

nvanbenschoten · 2019-02-19T20:20:50Z

Yes, this is almost exactly what I had in mind! Having some structure hold on to a Stopper and run an AsyncTask through it when necessary is exactly what we'll want to do. I don't necessarily think this structure needs to embed a Stopper, but we can figure that out later.

In the meantime, I'm curious whether you're able to see this have any effect on workloads that require transaction heartbeat loops (like tpcc or any workload that uses explicit SQL transactions). A baseline throughput win would be nice, but let's start by confirming that this reduces the number of goroutines in the system by roughly the number of client connections.

ajwerner · 2019-02-21T14:32:21Z

Initial benchmarks of this change fails to show any positive impact. The initial version of the commit actually hurt performance slightly. Adding a buffer to the channels eliminated the performance degradation. The change does show some decrease in number of goroutines but has no change in performance. I went fishing for a benchmark which might show some improvement. My hunch was that it would matter for highly concurrent transnational workloads. I tested using KV0 with 1 byte values and a batch size of 2 or with a secondary index on 32 core nodes. I ran with a variety of concurrency and splits and none showed any positive input.

The runs pretty much all looked like this.

name            old ops/s   new ops/s   delta
KV0-throughput  2.73k ± 1%  2.73k ± 1%   ~     (p=0.690 n=5+5)

name            old ms/s    new ms/s    delta
KV0-P50           336 ± 0%    342 ± 3%   ~     (p=0.556 n=4+5)
KV0-Avg           190 ± 3%    188 ± 3%   ~     (p=1.000 n=5+5)

In all of these cases the observed difference in goroutines as visible in the admin UI seemed to be around less than 500 but the goroutine count was generally in the 30k range.

nvanbenschoten · 2019-02-21T16:48:54Z

Thanks for looking into this. Part of the hope here was that this would reduce the time we spend in runtime.findrunnable, which can hit upwards of 8% of a CPU profile. If you're still up to one more experiment, I'd be curious whether you see any movement here.

Regardless, if we don't see any movement on top-line performance even when we kill off 500 goroutines (what percent of total goroutines was this?) then we should put this on the backburner. We can revisit later, or we can get rid of individual txn heartbeats entirely like the original issue alluded to.

ajwerner · 2019-02-21T17:22:53Z

Anecdotally from a few runs on 32-core nodes with 512 splits and concurrency 1024 I see a goroutine count difference that bounces around but seems to average ~200 fewer with the change (which is about in line with expectations given we'd expect there to be a bit more than 300 outstanding requests per gateway but not all are currently executing). The goroutine count is ~9k. So we're talking about a 2% decrease in goroutines. That's about what I had observed before at higher range counts but there it was even closer to 1%.

ajwerner · 2021-06-09T15:48:22Z

Closing as stale.

util/stop: add DelayedStopper to enable RunDelayedAsyncTask

4098122

Release note: None

ajwerner requested review from nvanbenschoten and a team February 15, 2019 23:06

ajwerner force-pushed the ajwerner/stopper-delayed-async branch from b4e2deb to 5127770 Compare February 19, 2019 05:02

tbg added the X-noremind Bots won't notify about PRs with X-noremind label Jun 19, 2019

ajwerner closed this Jun 9, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kv: avoid immediately launching goroutine for txn heartbeat loop #35015

kv: avoid immediately launching goroutine for txn heartbeat loop #35015

ajwerner commented Feb 15, 2019

cockroach-teamcity commented Feb 15, 2019

nvanbenschoten commented Feb 19, 2019

ajwerner commented Feb 21, 2019

nvanbenschoten commented Feb 21, 2019

ajwerner commented Feb 21, 2019

ajwerner commented Jun 9, 2021

kv: avoid immediately launching goroutine for txn heartbeat loop #35015

kv: avoid immediately launching goroutine for txn heartbeat loop #35015

Conversation

ajwerner commented Feb 15, 2019

cockroach-teamcity commented Feb 15, 2019

nvanbenschoten commented Feb 19, 2019

ajwerner commented Feb 21, 2019

nvanbenschoten commented Feb 21, 2019

ajwerner commented Feb 21, 2019

ajwerner commented Jun 9, 2021