-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kv: avoid immediately launching goroutine for txn heartbeat loop #35015
kv: avoid immediately launching goroutine for txn heartbeat loop #35015
Conversation
Release note: None
Before this PR every transaction would start a goroutine for its heartbeat loop immediately upon creation. Most transactions never need to heartbeat so this goroutine is wasteful. This change delays the start of the heartbeat loop until after the first heartbeatInterval has passed by using the new DelayedStopper. Fixes cockroachdb#35009 Release note: None
b4e2deb
to
5127770
Compare
Yes, this is almost exactly what I had in mind! Having some structure hold on to a In the meantime, I'm curious whether you're able to see this have any effect on workloads that require transaction heartbeat loops (like |
Initial benchmarks of this change fails to show any positive impact. The initial version of the commit actually hurt performance slightly. Adding a buffer to the channels eliminated the performance degradation. The change does show some decrease in number of goroutines but has no change in performance. I went fishing for a benchmark which might show some improvement. My hunch was that it would matter for highly concurrent transnational workloads. I tested using KV0 with 1 byte values and a batch size of 2 or with a secondary index on 32 core nodes. I ran with a variety of concurrency and splits and none showed any positive input. The runs pretty much all looked like this.
In all of these cases the observed difference in goroutines as visible in the admin UI seemed to be around less than 500 but the goroutine count was generally in the 30k range. |
Thanks for looking into this. Part of the hope here was that this would reduce the time we spend in Regardless, if we don't see any movement on top-line performance even when we kill off 500 goroutines (what percent of total goroutines was this?) then we should put this on the backburner. We can revisit later, or we can get rid of individual txn heartbeats entirely like the original issue alluded to. |
Anecdotally from a few runs on 32-core nodes with 512 splits and concurrency 1024 I see a goroutine count difference that bounces around but seems to average ~200 fewer with the change (which is about in line with expectations given we'd expect there to be a bit more than 300 outstanding requests per gateway but not all are currently executing). The goroutine count is ~9k. So we're talking about a 2% decrease in goroutines. That's about what I had observed before at higher range counts but there it was even closer to 1%. |
Closing as stale. |
This PR comes in two commits. The first extends the stop.Stopper infrastructure to support delaying async tasks. The second adopts this new RunDelayedAsyncTask mechanism to avoid immediately launching goroutines for the txn heartbeat loop.
Note I still need to add unit testing to the kv portion of the code. Just want to get this out there and see if it's what you had in mind.
Fixes #35009