Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvcoord: Eliminate 1 Go routine from MuxRangeFeed #96756

Merged
merged 3 commits into from
Mar 14, 2023

Conversation

miretskiy
Copy link
Contributor

@miretskiy miretskiy commented Feb 7, 2023

Prior to this PR, the server side MuxRangeFeed
implementation spawned a separate Go routine executing
single RangeFeed for each incoming request.

This is wasteful and unnecessary.
Instead of blocking, and waiting for a single RangeFeed to complete,
have rangefeed related functions return a promise to return
a *roachpb.Error once rangefeed completes (future.Future[*roachpb.Error]).

Prior to this change MuxRangeFeed would spin up 4 Go routines
per range. With this PR, the number is down to 3.
This improvement is particularly important when executing
rangefeed against large tables (10s-100s of thousands of ranges).

Informs #96395
Epic: None

Release note (enterprise change): Changefeeds running with
changefeed.mux_rangefeed.enabled setting set to true are
more efficient, particularly when executing against large tables.

@miretskiy miretskiy requested review from ajwerner, tbg, irfansharif and a team February 7, 2023 23:34
@miretskiy miretskiy requested review from a team as code owners February 7, 2023 23:34
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Contributor

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @erikgrinaker, @irfansharif, @miretskiy, and @tbg)


-- commits line 31 at r2:
drive by: I don't think this is a release note that users can understand :P

@miretskiy miretskiy force-pushed the future branch 2 times, most recently from 8adcdfe to 4e2ce15 Compare February 8, 2023 01:14
@miretskiy
Copy link
Contributor Author

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @erikgrinaker, @irfansharif, @miretskiy, and @tbg)

-- commits line 31 at r2: drive by: I don't think this is a release note that users can understand :P

rephrased.

@miretskiy miretskiy force-pushed the future branch 2 times, most recently from 2c64887 to 5aff403 Compare February 8, 2023 02:07
@erikgrinaker
Copy link
Contributor

I'm curious if we measured the impact of this change? I didn't think these goroutines would be considered for scheduling until the channel send, so I'm curious where the cost comes from.

@miretskiy
Copy link
Contributor Author

I'm curious if we measured the impact of this change? I didn't think these goroutines would be considered for scheduling until the channel send, so I'm curious where the cost comes from.

I did few tests. 75k ranges on 5 nice e cluster (kV splits 75k). Even without any workload, you could see impact on go scheduler. You are right, of course, that nothing should happen until send. But of course, even without work happening on those ranges, each one still had checkpoints - one every 200ms

@erikgrinaker
Copy link
Contributor

I'm curious if we measured the impact of this change? I didn't think these goroutines would be considered for scheduling until the channel send, so I'm curious where the cost comes from.

I did few tests. 75k ranges on 5 nice e cluster (kV splits 75k). Even without any workload, you could see impact on go scheduler. You are right, of course, that nothing should happen until send. But of course, even without work happening on those ranges, each one still had checkpoints - one every 200ms

Sure, but I'm specifically talking about the goroutines we're removing in this PR -- do they actually cost anything, or are they ~free? They're only waiting for the final error result from the registration, right, so they shouldn't be scheduled until the rangefeed terminates?

@miretskiy
Copy link
Contributor Author

Sure, but I'm specifically talking about the goroutines we're removing in this PR -- do they actually cost anything, or are they ~free? They're only waiting for the final error result from the registration, right, so they shouldn't be scheduled until the rangefeed terminates?

I can post some info, while this is getting reviewed. Roughly free is not the same as free, esp since you have so many. I tested with another PR which removes 1 more go routine on client side, but that change is a lot more disruptive, so I pulled it out for this PR; but I will get updated numbers.
Conceptually, I think this PR should help.
I'm not a go scheduler expert; I do suspect that even though these go routines are not active, all of them are in runnable state; many go routines in runnable state will slow down the scheduler since it has to work harder to select next go routine to run. Fewer go routines is better, in my opinion, even if each one is roughly free

@erikgrinaker
Copy link
Contributor

I'm not a go scheduler expert; I do suspect that even though these go routines are not active, all of them are in runnable state; many go routines in runnable state will slow down the scheduler since it has to work harder to select next go routine to run.

I don't think they are. They should be in the blocked state until someone sends on the channel, at which point the sending goroutine will mark them as runnable: https://codeburst.io/diving-deep-into-the-golang-channels-549fd4ed21a8.

@miretskiy
Copy link
Contributor Author

I'm not a go scheduler expert; I do suspect that even though these go routines are not active, all of them are in runnable state; many go routines in runnable state will slow down the scheduler since it has to work harder to select next go routine to run.

I don't think they are. They should be in the blocked state until someone sends on the channel, at which point the sending goroutine will mark them as runnable: https://codeburst.io/diving-deep-into-the-golang-channels-549fd4ed21a8.

As I said, not a go scheduler expert. It still seems to me that fewer resources used is better, even if resource is cheap.

@miretskiy
Copy link
Contributor Author

I'll post some benches later

Copy link
Member

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursory first review only to get my bearing, thank you for giving this code some attention (I know REPL nominally owns it so double thanks!). Generally speaking I think the introduction of the Future makes sense - even if this extra goroutine doesn't weigh heavily on the scheduler, it still seems like a useful pattern to introduce. I'm not sold, however, on the "Promise" which is also not used - I'd much prefer we removed that.

I was wondering if this "flattening" could complicate the work we expect we have to do regardless, to avoid the periodic "pulsing" of the goroutines via the closed timestamp interval. I don't think so, which is good.

Could you list the goroutines before and after in the commit message? I think that would be instructive for most readers including myself. I can see that we're saving a goroutine in rangeFeedWithRangeID (previously on <-errC). We have (*registration).outputLoop. We have the incoming client goroutine, which will now sit on the promise directly (skipping the <-errC). And then we have a goroutine in gRPC servicing the streaming RPC.

Is anything conceptually in the way of eliding outputLoop as well, by hooking it up directly to the registration? Instead of returning a Future to the client goroutine, we'd return - essentially - the *registration itself and task the client goroutine with servicing it.

pkg/util/future/future.go Outdated Show resolved Hide resolved
@tbg tbg mentioned this pull request Feb 13, 2023
tbg added a commit to tbg/cockroach that referenced this pull request Feb 13, 2023
Exploration that might be useful for cockroachdb#96756.
@tbg
Copy link
Member

tbg commented Feb 13, 2023

Ah, I meant to say, curious to see the overhead as well. I played around with #97028

and got (on my M1 Mac)

go run ./pkg/cmd/goroutinepulser 100000 200ms
[...]
avg	p50	p75	p90	p99	p99.9	p99.99	pMax
6.87	6.52	9.65	12.57	p16.69	19.90	21.89	31.46 [cum ms]
go run ./pkg/cmd/goroutinepulser 75000 200ms
[...]
avg	p50	p75	p90	p99	p99.9	p99.99	pMax
6.63	6.42	8.98	11.45	p18.17	26.25	29.23	31.46 [cum ms]

so here a 25% reduction isn't dramatic (plus Erik's point about the goroutines that are being reduced never becoming runnable).

I'm curious to see how a "real" CockroachDB cluster will do.

@miretskiy
Copy link
Contributor Author

Cursory first review only to get my bearing, thank you for giving this code some attention (I know REPL nominally owns it so double thanks!). Generally speaking I think the introduction of the Future makes sense - even if this extra goroutine doesn't weigh heavily on the scheduler, it still seems like a useful pattern to introduce. I'm not sold, however, on the "Promise" which is also not used - I'd much prefer we removed that.

I am actually using promise quite extensively in the latest version I just pushed.

I was wondering if this "flattening" could complicate the work we expect we have to do regardless, to avoid the periodic "pulsing" of the goroutines via the closed timestamp interval. I don't think so, which is good.

I don't think so; but I've been wrong before.

Could you list the goroutines before and after in the commit message? I think that would be instructive for most readers including myself. I can see that we're saving a goroutine in rangeFeedWithRangeID (previously on <-errC). We have (*registration).outputLoop. We have the incoming client goroutine, which will now sit on the promise directly (skipping the <-errC). And then we have a goroutine in gRPC servicing the streaming RPC.

it only removes go routine started by MuxRangefeed to run underlying single rangefeed.

Is anything conceptually in the way of eliding outputLoop as well, by hooking it up directly to the registration? Instead of returning a Future to the client goroutine, we'd return - essentially - the *registration itself and task the client goroutine with servicing it.

Yes, we should do that. We start off w/ 5 go routines for regular rangefeed; we are down to 4 w/ Mux; this PR brings it down to 3.
Then, rewrite client portion of mux -- that's another go routine per range.
And then finally, drop output loop.
Everything should move into O(num nodes) instead of O(num ranges).

@miretskiy
Copy link
Contributor Author

@erikgrinaker @tbg
I got some benchmarks. As you correctly point out, @erikgrinaker , these go routines this PR removes should be roughly
free. And they are. But they are not entirely free. The setup is 5 node, n32-standard machine cluster; KV workload initialized with 100k splits -- so 20k ranges per node. It's a bit too much, but good for this test to see the impact
of those Go routines.

The biggest savings of this PR come from the startup costs:

Screenshot 2023-02-16 at 2 30 47 PM

You can clearly tell a difference in runnable count between up to 160(!) on the left -- this was running master version of MuxRF, and on the right -- which is running this change.

You can also (obviously) see the total number of Go routines go down from 400k to 300k:
Screenshot 2023-02-16 at 4 36 14 PM

More interestingly, is the 99.99 impact on sql latency:
Screenshot 2023-02-16 at 4 37 54 PM

So, to summarize this PR: it eliminates 1 Go routine from MuxRangefeed. This Go routine used to be idle, so it was mostly free. However, at the start of rangefeed, that extra go routine wants to run -- and it's this extra Go routine that would make latency worse -- simply by creating more work for the Go scheduler.

To be very clear: there are other ways to solve it -- perhaps by putting some sort of a rate limit on the creation of those extra go routines. But again, I just don't see why we needed them in the first place.

@tbg
Copy link
Member

tbg commented Feb 20, 2023

Thanks for running the experiments! Feel free to request a review once the conflicts have been rebased away and it's ready for an in-depth review.

@erikgrinaker
Copy link
Contributor

Good point, spawning all of these in a loop will put a fair bit of load on the scheduler. I was mostly thinking about the individual rangefeed approach where the goroutine is already spawned by gRPC, not MuxRangeFeed where we don't need to spawn them in the first place. Let's get this rebased and wrapped up, and we'll review it. Thanks!

@miretskiy miretskiy force-pushed the future branch 3 times, most recently from 528ca82 to 1834760 Compare March 2, 2023 13:44
@miretskiy
Copy link
Contributor Author

@tbg, @erikgrinaker -- all is green in the land of CI with all comments addressed.

@miretskiy miretskiy force-pushed the future branch 2 times, most recently from fb20260 to a90009a Compare March 2, 2023 22:27
@dhartunian dhartunian removed the request for review from a team March 6, 2023 16:03
Copy link
Contributor

@erikgrinaker erikgrinaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically LGTM, but let's resolve the comments first. Thanks a lot for taking this on!

pkg/kv/kvserver/rangefeed/metrics.go Outdated Show resolved Hide resolved
pkg/kv/kvserver/rangefeed/registry.go Outdated Show resolved Hide resolved
pkg/util/future/future.go Outdated Show resolved Hide resolved
pkg/util/future/future.go Show resolved Hide resolved
pkg/util/future/future.go Show resolved Hide resolved
pkg/util/future/future_test.go Outdated Show resolved Hide resolved
pkg/server/node.go Outdated Show resolved Hide resolved
pkg/server/node.go Show resolved Hide resolved
pkg/kv/kvserver/stores.go Outdated Show resolved Hide resolved
pkg/kv/kvserver/replica_rangefeed.go Show resolved Hide resolved
Add a gauge metric keep track of currently active registrations.

Epic: None

Release note: None
@miretskiy
Copy link
Contributor Author

@erikgrinaker -- updates pushed. Comments addressed, hopefully.

@miretskiy
Copy link
Contributor Author

miretskiy commented Mar 14, 2023 via email

@miretskiy
Copy link
Contributor Author

bors r=erikgrinaker

@miretskiy
Copy link
Contributor Author

bors r-

@craig
Copy link
Contributor

craig bot commented Mar 14, 2023

Canceled.

Yevgeniy Miretskiy added 2 commits March 14, 2023 09:30
Add a library implementing promise/future abstraction.
`Future[T]` describes a value of type T, which will become
available in the future.  The caller then may wait until
future becomes available.

Release note: None
Prior to this PR, the server side `MuxRangeFeed`
implementation spawned a separate Go routine executing
single RangeFeed for each incoming request.

This is wasteful and unnecessary.
Instead of blocking, and waiting for a single RangeFeed to complete,
have rangefeed related functions return a promise to return
a `*kvpb.Error` once rangefeed completes (`future.Future[*kvpb.Error]`).

Prior to this change MuxRangeFeed would spin up 4 Go routines
per range.  With this PR, the number is down to 3.
This improvement is particularly important when executing
rangefeed against large tables (10s-100s of thousands of ranges).

Informs cockroachdb#96395
Epic: None

Release note (enterprise change): Changefeeds running with
`changefeed.mux_rangefeed.enabled` setting set to true are
more efficient, particularly when executing against large tables.
@miretskiy
Copy link
Contributor Author

bors r+

@craig
Copy link
Contributor

craig bot commented Mar 14, 2023

Build failed (retrying...):

@miretskiy
Copy link
Contributor Author

bors r+

@craig
Copy link
Contributor

craig bot commented Mar 14, 2023

Already running a review

@craig
Copy link
Contributor

craig bot commented Mar 14, 2023

Build succeeded:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants