insights: ingester #85350

matthewtodd · 2022-07-29T21:13:51Z

Here we begin observing statements and transactions asynchronously, to
avoid slowing down the hot sql execution path as much as possible.

Release note: None

cockroach-teamcity · 2022-07-29T21:13:58Z

This change is

j82w

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @j82w and @matthewtodd)

pkg/sql/sqlstats/insights/ingester.go line 33 at r1 (raw file):

type concurrentBufferIngester struct {
	guard struct {

What is the reason for using the ConcurrentBufferGuard instead of just a normal Channel? Channels are thread safe so I'm trying to understand what benefits it gives.

pkg/sql/sqlstats/insights/ingester.go line 39 at r1 (raw file):

	sink     chan *block
	delegate Registry

Would it be better to have a channel per a registry? That way if one is running slow or if the buffer is full it can skip that input for that registry and not impact the other registries.

pkg/sql/sqlstats/insights/ingester.go line 92 at r1 (raw file):

	sessionID clusterunique.ID, statement *Statement,
) {
	i.guard.AtomicWrite(func(writerIdx int64) {

Should there be a check if the buffer is full and log a message to avoid blocking to the registry is running to slow?

Closes #81021. Here we begin observing statements and transactions asynchronously, to avoid slowing down the hot sql execution path as much as possible. Release note: None

matthewtodd

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @j82w)

pkg/sql/sqlstats/insights/ingester.go line 33 at r1 (raw file):

Previously, j82w wrote…

What is the reason for using the ConcurrentBufferGuard instead of just a normal Channel? Channels are thread safe so I'm trying to understand what benefits it gives.

Good question! A channel would be way simpler, but we're trying to be super fast here since we're on the SQL execution path. I ran some benchmarks, and the ConcurrentBufferGuard approach is about 7.5 times faster than just using a channel. (140ns/op vs. 1100ns/op on my laptop.)

There's a lot more goroutine coordination in the channel approach; ConcurrentBufferGuard is built around an atomic int and a read mutex, which is far lighter-weight.

pkg/sql/sqlstats/insights/ingester.go line 39 at r1 (raw file):

Previously, j82w wrote…

Would it be better to have a channel per a registry? That way if one is running slow or if the buffer is full it can skip that input for that registry and not impact the other registries.

There is only one Registry instance. I have been hoping that will be sufficient, since keeping the inner implementation "global" and single-threaded is easy to reason about and unit test. The hope for the ingester is to handle all the async / parallel work, feeding observations into it.

I think you're right, though, that there's perhaps more to understand and design for about how this thing behaves when it's overwhelmed. At the moment, some micro-benchmarking here shows level performance at 140ns/op with 1K concurrent sessions hammering the ingester, getting slightly slower at 10K concurrent sessions, so I feel confident enough to subject this thing to the roachtests as our next step.

pkg/sql/sqlstats/insights/ingester.go line 92 at r1 (raw file):

Previously, j82w wrote…

Should there be a check if the buffer is full and log a message to avoid blocking to the registry is running to slow?

At this level, I think we can let the ConcurrentBufferGuard handle that for us. While it will lock to flush the buffer into the channel, I think we're okay with amortizing that cost. And, again, I think the micro-benchmarks here do give us some reasonable first-order confidence that the Registry has more than enough performance capacity to keep up.

But, yes, I think some safety releases probably make sense for us to experiment with next week.

j82w

Reviewed 3 of 5 files at r2, 2 of 2 files at r3, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @matthewtodd)

matthewtodd · 2022-08-11T15:43:24Z

bors r+

craig · 2022-08-11T17:07:27Z

Build succeeded:

Bazel Essential CI (Cockroach)

renatolabs · 2022-08-11T19:38:05Z

This change introduced a data race on TestConsumeJoinToken (#85988). I'm going to create a PR shortly to skip that test under race for the time being.

Edit: PR to skip the test: #85989.

msbutler · 2022-08-11T20:25:01Z

This PR is also introducing a consistent data race in my PR that was ready to merge until I rebased onto this. Should we revert this PR, as it seems to be exposing the codebase to data races? My data race stack also points to #83080

matthewtodd · 2022-08-11T20:48:24Z

I've just sent #85994 to disable this problematic codepath until I can get to the bottom of it. I'm sorry for the disruption!

msbutler · 2022-08-11T21:07:36Z

thanks!

85994: insights: remove the ingester for now r=matthewtodd a=matthewtodd In #85350 we introduced a data race that's affecting many branches in CI. Until we can get to the bottom of it, probably in #83080, let's just remove the offending codepath. Release note: None Co-authored-by: Matthew Todd <[email protected]>

matthewtodd requested a review from a team July 29, 2022 21:14

j82w suggested changes Aug 1, 2022

View reviewed changes

insights: ingester

1533b8e

Closes #81021. Here we begin observing statements and transactions asynchronously, to avoid slowing down the hot sql execution path as much as possible. Release note: None

matthewtodd commented Aug 11, 2022

View reviewed changes

matthewtodd marked this pull request as ready for review August 11, 2022 14:44

matthewtodd requested a review from a team August 11, 2022 14:44

j82w approved these changes Aug 11, 2022

View reviewed changes

craig bot merged commit 8e3ee57 into cockroachdb:master Aug 11, 2022

matthewtodd deleted the insights-ingester branch August 11, 2022 18:35

renatolabs mentioned this pull request Aug 11, 2022

pkg/sql/sqlstats/insights: data race when running TestConsumeJoinToken #85988

Closed

matthewtodd mentioned this pull request Aug 11, 2022

insights: remove the ingester for now #85994

Merged

msbutler mentioned this pull request Aug 11, 2022

backupccl: target offline importing tables if flag is set #85979

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

insights: ingester #85350

insights: ingester #85350

matthewtodd commented Jul 29, 2022 •

edited

Loading

cockroach-teamcity commented Jul 29, 2022

j82w left a comment

matthewtodd left a comment

j82w left a comment

matthewtodd commented Aug 11, 2022

craig bot commented Aug 11, 2022

renatolabs commented Aug 11, 2022 •

edited

Loading

msbutler commented Aug 11, 2022 •

edited

Loading

matthewtodd commented Aug 11, 2022

msbutler commented Aug 11, 2022

insights: ingester #85350

insights: ingester #85350

Conversation

matthewtodd commented Jul 29, 2022 • edited Loading

cockroach-teamcity commented Jul 29, 2022

j82w left a comment

Choose a reason for hiding this comment

matthewtodd left a comment

Choose a reason for hiding this comment

j82w left a comment

Choose a reason for hiding this comment

matthewtodd commented Aug 11, 2022

craig bot commented Aug 11, 2022

renatolabs commented Aug 11, 2022 • edited Loading

msbutler commented Aug 11, 2022 • edited Loading

matthewtodd commented Aug 11, 2022

msbutler commented Aug 11, 2022

matthewtodd commented Jul 29, 2022 •

edited

Loading

renatolabs commented Aug 11, 2022 •

edited

Loading

msbutler commented Aug 11, 2022 •

edited

Loading