-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
insights: ingester #85350
insights: ingester #85350
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @j82w and @matthewtodd)
pkg/sql/sqlstats/insights/ingester.go
line 33 at r1 (raw file):
type concurrentBufferIngester struct { guard struct {
What is the reason for using the ConcurrentBufferGuard instead of just a normal Channel? Channels are thread safe so I'm trying to understand what benefits it gives.
pkg/sql/sqlstats/insights/ingester.go
line 39 at r1 (raw file):
sink chan *block delegate Registry
Would it be better to have a channel per a registry? That way if one is running slow or if the buffer is full it can skip that input for that registry and not impact the other registries.
pkg/sql/sqlstats/insights/ingester.go
line 92 at r1 (raw file):
sessionID clusterunique.ID, statement *Statement, ) { i.guard.AtomicWrite(func(writerIdx int64) {
Should there be a check if the buffer is full and log a message to avoid blocking to the registry is running to slow?
Closes #81021. Here we begin observing statements and transactions asynchronously, to avoid slowing down the hot sql execution path as much as possible. Release note: None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @j82w)
pkg/sql/sqlstats/insights/ingester.go
line 33 at r1 (raw file):
Previously, j82w wrote…
What is the reason for using the ConcurrentBufferGuard instead of just a normal Channel? Channels are thread safe so I'm trying to understand what benefits it gives.
Good question! A channel would be way simpler, but we're trying to be super fast here since we're on the SQL execution path. I ran some benchmarks, and the ConcurrentBufferGuard approach is about 7.5 times faster than just using a channel. (140ns/op vs. 1100ns/op on my laptop.)
There's a lot more goroutine coordination in the channel approach; ConcurrentBufferGuard is built around an atomic int and a read mutex, which is far lighter-weight.
pkg/sql/sqlstats/insights/ingester.go
line 39 at r1 (raw file):
Previously, j82w wrote…
Would it be better to have a channel per a registry? That way if one is running slow or if the buffer is full it can skip that input for that registry and not impact the other registries.
There is only one Registry instance. I have been hoping that will be sufficient, since keeping the inner implementation "global" and single-threaded is easy to reason about and unit test. The hope for the ingester is to handle all the async / parallel work, feeding observations into it.
I think you're right, though, that there's perhaps more to understand and design for about how this thing behaves when it's overwhelmed. At the moment, some micro-benchmarking here shows level performance at 140ns/op with 1K concurrent sessions hammering the ingester, getting slightly slower at 10K concurrent sessions, so I feel confident enough to subject this thing to the roachtests as our next step.
pkg/sql/sqlstats/insights/ingester.go
line 92 at r1 (raw file):
Previously, j82w wrote…
Should there be a check if the buffer is full and log a message to avoid blocking to the registry is running to slow?
At this level, I think we can let the ConcurrentBufferGuard handle that for us. While it will lock to flush the buffer into the channel, I think we're okay with amortizing that cost. And, again, I think the micro-benchmarks here do give us some reasonable first-order confidence that the Registry has more than enough performance capacity to keep up.
But, yes, I think some safety releases probably make sense for us to experiment with next week.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 3 of 5 files at r2, 2 of 2 files at r3, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @matthewtodd)
bors r+ |
Build succeeded: |
I've just sent #85994 to disable this problematic codepath until I can get to the bottom of it. I'm sorry for the disruption! |
thanks! |
85994: insights: remove the ingester for now r=matthewtodd a=matthewtodd In #85350 we introduced a data race that's affecting many branches in CI. Until we can get to the bottom of it, probably in #83080, let's just remove the offending codepath. Release note: None Co-authored-by: Matthew Todd <[email protected]>
Closes #81021.
Here we begin observing statements and transactions asynchronously, to
avoid slowing down the hot sql execution path as much as possible.
Release note: None