-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql: introduce contention event store #76549
Conversation
4d7ccd0
to
cbe49db
Compare
cfc2ec4
to
a0bbc1d
Compare
4d7bc84
to
b5cb37d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! 🎉
Reviewed 10 of 13 files at r2, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @Azhng and @matthewtodd)
-- commits, line 16 at r2:
nit: "bounded FIFO store"?
pkg/sql/contention/cluster_settings.go, line 24 at r2 (raw file):
settings.TenantWritable, "sql.contention.event_store.resolution_interval", "the interval at which transaction ID resolution is performed (set to 0 to disable)",
nit: Is this "transaction fingerprint ID resolution"?
pkg/sql/contention/event_store.go, line 40 at r2 (raw file):
// eventWriter provides interfaces to write contention event into eventStore. type eventWriter interface { addEvent(roachpb.ContentionEvent)
Do you want contentionpb.ExtendedContentionEvent
here instead?
pkg/sql/contention/event_store.go, line 57 at r2 (raw file):
return i } }
Would it make sense to keep track of the length as we add & remove instead?
pkg/sql/contention/event_store.go, line 137 at r2 (raw file):
}, OnEvictedEntry: func(entry *cache.Entry) { entrySize := int64(entry.Value.(*contentionpb.ExtendedContentionEvent).Size())
bug: add the UUID's size. (So, maybe a function to avoid the duplication?)
pkg/sql/contention/event_store.go, line 147 at r2 (raw file):
return eventBatchSize }, /* limiter */ func(currentWriterIndex int64) {
nit: _
pkg/sql/contention/event_store.go, line 162 at r2 (raw file):
func (s *eventStore) start(ctx context.Context, stopper *stop.Stopper) { s.startResolver(ctx, stopper) s.startEventIntake(ctx, stopper)
nit: reverse these 2 lines?
pkg/sql/contention/event_store.go, line 297 at r2 (raw file):
s.guard.ForceSync() result, err := s.resolver.dequeue(ctx)
Is it worth calling out here that the asynchrony means that the resolver probably won't yet have enqueued all the events from the intake buffer?
pkg/sql/contention/event_store.go, line 298 at r2 (raw file):
result, err := s.resolver.dequeue(ctx) // Ensure that all the resolved contention events are dequeued from the
nit: reword this: "are added to the store"
pkg/sql/contention/event_store.go, line 321 at r2 (raw file):
} func (s *eventStore) nextResolutionInterval() time.Duration {
nit: instead of next, maybe "jittered"? Or resolutionIntervalWithJitter? Not sure.
pkg/sql/contentionpb/contention.go, line 75 at r2 (raw file):
// Valid returns if the ExtendedContentionEvent is valid. func (e *ExtendedContentionEvent) Valid() bool { return e.BlockingEvent.TxnMeta.ID != uuid.UUID{}
nit: wanna do the nil thing here?
pkg/sql/contention/event_store_test.go, line 33 at r2 (raw file):
) func TestEventStore(t *testing.T) {
use the metamorphic stuff?
pkg/sql/contention/event_store_test.go, line 51 at r2 (raw file):
// Up to 10 nodes. numOfCoordinators := rand.Intn(10)
nit: do the random counts provide any value? Maybe it's better to always have large numbers here? Also, Intn can return 0. @Azhng suggests Intn + some minimum.
pkg/sql/contention/event_store_test.go, line 57 at r2 (raw file):
// Randomize input. testCases := randomlyGenerateTestCases(testSize, numOfCoordinators)
nit: rename to testData?
pkg/sql/contention/event_store_test.go, line 68 at r2 (raw file):
// The contention event should immediately be available to be read from // the event store (after synchronization). However, certain information will
nit: this sentence is misleading, since the ForceSync doesn't make events immediately available to forEachEvent.
pkg/sql/contention/event_store_test.go, line 70 at r2 (raw file):
// the event store (after synchronization). However, certain information will // remain unavailable until the resolution is performed. testutils.SucceedsWithin(t, func() error {
nit: maybe call out that this block's testing the intake goroutine, so that's why we keep retrying. (Since we can't directly control when it does its work.)
pkg/sql/contention/event_store_test.go, line 111 at r2 (raw file):
require.NoError(t, store.flushAndResolve(ctx)) require.NoError(t, store.forEachEvent(
nit: maybe a comment saying something like "Now that we've resolved all the txn fingerprint ids, the event store has all the information we expect."
pkg/sql/contention/event_store_test.go, line 125 at r2 (raw file):
st := cluster.MakeTestingClusterSettings() // TxnIDResolutionInterval.Override(ctx, &st.SV, time.Millisecond*300)
nit: delete
pkg/sql/contention/event_store_test.go, line 153 at r2 (raw file):
numOfOps := b.N / numOfConcurrentWriter inputIdx := numOfOps * writerIdx
nit: comment about how each writer's reading from its own section of the input?
pkg/sql/contention/event_store_test.go, line 208 at r2 (raw file):
// CollectionTs is generated at the runtime. It's tricky to assert its value. // We simply assert that it's nonzero and zero it out afterwards. require.True(t, actual.CollectionTs.After(expected.CollectionTs),
Another option could be passing in a fake clock. Not sure if it matters here.
02ff5f6
to
2032e1e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @matthewtodd)
Previously, matthewtodd (Matthew Todd) wrote…
nit: "bounded FIFO store"?
Done.
pkg/sql/contention/event_store.go, line 40 at r2 (raw file):
Previously, matthewtodd (Matthew Todd) wrote…
Do you want
contentionpb.ExtendedContentionEvent
here instead?
Ah my mistake, this change will be introduced in the following PR.
pkg/sql/contention/event_store.go, line 57 at r2 (raw file):
Previously, matthewtodd (Matthew Todd) wrote…
Would it make sense to keep track of the length as we add & remove instead?
Hmm it could make sending the batch over the channel slightly more complicated. I slightly prefer just to make the intake goroutine to do a little bit extra work.
pkg/sql/contention/event_store.go, line 137 at r2 (raw file):
Previously, matthewtodd (Matthew Todd) wrote…
bug: add the UUID's size. (So, maybe a function to avoid the duplication?)
Done.
pkg/sql/contention/event_store.go, line 162 at r2 (raw file):
Previously, matthewtodd (Matthew Todd) wrote…
nit: reverse these 2 lines?
Done.
pkg/sql/contention/event_store.go, line 297 at r2 (raw file):
Previously, matthewtodd (Matthew Todd) wrote…
Is it worth calling out here that the asynchrony means that the resolver probably won't yet have enqueued all the events from the intake buffer?
Done.
pkg/sql/contention/event_store_test.go, line 33 at r2 (raw file):
Previously, matthewtodd (Matthew Todd) wrote…
use the metamorphic stuff?
Actually I realized metamorphic constants was used for something else. Those are constants that can be changed without affecting program expected output. I think I will just leave these number as random number here.
pkg/sql/contention/event_store_test.go, line 51 at r2 (raw file):
Previously, matthewtodd (Matthew Todd) wrote…
nit: do the random counts provide any value? Maybe it's better to always have large numbers here? Also, Intn can return 0. @Azhng suggests Intn + some minimum.
Done.
pkg/sql/contention/event_store_test.go, line 68 at r2 (raw file):
Previously, matthewtodd (Matthew Todd) wrote…
nit: this sentence is misleading, since the ForceSync doesn't make events immediately available to forEachEvent.
Done. Reworded.
pkg/sql/contention/event_store_test.go, line 70 at r2 (raw file):
Previously, matthewtodd (Matthew Todd) wrote…
nit: maybe call out that this block's testing the intake goroutine, so that's why we keep retrying. (Since we can't directly control when it does its work.)
Done.
pkg/sql/contention/event_store_test.go, line 111 at r2 (raw file):
Previously, matthewtodd (Matthew Todd) wrote…
nit: maybe a comment saying something like "Now that we've resolved all the txn fingerprint ids, the event store has all the information we expect."
Done.
pkg/sql/contention/event_store_test.go, line 125 at r2 (raw file):
Previously, matthewtodd (Matthew Todd) wrote…
nit: delete
Done.
pkg/sql/contention/event_store_test.go, line 153 at r2 (raw file):
Previously, matthewtodd (Matthew Todd) wrote…
nit: comment about how each writer's reading from its own section of the input?
Done.
pkg/sql/contention/event_store_test.go, line 208 at r2 (raw file):
Previously, matthewtodd (Matthew Todd) wrote…
Another option could be passing in a fake clock. Not sure if it matters here.
Done.
b277821
to
36fe969
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 6 files at r1, 13 of 13 files at r5, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @Azhng)
pkg/sql/contention/event_store.go, line 204 at r5 (raw file):
TxnIDResolutionInterval.SetOnChange(&s.st.SV, func(ctx context.Context) { select { case resolutionIntervalChanged <- struct{}{}:
nit: Does publishing to this channel require the select
block?
36fe969
to
578aaf3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @matthewtodd)
pkg/sql/contention/event_store.go, line 204 at r5 (raw file):
Previously, matthewtodd (Matthew Todd) wrote…
nit: Does publishing to this channel require the
select
block?
Good point. Removed the select block.
This commit introduces contention event resolver. The new resolver implements a queue-like interface, where unresolved contention events will be queued into the resolver. The resolution is performed by invoking the `resolve()` method on the resolver queue, and the result of the resolution can be retrieved by calling `dequeue()` method. Release note: None
This commit introduces contention event store, an in-memory bounded FIFO store that stores historical contention events and map collected contention events to their corresponding transaction fingerprint IDs. Release note (sql change): A new cluster setting `sql.contention.event_store.capacity` is added . This cluster setting can be used to control the in-memory capacity of contention event store. When this setting is set to zero, the contention event store is disabled.
578aaf3
to
c0ddc37
Compare
Reviewer note: only last commit is relevant.
This commit introduces contention event store, an in-memory key-value
FIFO store that stores historical contention events and map collected
contention events to their corresponding transaction fingerprint IDs.
Release note (sql change): A new cluster setting
sql.contention.event_store.capacity
is added . This cluster settingcan be used to control the in-memory capacity of contention event store.
When this setting is set to zero, the contention event store is
disabled.