Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: introduce contention event store #76549

Merged
merged 2 commits into from
Feb 26, 2022
Merged

Conversation

Azhng
Copy link
Contributor

@Azhng Azhng commented Feb 15, 2022

Reviewer note: only last commit is relevant.


This commit introduces contention event store, an in-memory key-value
FIFO store that stores historical contention events and map collected
contention events to their corresponding transaction fingerprint IDs.

Release note (sql change): A new cluster setting
sql.contention.event_store.capacity is added . This cluster setting
can be used to control the in-memory capacity of contention event store.
When this setting is set to zero, the contention event store is
disabled.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@Azhng Azhng force-pushed the txn-id-resolution branch 3 times, most recently from 4d7ccd0 to cbe49db Compare February 15, 2022 00:52
@Azhng Azhng requested a review from a team February 15, 2022 00:52
@Azhng Azhng force-pushed the txn-id-resolution branch 8 times, most recently from cfc2ec4 to a0bbc1d Compare February 16, 2022 00:52
@Azhng Azhng marked this pull request as ready for review February 16, 2022 02:18
@Azhng Azhng requested a review from a team as a code owner February 16, 2022 02:18
@Azhng Azhng force-pushed the txn-id-resolution branch 7 times, most recently from 4d7bc84 to b5cb37d Compare February 17, 2022 14:46
Copy link
Contributor

@matthewtodd matthewtodd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! 🎉

Reviewed 10 of 13 files at r2, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @Azhng and @matthewtodd)


-- commits, line 16 at r2:
nit: "bounded FIFO store"?


pkg/sql/contention/cluster_settings.go, line 24 at r2 (raw file):

	settings.TenantWritable,
	"sql.contention.event_store.resolution_interval",
	"the interval at which transaction ID resolution is performed (set to 0 to disable)",

nit: Is this "transaction fingerprint ID resolution"?


pkg/sql/contention/event_store.go, line 40 at r2 (raw file):

// eventWriter provides interfaces to write contention event into eventStore.
type eventWriter interface {
	addEvent(roachpb.ContentionEvent)

Do you want contentionpb.ExtendedContentionEvent here instead?


pkg/sql/contention/event_store.go, line 57 at r2 (raw file):

			return i
		}
	}

Would it make sense to keep track of the length as we add & remove instead?


pkg/sql/contention/event_store.go, line 137 at r2 (raw file):

		},
		OnEvictedEntry: func(entry *cache.Entry) {
			entrySize := int64(entry.Value.(*contentionpb.ExtendedContentionEvent).Size())

bug: add the UUID's size. (So, maybe a function to avoid the duplication?)


pkg/sql/contention/event_store.go, line 147 at r2 (raw file):

			return eventBatchSize
		}, /* limiter */
		func(currentWriterIndex int64) {

nit: _


pkg/sql/contention/event_store.go, line 162 at r2 (raw file):

func (s *eventStore) start(ctx context.Context, stopper *stop.Stopper) {
	s.startResolver(ctx, stopper)
	s.startEventIntake(ctx, stopper)

nit: reverse these 2 lines?


pkg/sql/contention/event_store.go, line 297 at r2 (raw file):

	s.guard.ForceSync()

	result, err := s.resolver.dequeue(ctx)

Is it worth calling out here that the asynchrony means that the resolver probably won't yet have enqueued all the events from the intake buffer?


pkg/sql/contention/event_store.go, line 298 at r2 (raw file):

	result, err := s.resolver.dequeue(ctx)
	// Ensure that all the resolved contention events are dequeued from the

nit: reword this: "are added to the store"


pkg/sql/contention/event_store.go, line 321 at r2 (raw file):

}

func (s *eventStore) nextResolutionInterval() time.Duration {

nit: instead of next, maybe "jittered"? Or resolutionIntervalWithJitter? Not sure.


pkg/sql/contentionpb/contention.go, line 75 at r2 (raw file):

// Valid returns if the ExtendedContentionEvent is valid.
func (e *ExtendedContentionEvent) Valid() bool {
	return e.BlockingEvent.TxnMeta.ID != uuid.UUID{}

nit: wanna do the nil thing here?


pkg/sql/contention/event_store_test.go, line 33 at r2 (raw file):

)

func TestEventStore(t *testing.T) {

use the metamorphic stuff?


pkg/sql/contention/event_store_test.go, line 51 at r2 (raw file):

	// Up to 10 nodes.
	numOfCoordinators := rand.Intn(10)

nit: do the random counts provide any value? Maybe it's better to always have large numbers here? Also, Intn can return 0. @Azhng suggests Intn + some minimum.


pkg/sql/contention/event_store_test.go, line 57 at r2 (raw file):

	// Randomize input.
	testCases := randomlyGenerateTestCases(testSize, numOfCoordinators)

nit: rename to testData?


pkg/sql/contention/event_store_test.go, line 68 at r2 (raw file):

	// The contention event should immediately be available to be read from
	// the event store (after synchronization). However, certain information will

nit: this sentence is misleading, since the ForceSync doesn't make events immediately available to forEachEvent.


pkg/sql/contention/event_store_test.go, line 70 at r2 (raw file):

	// the event store (after synchronization). However, certain information will
	// remain unavailable until the resolution is performed.
	testutils.SucceedsWithin(t, func() error {

nit: maybe call out that this block's testing the intake goroutine, so that's why we keep retrying. (Since we can't directly control when it does its work.)


pkg/sql/contention/event_store_test.go, line 111 at r2 (raw file):

	require.NoError(t, store.flushAndResolve(ctx))

	require.NoError(t, store.forEachEvent(

nit: maybe a comment saying something like "Now that we've resolved all the txn fingerprint ids, the event store has all the information we expect."


pkg/sql/contention/event_store_test.go, line 125 at r2 (raw file):

	st := cluster.MakeTestingClusterSettings()
	// TxnIDResolutionInterval.Override(ctx, &st.SV, time.Millisecond*300)

nit: delete


pkg/sql/contention/event_store_test.go, line 153 at r2 (raw file):

				numOfOps := b.N / numOfConcurrentWriter
				inputIdx := numOfOps * writerIdx

nit: comment about how each writer's reading from its own section of the input?


pkg/sql/contention/event_store_test.go, line 208 at r2 (raw file):

	// CollectionTs is generated at the runtime. It's tricky to assert its value.
	// We simply assert that it's nonzero and zero it out afterwards.
	require.True(t, actual.CollectionTs.After(expected.CollectionTs),

Another option could be passing in a fake clock. Not sure if it matters here.

@Azhng Azhng force-pushed the txn-id-resolution branch 3 times, most recently from 02ff5f6 to 2032e1e Compare February 17, 2022 19:11
Copy link
Contributor Author

@Azhng Azhng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @matthewtodd)


-- commits, line 16 at r2:

Previously, matthewtodd (Matthew Todd) wrote…

nit: "bounded FIFO store"?

Done.


pkg/sql/contention/event_store.go, line 40 at r2 (raw file):

Previously, matthewtodd (Matthew Todd) wrote…

Do you want contentionpb.ExtendedContentionEvent here instead?

Ah my mistake, this change will be introduced in the following PR.


pkg/sql/contention/event_store.go, line 57 at r2 (raw file):

Previously, matthewtodd (Matthew Todd) wrote…

Would it make sense to keep track of the length as we add & remove instead?

Hmm it could make sending the batch over the channel slightly more complicated. I slightly prefer just to make the intake goroutine to do a little bit extra work.


pkg/sql/contention/event_store.go, line 137 at r2 (raw file):

Previously, matthewtodd (Matthew Todd) wrote…

bug: add the UUID's size. (So, maybe a function to avoid the duplication?)

Done.


pkg/sql/contention/event_store.go, line 162 at r2 (raw file):

Previously, matthewtodd (Matthew Todd) wrote…

nit: reverse these 2 lines?

Done.


pkg/sql/contention/event_store.go, line 297 at r2 (raw file):

Previously, matthewtodd (Matthew Todd) wrote…

Is it worth calling out here that the asynchrony means that the resolver probably won't yet have enqueued all the events from the intake buffer?

Done.


pkg/sql/contention/event_store_test.go, line 33 at r2 (raw file):

Previously, matthewtodd (Matthew Todd) wrote…

use the metamorphic stuff?

Actually I realized metamorphic constants was used for something else. Those are constants that can be changed without affecting program expected output. I think I will just leave these number as random number here.


pkg/sql/contention/event_store_test.go, line 51 at r2 (raw file):

Previously, matthewtodd (Matthew Todd) wrote…

nit: do the random counts provide any value? Maybe it's better to always have large numbers here? Also, Intn can return 0. @Azhng suggests Intn + some minimum.

Done.


pkg/sql/contention/event_store_test.go, line 68 at r2 (raw file):

Previously, matthewtodd (Matthew Todd) wrote…

nit: this sentence is misleading, since the ForceSync doesn't make events immediately available to forEachEvent.

Done. Reworded.


pkg/sql/contention/event_store_test.go, line 70 at r2 (raw file):

Previously, matthewtodd (Matthew Todd) wrote…

nit: maybe call out that this block's testing the intake goroutine, so that's why we keep retrying. (Since we can't directly control when it does its work.)

Done.


pkg/sql/contention/event_store_test.go, line 111 at r2 (raw file):

Previously, matthewtodd (Matthew Todd) wrote…

nit: maybe a comment saying something like "Now that we've resolved all the txn fingerprint ids, the event store has all the information we expect."

Done.


pkg/sql/contention/event_store_test.go, line 125 at r2 (raw file):

Previously, matthewtodd (Matthew Todd) wrote…

nit: delete

Done.


pkg/sql/contention/event_store_test.go, line 153 at r2 (raw file):

Previously, matthewtodd (Matthew Todd) wrote…

nit: comment about how each writer's reading from its own section of the input?

Done.


pkg/sql/contention/event_store_test.go, line 208 at r2 (raw file):

Previously, matthewtodd (Matthew Todd) wrote…

Another option could be passing in a fake clock. Not sure if it matters here.

Done.

@Azhng Azhng force-pushed the txn-id-resolution branch 2 times, most recently from b277821 to 36fe969 Compare February 23, 2022 23:57
Copy link
Contributor

@matthewtodd matthewtodd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 2 of 6 files at r1, 13 of 13 files at r5, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @Azhng)


pkg/sql/contention/event_store.go, line 204 at r5 (raw file):

		TxnIDResolutionInterval.SetOnChange(&s.st.SV, func(ctx context.Context) {
			select {
			case resolutionIntervalChanged <- struct{}{}:

nit: Does publishing to this channel require the select block?

@Azhng Azhng force-pushed the txn-id-resolution branch from 36fe969 to 578aaf3 Compare February 24, 2022 19:21
Copy link
Contributor Author

@Azhng Azhng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @matthewtodd)


pkg/sql/contention/event_store.go, line 204 at r5 (raw file):

Previously, matthewtodd (Matthew Todd) wrote…

nit: Does publishing to this channel require the select block?

Good point. Removed the select block.

This commit introduces contention event resolver. The new resolver
implements a queue-like interface, where unresolved contention events
will be queued into the resolver. The resolution is performed by
invoking the `resolve()` method on the resolver queue, and the result of
the resolution can be retrieved by calling `dequeue()` method.

Release note: None
This commit introduces contention event store, an in-memory bounded
FIFO store that stores historical contention events and map collected
contention events to their corresponding transaction fingerprint IDs.

Release note (sql change): A new cluster setting
`sql.contention.event_store.capacity` is added . This cluster setting
can be used to control the in-memory capacity of contention event store.
When this setting is set to zero, the contention event store is
disabled.
@Azhng Azhng force-pushed the txn-id-resolution branch from 578aaf3 to c0ddc37 Compare February 25, 2022 22:28
@craig craig bot merged commit c0ddc37 into cockroachdb:master Feb 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants