Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

split: add observability for when load based splitting cannot find a key #88720

Merged
merged 1 commit into from
Oct 5, 2022

Conversation

KaiSun314
Copy link
Contributor

@KaiSun314 KaiSun314 commented Sep 26, 2022

Fixes #87276

Previously, there were no metrics or logging in the load-based splitter.

This was inadequate because there is minimal observability into why the load splitter could not find a split key.

To address this, this patch adds metrics and logging to the load splitter, including counter metrics indicating number of times could not find a split key and popular key (>25% occurrence) and logging indicating causes for no split key (insufficient counters, imbalance, too many contained).

Release note (ops change): Added observability for when load based splitting cannot find a key to indicate the reasons why the load splitter could not find a split key, enabling us to have more observability and insight to debug why a range is not splitting more easily.

Metrics / Logging Screenshots (in metrics: black line = no split key count, yellow line = popular key count):

YCSB Workload A - Metrics
YCSB-A Metrics

YCSB Workload B - Metrics
YCSB-B Metrics

YCSB Workload B (QPS threshold set to 100) - Metrics and logging
YCSB-B 100 Metrics
YCSB-B 100 Logs

YCSB Workload C - Metrics and logging
YCSB-C Metrics
YCSB-C Logs

YCSB Workload D - Metrics
YCSB-D Metrics

YCSB Workload E - Metrics and logging
YCSB-E Metrics
YCSB-E Logs

YCSB Workload F - Metrics
YCSB-F Metrics

KV Workload with span limit 1000, span percent 0.95 - Metrics and logging
KV-Span Metrics
KV-Span Logs

@KaiSun314 KaiSun314 requested a review from a team September 26, 2022 15:16
@KaiSun314 KaiSun314 requested a review from a team as a code owner September 26, 2022 15:17
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@KaiSun314 KaiSun314 force-pushed the load_split_observability branch from 695508c to f446d8e Compare September 26, 2022 15:18
@KaiSun314 KaiSun314 requested a review from kvoli September 27, 2022 17:08
@KaiSun314 KaiSun314 force-pushed the load_split_observability branch from f446d8e to 40e7ff8 Compare September 27, 2022 22:12
Copy link
Collaborator

@kvoli kvoli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 5 of 16 files at r1, 2 of 5 files at r2, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @KaiSun314 and @kvoli)


pkg/kv/kvserver/split/decider.go line 175 at r2 (raw file):

					majorityKeyFrequency := d.mu.splitFinder.MajorityKeyFrequency()
					loggedStr := fmt.Sprintf(
						"No split key found: insufficient counters = %v, imbalance = %v, too many contained = %v, imbalance and too many contained = %v",

Can we update to use %d instead of %v


pkg/kv/kvserver/split/decider.go line 179 at r2 (raw file):

					if majorityKeyFrequency >= (1+splitKeyThreshold)/2 {
						d.loadSplitterMetrics.PopularKeyCount.Inc(1)
						loggedStr += fmt.Sprintf(", most popular key occurs in %v%% of samples", majorityKeyFrequency*100)

Same as above.


pkg/kv/kvserver/split/decider.go line 182 at r2 (raw file):

					}
					d.loadSplitterMetrics.NoSplitKeyCount.Inc(1)
					log.KvDistribution.Infof(ctx, "%s", loggedStr)

nit: log.kvDistribution.Info(...) rather than Infof


pkg/kv/kvserver/split/decider_test.go line 410 at r2 (raw file):

}

func TestMetrics(t *testing.T) {

Nice testing

nit: TestDeciderMetrics


pkg/kv/kvserver/split/finder_test.go line 276 at r2 (raw file):

}

func TestFinder_NoSplitKeyCause(t *testing.T) {

nit: drop the _ between parts.


pkg/kv/kvserver/split/finder_test.go line 320 at r2 (raw file):

	}

	testCases := []struct {

It isn't necessary to add a for loop for a single test case here. Nice test logic.


pkg/kv/kvserver/split/finder_test.go line 339 at r2 (raw file):

		}
		if imbalance != test.expectedImbalance {
			t.Errorf(

Older tests tend to use t.Errorf but newer ones have migrated to use require.Equal(t, expected, actual, msg, args). We should swap it out w/ the newer format.


pkg/kv/kvserver/split/finder_test.go line 356 at r2 (raw file):

}

func TestFinder_MajorityKeyFrequency(t *testing.T) {

Same as above.

@KaiSun314 KaiSun314 force-pushed the load_split_observability branch 2 times, most recently from 6ead457 to 7d9a47f Compare September 28, 2022 19:43
Copy link
Contributor Author

@KaiSun314 KaiSun314 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @kvoli)


pkg/kv/kvserver/split/decider.go line 175 at r2 (raw file):

Previously, kvoli (Austen) wrote…

Can we update to use %d instead of %v

Done.


pkg/kv/kvserver/split/decider.go line 179 at r2 (raw file):

Previously, kvoli (Austen) wrote…

Same as above.

Done.


pkg/kv/kvserver/split/decider.go line 182 at r2 (raw file):

Previously, kvoli (Austen) wrote…

nit: log.kvDistribution.Info(...) rather than Infof

Unfortunately doing log.KvDistribution.Info(ctx, loggedStr) gives the build error:

compilepkg: nogo: errors found by nogo during build-time code analysis:
pkg/kv/kvserver/split/decider.go:182:29: recordLocked(): message argument is not a constant expression\n++Tip: use YourFuncf("descriptive prefix %s", ...) or list new formatting wrappers in pkg/testutils/lint/passes/fmtsafe/functions.go. (fmtsafe)
Target //pkg/cmd/cockroach:cockroach failed to build


pkg/kv/kvserver/split/decider_test.go line 410 at r2 (raw file):

Previously, kvoli (Austen) wrote…

Nice testing

nit: TestDeciderMetrics

Done.


pkg/kv/kvserver/split/finder_test.go line 320 at r2 (raw file):

Previously, kvoli (Austen) wrote…

It isn't necessary to add a for loop for a single test case here. Nice test logic.

Done.


pkg/kv/kvserver/split/finder_test.go line 339 at r2 (raw file):

Previously, kvoli (Austen) wrote…

Older tests tend to use t.Errorf but newer ones have migrated to use require.Equal(t, expected, actual, msg, args). We should swap it out w/ the newer format.

Done.


pkg/kv/kvserver/split/finder_test.go line 356 at r2 (raw file):

Previously, kvoli (Austen) wrote…

Same as above.

Done.

@KaiSun314 KaiSun314 force-pushed the load_split_observability branch 8 times, most recently from 6ca44e0 to 1ab9460 Compare October 3, 2022 18:13
Previously, there were no metrics or logging in the load-based splitter.

This was inadequate because there is minimal observability into why the
load splitter could not find a split key.

To address this, this patch adds metrics and logging to the load
splitter, including counter metrics indicating number of times could not
find a split key and popular key (>25% occurrence) and logging
indicating causes for no split key (insufficient counters, imbalance,
too many contained).

Release note (ops change): Added observability for when load based
splitting cannot find a key to indicate the reasons why the load
splitter could not find a split key, enabling us to have more
observability and insight to debug why a range is not splitting more
easily.
@KaiSun314 KaiSun314 force-pushed the load_split_observability branch from 1ab9460 to 8ad1bf7 Compare October 3, 2022 19:12
Copy link
Collaborator

@kvoli kvoli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice patch. :lgtm_strong:

Great to have this obs in.

Reviewed 2 of 16 files at r1, 2 of 7 files at r3, 5 of 6 files at r4, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @kvoli)

@KaiSun314
Copy link
Contributor Author

bors r=kvoli

@craig
Copy link
Contributor

craig bot commented Oct 5, 2022

Build succeeded:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

kvserver: add observability for when load based splitting cannot find a key
3 participants