-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
split: add observability for when load based splitting cannot find a key #88720
Conversation
695508c
to
f446d8e
Compare
f446d8e
to
40e7ff8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 5 of 16 files at r1, 2 of 5 files at r2, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @KaiSun314 and @kvoli)
pkg/kv/kvserver/split/decider.go
line 175 at r2 (raw file):
majorityKeyFrequency := d.mu.splitFinder.MajorityKeyFrequency() loggedStr := fmt.Sprintf( "No split key found: insufficient counters = %v, imbalance = %v, too many contained = %v, imbalance and too many contained = %v",
Can we update to use %d
instead of %v
pkg/kv/kvserver/split/decider.go
line 179 at r2 (raw file):
if majorityKeyFrequency >= (1+splitKeyThreshold)/2 { d.loadSplitterMetrics.PopularKeyCount.Inc(1) loggedStr += fmt.Sprintf(", most popular key occurs in %v%% of samples", majorityKeyFrequency*100)
Same as above.
pkg/kv/kvserver/split/decider.go
line 182 at r2 (raw file):
} d.loadSplitterMetrics.NoSplitKeyCount.Inc(1) log.KvDistribution.Infof(ctx, "%s", loggedStr)
nit: log.kvDistribution.Info(...)
rather than Infof
pkg/kv/kvserver/split/decider_test.go
line 410 at r2 (raw file):
} func TestMetrics(t *testing.T) {
Nice testing
nit: TestDeciderMetrics
pkg/kv/kvserver/split/finder_test.go
line 276 at r2 (raw file):
} func TestFinder_NoSplitKeyCause(t *testing.T) {
nit: drop the _ between parts.
pkg/kv/kvserver/split/finder_test.go
line 320 at r2 (raw file):
} testCases := []struct {
It isn't necessary to add a for loop for a single test case here. Nice test logic.
pkg/kv/kvserver/split/finder_test.go
line 339 at r2 (raw file):
} if imbalance != test.expectedImbalance { t.Errorf(
Older tests tend to use t.Errorf
but newer ones have migrated to use require.Equal(t, expected, actual, msg, args)
. We should swap it out w/ the newer format.
pkg/kv/kvserver/split/finder_test.go
line 356 at r2 (raw file):
} func TestFinder_MajorityKeyFrequency(t *testing.T) {
Same as above.
6ead457
to
7d9a47f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @kvoli)
pkg/kv/kvserver/split/decider.go
line 175 at r2 (raw file):
Previously, kvoli (Austen) wrote…
Can we update to use
%d
instead of%v
Done.
pkg/kv/kvserver/split/decider.go
line 179 at r2 (raw file):
Previously, kvoli (Austen) wrote…
Same as above.
Done.
pkg/kv/kvserver/split/decider.go
line 182 at r2 (raw file):
Previously, kvoli (Austen) wrote…
nit:
log.kvDistribution.Info(...)
rather thanInfof
Unfortunately doing log.KvDistribution.Info(ctx, loggedStr) gives the build error:
compilepkg: nogo: errors found by nogo during build-time code analysis:
pkg/kv/kvserver/split/decider.go:182:29: recordLocked(): message argument is not a constant expression\n++Tip: use YourFuncf("descriptive prefix %s", ...) or list new formatting wrappers in pkg/testutils/lint/passes/fmtsafe/functions.go. (fmtsafe)
Target //pkg/cmd/cockroach:cockroach failed to build
pkg/kv/kvserver/split/decider_test.go
line 410 at r2 (raw file):
Previously, kvoli (Austen) wrote…
Nice testing
nit:
TestDeciderMetrics
Done.
pkg/kv/kvserver/split/finder_test.go
line 320 at r2 (raw file):
Previously, kvoli (Austen) wrote…
It isn't necessary to add a for loop for a single test case here. Nice test logic.
Done.
pkg/kv/kvserver/split/finder_test.go
line 339 at r2 (raw file):
Previously, kvoli (Austen) wrote…
Older tests tend to use
t.Errorf
but newer ones have migrated to userequire.Equal(t, expected, actual, msg, args)
. We should swap it out w/ the newer format.
Done.
pkg/kv/kvserver/split/finder_test.go
line 356 at r2 (raw file):
Previously, kvoli (Austen) wrote…
Same as above.
Done.
6ca44e0
to
1ab9460
Compare
Previously, there were no metrics or logging in the load-based splitter. This was inadequate because there is minimal observability into why the load splitter could not find a split key. To address this, this patch adds metrics and logging to the load splitter, including counter metrics indicating number of times could not find a split key and popular key (>25% occurrence) and logging indicating causes for no split key (insufficient counters, imbalance, too many contained). Release note (ops change): Added observability for when load based splitting cannot find a key to indicate the reasons why the load splitter could not find a split key, enabling us to have more observability and insight to debug why a range is not splitting more easily.
1ab9460
to
8ad1bf7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great to have this obs in.
Reviewed 2 of 16 files at r1, 2 of 7 files at r3, 5 of 6 files at r4, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @kvoli)
bors r=kvoli |
Build succeeded: |
Fixes #87276
Previously, there were no metrics or logging in the load-based splitter.
This was inadequate because there is minimal observability into why the load splitter could not find a split key.
To address this, this patch adds metrics and logging to the load splitter, including counter metrics indicating number of times could not find a split key and popular key (>25% occurrence) and logging indicating causes for no split key (insufficient counters, imbalance, too many contained).
Release note (ops change): Added observability for when load based splitting cannot find a key to indicate the reasons why the load splitter could not find a split key, enabling us to have more observability and insight to debug why a range is not splitting more easily.
Metrics / Logging Screenshots (in metrics: black line = no split key count, yellow line = popular key count):
YCSB Workload A - Metrics
YCSB Workload B - Metrics
YCSB Workload B (QPS threshold set to 100) - Metrics and logging
YCSB Workload C - Metrics and logging
YCSB Workload D - Metrics
YCSB Workload E - Metrics and logging
YCSB Workload F - Metrics
KV Workload with span limit 1000, span percent 0.95 - Metrics and logging