Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add time-series metrics re: LSM state #1140

Closed
joshimhoff opened this issue May 14, 2021 · 4 comments
Closed

Add time-series metrics re: LSM state #1140

joshimhoff opened this issue May 14, 2021 · 4 comments

Comments

@joshimhoff
Copy link
Contributor

joshimhoff commented May 14, 2021

Stats like count & size make fantastic time-series metrics:

[n1,s1] 5660606 +__level_____count____size___score______in__ingest(sz_cnt)____move(sz_cnt)___write(sz_cnt)____read___r-amp___w-amp
[n1,s1] 5660606 +    WAL         1    53 M       -    32 G       -       -       -       -    33 G       -       -       -     1.0
[n1,s1] 5660606 +      0        13    22 M    3.14    33 G    39 M       3     0 B       0   4.4 G   2.6 K     0 B       3     0.1
[n1,s1] 5660606 +      1         0     0 B    0.00     0 B     0 B       0     0 B       0     0 B       0     0 B       0     0.0
[n1,s1] 5660606 +      2         4    16 M    0.96   4.4 G    26 M       1    16 M       8    20 G   7.7 K    21 G       1     4.5
[n1,s1] 5660606 +      3        10    47 M    0.59   3.1 G     0 B       0   375 M      99    28 G    11 K    29 G       1     9.1
[n1,s1] 5660606 +      4       189   119 M    0.16   2.7 G     0 B       0   550 M      95   110 G    12 K   110 G       1    40.3
[n1,s1] 5660606 +      5      1154    33 G    1.00   6.5 G    10 K       8   471 M      84   148 G   5.3 K   148 G       1    22.8
[n1,s1] 5660606 +      6      5438   280 G       -   6.7 G     0 B       0     0 B       0    40 G     701    42 G       1     5.9
[n1,s1] 5660606 +  total      6808   313 G       -    33 G    65 M      12   1.4 G     286   383 G    39 K   350 G       8    11.6
[n1,s1] 5660606 +  flush       590
[n1,s1] 5660606 +compact      3758    38 M            19 G  (size == estimated-debt, in = in-progress-bytes)
[n1,s1] 5660606 + memtbl         1    64 M
[n1,s1] 5660606 +zmemtbl         0     0 B
[n1,s1] 5660606 +   ztbl         0     0 B
[n1,s1] 5660606 + bcache     180 K   3.0 G   81.3%  (score == hit-rate)
[n1,s1] 5660606 + tcache     6.8 K   4.0 M  100.0%  (score == hit-rate)
[n1,s1] 5660606 + titers        13
[n1,s1] 5660606 + filter         -       -   91.6%  (score == utility)

My understanding is the number of levels doesn't grow forever, so we can break the metrics down by level without worry about carnality limits.

It'll be much quicker to make basic sense of what is going with such metrics. Logs are good too but metrics can be made sense of at higher speed. Also, it is easier to correlate such metrics with some impact based metrics (SQL latencies or kvprober error rate) than it is to do so with logs.

This will compliment #1141.

@petermattis
Copy link
Collaborator

Pebble already exports this info at a very fine granularity. See https://github.com/cockroachdb/pebble/blob/master/metrics.go#L125. Pebble itself doesn't create any timeseries metrics, it only ever exports data that can be used to power metrics at an application level. I suspect there is nothing to be done here in Pebble.

@joshimhoff
Copy link
Contributor Author

Nice. So then this is a CRDB issue, @petermattis? I'll look for the time-series metrics first.

@joshimhoff
Copy link
Contributor Author

Yup. Example metric that could be broken down by level but isn't:

image

Will open in CRDB repo.

@joshimhoff
Copy link
Contributor Author

Closing in favor of cockroachdb/cockroach#65276.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants