-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add per-tenant cache metrics #6289
Add per-tenant cache metrics #6289
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like where this is headed :)
@@ -1833,6 +1833,10 @@ the index to a backing cache store. | |||
# CLI flag: -<prefix>.cache.enable-fifocache | |||
[enable_fifocache: <boolean>] | |||
|
|||
# Add tenant labels to cache-related metrics. | |||
# CLI flag: -<prefix>.cache.per-tenant-metrics | |||
[per_tenant_metrics: <boolean> | default = false] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WDYT about just always doing this based on if tenancy is enabled (ie. auth = true)? Curious how much cardinality explosion we're really protecting users from here, and we already have sooooo many configs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You know it's rude to review draft PRs 😛 but thanks for taking a look
Cardinality is something that I plan to address in the description of the PR, but finishing something else up first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
haha, it's like we're pairing...but asynchronously! yeah, I didn't take too critical an eye, but couldn't resist getting my opinions in early.
Signed-off-by: Danny Kopping <[email protected]>
0a7b9a3
to
d064453
Compare
./tools/diff_coverage.sh ../loki-main/test_results.txt test_results.txt ingester,distributor,querier,querier/queryrange,iter,storage,chunkenc,logql,loki Change in test coverage per package. Green indicates 0 or positive change, red indicates that test coverage for a package fell. + ingester 0%
+ distributor 0.3%
+ querier 0%
+ querier/queryrange 0%
+ iter 0%
+ storage 0%
+ chunkenc 0%
+ logql 0%
+ loki 0% |
Signed-off-by: Danny Kopping <[email protected]>
./tools/diff_coverage.sh ../loki-main/test_results.txt test_results.txt ingester,distributor,querier,querier/queryrange,iter,storage,chunkenc,logql,loki Change in test coverage per package. Green indicates 0 or positive change, red indicates that test coverage for a package fell. + ingester 0%
+ distributor 0.3%
+ querier 0%
+ querier/queryrange 0%
+ iter 0%
+ storage 0%
+ chunkenc 0%
+ logql 0%
+ loki 0% |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not in favor of adding this due to cardinality concerns on large clusters. It will be helpful on smaller clusters, but at the expense of being a toggle away from a cardinality bomb on large ones :(. Instead, can we include tenant information in logs for later extraction via loki?
I'm fine with parking this for now 👍 |
What this PR does / why we need it:
In order to improve & troubleshoot our cache performance, we need per-tenant metrics. Right now we track cache value size, hit/miss ratio, request latency, and requested key count on a global level. All of these metrics (except for request latency) are too coarse right now to be very useful. Request latency is irrelevant per tenant because we don't have a way to dedicate a caching backend per tenant, so this would be a very noisy per-tenant metric.
Which issue(s) this PR fixes:
#6318
Special notes for your reviewer:
This setting is defaulted to
false
due to potential cardinality explosions.This PR adds a
tenant
label for the following metrics:loki_cache_value_size_bytes
loki_cache_fetched_keys
loki_cache_hits
If one has a large number of active tenants, this can result in the creation of many more streams.
Checklist
CHANGELOG.md
.docs/sources/upgrading/_index.md