-
Notifications
You must be signed in to change notification settings - Fork 806
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make cortex_bucket_store_blocks_loaded metric per user #4918
Make cortex_bucket_store_blocks_loaded metric per user #4918
Conversation
@@ -212,7 +212,7 @@ func (m *BucketStoreMetrics) Collect(out chan<- prometheus.Metric) { | |||
data.SendSumOfCounters(out, m.blockDrops, "thanos_bucket_store_block_drops_total") | |||
data.SendSumOfCounters(out, m.blockDropFailures, "thanos_bucket_store_block_drop_failures_total") | |||
|
|||
data.SendSumOfGauges(out, m.blocksLoaded, "thanos_bucket_store_blocks_loaded") | |||
data.SendSumOfGaugesPerUser(out, m.blocksLoaded, "thanos_bucket_store_blocks_loaded") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this cause an explosion in cardinality if there is a high churn in users?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My 2 cents:
- We have per user metrics already in other Cortex components
- For cardinality, I feel user label is okay. Compared to short lived job names like pod name, container ID, etc, user ID is relatively bounded.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this change is ok, but it looks like store-gateway only "soft deletes" per user metrics:
m.regs.RemoveUserRegistry(user, false) |
(the second parameter is "hard delete?"
If that's the case, turning a metric to per-user may not be a good idea because of memory leak.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually, I think this PR would prevent memory leak.
so, I think we are good to go.
For cardinality, I feel user label is okay. Compared to short lived job names like pod name, container ID, etc, user ID is relatively bounded.
The user
label can be "short lived" too; consider if you are running some test continuously, and each test run creates new user :)
However, I think this metrics is useful to be per user.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you are talking about benchmarking and continuously tests then every label can be "short lived", right? If we are not reusing the same test user.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea :)
I think the per user metrics is not removed by store-gateway
I see. I am not sure how should I clean up the stale user metrics. Isn't it the same as other metrics we have that contain the |
Don’t worry about this, my comment was staled; metrics are cleaned up properly |
Signed-off-by: Ben Ye <[email protected]>
Signed-off-by: Ben Ye <[email protected]>
Signed-off-by: Ben Ye <[email protected]>
3e8b7f5
to
f492637
Compare
Conflict fixed. PTAL |
Signed-off-by: Ben Ye <[email protected]>
f492637
to
9c2bbc4
Compare
CHANGELOG.md
Outdated
@@ -41,8 +41,9 @@ | |||
* [CHANGE] Disables TSDB isolation. #4825 | |||
* [CHANGE] Drops support Prometheus 1.x rule format on configdb. #4826 | |||
* [CHANGE] Removes `-ingester.stream-chunks-when-using-blocks` experimental flag and stream chunks by default when `querier.ingester-streaming` is enabled. #4864 | |||
* [CHANGE] Compactor: Added `cortex_compactor_runs_interrupted_total` to separate compaction interruptions from failures | |||
* [CHANGE] Compactor: Added `cortex_compactor_runs_interrupted_total` to separate compaction interruptions from failures. #4921 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should normally avoid this and create a specific PR for that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK I will remove this change for this pr.
Signed-off-by: Ben Ye <[email protected]>
…#4918) * make cortex_bucket_store_blocks_loaded metric per user Signed-off-by: Ben Ye <[email protected]> * fix integration test Signed-off-by: Ben Ye <[email protected]> * update changelog Signed-off-by: Ben Ye <[email protected]> * fix test Signed-off-by: Ben Ye <[email protected]> * update changelog Signed-off-by: Ben Ye <[email protected]> Signed-off-by: Ben Ye <[email protected]>
Signed-off-by: Friedrich Gonzalez <[email protected]>
Signed-off-by: Friedrich Gonzalez <[email protected]>
Signed-off-by: Ben Ye [email protected]
What this PR does:
Right now
cortex_bucket_store_blocks_loaded
metric is the total blocks loaded for each store gateway instance.This pr makes it per user so that we can know the # of blocks for each tenant.
This pr changes the behavior of the metric so we need
sum
to get the previous value back.If we want to have compatibility then I can add a separate metric
cortex_bucket_store_blocks_loaded_per_user
rather than changing existing ones.Which issue(s) this PR fixes:
Fixes #
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]