Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make cortex_bucket_store_blocks_loaded metric per user #4918

Conversation

yeya24
Copy link
Contributor

@yeya24 yeya24 commented Oct 17, 2022

Signed-off-by: Ben Ye [email protected]

What this PR does:

Right now cortex_bucket_store_blocks_loaded metric is the total blocks loaded for each store gateway instance.
This pr makes it per user so that we can know the # of blocks for each tenant.

This pr changes the behavior of the metric so we need sum to get the previous value back.
If we want to have compatibility then I can add a separate metric cortex_bucket_store_blocks_loaded_per_user rather than changing existing ones.

Which issue(s) this PR fixes:
Fixes #

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

@@ -212,7 +212,7 @@ func (m *BucketStoreMetrics) Collect(out chan<- prometheus.Metric) {
data.SendSumOfCounters(out, m.blockDrops, "thanos_bucket_store_block_drops_total")
data.SendSumOfCounters(out, m.blockDropFailures, "thanos_bucket_store_block_drop_failures_total")

data.SendSumOfGauges(out, m.blocksLoaded, "thanos_bucket_store_blocks_loaded")
data.SendSumOfGaugesPerUser(out, m.blocksLoaded, "thanos_bucket_store_blocks_loaded")
Copy link
Contributor

@harry671003 harry671003 Oct 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this cause an explosion in cardinality if there is a high churn in users?

Copy link
Contributor Author

@yeya24 yeya24 Oct 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My 2 cents:

  1. We have per user metrics already in other Cortex components
  2. For cardinality, I feel user label is okay. Compared to short lived job names like pod name, container ID, etc, user ID is relatively bounded.

Copy link
Contributor

@alvinlin123 alvinlin123 Oct 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this change is ok, but it looks like store-gateway only "soft deletes" per user metrics:

m.regs.RemoveUserRegistry(user, false)

(the second parameter is "hard delete?"

If that's the case, turning a metric to per-user may not be a good idea because of memory leak.

Copy link
Contributor

@alvinlin123 alvinlin123 Oct 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, I think this PR would prevent memory leak.

so, I think we are good to go.

For cardinality, I feel user label is okay. Compared to short lived job names like pod name, container ID, etc, user ID is relatively bounded.

The user label can be "short lived" too; consider if you are running some test continuously, and each test run creates new user :)

However, I think this metrics is useful to be per user.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you are talking about benchmarking and continuously tests then every label can be "short lived", right? If we are not reusing the same test user.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea :)

alvinlin123
alvinlin123 previously approved these changes Oct 27, 2022
@alvinlin123 alvinlin123 self-requested a review October 27, 2022 03:16
@alvinlin123 alvinlin123 dismissed their stale review October 27, 2022 03:16

I think the per user metrics is not removed by store-gateway

@yeya24
Copy link
Contributor Author

yeya24 commented Oct 27, 2022

I think the per user metrics is not removed by store-gateway

I see. I am not sure how should I clean up the stale user metrics. Isn't it the same as other metrics we have that contain the user label?

@alvinlin123
Copy link
Contributor

I think the per user metrics is not removed by store-gateway

I see. I am not sure how should I clean up the stale user metrics. Isn't it the same as other metrics we have that contain the user label?

Don’t worry about this, my comment was staled; metrics are cleaned up properly

@yeya24 yeya24 force-pushed the add-blocks-loaded-per-user-store-gateway branch from 3e8b7f5 to f492637 Compare October 30, 2022 05:55
@yeya24
Copy link
Contributor Author

yeya24 commented Oct 30, 2022

Conflict fixed. PTAL

Signed-off-by: Ben Ye <[email protected]>
@yeya24 yeya24 force-pushed the add-blocks-loaded-per-user-store-gateway branch from f492637 to 9c2bbc4 Compare October 30, 2022 06:35
CHANGELOG.md Outdated
@@ -41,8 +41,9 @@
* [CHANGE] Disables TSDB isolation. #4825
* [CHANGE] Drops support Prometheus 1.x rule format on configdb. #4826
* [CHANGE] Removes `-ingester.stream-chunks-when-using-blocks` experimental flag and stream chunks by default when `querier.ingester-streaming` is enabled. #4864
* [CHANGE] Compactor: Added `cortex_compactor_runs_interrupted_total` to separate compaction interruptions from failures
* [CHANGE] Compactor: Added `cortex_compactor_runs_interrupted_total` to separate compaction interruptions from failures. #4921
Copy link
Member

@friedrichg friedrichg Oct 31, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should normally avoid this and create a specific PR for that

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK I will remove this change for this pr.

Signed-off-by: Ben Ye <[email protected]>
@alvinlin123 alvinlin123 merged commit ed36a62 into cortexproject:master Oct 31, 2022
t00350320 pushed a commit to t00350320/cortex that referenced this pull request Nov 1, 2022
…#4918)

* make cortex_bucket_store_blocks_loaded metric per user

Signed-off-by: Ben Ye <[email protected]>

* fix integration test

Signed-off-by: Ben Ye <[email protected]>

* update changelog

Signed-off-by: Ben Ye <[email protected]>

* fix test

Signed-off-by: Ben Ye <[email protected]>

* update changelog

Signed-off-by: Ben Ye <[email protected]>

Signed-off-by: Ben Ye <[email protected]>
friedrichg added a commit to cortexproject/cortex-jsonnet that referenced this pull request Jun 12, 2023
friedrichg added a commit to cortexproject/cortex-jsonnet that referenced this pull request Jun 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants