-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
compactor: no downsampling, why? How to enable/activate/implement downsampling? #6866
Comments
Can you post a screenshot of the block UI from the compactor? You can access it on the http port which you've set to be |
So downsampling happens only for blocks that are 2 days or 14 days in duration. Your bucket seems to have many 2 hour blocks, which is probably because no compaction is taking place. Do you see any logs related to compaction? |
I am not sure if any compaciton is taking place. I think that pending values go down due to retention, and not because compactions are completed. |
Any idea on how to check? |
We seem to be having a similar issue since we upgraded to v0.32.X from v0.31. No error logs at all but we can see that the metric |
Updated: nvm you already mentioned you read the doc. Downsampling is enabled by default. You can scale up compactors to catch up. @Kiara0107 Can you check One possibility of you have |
Could you post a graph of bucket operations? Also, this could be an bug introduced in 0.32 as noted in another issue. Could you try 0.31 to see if it makes a difference? |
I think something is corrupted now, I face the "critical error detected; halting" message in the log (way to long to completely paste here)
|
Similar log for me actually: I guess a problem with a single block is halting all compaction? |
Mine is slighty different, starts with: Edit: There are over 1500 blocks mentioned, tried to log on the the container and use the bucket tools, but without success:
Any suggestions on how to fix this 'overlap' error? Manually removing buckets from S3 doesn't feel like te way to go :S |
It seems my compactor is now making progress. Apparently, the compactor had been running for weeks doing nothing after it encountered an error with a block (Our thanos compactor dashboard wasn't showing the most critical metric: The only solution I could think of is manually marking the failing blocks for non-compaction with |
@Kiara0107 do you have replicated blocks, e.g. coming from prometheus pairs? |
@fpetkovski yes I have. We have one Prometheus pair running
and
|
In this case you should use |
My hero. Looks like that indeed is helping, Thanos-compact logs shows:
Which is for sure different then before. I will leave it to it now and see if compaction continues and downsampling will be activated. FYI I've added these 2 flags to the docker run command: |
Since it is a configuration issue, I will close this one. We should update compactor backlog troubleshooting doc to mention checking compactor halt first. |
So everytime when I have two or more replicas of prometheus scraping same targets (same cluster) distinguished by label - --deduplication.func=
- --deduplication.replica-label=prometheus_replica Or what is the best and recommended way how to setup compactor when I have HA prometheus in the same cluster? Thanks |
Thanos, Prometheus and Golang version used:
Object Storage Provider: S3 - Wasabi
What happened: no downsampling, but also no error
Metric: thanos_compact_todo_downsample_blocks = 0 (flatliner)
Bucket_store_block_series tag block.resolution = 0 (always..)
What you expected to happen: since there is data for the last 10 months I would expect downsampled blocks.
How to reproduce it (as minimally and precisely as possible):
Prometheus docker run:
Thanos compact:
Full logs to relevant components:
N/A there are no errors.. or mentions..
Anything else we need to know:
I've read the docs on https://thanos.io/tip/components/compact.md/#downsampling, https://thanos.io/tip/operating/compactor-backlog.md/ and https://thanos.io/tip/components/sidecar.md/
But still I'm clueless on how to enable downsampling.
The text was updated successfully, but these errors were encountered: