Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compactor doesn't compact data as configured #6313

Open
amewayne opened this issue Apr 24, 2023 · 6 comments
Open

Compactor doesn't compact data as configured #6313

amewayne opened this issue Apr 24, 2023 · 6 comments

Comments

@amewayne
Copy link

amewayne commented Apr 24, 2023

Thanos, Prometheus and Golang version used:
Thanos: v0.30.2
Prometheus: 2.42.0
Go: 1.19.5

Object Storage Provider:

Minio/S3

What happened:

Thanos-Compact stopped to compact data from Minio after a period of time.
I configured retention.resolution-raw=2d when I started compactor, but I can see that there are raw datas over 2 days.
I tried to restart compactor and it started to work for a while, but still stopped to compact soon.

Have no clue about this issue because everything seems good and no error found in the logs.

This is the command that I used to start compactor:

docker run --name="thanos-compactor" -d --restart=always -v /opt/prometheus-config:/etc/prometheus -v /data0/compact-data:/data0/compact-data --net=host thanos:v0.30.2 compact --http-address=0.0.0.0:10902 --data-dir=/data0/compact-data --objstore.config-file=/etc/prometheus/minio/prod.yml --compact.cleanup-interval=5m --wait --wait-interval=5m --retention.resolution-raw=2d --retention.resolution-5m=10d --retention.resolution-1h=60d  --log.level=info

What you expected to happen:

Compactor keeps working to compact data, and there will not be raw data older than 2 days on Minio.

How to reproduce it (as minimally and precisely as possible):

Full logs to relevant components:

level=info ts=2023-04-24T07:52:16.002244007Z caller=clean.go:34 msg="started cleaning of aborted partial uploads"
level=info ts=2023-04-24T07:52:16.002258707Z caller=clean.go:61 msg="cleaning of aborted partial uploads done"
level=info ts=2023-04-24T07:52:16.002268737Z caller=blocks_cleaner.go:44 msg="started cleaning of blocks marked for deletion"
level=info ts=2023-04-24T07:52:16.002307288Z caller=blocks_cleaner.go:58 msg="cleaning of blocks marked for deletion done"
level=info ts=2023-04-24T07:52:18.262869543Z caller=fetcher.go:478 component=block.BaseFetcher msg="successfully synchronized block metadata" duration=2.260546335s duration_ms=2260 cached=1973 returned=1878 partial=0
level=info ts=2023-04-24T07:52:18.263340051Z caller=compact.go:1296 msg="start of GC"
level=info ts=2023-04-24T07:52:18.270631636Z caller=compact.go:1319 msg="start of compactions"
level=info ts=2023-04-24T07:52:18.271490099Z caller=compact.go:1005 group="0@{monitor=\"kubernetes\", replica=\"1\"}" groupKey=0@10753051500942971680 msg="compaction available and planned; downloading blocks" plan="[01GYG0KACQXWFAXK6K7727TZ2S (min time: 1682006400000, max time: 1682013600000) 01GYG7F1MVKRFJDP3JKGP1S4N2 (min time: 1682013600000, max time: 1682020800000) 01GYGEARWWHF48ZQF6PH69D1XS (min time: 1682020800000, max time: 1682028000000) 01GYGN6G5CZYV7QDFVP6W5NJEK (min time: 1682028000000, max time: 1682035200000)]"
level=info ts=2023-04-24T07:53:08.08875303Z caller=fetcher.go:478 component=block.BaseFetcher msg="successfully synchronized block metadata" duration=2.069151008s duration_ms=2069 cached=1973 returned=1973 partial=0

Anything else we need to know:

image

@amewayne
Copy link
Author

Finally got some clues about this issue. I cleaned up the directory of compactor and restarted it with --compact.concurrency flag, and compactor seems working well this time.

If my understanding is correct, thanos_compact_todo_compaction_blocks stands for the number of current backlogs that need to compacted. thanos_compact_todo_compaction_blocks shows there are over 2k blocks that need to be handled, so it looks like that compactor stopped working. After adding --compact.concurrency, thanos_compact_todo_compaction_blocks keeps going down, so I guess it's related.

@beramod
Copy link

beramod commented Jul 27, 2023

Hi @amewayne
I have a similar problem.
Could you tell me how you set compact.concurrency?

@amewayne
Copy link
Author

Hi @amewayne I have a similar problem. Could you tell me how you set compact.concurrency?

I set --compact.concurrency=15 on a 32-core server.

@Kiara0107
Copy link

I have the same issue, see #6866, I increased the concurrency form 1 to 6, but I see no difference. Do you have any other tips?

@yeya24
Copy link
Contributor

yeya24 commented Nov 2, 2023

@Kiara0107 https://thanos.io/tip/operating/compactor-backlog.md/#troubleshoot-compactor-backlog Please take a look at this doc.

It might take some time to show differences after you increased concurrency since compaction takes time.

@Kiara0107
Copy link

Thanks, I've read the doc, that's why I increased the concurrency. Unfortunately no changes after 24 hours. I would say I should have seen some results by now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants