-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Shipper doesn't upload OOO compacted blocks #6462
Comments
I am able to see blocks still being compacted due to overlapping. It was generated from OOO blocks.
|
Is there a way to distinguish an OOO block from a regular one? |
@fpetkovski, there is a method to set compaction hints to out of order https://github.com/prometheus/prometheus/blob/main/tsdb/block.go#L196. But this seems only available on lv1 block created from OOO head. After 1 compaction this hint will be removed. |
Hm, maybe the shipper could inspect source block metas and figure out that way if a block is created from an OOO chunk. |
@fpetkovski Not sure if it works or not because source blocks will be deleted after compaction so shipper might miss that? I guess having a fsnotify watcher might work, but that over complicates things. What's the downside if we just enable upload compacted blocks when OOO is enabled? |
I have a different approach of the same issue:
So the TSDB of receive actually rewrite a new block with the hint: from-out-of-order. However, both blocks have been uploaded by shipper which halts my compactor because of overlapping blocks. meta.json 01H66BABNW48B74PW4YRAM31TQ{
"ulid": "01H66BABNW48B74PW4YRAM31TQ",
"minTime": 1690272000000,
"maxTime": 1690279200000,
"stats": {
"numSamples": 135016,
"numSeries": 91424,
"numChunks": 91424
},
"compaction": {
"level": 1,
"sources": [
"01H66BABNW48B74PW4YRAM31TQ"
],
"hints": [
"from-out-of-order"
]
},
"version": 1,
"thanos": {
"labels": {
"prometheus": "rbox",
"receive": "true",
"receive_replica": "thanos-receive-rbox-ingester-0",
"receiver": "rbox",
"stack": "ccp-ne-ogob01a",
"tenant_id": "rbox"
},
"downsample": {
"resolution": 0
},
"source": "receive",
"segment_files": [
"000001"
],
"files": [
{
"rel_path": "chunks/000001",
"size_bytes": 2271083
},
{
"rel_path": "index",
"size_bytes": 16322685
},
{
"rel_path": "meta.json"
}
]
}
} meta.json 01H66BAH90X25EJN5S5BJ4JEF2 (not uploaded thats the issue){
"ulid": "01H66BAH90X25EJN5S5BJ4JEF2",
"minTime": 1690272000000,
"maxTime": 1690279200000,
"stats": {
"numSamples": 63400665,
"numSeries": 255020,
"numChunks": 532703
},
"compaction": {
"level": 2,
"sources": [
"01H66B9DW1T2V781EZQ4FN5CEX",
"01H66BABNW48B74PW4YRAM31TQ"
],
"parents": [
{
"ulid": "01H66B9DW1T2V781EZQ4FN5CEX",
"minTime": 1690272000000,
"maxTime": 1690279200000
},
{
"ulid": "01H66BABNW48B74PW4YRAM31TQ",
"minTime": 1690272000000,
"maxTime": 1690279200000
}
]
},
"version": 1
} (i deleted 01H66B9DW1T2V781EZQ4FN5CEX to fix my compactor but it was classic with no hint) Here the source blocks were uploaded (which halt my compactor because of overlapping blocks) but not the compacted block level2. I think this level2 block should be the only block being uploaded to longterm storage, maybe by adding an upload delay to the shipper (as the source blocks are instantly removed by TSDB, but right after shipper upload :( see logs. )? WDYT? |
@ahurtaud I think it is an interesting idea. But it is hard to ensure that within this delay the OOO block compaction will happen and we can get the l2 block. |
I wonder if we can modify the duplicate filter to filter out source OOO blocks once a compacted version is available. I am not familiar with OOO so I don't exactly know how that detection would work. |
@fpetkovski I was thinking about the same, but it requires the upload delay and I feel it cannot ensure we don't upload L1 blocks. |
I think what we will do is to always enable shipper uploading compacted blocks in Cortex. It is used in ingester which is almost the same as receiver. |
We now disable overlapping compaction in the TSDB/Receiver so this doesn't happen anymore. I'll close the ticket |
Hey, Is there a MR or Release that contains this change/fix/improvement? |
… parameterize uploading compacted blocks In v1.15.2, ingesters configured with OOO samples ingestion enabled could hit this bug (cortexproject#5402) where ingesters would not upload compacted blocks (thanos-io/thanos#6462). In v1.16.1, ingesters are configured to always upload compacted blocks (cortexproject#5625). In v1.17, ingesters stopped uploading compacted blocks (cortexproject#5735). This can cause problems for users upgrading from v1.15.2 with OOO ingestion enabled to v1.17 because both versions are hard coded to disable uploading compacted blocks from the ingesters. The workaround was to downgrade from v1.17 to v1.16 to allow those compacted blocks to be uploaded (and eventually deleted). The new flag is set to true by default which reverts the behavior of the ingester uploading compacted blocks back to v1.16. Signed-off-by: Charlie Le <[email protected]>
… parameterize uploading compacted blocks (#5959) In v1.15.2, ingesters configured with OOO samples ingestion enabled could hit this bug (#5402) where ingesters would not upload compacted blocks (thanos-io/thanos#6462). In v1.16.1, ingesters are configured to always upload compacted blocks (#5625). In v1.17, ingesters stopped uploading compacted blocks (#5735). This can cause problems for users upgrading from v1.15.2 with OOO ingestion enabled to v1.17 because both versions are hard coded to disable uploading compacted blocks from the ingesters. The workaround was to downgrade from v1.17 to v1.16 to allow those compacted blocks to be uploaded (and eventually deleted). The new flag is set to true by default which reverts the behavior of the ingester uploading compacted blocks back to v1.16. Signed-off-by: Charlie Le <[email protected]>
… parameterize uploading compacted blocks (#5959) In v1.15.2, ingesters configured with OOO samples ingestion enabled could hit this bug (#5402) where ingesters would not upload compacted blocks (thanos-io/thanos#6462). In v1.16.1, ingesters are configured to always upload compacted blocks (#5625). In v1.17, ingesters stopped uploading compacted blocks (#5735). This can cause problems for users upgrading from v1.15.2 with OOO ingestion enabled to v1.17 because both versions are hard coded to disable uploading compacted blocks from the ingesters. The workaround was to downgrade from v1.17 to v1.16 to allow those compacted blocks to be uploaded (and eventually deleted). The new flag is set to true by default which reverts the behavior of the ingester uploading compacted blocks back to v1.16. Signed-off-by: Charlie Le <[email protected]>
What happened:
If OOO is enabled, then shipper won't upload those OOO compacted blocks because their compaction level is > 1 but shipper doesn't upload compacted blocks when OOO is enabled.
https://github.com/thanos-io/thanos/blob/main/pkg/shipper/shipper.go#L295
See cortexproject/cortex#5402 also
What you expected to happen:
If out of order samples feature is enabled, shipper should still be able to upload those compacted blocks to objstore.
How to reproduce it (as minimally and precisely as possible):
Run two Prometheus instances with the same external labels. Remote write to one receiver with object store configured and OOO enabled.
Alternatives:
I am not sure about Prometheus' logic of OOO block compaction. But it sounds weird because compaction should be turned off so even if we have overlapped blocks generated from OOO, we shouldn't do any local compaction but let compactors take care of it. Seems like TSDB instead always tries to compact overlapping blocks and we don't have a way to disable that.
The text was updated successfully, but these errors were encountered: