compact: could "blocks marked for deletions" be deleted at the beginning? #2605

ulikl · 2020-05-14T08:38:16Z

Thanos, Prometheus and Golang version used:
#thanos --version
thanos, version 0.12.2 (branch: HEAD, revision: 52e10c6)
build user: root@c1a6cf60f03d
build date: 20200430-16:24:03
go version: go1.13.6

Object Storage Provider:
S3 compatible

What happened:

I have problem with the s3 storage quota with the new feature #2136 from v0.12.0.

Generally this is a great feature.
But the blocks marked for deletion are only deleted at the end after the retention actions.
This needs more external storage than necessary as the steps before create new files.

What you expected to happen:

Could the deleting of the blocks are ready to be deleted be delete at the beginning of a compaction iteration?
This should not interfere with the other compact action, as these block are not any more used by compaction.

How to reproduce it (as minimally and precisely as possible):
With every compact execution.

Full logs to relevant components:

You can see the order of the action in log file:

level=info ts=2020-05-13T11:27:32.088281638Z caller=compact.go:892 msg="start of GC"
level=info ts=2020-05-13T11:27:32.088452538Z caller=compact.go:904 msg="start of compactions"
level=info ts=2020-05-13T11:27:32.170084918Z caller=compact.go:633 group="0@{store=\"emea\"}" groupKey=0@2207242157792197222 msg="compaction available and planned; downloading block
...
level=info ts=2020-05-13T11:32:11.373968898Z caller=compact.go:887 msg="start sync of metas"
level=info ts=2020-05-13T11:33:25.185776942Z caller=fetcher.go:451 component=block.BaseFetcher msg="successfully synchronized block metadata" duration=1m13.811763883s cached=689 returned=59 partial=0
level=info ts=2020-05-13T11:33:25.187312386Z caller=compact.go:892 msg="start of GC"
level=info ts=2020-05-13T11:33:25.187461328Z caller=compact.go:904 msg="start of compactions"
level=info ts=2020-05-13T11:33:25.330177556Z caller=compact.go:936 msg="compaction iterations done"
level=info ts=2020-05-13T11:33:25.330237655Z caller=compact.go:410 msg="downsampling was explicitly disabled"
level=info ts=2020-05-13T11:34:25.790819672Z caller=fetcher.go:451 component=block.BaseFetcher msg="successfully synchronized block metadata" duration=1m0.46055915s cached=689 returned=59 partial=0
level=info ts=2020-05-13T11:34:25.792727418Z caller=retention.go:30 msg="start optional retention"
level=info ts=2020-05-13T11:34:25.792957546Z caller=retention.go:45 msg="optional retention apply done"
level=info ts=2020-05-13T11:34:25.792978207Z caller=clean.go:33 msg="started cleaning of aborted partial uploads"
level=info ts=2020-05-13T11:34:25.792989017Z caller=clean.go:60 msg="cleaning of aborted partial uploads done"
level=info ts=2020-05-13T11:34:25.793001202Z caller=blocks_cleaner.go:43 msg="started cleaning of blocks marked for deletion"
level=info ts=2020-05-13T11:34:26.257318243Z caller=blocks_cleaner.go:53 msg="deleted block marked for deletion" block=01E7N100XFABK55ZBB38PQPGS7
...

The text was updated successfully, but these errors were encountered:

bwplotka · 2020-05-18T18:04:32Z

I guess we could, but still how that be helpful for you? It's a matter of few iterations really. Should not matter over longer period of time (:

ulikl · 2020-05-19T06:26:00Z

You are right, usually it should not be much difference.
I was just uploading a few pending days at once, as there where problems accessing my storage.
So the deleted uncompacted blocks where quiet adding up.

GiedriusS · 2020-05-20T21:19:45Z

Started thinking about this too, this could be painful where you might run into quota problems if Thanos Compactor wouldn't finish its iteration quickly enough. Maybe it would be nice to have some kind of subcommand under tools that would go ahead and wipe all of the blocks that have the deletion mark?

pracucci · 2020-05-21T07:30:21Z

We're currently running into a similar problem. We have compactor which is continuously looping within the BucketCompactor.Compact() because there's always some work to do and we never hit the blocksCleaner.DeleteMarkedBlocks().

Have you considered decoupling the blocks deletion from the compactor main loop and running the blocks deletion in background with its own periodic ticker? It would require to do another full bucket scan to find blocks marked for deletion (using a dedicated metadata fetcher) but it wouldn't be blocked by the compactor.

Jakob3xD · 2020-06-02T10:12:58Z

Just for the visualization of the issue. Here is a pic of my graph. Due to the delayed deletion it creates a "peak" of ~2TB of data, which in my opinion is quit heavy.

zygiss · 2020-06-17T03:24:03Z

As a workaround, would running another instance of Compactor as a cron job with --debug.max-compaction-level=1 be a terrible idea here?

If I'm reading it right, this instance of Compactor will not try to compact any blocks, so it can safely run alongside exsting Compactors and instead of compacting, skip to removing blocks marked for deletion.

sevagh · 2020-07-01T05:07:18Z

Based on @zygiss idea above to disable compaction, but it being 1am and me feeling fuzzy in the brain, I wasn't certain about modifying the appropriate hidden compaction level, so I compiled myself a special thanos binary with the following modifications:

diff --git a/cmd/thanos/compact.go b/cmd/thanos/compact.go
index 627010f1..67dcea3e 100644
--- a/cmd/thanos/compact.go
+++ b/cmd/thanos/compact.go
@@ -121,6 +121,9 @@ func registerCompact(m map[string]setupFunc, app *kingpin.Application) {
                "as querying long time ranges without non-downsampled data is not efficient and useful e.g it is not possible to render all samples for a human eye anyway").
                Default("false").Bool()

+       disableCompacting := cmd.Flag("compacting.disable", "Disables compacting to only delete blocks. Use at your own risk").
+               Default("false").Bool()
+
        maxCompactionLevel := cmd.Flag("debug.max-compaction-level", fmt.Sprintf("Maximum compaction level, default is %d: %s", compactions.maxLevel(), compactions.String())).
                Hidden().Default(strconv.Itoa(compactions.maxLevel())).Int()

@@ -168,6 +171,7 @@ func registerCompact(m map[string]setupFunc, app *kingpin.Application) {
                        },
                        component.Compact,
                        *disableDownsampling,
+                       *disableCompacting,
                        *maxCompactionLevel,
                        *blockSyncConcurrency,
                        *compactionConcurrency,
@@ -195,6 +199,7 @@ func runCompact(
        retentionByResolution map[compact.ResolutionLevel]time.Duration,
        component component.Component,
        disableDownsampling bool,
+       disableCompacting bool,
        maxCompactionLevel, blockSyncConcurrency int,
        concurrency int,
        dedupReplicaLabels []string,
@@ -382,8 +387,10 @@ func runCompact(
        }

        compactMainFn := func() error {
-               if err := compactor.Compact(ctx); err != nil {
-                       return errors.Wrap(err, "compaction")
+               if !disableCompacting {
+                       if err := compactor.Compact(ctx); err != nil {
+                               return errors.Wrap(err, "compaction")
+                       }
                }

                if !disableDownsampling {

It ran pretty well, proceeding immediately to deletion. For safety, I shut off the normal compactor (to avoid any risk of race conditions - even though there may not be any in the the deletion function).

Logs:

Jun 30 21:56:31 sv3-thanos2 thanos-custom-compact[29759]: level=info ts=2020-07-01T04:56:31.87090466Z caller=blocks_cleaner.go:53 msg="deleted block marked for deletion" block=01EBJRWBN8J8GF703ZVHDJZ29C
Jun 30 21:56:31 sv3-thanos2 thanos-custom-compact[29759]: level=info ts=2020-07-01T04:56:31.908009471Z caller=blocks_cleaner.go:53 msg="deleted block marked for deletion" block=01EBZ08SDN8GFYKWS535GPDSR3
Jun 30 21:56:32 sv3-thanos2 thanos-custom-compact[29759]: level=info ts=2020-07-01T04:56:32.36013667Z caller=blocks_cleaner.go:53 msg="deleted block marked for deletion" block=01EBK21JXGCZ3ECE2BMBMKWTAQ
Jun 30 21:56:32 sv3-thanos2 thanos-custom-compact[29759]: level=info ts=2020-07-01T04:56:32.455291751Z caller=blocks_cleaner.go:53 msg="deleted block marked for deletion" block=01EBR4HGXVSPJZYSM2MF76W3XR

It ran fine, exit, and now I'm back to my normal compactor.

stale · 2020-08-06T17:56:51Z

Hello 👋 Looks like there was no activity on this issue for last 30 days.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗
If there will be no activity for next week, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.

Jakob3xD · 2020-08-13T13:59:20Z

Would it be possible to add the -wait option to the Bucket downsample tool to split the workload of the compactor into two parts? The compactor itself compacting the raw blocks and deleting files and the downsampler just creating downsampled blocks.
My main issue with the current compactor is the downsampling taking so much time. As there is no option to boost/ tune the downsampling it would be nice to split it, so the compactor cleans the bucket more often.

stale · 2020-11-08T10:40:03Z

Hello 👋 Looks like there was no activity on this issue for the last two months.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗
If there will be no activity in the next two weeks, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.

GiedriusS · 2020-11-08T20:40:03Z

Would it be possible to add the -wait option to the Bucket downsample tool to split the workload of the compactor into two parts? The compactor itself compacting the raw blocks and deleting files and the downsampler just creating downsampled blocks.
My main issue with the current compactor is the downsampling taking so much time. As there is no option to boost/ tune the downsampling it would be nice to split it, so the compactor cleans the bucket more often.

Maybe let's open a separate issue for this? As for the OP, it has been implemented in #3115 and will be in the next version. Closing as fixed.

GiedriusS added component: compact feature request/improvement labels May 21, 2020

GiedriusS added the help wanted label Jun 2, 2020

sevagh mentioned this issue Jul 2, 2020

[Compact] Downsampled blocks still exist after disabling + setting a low retention #2827

Closed

stale bot added the stale label Aug 6, 2020

GiedriusS removed the stale label Aug 6, 2020

stale bot added the stale label Nov 8, 2020

GiedriusS closed this as completed Nov 8, 2020

sevagh mentioned this issue Jun 4, 2021

Tips for slow compact on a large bucket with large blocks #4310

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

compact: could "blocks marked for deletions" be deleted at the beginning? #2605

compact: could "blocks marked for deletions" be deleted at the beginning? #2605

ulikl commented May 14, 2020

bwplotka commented May 18, 2020

ulikl commented May 19, 2020 •

edited

Loading

GiedriusS commented May 20, 2020

pracucci commented May 21, 2020

Jakob3xD commented Jun 2, 2020

zygiss commented Jun 17, 2020

sevagh commented Jul 1, 2020

stale bot commented Aug 6, 2020

Jakob3xD commented Aug 13, 2020 •

edited

Loading

stale bot commented Nov 8, 2020

GiedriusS commented Nov 8, 2020

compact: could "blocks marked for deletions" be deleted at the beginning? #2605

compact: could "blocks marked for deletions" be deleted at the beginning? #2605

Comments

ulikl commented May 14, 2020

bwplotka commented May 18, 2020

ulikl commented May 19, 2020 • edited Loading

GiedriusS commented May 20, 2020

pracucci commented May 21, 2020

Jakob3xD commented Jun 2, 2020

zygiss commented Jun 17, 2020

sevagh commented Jul 1, 2020

stale bot commented Aug 6, 2020

Jakob3xD commented Aug 13, 2020 • edited Loading

stale bot commented Nov 8, 2020

GiedriusS commented Nov 8, 2020

ulikl commented May 19, 2020 •

edited

Loading

Jakob3xD commented Aug 13, 2020 •

edited

Loading