Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

compactor: Add offline deduplication #4239

Merged
merged 12 commits into from
Jun 16, 2021
Merged

compactor: Add offline deduplication #4239

merged 12 commits into from
Jun 16, 2021

Conversation

yeya24
Copy link
Contributor

@yeya24 yeya24 commented May 16, 2021

Signed-off-by: yeya24 [email protected]

Fixes #1014

  • I added CHANGELOG entry for this change.
  • Change is not relevant to the end user.

Changes

Another attempt on offline deduplication using the existing querier's penalty deduplication algorithm.

This pr is WIP as I need to do more testing and is currently blocked by prometheus/prometheus#8836, so I use my own fork here.

But as mentioned in #1014 (comment), this deduplication algorithm works for 99% of cases but is wrong in 1% case. So it has its own risk.

Verification

Tested with 2 Prometheus replicas running in k8s. I replicate the minio bucket to a local directory as a filesystem objstore for testing. The local directory is used for running dedup compactor. There are 12h test data and I run 1 thanos store and 1 thanos querier for each bucket.

Undeduped bucket:

Query deduplication has to be enabled.

Screenshot from 2021-05-17 11-09-16

Bucket size: 41m (36m if normal compaction is enabled)

Offline deduped bucket:

Query deduplication disabled.

Screenshot from 2021-05-17 11-10-36

Bucket size: 19m

@yeya24
Copy link
Contributor Author

yeya24 commented May 16, 2021

I think the failed E2E test is related to my change, but I am not sure why.

bwplotka
bwplotka previously approved these changes May 17, 2021
Copy link
Member

@bwplotka bwplotka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is epic, If not for the file move this would have been 50 line change to add something so powerful. Well done!

I have a strong suggestion regarding the bool flag, otherwise LGTM!

One extra question: Are we sure penalty algo will work properly on 1:1 dups? (: (extra test would help)

cmd/thanos/compact.go Outdated Show resolved Hide resolved
cmd/thanos/compact.go Show resolved Hide resolved
cmd/thanos/compact.go Outdated Show resolved Hide resolved
pkg/compact/dedup.go Outdated Show resolved Hide resolved
cmd/thanos/compact.go Show resolved Hide resolved
@yeya24 yeya24 force-pushed the dedup-comp branch 2 times, most recently from 4fdedb0 to 143cd07 Compare May 17, 2021 18:00
@yeya24 yeya24 changed the title WIP: Add offline dedup compactor Add offline dedup compactor May 17, 2021
@yeya24 yeya24 force-pushed the dedup-comp branch 2 times, most recently from 9b005a3 to 3e11f2b Compare May 17, 2021 18:08
@yeya24
Copy link
Contributor Author

yeya24 commented May 18, 2021

This works for downsampled blocks now. Here is one example. Each qurier runs against one bucket. The two buckets contain the same downsample level1 blocks and one bucket is deduplicated then. The result is almost the same.

Screenshot from 2021-05-17 20-40-38

pkg/dedup/chunk_iter.go Outdated Show resolved Hide resolved
pkg/block/fetcher.go Outdated Show resolved Hide resolved
go.mod Outdated Show resolved Hide resolved
go.mod Outdated Show resolved Hide resolved
@yeya24 yeya24 force-pushed the dedup-comp branch 7 times, most recently from 1ece1ce to 3a1c8ef Compare May 21, 2021 02:38
@yeya24 yeya24 force-pushed the dedup-comp branch 2 times, most recently from ced8fda to 1b9630c Compare May 27, 2021 18:02
@yeya24 yeya24 mentioned this pull request May 28, 2021
2 tasks
Copy link
Member

@kakkoyun kakkoyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this really close to the finish line.

cmd/thanos/compact.go Show resolved Hide resolved
cmd/thanos/compact.go Show resolved Hide resolved
kakkoyun
kakkoyun previously approved these changes Jun 2, 2021
Copy link
Member

@kakkoyun kakkoyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome! LGTM! 🥇

Let's give @bwplotka on last chance to review and we can merge.

@yeya24
Copy link
Contributor Author

yeya24 commented Jun 3, 2021

Friendly ping @bwplotka.

Copy link
Member

@GiedriusS GiedriusS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few nits but overall LGTM 🤗

cmd/thanos/compact.go Outdated Show resolved Hide resolved
this requires more complex deduplication algorithms. For example one that is used to [deduplicate on the fly on the Querier](query.md#run-time-deduplication-of-ha-groups). This is common
case when Prometheus HA replicas are used. [Offline deduplication for this is in progress](https://github.com/thanos-io/thanos/issues/1014).
case when Prometheus HA replicas are used. You can enable this deduplication via `--deduplication.func=penalty` flag.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we please clarify in the docs in exactly which 1% of the cases the penalty-based deduplication algorithm doesn't work?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure about the 1% case. I only know it is related to counter reset/ rate functions. Is this #2890 the right issue for it?

@bwplotka @kakkoyun for more input.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe let's say something like:

even though the deduplication algorithm has been battle-tested for quite a long time but still very few issues come up from time to time such as https://github.com/thanos-io/thanos/issues/2890

? It would be more concrete. And it wouldn't sound like we know that it is bad in some 1% of cases & we aren't fixing it 😄

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated docs. PTAL

pkg/dedup/chunk_iter.go Outdated Show resolved Hide resolved
test/e2e/compact_test.go Show resolved Hide resolved
cmd/thanos/compact.go Outdated Show resolved Hide resolved
Signed-off-by: yeya24 <[email protected]>
docs/components/tools.md Outdated Show resolved Hide resolved
Copy link
Member

@GiedriusS GiedriusS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have been running this new mode for a few days and haven't spotted any abnormalities. 💪 Let's merge?

@yeya24
Copy link
Contributor Author

yeya24 commented Jun 15, 2021

I feel it is ready to go unless @bwplotka has some comments.
Will merge tomorrow if there is no objection.

@yeya24 yeya24 merged commit f75a233 into thanos-io:main Jun 16, 2021
@yeya24 yeya24 deleted the dedup-comp branch June 16, 2021 16:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Compact: Offline deduplication
4 participants