-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
compactor: Add offline deduplication #4239
Conversation
I think the failed E2E test is related to my change, but I am not sure why. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is epic, If not for the file move this would have been 50 line change to add something so powerful. Well done!
I have a strong suggestion regarding the bool flag, otherwise LGTM!
One extra question: Are we sure penalty algo will work properly on 1:1 dups? (: (extra test would help)
4fdedb0
to
143cd07
Compare
9b005a3
to
3e11f2b
Compare
9b53a9e
to
f1a7888
Compare
5a71d03
to
a3b4834
Compare
1ece1ce
to
3a1c8ef
Compare
ced8fda
to
1b9630c
Compare
Signed-off-by: yeya24 <[email protected]>
Signed-off-by: yeya24 <[email protected]>
Signed-off-by: yeya24 <[email protected]>
Signed-off-by: yeya24 <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this really close to the finish line.
Signed-off-by: yeya24 <[email protected]>
Signed-off-by: yeya24 <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome! LGTM! 🥇
Let's give @bwplotka on last chance to review and we can merge.
Friendly ping @bwplotka. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few nits but overall LGTM 🤗
this requires more complex deduplication algorithms. For example one that is used to [deduplicate on the fly on the Querier](query.md#run-time-deduplication-of-ha-groups). This is common | ||
case when Prometheus HA replicas are used. [Offline deduplication for this is in progress](https://github.com/thanos-io/thanos/issues/1014). | ||
case when Prometheus HA replicas are used. You can enable this deduplication via `--deduplication.func=penalty` flag. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we please clarify in the docs in exactly which 1% of the cases the penalty-based deduplication algorithm doesn't work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe let's say something like:
even though the deduplication algorithm has been battle-tested for quite a long time but still very few issues come up from time to time such as https://github.com/thanos-io/thanos/issues/2890
? It would be more concrete. And it wouldn't sound like we know that it is bad in some 1% of cases & we aren't fixing it 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated docs. PTAL
Signed-off-by: yeya24 <[email protected]>
Signed-off-by: yeya24 <[email protected]>
Signed-off-by: yeya24 <[email protected]>
…-comp Signed-off-by: yeya24 <[email protected]>
Signed-off-by: yeya24 <[email protected]>
Signed-off-by: yeya24 <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have been running this new mode for a few days and haven't spotted any abnormalities. 💪 Let's merge?
I feel it is ready to go unless @bwplotka has some comments. |
Signed-off-by: yeya24 [email protected]
Fixes #1014
Changes
Another attempt on offline deduplication using the existing querier's penalty deduplication algorithm.
This pr is WIP as I need to do more testing and is currently blocked by prometheus/prometheus#8836, so I use my own fork here.But as mentioned in #1014 (comment), this deduplication algorithm works for 99% of cases but is wrong in 1% case. So it has its own risk.
Verification
Tested with 2 Prometheus replicas running in k8s. I replicate the minio bucket to a local directory as a filesystem objstore for testing. The local directory is used for running dedup compactor. There are 12h test data and I run 1 thanos store and 1 thanos querier for each bucket.
Undeduped bucket:
Query deduplication has to be enabled.
Bucket size: 41m (36m if normal compaction is enabled)
Offline deduped bucket:
Query deduplication disabled.
Bucket size: 19m