Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

query: Improve deduplication of same series (e.g from receiver) #2303

Closed
bwplotka opened this issue Mar 23, 2020 · 3 comments
Closed

query: Improve deduplication of same series (e.g from receiver) #2303

bwplotka opened this issue Mar 23, 2020 · 3 comments

Comments

@bwplotka
Copy link
Member

Right now it works for both Prometheus replicas and receiver replicas, but there is some optimization we can make if we work on receiver only data.

@stale
Copy link

stale bot commented May 1, 2020

Hello 👋 Looks like there was no activity on this issue for last 30 days.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗
If there will be no activity for next week, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.

@stale stale bot added the stale label May 1, 2020
@bwplotka
Copy link
Member Author

bwplotka commented May 2, 2020

Essentially we can dedup on chunk level, without iterating over samples (!) in query.

@stale stale bot removed the stale label May 2, 2020
@bwplotka
Copy link
Member Author

Also deduplicating exactly same series from store vs prometheus or many stores.

This might mean it would be useful to move deduplication layer to the proxy.go

bwplotka added a commit that referenced this issue May 13, 2020
…ng for StoreAPI.

Also: Merge same series together on proxy level instead select. This allows better dedup efficiency.

Partially fixes: #2303

Cases like overlapped data from store and sidecar and 1:1 duplicates are optimized as soon as it's possible.
This case was highly visible on GitLab repro data and exists in most of Thanos setup.

Signed-off-by: Bartlomiej Plotka <[email protected]>
bwplotka added a commit that referenced this issue May 13, 2020
…ng for StoreAPI.

Also: Merge same series together on proxy level instead select. This allows better dedup efficiency.

Partially fixes: #2303

Cases like overlapped data from store and sidecar and 1:1 duplicates are optimized as soon as it's possible.
This case was highly visible on GitLab repro data and exists in most of Thanos setup.

Signed-off-by: Bartlomiej Plotka <[email protected]>
bwplotka added a commit that referenced this issue May 13, 2020
…ng for StoreAPI.

Also: Merge same series together on proxy level instead select. This allows better dedup efficiency.

Partially fixes: #2303

Cases like overlapped data from store and sidecar and 1:1 duplicates are optimized as soon as it's possible.
This case was highly visible on GitLab repro data and exists in most of Thanos setup.

Signed-off-by: Bartlomiej Plotka <[email protected]>
bwplotka added a commit that referenced this issue May 13, 2020
…ng for StoreAPI.

Also: Merge same series together on proxy level instead select. This allows better dedup efficiency.

Partially fixes: #2303

Cases like overlapped data from store and sidecar and 1:1 duplicates are optimized as soon as it's possible.
This case was highly visible on GitLab repro data and exists in most of Thanos setup.

Signed-off-by: Bartlomiej Plotka <[email protected]>
bwplotka added a commit that referenced this issue May 15, 2020
…ng for StoreAPI.

Also: Merge same series together on proxy level instead select. This allows better dedup efficiency.

Partially fixes: #2303

Cases like overlapped data from store and sidecar and 1:1 duplicates are optimized as soon as it's possible.
This case was highly visible on GitLab repro data and exists in most of Thanos setup.

Signed-off-by: Bartlomiej Plotka <[email protected]>
brancz pushed a commit that referenced this issue May 18, 2020
…rting for StoreAPI. (#2603)

* Deduplicate chunk dups on proxy StoreAPI level. Recommend chunk sorting for StoreAPI.

Also: Merge same series together on proxy level instead select. This allows better dedup efficiency.

Partially fixes: #2303

Cases like overlapped data from store and sidecar and 1:1 duplicates are optimized as soon as it's possible.
This case was highly visible on GitLab repro data and exists in most of Thanos setup.

Signed-off-by: Bartlomiej Plotka <[email protected]>

* Optimized algorithm to combine series only on start.

Signed-off-by: Bartlomiej Plotka <[email protected]>

* Optimized chunk comparision for overlaps.

Signed-off-by: Bartlomiej Plotka <[email protected]>
bwplotka added a commit that referenced this issue Jun 3, 2020
…ng for StoreAPI.

Also: Merge same series together on proxy level instead select. This allows better dedup efficiency.

Partially fixes: #2303

Cases like overlapped data from store and sidecar and 1:1 duplicates are optimized as soon as it's possible.
This case was highly visible on GitLab repro data and exists in most of Thanos setup.

Signed-off-by: Bartlomiej Plotka <[email protected]>
bwplotka added a commit that referenced this issue Jun 3, 2020
…rting for StoreAPI + Optimized iter chunk dedup. (#2710)

* Deduplicate chunk dups on proxy StoreAPI level. Recommend chunk sorting for StoreAPI.

Also: Merge same series together on proxy level instead select. This allows better dedup efficiency.

Partially fixes: #2303

Cases like overlapped data from store and sidecar and 1:1 duplicates are optimized as soon as it's possible.
This case was highly visible on GitLab repro data and exists in most of Thanos setup.

Signed-off-by: Bartlomiej Plotka <[email protected]>

* Optimized algorithm to combine series only on start.

Signed-off-by: Bartlomiej Plotka <[email protected]>

* Optimized chunk comparision for overlaps.

Signed-off-by: Bartlomiej Plotka <[email protected]>

* Optimized deduplication for deduplicated chunk on query level as well.

Never use proto .String() in fast path!

Signed-off-by: Bartlomiej Plotka <[email protected]>
# Conflicts:
#	CHANGELOG.md
#	pkg/store/storepb/custom.go
#	pkg/store/storepb/custom_test.go
bwplotka added a commit that referenced this issue Jun 3, 2020
…rting for StoreAPI + Optimized iter chunk dedup. (#2710)

* Deduplicate chunk dups on proxy StoreAPI level. Recommend chunk sorting for StoreAPI.

Also: Merge same series together on proxy level instead select. This allows better dedup efficiency.

Partially fixes: #2303

Cases like overlapped data from store and sidecar and 1:1 duplicates are optimized as soon as it's possible.
This case was highly visible on GitLab repro data and exists in most of Thanos setup.

Signed-off-by: Bartlomiej Plotka <[email protected]>

* Optimized algorithm to combine series only on start.

Signed-off-by: Bartlomiej Plotka <[email protected]>

* Optimized chunk comparision for overlaps.

Signed-off-by: Bartlomiej Plotka <[email protected]>

* Optimized deduplication for deduplicated chunk on query level as well.

Never use proto .String() in fast path!

Signed-off-by: Bartlomiej Plotka <[email protected]>
# Conflicts:
#	CHANGELOG.md
#	pkg/store/storepb/custom.go
#	pkg/store/storepb/custom_test.go
brancz pushed a commit that referenced this issue Jun 4, 2020
…rting for StoreAPI + Optimized iter chunk dedup. (#2710) (#2711)

* Deduplicate chunk dups on proxy StoreAPI level. Recommend chunk sorting for StoreAPI.

Also: Merge same series together on proxy level instead select. This allows better dedup efficiency.

Partially fixes: #2303

Cases like overlapped data from store and sidecar and 1:1 duplicates are optimized as soon as it's possible.
This case was highly visible on GitLab repro data and exists in most of Thanos setup.

Signed-off-by: Bartlomiej Plotka <[email protected]>

* Optimized algorithm to combine series only on start.

Signed-off-by: Bartlomiej Plotka <[email protected]>

* Optimized chunk comparision for overlaps.

Signed-off-by: Bartlomiej Plotka <[email protected]>

* Optimized deduplication for deduplicated chunk on query level as well.

Never use proto .String() in fast path!

Signed-off-by: Bartlomiej Plotka <[email protected]>
# Conflicts:
#	CHANGELOG.md
#	pkg/store/storepb/custom.go
#	pkg/store/storepb/custom_test.go
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant