-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve metric queries by computing samples at the edges. #2293
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Wondering how we're going to achieve fast mutation of labels. Signed-off-by: Cyril Tovena <[email protected]>
I realize I need hash for deduping lines. going to benchmark somes. Signed-off-by: Cyril Tovena <[email protected]>
Signed-off-by: Cyril Tovena <[email protected]>
Signed-off-by: Cyril Tovena <[email protected]>
Signed-off-by: Cyril Tovena <[email protected]>
Signed-off-by: Cyril Tovena <[email protected]>
Signed-off-by: Cyril Tovena <[email protected]>
Signed-off-by: Cyril Tovena <[email protected]>
Signed-off-by: Cyril Tovena <[email protected]>
Signed-off-by: Cyril Tovena <[email protected]>
…arams. Signed-off-by: Cyril Tovena <[email protected]>
Signed-off-by: Cyril Tovena <[email protected]>
Signed-off-by: Cyril Tovena <[email protected]>
Signed-off-by: Cyril Tovena <[email protected]>
Signed-off-by: Cyril Tovena <[email protected]>
Signed-off-by: Cyril Tovena <[email protected]>
Signed-off-by: Cyril Tovena <[email protected]>
Signed-off-by: Cyril Tovena <[email protected]>
Signed-off-by: Cyril Tovena <[email protected]>
Signed-off-by: Cyril Tovena <[email protected]>
Signed-off-by: Cyril Tovena <[email protected]>
Signed-off-by: Cyril Tovena <[email protected]>
Signed-off-by: Cyril Tovena <[email protected]>
Signed-off-by: Cyril Tovena <[email protected]>
Signed-off-by: Cyril Tovena <[email protected]>
Signed-off-by: Cyril Tovena <[email protected]>
Signed-off-by: Cyril Tovena <[email protected]>
Signed-off-by: Cyril Tovena <[email protected]>
Signed-off-by: Cyril Tovena <[email protected]>
Signed-off-by: Cyril Tovena <[email protected]>
Signed-off-by: Cyril Tovena <[email protected]>
Signed-off-by: Cyril Tovena <[email protected]>
Signed-off-by: Cyril Tovena <[email protected]>
Signed-off-by: Cyril Tovena <[email protected]>
Signed-off-by: Cyril Tovena <[email protected]>
Codecov Report
@@ Coverage Diff @@
## master #2293 +/- ##
==========================================
+ Coverage 62.29% 62.55% +0.26%
==========================================
Files 158 159 +1
Lines 12766 13361 +595
==========================================
+ Hits 7952 8358 +406
- Misses 4201 4363 +162
- Partials 613 640 +27
|
cyriltovena
added a commit
to cyriltovena/loki
that referenced
this pull request
Jul 6, 2020
This PR removes mostcommon and sort insert function in the heap iterator. I discovered while working on grafana#2293 that those are actually not helping since we're deduping those lines anyways. There were no tests checking if deduping was correctly working so I did added those. Bonus point this means deduping will run faster and the code is less complex. The only side effect is that the order of entries that are at the same timestamp, before the most common entry would appear first, now we keep the same order as we stored them, which I think is better. I also change the label ordering, I think whether we are forward or backward we should keep the same aphabetical labels ordering not sure why direction was altering this before. Signed-off-by: Cyril Tovena <[email protected]>
owen-d
pushed a commit
that referenced
this pull request
Jul 8, 2020
* Improve entry deduplication. This PR removes mostcommon and sort insert function in the heap iterator. I discovered while working on #2293 that those are actually not helping since we're deduping those lines anyways. There were no tests checking if deduping was correctly working so I did added those. Bonus point this means deduping will run faster and the code is less complex. The only side effect is that the order of entries that are at the same timestamp, before the most common entry would appear first, now we keep the same order as we stored them, which I think is better. I also change the label ordering, I think whether we are forward or backward we should keep the same aphabetical labels ordering not sure why direction was altering this before. Signed-off-by: Cyril Tovena <[email protected]> * Improve heap iterator backward test. Signed-off-by: Cyril Tovena <[email protected]>
slim-bean
approved these changes
Jul 11, 2020
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More great work @cyriltovena !! Great work with tests!
# Conflicts: # pkg/ingester/instance.go # pkg/logql/series_extractor_test.go
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR pushes the metric extraction/transformation to the edges (ingester/storage) instead of doing it in the engine. This allow to create metric without making a string allocation from the line buffer while decompressing which reduce drastically memory allocations, and speed up those metric queries.
I have observed 2x improvement for all metric queries. Deduping of log line is done by using a hash of the log line and not the content anymore, I'm using xxhash which has shown very good performance and few chances of collisions see https://github.com/Cyan4973/xxHash.
Another interesting changes, the store now implement
chunk.Store
andlogql.Querier
this makes it easier to use it with the LogQL engine.The PR is big as it splits the whole code base in 2.
Next possible steps:
Seek
function in the SampleIterator allowing range vector iterator to skip through lazy chunk and blocks. Specially when doing query where the range is smaller than the step e.grate({app="foo"}[1m])
with a step of 5mWhen deploying this change, ingester should be fully roll out first, as it introduces a new GRPC service for requesting sample to ingester.
I really wanted to get this change in before we introduce LogQL v2, I believe now is easier. Again I'm sorry for the big PR.