Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix badger merge-join algorithm to correctly filter indexes #1721

Merged
merged 2 commits into from
Aug 19, 2019

Conversation

burmanm
Copy link
Contributor

@burmanm burmanm commented Aug 8, 2019

Which problem is this PR solving?

Resolves #1719, the index seeks were not correctly merged and filtered.

Short description of the changes

Make the merge-join correctly update two indices when encountering equal items. Also, the input must be the output of previous merge. Also, changed ASC to DESC reversing to happen after the top query filtering - thus reducing unnecessary work.

@codecov
Copy link

codecov bot commented Aug 9, 2019

Codecov Report

Merging #1721 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1721      +/-   ##
==========================================
+ Coverage   98.36%   98.36%   +<.01%     
==========================================
  Files         193      193              
  Lines        9358     9361       +3     
==========================================
+ Hits         9205     9208       +3     
  Misses        119      119              
  Partials       34       34
Impacted Files Coverage Δ
plugin/storage/badger/spanstore/reader.go 96.66% <100%> (+0.03%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 98fd69a...4455754. Read the comment docs.

@pavolloffay pavolloffay added the storage/badger Issues related to badger storage label Aug 9, 2019
@pavolloffay pavolloffay changed the title Fix badger merge-join algorithm to correctly filter indexes, closes #1719 Fix badger merge-join algorithm to correctly filter indexes Aug 14, 2019
plugin/storage/badger/spanstore/reader.go Outdated Show resolved Hide resolved
plugin/storage/badger/spanstore/reader.go Outdated Show resolved Hide resolved
@@ -346,53 +345,60 @@ func (r *TraceReader) durationQueries(query *spanstore.TraceQueryParameters, ids
return ids
}

func mergeJoinIds(left, right [][]byte) [][]byte {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename to mergeEqualIds ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the ids sorted? Maybe that should be documented somewhere.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's mentioned at the beginning of the package. Everything is sorted (it's a sorted K/V).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As for the name, it's because the algorithm is called "sort-merge join" and is used in relational databases. Here the sorting phase happens in the DB and the merge phase in this code. It's pretty descriptive in my opinion since if someone wants to improve this method such as doing it parallel or using sharding from multiple badgers there are known algorithms for those variations too (which would underneath use this in any case).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation. Now it rings a bell..

Copy link
Contributor

@objectiser objectiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - It essentially looks like the same id must exist in the array of id lists supplied - sorry haven't had a chance to dig into the implementation in more detail - is there a quick explanation of what each id list represents?

plugin/storage/badger/spanstore/reader.go Outdated Show resolved Hide resolved
@@ -200,6 +205,7 @@ func TestIndexSeeks(t *testing.T) {
params.OperationName = "operation-1"
tags := make(map[string]string)
tags["k11"] = "val0"
tags["error"] = "true"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious why this was added, as doesn't seem related to the PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devil is in the details. That single line exploits the bug (the test fails with older version) since it adds another index query against the tags.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As for the id list, it is basically the list of matches for the search query. A form of a posting list (of traceIDs) if thinking in terms of the ES.

In terms of relational database, it's equivalent to something like: SELECT id FROM dbo.spans WHERE service = 'invoices'

That is, a single id list is equivalent to that one. Just imagine each id list is one similar query, touching a single index and single value. It doesn't matter if the index is the same or not (so one query could be against service, one against tags index etc).

Signed-off-by: Michael Burman <[email protected]>
Copy link
Contributor

@objectiser objectiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@burmanm Thanks for the explanation.

@pavolloffay pavolloffay merged commit ecdecd1 into jaegertracing:master Aug 19, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
storage/badger Issues related to badger storage
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Badger index merge is working incorrectly
3 participants