-
Notifications
You must be signed in to change notification settings - Fork 455
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP [dbnode] Refactor posting list operations to be lazily evaluated and zero copy #2791
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM w nits
var ( | ||
union = make([]postings.List, 0, len(s.searchers)) | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: don't need var block here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had a cursory glance over but didn't go super deep on it, figuring a lot of it was very similar to Pilosa's implementation; if this is a bad assumption can go back with more discerning review
"github.com/uber-go/tally" | ||
) | ||
|
||
// LatencyBuckets are a set of latency buckets useful for measuring things. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit measuring latencies
func (i *multiBitmap) IsEmpty() bool { | ||
iter := i.Iterator() | ||
hasAny := iter.Next() | ||
_ = iter.Err() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, should this return false if iter.Err() is non-nil? Otherwise no need to call Err
type containerIterator interface { | ||
NextContainer() bool | ||
ContainerKey() uint64 | ||
ContainerUnion(ctx containerOpContext, target *bitmapContainer) | ||
ContainerIntersect(ctx containerOpContext, target *bitmapContainer) | ||
ContainerNegate(ctx containerOpContext, target *bitmapContainer) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: do these need to be exported? If so, may need comments (and may be worth exporting containerIterator?)
i.bitmap.Reset(true) | ||
|
||
intersects := i.filter(i.multiContainerIter.containerIters, multiContainerOpIntersect) | ||
negates := i.filter(i.multiContainerIter.containerIters, multiContainerOpNegate) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't this overwrite the i.filtered
in intersects
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah yes, TY!
func (i *multiBitmapContainersIterator) NextContainer() bool { | ||
if len(i.iters) != 0 { | ||
// Exhausted. | ||
return true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be
if len(i.iters) == 0 {
return false
}
src/x/cache/lru_cache.go
Outdated
|
||
// Tell any other callers that we're done loading | ||
if entry.loadingCh != nil { | ||
close(entry.loadingCh) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice pattern 👍
} | ||
// Only valid if all intersecting actually matched, | ||
// if zero intersecting then postings does not contain ID. | ||
return len(i.intersect) > 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit; can move this to 181, before checking intersectNegate
lists
case elem.multiBitmap != nil: | ||
it = elem.multiBitmap.containerIterator() | ||
|
||
case elem.bitmap != nil: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: needs to fail if both set?
|
||
require.False(t, first.Equal(second)) | ||
} | ||
// func TestRoaringPostingsListEqualWithOtherNonRoaring(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit; delete?
@@ -39,16 +39,17 @@ import ( | |||
|
|||
func TestConcurrentQueries(t *testing.T) { | |||
parameters := gopter.DefaultTestParameters() | |||
seed := time.Now().UnixNano() | |||
parameters.MinSuccessfulTests = 100 | |||
// seed := time.Now().UnixNano() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit; can probably resume using rng seed
// simpleReader, err := simpleSeg.Reader() | ||
// require.NoError(t, err) | ||
// simpleExec := executor.NewExecutor([]index.Reader{simpleReader}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit; delete
require.NoError(t, err) | ||
matchedDocs, err := collectDocs(dOrg) | ||
require.NoError(t, err) | ||
require.NoError(t, err, fmt.Sprintf("query: %v\n", q.String())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit; require.NoError(t, err, q.String())
may be easier?
if okA && okB && countA != countB { | ||
return false | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If using countfast here, we should also make sure that
if !okA && !okB {
return EqualIterator(a.Iterator(), b.Iterator())
}
return false
intersects []postings.List, | ||
negates []postings.List, | ||
) (postings.List, error) { | ||
intersect := make([]readOnlyIterable, 0, len(intersects)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: prealloc a slice of iterables := make([]readOnlyIterable, 0, len(intersects) + len(negates)
and have both intersect and negate be taken from that, e.g. intersect, negate := iterables[:len(intersects)], iterables[len(intersects):]
Contains(id postings.ID) bool | ||
ContainerIterator() containerIterator | ||
} | ||
|
||
type containerIterator interface { | ||
NextContainer() bool | ||
ContainerKey() uint64 | ||
ContainerUnion(ctx containerOpContext, target *bitmapContainer) | ||
ContainerIntersect(ctx containerOpContext, target *bitmapContainer) | ||
ContainerNegate(ctx containerOpContext, target *bitmapContainer) | ||
Err() error | ||
Close() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit; comments here
func (i *multiBitmap) IsEmpty() bool { | ||
iter := i.Iterator() | ||
hasAny := iter.Next() | ||
_ = iter.Err() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do these need to check err?
…indexed snapshot during a GC run
c0e6836
to
3699014
Compare
…-index-query-allocs-rebase
…ding to rebuild all cached searches again
…terministic query key, also returned optimized form
What this PR does / why we need it:
This reduces a significant amount of all index query allocations by implementing the conjunction search as a lazy iterator over existing postings lists. It also caches regexp compilations with an LRU cache and allows setting different sizes for the regexp cache.
This change completely re-implements roaring bitmaps for query execution, this means pilosa roaring bitmaps are only used and index segment build time and all queries execute new read only roaring bitmaps which are completely mmap backed without any bitmap container/other allocations involved.
Special notes for your reviewer:
Does this PR introduce a user-facing and/or backwards incompatible change?:
Does this PR require updating code package or user-facing documentation?: