-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speed up ordinal lookups in composite aggregation #78313
Conversation
This change is an optimization on top of #, that sorts ordinals to perform lookups. The sorting ensures that we don't do the de-compression of blocks in the dictionary of terms more than necessary. In the worst case today, we can decompress the same block for each lookup term per segment, while this change requires only one decompression. This commit also creates the doc values lookup once per request per segment. This is useful when inverted lists are used to shortcut the collection since terms are already sorted in the dictionary.
Pinging @elastic/es-analytics-geo (Team:Analytics) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for looking into this. I've left two comments to address (and a bunch of optional ones)
...rc/main/java/org/elasticsearch/search/aggregations/bucket/composite/OrdinalValuesSource.java
Outdated
Show resolved
Hide resolved
...rc/main/java/org/elasticsearch/search/aggregations/bucket/composite/OrdinalValuesSource.java
Outdated
Show resolved
Hide resolved
assert leafReaderContextChanged == false || invariant(); // for performance reasons only check invariant upon change | ||
return new LeafBucketCollector() { | ||
@Override | ||
public void collect(int doc, long bucket) throws IOException { | ||
// caller of getLeafCollector ensures that collection happens before requesting a new leaf collector | ||
// this is important as ordinals only make sense in the context of the current lookup | ||
assert dvs == lookup; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we could still assert that they have same leaf reader context ordinal? i.e. that the ordinal-based operations here make sense.
...rc/main/java/org/elasticsearch/search/aggregations/bucket/composite/OrdinalValuesSource.java
Show resolved
Hide resolved
...rc/main/java/org/elasticsearch/search/aggregations/bucket/composite/OrdinalValuesSource.java
Outdated
Show resolved
Hide resolved
...rc/main/java/org/elasticsearch/search/aggregations/bucket/composite/OrdinalValuesSource.java
Outdated
Show resolved
Hide resolved
...rc/main/java/org/elasticsearch/search/aggregations/bucket/composite/OrdinalValuesSource.java
Show resolved
Hide resolved
Have we run the ML performance test against this branch? This seems like it should help, but it'd be good to have the numbers to back that up. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've left two more minor comments, looking good o.w.
...rc/main/java/org/elasticsearch/search/aggregations/bucket/composite/OrdinalValuesSource.java
Show resolved
Hide resolved
if (leafReaderContextChanged) { | ||
remapOrdinals(lookup, dvs); | ||
leafReaderOrd = context.ord; | ||
// use a separate instance for ordinal and term lookups, that is cached per segment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment isn't right for this method. Here, after your refactoring, we don't have a separate instance as there is no iteration anymore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have we run the ML performance test against this branch? This seems like it should help, but it'd be good to have the numbers to back that up.
It would be great to have the results of this test before we merge this in, otherwise without an existing performance infrastructure, I feel like we are running blind with these changes.
Jim has done some ad-hoc testing. Once this PR is merged (e.g. master-only in a first step), it can be validated by the ML QA team (@wwang500). There's ongoing work to add benchmarking capabilities to Rally for paging through composite aggregations. I would like this PR to go into 7.16 and not be blocked on benchmarking infrastructure. |
[7.14 update](http://github.com/elastic/elasticsearch/pull/74559) exacerbated performance concerns on composite aggregations & [fix in discussion](#78313) is either reverting or will come in 7.16. In the mean time users need disclaimer that before change this search was expensive & esp. now or w/upgrade may impact cluster performance.
I am going to merge in master only to allow ml to run their performance test. Depending on the results we'll decide if we can backport safely. |
This change is an optimization on top of #74559, that sorts ordinals to perform
lookups. The sorting ensures that we don't do the de-compression of
blocks in the dictionary of terms more than necessary.
In the worst case today, we can decompress the same block for each lookup
term per segment, while this change requires only one decompression.
This commit also creates the doc values lookup once
per request per segment. This is useful when inverted lists
are used to shortcut the collection since terms are already sorted
in the dictionary.