New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Speed up ordinal lookups in composite aggregation #78313

Merged

jimczi merged 3 commits into elastic:master from jimczi:composite_sorted_ordinals

Oct 6, 2021

Contributor

jimczi commented Sep 27, 2021 •

edited

Loading

This change is an optimization on top of #74559, that sorts ordinals to perform
lookups. The sorting ensures that we don't do the de-compression of
blocks in the dictionary of terms more than necessary.
In the worst case today, we can decompress the same block for each lookup
term per segment, while this change requires only one decompression.

This commit also creates the doc values lookup once
per request per segment. This is useful when inverted lists
are used to shortcut the collection since terms are already sorted
in the dictionary.


          Speed up ordinal lookups in composte aggregation

6c2744b

This change is an optimization on top of #, that sorts ordinals to perform
lookups. The sorting ensures that we don't do the de-compression of
 blocks in the dictionary of terms more than necessary.
 In the worst case today, we can decompress the same block for each lookup
term per segment, while this change requires only one decompression.

This commit also creates the doc values lookup once
per request per segment. This is useful when inverted lists
 are used to shortcut the collection since terms are already sorted
 in the dictionary.

jimczi added >enhancement :Analytics/Aggregations v8.0.0 v7.16.0 labels

jimczi requested a review from ywelsch

September 27, 2021 11:21

elasticmachine added the Team:Analytics label

Collaborator

elasticmachine commented Sep 27, 2021

Pinging @elastic/es-analytics-geo (Team:Analytics)

ywelsch reviewed

View reviewed changes

Contributor

ywelsch left a comment

Thank you for looking into this. I've left two comments to address (and a bunch of optional ones)

...rc/main/java/org/elasticsearch/search/aggregations/bucket/composite/OrdinalValuesSource.java Outdated Show resolved Hide resolved

...rc/main/java/org/elasticsearch/search/aggregations/bucket/composite/OrdinalValuesSource.java Outdated Show resolved Hide resolved

...rc/main/java/org/elasticsearch/search/aggregations/bucket/composite/OrdinalValuesSource.java

                       assert leafReaderContextChanged == false || invariant(); // for performance reasons only check invariant upon change
                       return new LeafBucketCollector() {
                           @Override
                           public void collect(int doc, long bucket) throws IOException {
                               // caller of getLeafCollector ensures that collection happens before requesting a new leaf collector
                               // this is important as ordinals only make sense in the context of the current lookup
-                              assert dvs == lookup;

Contributor

ywelsch Sep 27, 2021

we could still assert that they have same leaf reader context ordinal? i.e. that the ordinal-based operations here make sense.

...rc/main/java/org/elasticsearch/search/aggregations/bucket/composite/OrdinalValuesSource.java Show resolved Hide resolved

...rc/main/java/org/elasticsearch/search/aggregations/bucket/composite/OrdinalValuesSource.java Outdated Show resolved Hide resolved

...rc/main/java/org/elasticsearch/search/aggregations/bucket/composite/OrdinalValuesSource.java Outdated Show resolved Hide resolved

...rc/main/java/org/elasticsearch/search/aggregations/bucket/composite/OrdinalValuesSource.java Show resolved Hide resolved

ywelsch changed the title ~~Speed up ordinal lookups in composte aggregation~~ Speed up ordinal lookups in composite aggregation

jimczi added 2 commits

September 27, 2021 15:46


          apply review comments

18ee058


          remove noop if

f54398f

Member

not-napoleon commented Sep 27, 2021

Have we run the ML performance test against this branch? This seems like it should help, but it'd be good to have the numbers to back that up.

ywelsch approved these changes

View reviewed changes

Contributor

ywelsch left a comment

I've left two more minor comments, looking good o.w.

...rc/main/java/org/elasticsearch/search/aggregations/bucket/composite/OrdinalValuesSource.java Show resolved Hide resolved

...rc/main/java/org/elasticsearch/search/aggregations/bucket/composite/OrdinalValuesSource.java

                       if (leafReaderContextChanged) {
-                          remapOrdinals(lookup, dvs);
-                          leafReaderOrd = context.ord;
+                          // use a separate instance for ordinal and term lookups, that is cached per segment

Contributor

ywelsch Sep 28, 2021

This comment isn't right for this method. Here, after your refactoring, we don't have a separate instance as there is no iteration anymore.

imotov reviewed

View reviewed changes

Contributor

imotov left a comment

Have we run the ML performance test against this branch? This seems like it should help, but it'd be good to have the numbers to back that up.

It would be great to have the results of this test before we merge this in, otherwise without an existing performance infrastructure, I feel like we are running blind with these changes.

Contributor

ywelsch commented Sep 30, 2021

Have we run the ML performance test against this branch? This seems like it should help, but it'd be good to have the numbers to back that up.

It would be great to have the results of this test before we merge this in, otherwise without an existing performance infrastructure, I feel like we are running blind with these changes.

Jim has done some ad-hoc testing.

Once this PR is merged (e.g. master-only in a first step), it can be validated by the ML QA team (@wwang500).

There's ongoing work to add benchmarking capabilities to Rally for paging through composite aggregations.

I would like this PR to go into 7.16 and not be blocked on benchmarking infrastructure.

stefnestor added a commit that referenced this pull request


          Add prod warning to Composite Aggregation

[7.14 update](http://github.com/elastic/elasticsearch/pull/74559) exacerbated performance concerns on composite aggregations & [fix in discussion](#78313) is either reverting or will come in 7.16. In the mean time users need disclaimer that before change this search was expensive & esp. now or w/upgrade may impact cluster performance.

stefnestor mentioned this pull request

Add prod warning to Composite Aggregation #78723

Merged

Contributor Author

jimczi commented Oct 6, 2021

I am going to merge in master only to allow ml to run their performance test. Depending on the results we'll decide if we can backport safely.

jimczi merged commit f2580da into elastic:master

jimczi deleted the composite_sorted_ordinals branch

October 6, 2021 16:19

jimczi added the backport pending label

jimczi mentioned this pull request

Fix failures in CompositeAggregatorTests #78926

Merged

jimczi added a commit that referenced this pull request


          Fix failures in CompositeAggregatorTests

68bc320

The random tests in CompositeAggregatorTests create lots of segments since #78313. That can lead to out of memory in tests.
The additional commits were added to simulate the multi-segments case but the random index writer should already perform some random overcommits.

Closes #78919

jakelandis added v8.0.0-beta1 and removed v8.0.0 labels

not-napoleon removed the backport pending label

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/Aggregations >enhancement Team:Analytics v7.16.0 v8.0.0-beta1