Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor children aggregator into a generic ParentJoinAggregator #34845

Merged
merged 2 commits into from
Oct 26, 2018

Conversation

jimczi
Copy link
Contributor

@jimczi jimczi commented Oct 25, 2018

This commit adds a new ParentJoinAggregator that implements a join using global ordinals
in a way that can be reused by the children and the upcoming parent aggregation.
This new aggregator is a refactor of the existing ParentToChildrenAggregator with two main changes:

  • It uses a dense bit array instead of a long array when the aggregation does not have any parent.
  • It uses a single aggregator per bucket if it is nested under another aggregation.

For the latter case we use a MultiBucketAggregatorWrapper in the factory in order to ensure that each
instance of the aggregator handles a single bucket. This is more inlined with the strategy we use for other
aggregations like terms aggregation for instance since the number of buckets to handle should be low (thanks to the breadth_first strategy).

This change is also required for #34210 which adds the parent aggregation in the parent-join module.

This commit adds a new ParentJoinAggregator that implements a join using global ordinals
in a way that can be reused by the `children` and the upcoming `parent` aggregation.
This new aggregator is a refactor of the existing ParentToChildrenAggregator with two main changes:
* It uses a dense bit array instead of a long array when the aggregation does not have any parent.
* It uses a single aggregator per bucket if it is nested under another aggregation.
For the latter case we use a `MultiBucketAggregatorWrapper` in the factory in order to ensure that each
instance of the aggregator handles a single bucket. This is more inlined with the strategy we use for other
aggregations like `terms` aggregation for instance since the number of buckets to handle should be low (thanks to the breadth_first strategy).
This change is also required for elastic#34210 which adds the `parent` aggregation in the parent-join module.

Relates elastic#34508
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search-aggs

Copy link
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 These changes make sense to me.

@jimczi
Copy link
Contributor Author

jimczi commented Oct 26, 2018

run gradle build tests

@jimczi jimczi merged commit 1b879ea into elastic:master Oct 26, 2018
@jimczi jimczi deleted the parent_join_aggregator branch October 26, 2018 14:26
jasontedor added a commit to jasontedor/elasticsearch that referenced this pull request Oct 26, 2018
* 'master' of github.com:elastic/elasticsearch:
  Fix line length for org.elasticsearch.common.* files (elastic#34888)
  [ML] Extract common native process base class (elastic#34856)
  Refactor children aggregator into a generic ParentJoinAggregator (elastic#34845)
  [Style] Fix line lengths in action.admin.indices (elastic#34890)
  HLRC - add support for source exists API (elastic#34519)
  [CCR] Retry when no index shard stats can be found (elastic#34852)
  [Docs] audit logfile structured format (elastic#34584)
  [Test] Fix FullClusterRestartIT.testShrink() with copy_settings param (elastic#34853)
  Fix LineLength Check Suppressions: index.fielddata (elastic#34891)
  TEST: Stablize Minio Free Port Search (elastic#34894)
  Delete flaky SettingsBasedHostProviderIT test (elastic#34813)
  [ML] Include message in field_stats for text log files (elastic#34861)
  [TEST] HLRC: Expand failure messages in API checks (elastic#34838)
  Lowercase static final DeprecationLogger instance names (elastic#34887)
jimczi added a commit that referenced this pull request Oct 26, 2018
)

This commit adds a new ParentJoinAggregator that implements a join using global ordinals
in a way that can be reused by the `children` and the upcoming `parent` aggregation.
This new aggregator is a refactor of the existing ParentToChildrenAggregator with two main changes:
* It uses a dense bit array instead of a long array when the aggregation does not have any parent.
* It uses a single aggregator per bucket if it is nested under another aggregation.
For the latter case we use a `MultiBucketAggregatorWrapper` in the factory in order to ensure that each
instance of the aggregator handles a single bucket. This is more inlined with the strategy we use for other
aggregations like `terms` aggregation for instance since the number of buckets to handle should be low (thanks to the breadth_first strategy).
This change is also required for #34210 which adds the `parent` aggregation in the parent-join module.

Relates #34508
@jimczi
Copy link
Contributor Author

jimczi commented Oct 26, 2018

Thanks @martijnvg !

jasontedor added a commit to martijnvg/elasticsearch that referenced this pull request Oct 26, 2018
* master:
  Introduce cross-cluster replication API docs (elastic#34726)
  Responses can use Writeable.Reader interface (elastic#34655)
  SQL: Provide null-safe scripts for Not and Neg (elastic#34877)
  Fix put/resume follow request parsing (elastic#34913)
  Fix line length for org.elasticsearch.common.* files (elastic#34888)
  [ML] Extract common native process base class (elastic#34856)
  Refactor children aggregator into a generic ParentJoinAggregator (elastic#34845)
  [Style] Fix line lengths in action.admin.indices (elastic#34890)
  HLRC - add support for source exists API (elastic#34519)
centic9 added a commit to centic9/elasticsearch that referenced this pull request Oct 26, 2018
The required changes are now even less as the new base class
ParentJoinAggregator implements collecting buckets in a different
way.
kcm pushed a commit that referenced this pull request Oct 30, 2018
)

This commit adds a new ParentJoinAggregator that implements a join using global ordinals
in a way that can be reused by the `children` and the upcoming `parent` aggregation.
This new aggregator is a refactor of the existing ParentToChildrenAggregator with two main changes:
* It uses a dense bit array instead of a long array when the aggregation does not have any parent.
* It uses a single aggregator per bucket if it is nested under another aggregation.
For the latter case we use a `MultiBucketAggregatorWrapper` in the factory in order to ensure that each
instance of the aggregator handles a single bucket. This is more inlined with the strategy we use for other
aggregations like `terms` aggregation for instance since the number of buckets to handle should be low (thanks to the breadth_first strategy).
This change is also required for #34210 which adds the `parent` aggregation in the parent-join module.

Relates #34508
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants