Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add aggregatorMergeStrategy property in SegmentMetadata queries #14560

Merged
merged 5 commits into from
Jul 13, 2023

Conversation

abhishekrb19
Copy link
Contributor

@abhishekrb19 abhishekrb19 commented Jul 10, 2023

Motivation

SegmentMetadata queries currently supports two types of aggregator merge strategies, namely strict and lenient, when "aggregators" analysis type is enabled. Users often want something less strict than a lenient policy, where the most recent aggregator is selected for a column in an evolving data model. A strict strategy is best suited once the data model is locked in.

This PR:

  • Adds a new property, aggregatorMergeStrategy, to segmentMetadata queries. Please see docs/querying/segmentmetadataquery.md for more information. This also allows us to define an earliest strategy (similar to the latest strategy) or more sophisticated merge strategies as needed.
  • Deprecates the existing lenientAggregatorMerge boolean property in favor of aggregatorMergeStrategy.
  • Validates that both the old and new property is not set. The new property is backwards compatible with the deprecated property, so the default is strict when aggregators analysis type is enabled.
  • Changes the segment id of merged segments as part of segmentMetadata analysis, from merged to <datasource>_<interval>_merged_<partition_number> format.
  • Adjusts unit tests to assert the complete SegmentAnalysis object instead of just the aggregators. Also, add tests for latest aggregator merge and backwards compatibility logic.

Release note

lenientAggregatorMerge property in segment metadata queries is deprecated in favor of a new property aggregatorMergeStrategy. aggregatorMergeStrategy also supports a latest strategy in addition to existing strict and lenient strategies from lenientAggregatorMerge.


Key changed/added classes in this PR
  • AggregatorMergeStrategy.java
  • SegmentMetadataQuery.java
  • SegmentMetadataQueryQueryToolChest.java
  • SegmentMetadataQueryTest.java
  • SegmentMetadataQueryQueryToolChestTest.java

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • been tested in a test Druid cluster.

- Adds a new property aggregatorMergeStrategy to segmentMetadata query.
aggregatorMergeStrategy currently supports three types of merge strategies -
the legacy strict and lenient strategies, and the new latest strategy.
- The latest strategy considers the latest aggregator from the latest segment
by time order when there's a conflict when merging aggregators from different
segments.
- Deprecate lenientAggregatorMerge property; The API validates that both the new
and old properties are not set, and returns an exception.
- When merging segments as part of segmentMetadata query, the segments have a more
elaborate id -- <datasource>_<interval>_merged_<partition_number> format, similar to
the name format that segments usually contain. Previously it was simply "merged".
- Adjust unit tests to test the latest strategy, to assert the returned complete
SegmentAnalysis object instead of just the aggregators for completeness.
Comment on lines +858 to +864
FACTORY.mergeRunners(
Execs.directExecutor(),
Lists.newArrayList(
toolChest.preMergeQueryDecoration(runner1),
toolChest.preMergeQueryDecoration(runner2)
)
)

Check notice

Code scanning / CodeQL

Deprecated method or constructor invocation

Invoking [QueryRunnerFactory.mergeRunners](1) should be avoided because it has been deprecated.
@zachjsh zachjsh self-requested a review July 12, 2023 16:33
Copy link
Contributor

@zachjsh zachjsh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@ektravel ektravel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few suggestions related to the documentation.

@zachjsh zachjsh merged commit f4ee58e into apache:master Jul 13, 2023
sergioferragut pushed a commit to sergioferragut/druid that referenced this pull request Jul 21, 2023
…ache#14560)

* Add aggregatorMergeStrategy property to SegmentMetadaQuery.

- Adds a new property aggregatorMergeStrategy to segmentMetadata query.
aggregatorMergeStrategy currently supports three types of merge strategies -
the legacy strict and lenient strategies, and the new latest strategy.
- The latest strategy considers the latest aggregator from the latest segment
by time order when there's a conflict when merging aggregators from different
segments.
- Deprecate lenientAggregatorMerge property; The API validates that both the new
and old properties are not set, and returns an exception.
- When merging segments as part of segmentMetadata query, the segments have a more
elaborate id -- <datasource>_<interval>_merged_<partition_number> format, similar to
the name format that segments usually contain. Previously it was simply "merged".
- Adjust unit tests to test the latest strategy, to assert the returned complete
SegmentAnalysis object instead of just the aggregators for completeness.

* Don't explicitly set strict strategy in tests

* Apply suggestions from code review

Co-authored-by: Katya Macedo  <[email protected]>

* Update docs/querying/segmentmetadataquery.md

* Apply suggestions from code review

Co-authored-by: Katya Macedo  <[email protected]>

---------

Co-authored-by: Katya Macedo <[email protected]>
@LakshSingla LakshSingla added this to the 28.0 milestone Oct 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants