-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add aggregatorMergeStrategy
property in SegmentMetadata queries
#14560
Conversation
- Adds a new property aggregatorMergeStrategy to segmentMetadata query. aggregatorMergeStrategy currently supports three types of merge strategies - the legacy strict and lenient strategies, and the new latest strategy. - The latest strategy considers the latest aggregator from the latest segment by time order when there's a conflict when merging aggregators from different segments. - Deprecate lenientAggregatorMerge property; The API validates that both the new and old properties are not set, and returns an exception. - When merging segments as part of segmentMetadata query, the segments have a more elaborate id -- <datasource>_<interval>_merged_<partition_number> format, similar to the name format that segments usually contain. Previously it was simply "merged". - Adjust unit tests to test the latest strategy, to assert the returned complete SegmentAnalysis object instead of just the aggregators for completeness.
FACTORY.mergeRunners( | ||
Execs.directExecutor(), | ||
Lists.newArrayList( | ||
toolChest.preMergeQueryDecoration(runner1), | ||
toolChest.preMergeQueryDecoration(runner2) | ||
) | ||
) |
Check notice
Code scanning / CodeQL
Deprecated method or constructor invocation
sql/src/test/java/org/apache/druid/sql/calcite/schema/SegmentMetadataCacheTest.java
Outdated
Show resolved
Hide resolved
24c1c2a
to
7e926ab
Compare
7e926ab
to
47ae161
Compare
...essing/src/main/java/org/apache/druid/query/metadata/SegmentMetadataQueryQueryToolChest.java
Show resolved
Hide resolved
...essing/src/main/java/org/apache/druid/query/metadata/SegmentMetadataQueryQueryToolChest.java
Show resolved
Hide resolved
...essing/src/main/java/org/apache/druid/query/metadata/SegmentMetadataQueryQueryToolChest.java
Show resolved
Hide resolved
processing/src/main/java/org/apache/druid/query/metadata/metadata/SegmentMetadataQuery.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a few suggestions related to the documentation.
Co-authored-by: Katya Macedo <[email protected]>
Co-authored-by: Katya Macedo <[email protected]>
...essing/src/main/java/org/apache/druid/query/metadata/SegmentMetadataQueryQueryToolChest.java
Show resolved
Hide resolved
…ache#14560) * Add aggregatorMergeStrategy property to SegmentMetadaQuery. - Adds a new property aggregatorMergeStrategy to segmentMetadata query. aggregatorMergeStrategy currently supports three types of merge strategies - the legacy strict and lenient strategies, and the new latest strategy. - The latest strategy considers the latest aggregator from the latest segment by time order when there's a conflict when merging aggregators from different segments. - Deprecate lenientAggregatorMerge property; The API validates that both the new and old properties are not set, and returns an exception. - When merging segments as part of segmentMetadata query, the segments have a more elaborate id -- <datasource>_<interval>_merged_<partition_number> format, similar to the name format that segments usually contain. Previously it was simply "merged". - Adjust unit tests to test the latest strategy, to assert the returned complete SegmentAnalysis object instead of just the aggregators for completeness. * Don't explicitly set strict strategy in tests * Apply suggestions from code review Co-authored-by: Katya Macedo <[email protected]> * Update docs/querying/segmentmetadataquery.md * Apply suggestions from code review Co-authored-by: Katya Macedo <[email protected]> --------- Co-authored-by: Katya Macedo <[email protected]>
Motivation
SegmentMetadata queries currently supports two types of aggregator merge strategies, namely strict and lenient, when "aggregators" analysis type is enabled. Users often want something less strict than a lenient policy, where the most recent aggregator is selected for a column in an evolving data model. A strict strategy is best suited once the data model is locked in.
This PR:
aggregatorMergeStrategy
, tosegmentMetadata
queries. Please seedocs/querying/segmentmetadataquery.md
for more information. This also allows us to define an earliest strategy (similar to the latest strategy) or more sophisticated merge strategies as needed.lenientAggregatorMerge
boolean property in favor ofaggregatorMergeStrategy
.strict
when aggregators analysis type is enabled.merged
to<datasource>_<interval>_merged_<partition_number>
format.SegmentAnalysis
object instead of just the aggregators. Also, add tests for latest aggregator merge and backwards compatibility logic.Release note
lenientAggregatorMerge
property in segment metadata queries is deprecated in favor of a new propertyaggregatorMergeStrategy
.aggregatorMergeStrategy
also supports a latest strategy in addition to existing strict and lenient strategies fromlenientAggregatorMerge
.Key changed/added classes in this PR
AggregatorMergeStrategy.java
SegmentMetadataQuery.java
SegmentMetadataQueryQueryToolChest.java
SegmentMetadataQueryTest.java
SegmentMetadataQueryQueryToolChestTest.java
This PR has: