-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor ValuesSource and related classes #42949
Comments
Pinging @elastic/es-analytics-geo |
Notes from discussion on 2019-06-19
|
We should also make sure that type errors are consistent in how they are reported to the user. Currently, some situations throw |
Re: Immutability of VSConfig Spotted a location where we use the mutability: We can re-arrange the unmapped part (check parent first), but not sure how to handle the new fieldtype overwriting. Maybe generate a new config builder from the first config, set the new context and re-build? Dunno. :) |
this commit introduces more thorough field-type support tests as well as tests for unmapped and missing values. relates elastic#42949.
* Remove ValuesSourceType argument to ValuesSourceAggregationBuilder (#48638) * ValuesSourceRegistry Prototype (#48758) * Remove generics from ValuesSource related classes (#49606) * fix percentile aggregation tests (#50712) * Basic thread safety for ValuesSourceRegistry (#50340) * Remove target value type from ValuesSourceAggregationBuilder (#49943) * Cleanup default values source type (#50992) * CoreValuesSourceType no longer implements Writable (#51276) * Remove genereics & hard coded ValuesSource references from Matrix Stats (#51131) * Put values source types on fields (#51503) * Remove VST Any (#51539) * Rewire terms agg to use new VS registry (#51182) Also adds some basic AggTestCases for untested code paths (and boilerplate for future tests once the IT are converted over) * Wire Cardinality aggregation to work with the ValuesSourceRegistry (#51337) * Wire Percentiles aggregator into new VS framework (#51639) This required a bit of a refactor to percentiles itself. Before, the Builder would switch on the chosen algo to generate an algo-specific factory. This doesn't work (or at least, would be difficult) in the new VS framework. This refactor consolidates both factories together and introduces a PercentilesConfig object to act as a standardized way to pass algo-specific parameters through the factory. This object is then used when deciding which kind of aggregator to create Note: CoreValuesSourceType.HISTOGRAM still lives in core, and will be moved in a subsequent PR. * Remove generics and target value type from MultiVSAB (#51647) * fix checkstyle after merge (#52008) * Plumb ValuesSourceRegistry through to QuerySearchContext (#51710) * Convert RareTerms to new VS registry (#52166) * Wire up Value Count (#52225) * Wire up Max & Min aggregations (#52219) * ValuesSource refactoring: Wire up Sum aggregation (#52571) * ValuesSource refactoring: Wire up SigTerms aggregation (#52590) * Soft immutability for VSConfig (#52729) * Unmute testSupportedFieldTypes, fix Percentiles/Ranks/Terms tests (#52734) Also fixes Percentiles which was incorrectly specified to only accept numeric, but in fact also accepts Boolean and Date (because those are numeric on master - thanks `testSupportedFieldTypes` for catching it!) * VS refactoring: Wire up stats aggregation (#52891) * ValuesSource refactoring: Wire up string_stats aggregation (#52875) * VS refactoring: Wire up median (MAD) aggregation (#52945) * fix valuesourcetype issue with constant_keyword field (#53041) this commit implements `getValuesSourceType` for the ConstantKeyword field type. master was merged into feature/extensible-values-source introducing a new field type that was not implementing `getValuesSourceType`. * ValuesSource refactoring: Wire up Avg aggregation (#52752) * Wire PercentileRanks aggregator into new VS framework (#51693) * Add a VSConfig resolver for aggregations not using the registry (#53038) * Vs refactor wire up ranges and date ranges (#52918) * Wire up geo_bounds aggregation to ValuesSourceRegistry (#53034) This commit updates the geo_bounds aggregation to depend on registering itself in the ValuesSourceRegistry relates #42949. * VS refactoring: convert Boxplot to new registry (#53132) * Wire-up geotile_grid and geohash_grid to ValuesSourceRegistry (#53037) This commit updates the geo*_grid aggregations to depend on registering itself in the ValuesSourceRegistry relates to the values-source refactoring meta issue #42949. * Wire-up geo_centroid agg to ValuesSourceRegistry (#53040) This commit updates the geo_centroid aggregation to depend on registering itself in the ValuesSourceRegistry. relates to the values-source refactoring meta issue #42949. * Fix type tests for Missing aggregation (#53501) * ValuesSource Refactor: move histo VSType into XPack module (#53298) - Introduces a new API (`getBareAggregatorRegistrar()`) which allows plugins to register aggregations against existing agg definitions defined in Core. - This moves the histogram VSType over to XPack where it belongs. `getHistogramValues()` still remains as a Core concept - Moves the histo-specific bits over to xpack (e.g. the actual aggregator logic). This requires extra boilerplate since we need to create a new "Analytics" Percentile/Rank aggregators to deal with the histo field. Doubly-so since percentiles/ranks are extra boiler-plate'y... should be much lighter for other aggs * Wire up DateHistogram to the ValuesSourceRegistry (#53484) * Vs refactor parser cleanup (#53198) Co-authored-by: Zachary Tong <[email protected]> Co-authored-by: Zachary Tong <[email protected]> Co-authored-by: Christos Soulios <[email protected]> Co-authored-by: Tal Levy <[email protected]>
Part of refactoring ValuesSource in elastic#42949
* ValuesSource refactor wire up missing agg (#53511) Part of refactoring ValuesSource in #42949 * ValuesSource refactoring: Wire up ExtendedStats aggregation (#53227) * Javadoc for ValuesSource related work (#53427) Co-authored-by: Andy Bristol <[email protected]> Co-authored-by: Christos Soulios <[email protected]>
Tests for unmapped fields, the missing parameter, scripting, and correct ValuesSource types in MissingAggregatorTests. Basic yaml tests for the missing agg For #42949
* Remove ValuesSourceType argument to ValuesSourceAggregationBuilder (elastic#48638) * ValuesSourceRegistry Prototype (elastic#48758) * Remove generics from ValuesSource related classes (elastic#49606) * fix percentile aggregation tests (elastic#50712) * Basic thread safety for ValuesSourceRegistry (elastic#50340) * Remove target value type from ValuesSourceAggregationBuilder (elastic#49943) * Cleanup default values source type (elastic#50992) * CoreValuesSourceType no longer implements Writable (elastic#51276) * Remove genereics & hard coded ValuesSource references from Matrix Stats (elastic#51131) * Put values source types on fields (elastic#51503) * Remove VST Any (elastic#51539) * Rewire terms agg to use new VS registry (elastic#51182) Also adds some basic AggTestCases for untested code paths (and boilerplate for future tests once the IT are converted over) * Wire Cardinality aggregation to work with the ValuesSourceRegistry (elastic#51337) * Wire Percentiles aggregator into new VS framework (elastic#51639) This required a bit of a refactor to percentiles itself. Before, the Builder would switch on the chosen algo to generate an algo-specific factory. This doesn't work (or at least, would be difficult) in the new VS framework. This refactor consolidates both factories together and introduces a PercentilesConfig object to act as a standardized way to pass algo-specific parameters through the factory. This object is then used when deciding which kind of aggregator to create Note: CoreValuesSourceType.HISTOGRAM still lives in core, and will be moved in a subsequent PR. * Remove generics and target value type from MultiVSAB (elastic#51647) * fix checkstyle after merge (elastic#52008) * Plumb ValuesSourceRegistry through to QuerySearchContext (elastic#51710) * Convert RareTerms to new VS registry (elastic#52166) * Wire up Value Count (elastic#52225) * Wire up Max & Min aggregations (elastic#52219) * ValuesSource refactoring: Wire up Sum aggregation (elastic#52571) * ValuesSource refactoring: Wire up SigTerms aggregation (elastic#52590) * Soft immutability for VSConfig (elastic#52729) * Unmute testSupportedFieldTypes, fix Percentiles/Ranks/Terms tests (elastic#52734) Also fixes Percentiles which was incorrectly specified to only accept numeric, but in fact also accepts Boolean and Date (because those are numeric on master - thanks `testSupportedFieldTypes` for catching it!) * VS refactoring: Wire up stats aggregation (elastic#52891) * ValuesSource refactoring: Wire up string_stats aggregation (elastic#52875) * VS refactoring: Wire up median (MAD) aggregation (elastic#52945) * fix valuesourcetype issue with constant_keyword field (elastic#53041)x-pack/plugin/rollup/src/main/java/org/elasticsearch/xpack/rollup/job/RollupIndexer.java this commit implements `getValuesSourceType` for the ConstantKeyword field type. master was merged into feature/extensible-values-source introducing a new field type that was not implementing `getValuesSourceType`. * ValuesSource refactoring: Wire up Avg aggregation (elastic#52752) * Wire PercentileRanks aggregator into new VS framework (elastic#51693) * Add a VSConfig resolver for aggregations not using the registry (elastic#53038) * Vs refactor wire up ranges and date ranges (elastic#52918) * Wire up geo_bounds aggregation to ValuesSourceRegistry (elastic#53034) This commit updates the geo_bounds aggregation to depend on registering itself in the ValuesSourceRegistry relates elastic#42949. * VS refactoring: convert Boxplot to new registry (elastic#53132) * Wire-up geotile_grid and geohash_grid to ValuesSourceRegistry (elastic#53037) This commit updates the geo*_grid aggregations to depend on registering itself in the ValuesSourceRegistry relates to the values-source refactoring meta issue elastic#42949. * Wire-up geo_centroid agg to ValuesSourceRegistry (elastic#53040) This commit updates the geo_centroid aggregation to depend on registering itself in the ValuesSourceRegistry. relates to the values-source refactoring meta issue elastic#42949. * Fix type tests for Missing aggregation (elastic#53501) * ValuesSource Refactor: move histo VSType into XPack module (elastic#53298) - Introduces a new API (`getBareAggregatorRegistrar()`) which allows plugins to register aggregations against existing agg definitions defined in Core. - This moves the histogram VSType over to XPack where it belongs. `getHistogramValues()` still remains as a Core concept - Moves the histo-specific bits over to xpack (e.g. the actual aggregator logic). This requires extra boilerplate since we need to create a new "Analytics" Percentile/Rank aggregators to deal with the histo field. Doubly-so since percentiles/ranks are extra boiler-plate'y... should be much lighter for other aggs * Wire up DateHistogram to the ValuesSourceRegistry (elastic#53484) * Vs refactor parser cleanup (elastic#53198) Co-authored-by: Zachary Tong <[email protected]> Co-authored-by: Zachary Tong <[email protected]> Co-authored-by: Christos Soulios <[email protected]> Co-authored-by: Tal Levy <[email protected]>
* Add ValuesSource Registry and associated logic (#54281) * Remove ValuesSourceType argument to ValuesSourceAggregationBuilder (#48638) * ValuesSourceRegistry Prototype (#48758) * Remove generics from ValuesSource related classes (#49606) * fix percentile aggregation tests (#50712) * Basic thread safety for ValuesSourceRegistry (#50340) * Remove target value type from ValuesSourceAggregationBuilder (#49943) * Cleanup default values source type (#50992) * CoreValuesSourceType no longer implements Writable (#51276) * Remove genereics & hard coded ValuesSource references from Matrix Stats (#51131) * Put values source types on fields (#51503) * Remove VST Any (#51539) * Rewire terms agg to use new VS registry (#51182) Also adds some basic AggTestCases for untested code paths (and boilerplate for future tests once the IT are converted over) * Wire Cardinality aggregation to work with the ValuesSourceRegistry (#51337) * Wire Percentiles aggregator into new VS framework (#51639) This required a bit of a refactor to percentiles itself. Before, the Builder would switch on the chosen algo to generate an algo-specific factory. This doesn't work (or at least, would be difficult) in the new VS framework. This refactor consolidates both factories together and introduces a PercentilesConfig object to act as a standardized way to pass algo-specific parameters through the factory. This object is then used when deciding which kind of aggregator to create Note: CoreValuesSourceType.HISTOGRAM still lives in core, and will be moved in a subsequent PR. * Remove generics and target value type from MultiVSAB (#51647) * fix checkstyle after merge (#52008) * Plumb ValuesSourceRegistry through to QuerySearchContext (#51710) * Convert RareTerms to new VS registry (#52166) * Wire up Value Count (#52225) * Wire up Max & Min aggregations (#52219) * ValuesSource refactoring: Wire up Sum aggregation (#52571) * ValuesSource refactoring: Wire up SigTerms aggregation (#52590) * Soft immutability for VSConfig (#52729) * Unmute testSupportedFieldTypes, fix Percentiles/Ranks/Terms tests (#52734) Also fixes Percentiles which was incorrectly specified to only accept numeric, but in fact also accepts Boolean and Date (because those are numeric on master - thanks `testSupportedFieldTypes` for catching it!) * VS refactoring: Wire up stats aggregation (#52891) * ValuesSource refactoring: Wire up string_stats aggregation (#52875) * VS refactoring: Wire up median (MAD) aggregation (#52945) * fix valuesourcetype issue with constant_keyword field (#53041)x-pack/plugin/rollup/src/main/java/org/elasticsearch/xpack/rollup/job/RollupIndexer.java this commit implements `getValuesSourceType` for the ConstantKeyword field type. master was merged into feature/extensible-values-source introducing a new field type that was not implementing `getValuesSourceType`. * ValuesSource refactoring: Wire up Avg aggregation (#52752) * Wire PercentileRanks aggregator into new VS framework (#51693) * Add a VSConfig resolver for aggregations not using the registry (#53038) * Vs refactor wire up ranges and date ranges (#52918) * Wire up geo_bounds aggregation to ValuesSourceRegistry (#53034) This commit updates the geo_bounds aggregation to depend on registering itself in the ValuesSourceRegistry relates #42949. * VS refactoring: convert Boxplot to new registry (#53132) * Wire-up geotile_grid and geohash_grid to ValuesSourceRegistry (#53037) This commit updates the geo*_grid aggregations to depend on registering itself in the ValuesSourceRegistry relates to the values-source refactoring meta issue #42949. * Wire-up geo_centroid agg to ValuesSourceRegistry (#53040) This commit updates the geo_centroid aggregation to depend on registering itself in the ValuesSourceRegistry. relates to the values-source refactoring meta issue #42949. * Fix type tests for Missing aggregation (#53501) * ValuesSource Refactor: move histo VSType into XPack module (#53298) - Introduces a new API (`getBareAggregatorRegistrar()`) which allows plugins to register aggregations against existing agg definitions defined in Core. - This moves the histogram VSType over to XPack where it belongs. `getHistogramValues()` still remains as a Core concept - Moves the histo-specific bits over to xpack (e.g. the actual aggregator logic). This requires extra boilerplate since we need to create a new "Analytics" Percentile/Rank aggregators to deal with the histo field. Doubly-so since percentiles/ranks are extra boiler-plate'y... should be much lighter for other aggs * Wire up DateHistogram to the ValuesSourceRegistry (#53484) * Vs refactor parser cleanup (#53198) Co-authored-by: Zachary Tong <[email protected]> Co-authored-by: Zachary Tong <[email protected]> Co-authored-by: Christos Soulios <[email protected]> Co-authored-by: Tal Levy <[email protected]> * First batch of easy fixes * Remove List.of from ValuesSourceRegistry Note that we intend to have a follow up PR dealing with the mutability of the registry, so I didn't even try to address that here. * More compiler fixes * More compiler fixes * More compiler fixes * Precommit is happy and so am I * Add new Core VSTs to tests * Disabled supported type test on SigTerms until we can backport it's fix * fix checkstyle * Fix test failure from semantic merge issue * Fix some metaData->metadata replacements that got lost * Fix list of supported types for MinAggregator * Fix list of supported types for Avg * remove unused import Co-authored-by: Zachary Tong <[email protected]> Co-authored-by: Zachary Tong <[email protected]> Co-authored-by: Christos Soulios <[email protected]> Co-authored-by: Tal Levy <[email protected]>
hey @not-napoleon, I notice there is some interdependence on ValuesSource and the CompositeValuesSourceBuilder#innerBuild, which I believe gets built after parsing in CompositeAggregationBuilder's parser. Given that this parsing assumes a one-to-one relation between builders and their supported valuessource types, I am not sure how to best generalize it across potential the registered aggregators. Are there any aspects of this that you've run into and have thoughts about? |
@talevy Yeah, Composite aggs work on a whole parallel system. That's one of the things (along with Array and Multi values source aggregations) that we cut from supporting in the refactor for now. Obviously at some point we need to come back to that, but it's not currently on my todo list. |
Following the implementation of the aggregate_metric_double field mapper(#49830) we are implementing the Min, Max, ValueCount, Sum and Average aggregations on aggregate metrics. The code builds on the excellent work done for #42949 and uses the extensible ValuesSources infrastructure to wire up common metric aggregation on the aggregate_metric_double field type. This PR is part of the rollups v2 refactoring as described in meta issue #42720
With many thanks to everyone who's contributed code, reviews, comments, and moral support to this project, I'm happy to say that with #57762 merged, we can call this done. There is still some related cleanup work that can be done, but it's pretty low priority right now and has other tickets tracking it (see #58139, #58135, and #58136). Please feel free to tag me with any values source related issues going forward. |
The aggregations framework defines several similar but subtly different classes & enums to represent what type of data an aggregation operates on. Their correct usage and interaction is not obvious, especially to new developers. This refactoring aims to make input type specification for aggregations easier to understand.
Plan
To replace the assorted hard coded
ValuesSource
references, we want to build a dynamic registry which will map Field types to Values Sources and specific aggregator implementations.TODO -
beforeFast follow after merging to masterValuesSources
out ofValuesSourceConfig
toValuesSource
(Extract Empty/Script/Missing ValuesSource behavior to an interface #48320)resolve
- This will get done with the registry prototypeValueSourceType
(working in prototype for simple case)ValuesSourceType
dynamically[ ] Significant Text (@polyfractal WIP)not VSABValueType
. This will involve changing how we do type checking of userValueTypeHints in the parser. Vs refactor parser cleanup #53198ValuesSource
based aggsValueType
ValueType
, but refactoring those is out of scope for the initial merge.ValueType
for parsing user value hints. There's a post-merge PR for removing that.ValuesSourceType
info fromValuesSourceAggregationBuilder
(Remove ValuesSourceType argument to ValuesSourceAggregationBuilder #48638)ValuesSourceConfig
(and probably other places)ValuesSourceType.ANY
& and remove ANY Remove ValuesSourceType.ANY #51539CoreValuesSourceType
CoreValuesSourceType no longer implements Writable #51276ValuesSource
referencesValuesSourceConfig
immutable Soft immutability for VSConfig #52729o.e.s.aggregations.support
(Tozzi) Vs refactor javadoc and comment cleanup #53427TODO - backport
List.of()
usageTODO - Post merge
serializeTargetValueType
(doesn't strictly need to be part of this refactor). The following classes override this method:CardinalityAggregationBuilder
MissingAggregationBuilder
ValueCountAggregationBuilder
RareTermsAggregationBuilder
SignificantTermsAggregationBuilder
TermsAggregationBuilder
AggregatorSupplier
) between Min&Max and other metrics.MultiValuesSourceAggregationBuilder
subclassesArrayValuesSourceAggregationBuilder
subclassesValueType
Out of Scope:
After considerable discussion, we've decided to deal with the multivalued aggregations at a later point. The current abstractions there aren't working as well as we'd like, so investing more in them right now doesn't seem like a good use of time.
Background
Involved Classes
ValuesSource & its subclasses
These are thin wrappers which provide a unified interface to different ways of getting input data (e.g. DocValues from Lucene, or script output). A class hierarchy defines the type of values returned by the source.
ValueSourceType
A small enum who's values roughly correspond to the first level subclasses of
ValuesSource
. This gets passed as a constructor argument toValuesSourceConifg
as a hint for what type of values source it should later construct.ValueType
A more robust enum specifying a wider range of types than
ValuesSourceType
. The instances of this enum provide a mapping back to aValuesSourceType
. This is used when users supply type hints, e.g. as part of a missing value specification.ValuesSourceConfig
This class resolves the actual
ValuesSource
to use on a given shard for the aggregation. It considers script values, missing values, and mapped fields.ValuesSourceAggregationBuilder
This is the base class for all aggregations using the value source model, and it ties together usage of all of the above classes. It is generic over a
ValuesSource
class, and accepts aValuesSourceType
and twoValueType
parameters. How those different value classes interact is not at all obvious; a good first step at this refactoring would be just documenting those relationships.Next Steps
I'm open to discussion as to what the best way to clean this up should be. In the interest of seeding that discussion, the following are my first-pass suggestions.
Clarify ValuesSourceAggregationBuilder
This would consist of descriptive field names & comments around the intended use of the various type parameters
ValuesSourceAggregationBuilder
accepts. Some javadoc on the intended roles ofValueSourceType
andValueType
would help too.Formalize relationship between
ValuesSourceType
andValuesSource
Currently, these are related via some cascading
if
statements inValuesSourceConfig
, but this misses out on many of the benefits of having an enum in the first place. Notably, nothing enforces that all enum values are accounted for, and developers wishing to understand the relationship between two classes must look to a third class for that information.ValueSourceType.ANY
ValuesSourceType.ANY
creates a few edge cases. It's the onlyValuesSourceType
that doesn't map cleanly back to aValuesSource
subclass.ValuesSourceConfig
interprets it as a bytes source, except in the case of a script, which allowsBYTES
but notANY
. Aggregations that can operate on multiple input types (e.g. TermsAggregation, which can operate on strings or numbers) useValuesSourceType.ANY
to indicate this, which is deceptive since later in the process they are restricted to more specific source types.ValuesSourceType Ordinal Serialization Issue
In addition to any other refactoring work being done here,
ValuesSourceType
is an enum being serialized by ordinal value, which is prone to error. If we keep this enum, we should adopt a pattern of serializing based on an ID we control, to allow for deletes & reorderings later. SeeValueType
, which does this correctly.ValueType.NUMBER
andValueType.NUMERIC
ValueType
specifies bothNUMBER
andNUMERIC
, which are apparently identical as far as the enum is concerned. There's a five year old TODO asking about the difference between these two. We should either merge them, or clarify why we need both.Rename
ValuesSource
ValuesSource
is very close toValueSource
, which is an interface inorg.elasticsearch.ingest
. It's easy to pull up the wrong class in Intellij, especially if you've only heard the name not seen it.The text was updated successfully, but these errors were encountered: