-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add WeightedAvg metric aggregation #31037
Conversation
Pinging @elastic/es-search-aggs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left some comments.
new overload for MultiValueMode was added. I wanted to reuse the capabilities of MultiValueMode, but all the existing selectors always returned true for advanceDoc() and set a default value
Agreed, we should refactor MultiValueMode to decouple selection from applying a default value.
docWeights.advanceExact(doc); | ||
final double weight = docWeights.doubleValue(); | ||
|
||
weights.increment(bucket, weight); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we sum up weights using kahan summation too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
++ good point, should definitely have kahan summation too.
compensations = bigArrays.grow(compensations, bucket + 1); | ||
|
||
if (docValues.advanceExact(doc)) { | ||
docWeights.advanceExact(doc); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you assert that it returns true?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Ok, added some more tests, documentation and some small fixes. I think this is ready for a review now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@polyfractal I left some changes but I like where this is going.
|Parameter Name |Description |Required |Default Value | ||
|`field` | The field that weights should be extracted from |Required | | ||
|`missing` | A weight to use if the field is missing entirely |Optional | | ||
|`multi` | If a document has multiple values for the field, how should the values be combined |Optional | `avg` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this say weights
instead of values
?
|`field` | The field that weights should be extracted from |Required | | ||
|`missing` | A weight to use if the field is missing entirely |Optional | | ||
|`multi` | If a document has multiple values for the field, how should the values be combined |Optional | `avg` | ||
|`script` | A script which provides the values for the document. This is mutually exclusive with `field` |Optional |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this say weights
instead of values
?
double newSum = sum + corrected; | ||
sumCompensation = (newSum - sum) - corrected; | ||
sum = newSum; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add comments to the loop to explain why wee need each of the conditions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm actually not entirely sure. @jpountz, are the conditionals to keep the naive behavior with infinites? E.g. if an infinite is added it converts the final value to infinite, whereas kahan summing would do something different?
So it's basically bwc with how we did things before?
DoubleArray sums; | ||
DoubleArray sumCompensations; | ||
DoubleArray weightCompensations; | ||
DocValueFormat format; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can these be made private and final?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Private yes, but not final. They are grown down in the collector (e.g. weights = bigArrays.grow(weights, bucket + 1);
)
import java.util.Map; | ||
import java.util.Objects; | ||
|
||
public abstract class MultiValuesSourceAggregationBuilder<VS extends ValuesSource, AB extends MultiValuesSourceAggregationBuilder<VS, AB>> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think its worth adding a JavaDoc to this class. Additionally I would point out in the JavaDoc that this class makes the assumption that all ValuesSources are of the same value type. I think this is a fine assumption to make, at least for now but its worth pointing it out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah good point, I didn't even think of that limitation. Will document.
If/when we need multiple value source types... that's gonna get fun :/
import java.io.IOException; | ||
import java.util.function.BiFunction; | ||
|
||
public class MultiValuesSourceFieldConfig implements Writeable, ToXContentFragment { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this maybe wrap ValuesSourceConfig
so we ensure as feeatures are added to one they are added to the other?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the layout is a bit tricky, and naming is maybe (probably) confusing. Open to suggestions.
MultiValuesSourceFieldConfig
is spiritually related to ValuesSourceParseHelper#declareFields()
in that it is basically the parser and builder object for the commonly shared fields
I think MultiValuesSourceConfig
is closer to what you were expecting, which is the final object. This contains a map of fields, where each entry's value is a Wrapper object containing a ValuesSourceConfig
and a MultiValueMode
.
So the underlying features should be shared, but the parsing is indeed still different. I can see if there's a way to share the parsing, but it may be tricky since the regular ValuesSourceConfig also defines a targetValueType
as part of the common fields, but that only applies to the total MultiValuesSource, not the individual fields.
I'll poke at it a bit. Might be easier to zoom about this when you're back too.
Jenkins, run gradle build tests |
Fixes an issue where assertions were being tripped on REST tests due to using the wrong stream ctor
@colings86 this should be good to go for another review whenever you have time, no rush :) |
@colings86 Removed the multi-value mode as discussed, but decided to also remove MultiValuesSourceConfig and just use a Map everywhere. Seemed silly to have a wrapper around the map without any additional functionality, and it didn't save much in the way of typing due to the length of it's name vs. a map :) This this is good to go for another review whenever you have a few minutes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a couple of minor comments but LGTM
If you have this situation, you will need to specify a `script` for the weight field, and use the script | ||
to combine the multiple values into a single value to be used. | ||
|
||
This single weight will be applied independently to each value extracted from the `value` field. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if its worth having an example of a single weight being applied to each value independantly to help solidify what we mean?
"single document. Use a script to combine multiple weights-per-doc into a single value."); | ||
} | ||
// There should always be one weight if advanceExact lands us here, either | ||
// a real weight or a `missing` value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: missing
value -> missing
weight
Adds a new single-value metrics aggregation that computes the weighted average of numeric values that are extracted from the aggregated documents. These values can be extracted from specific numeric fields in the documents. When calculating a regular average, each datapoint has an equal "weight"; it contributes equally to the final value. In contrast, weighted averages scale each datapoint differently. The amount that each datapoint contributes to the final value is extracted from the document, or provided by a script. As a formula, a weighted average is the `∑(value * weight) / ∑(weight)` A regular average can be thought of as a weighted average where every value has an implicit weight of `1`. Closes #15731
* 6.x: Security: revert to old way of merging automata (#32254) Fix a test bug in RangeQueryBuilderTests introduced in the field aliases backport. Introduce Application Privileges with support for Kibana RBAC (#32309) Undo a debugging change that snuck in during the field aliases merge. [test] port linux package packaging tests (#31943) Painless: Update More Methods to New Naming Scheme (#32305) Tribe: Add error with secure settings copied to tribe (#32298) Add V_6_3_3 version constant Add ERR to ranking evaluation documentation (#32314) [DOCS] Added link to 6.3.2 RNs [DOCS] Updates 6.3.2 release notes with PRs from ml-cpp repo (#32334) [Kerberos] Add Kerberos authentication support (#32263) [ML] Extract persistent task methods from MlMetadata (#32319) Backport - Add Snapshots Status API to High Level Rest Client (#32295) Make release notes ignore the `>test-failure` label. (#31309) [DOCS] Adds release highlights for search for 6.4 (#32095) Allow Integ Tests to run in a FIPS-140 JVM (#32316) Add support for field aliases to 6.x. (#32184) Register ERR metric with NamedXContentRegistry (#32320) fixes broken build for third-party-tests (#32315) Relates #31918 / Closes infra/issues/6085 [DOCS] Rollup Caps API incorrectly mentions GET Jobs API (#32280) Rest HL client: Add put watch action (#32026) (#32191) Add WeightedAvg metric aggregation (#31037) Consistent encoder names (#29492) Switch monitoring to new style Requests (#32255) specify subdirs of lib, bin, modules in package (#32253) Rename ranking evaluation `quality_level` to `metric_score` (#32168) Add new permission for JDK11 to load JAAS libraries (#32132) Switch x-pack:core to new style Requests (#32252) Watcher: Store username on watch execution (#31873) Silence SSL reload test that fails on JDK 11 Painless: Clean up add methods in PainlessLookup (#32258) CCE when re-throwing "shard not available" exception in TransportShardMultiGetAction (#32185) Fail shard if IndexShard#storeStats runs into an IOException (#32241) Fix `range` queries on `_type` field for singe type indices (#31756) (#32161) AwaitsFix RecoveryIT#testHistoryUUIDIsGenerated Add new fields to monitoring template for Beats state (#32085) (#32273) [TEST] improve REST high-level client naming conventions check (#32244) Check that client methods match API defined in the REST spec (#31825)
* master: Security: revert to old way of merging automata (#32254) Networking: Fix test leaking buffer (#32296) Undo a debugging change that snuck in during the field aliases merge. Painless: Update More Methods to New Naming Scheme (#32305) [TEST] Fix assumeFalse -> assumeTrue in SSLReloadIntegTests Ingest: Support integer and long hex values in convert (#32213) Introduce fips_mode setting and associated checks (#32326) Add V_6_3_3 version constant [DOCS] Removed extraneous callout number. Rest HL client: Add put license action (#32214) Add ERR to ranking evaluation documentation (#32314) Introduce Application Privileges with support for Kibana RBAC (#32309) Build: Shadow x-pack:protocol into x-pack:plugin:core (#32240) [Kerberos] Add Kerberos authentication support (#32263) [ML] Extract persistent task methods from MlMetadata (#32319) Add Restore Snapshot High Level REST API Register ERR metric with NamedXContentRegistry (#32320) fixes broken build for third-party-tests (#32315) Allow Integ Tests to run in a FIPS-140 JVM (#31989) [DOCS] Rollup Caps API incorrectly mentions GET Jobs API (#32280) awaitsfix testRandomClusterStateUpdates [TEST] add version skip to weighted_avg tests Consistent encoder names (#29492) Add WeightedAvg metric aggregation (#31037) Switch monitoring to new style Requests (#32255) Rename ranking evaluation `quality_level` to `metric_score` (#32168) Fix a test bug around nested aggregations and field aliases. (#32287) Add new permission for JDK11 to load JAAS libraries (#32132) Silence SSL reload test that fails on JDK 11 [test] package pre-install java check (#32259) specify subdirs of lib, bin, modules in package (#32253) Switch x-pack:core to new style Requests (#32252) awaitsfix SSLConfigurationReloaderTests Painless: Clean up add methods in PainlessLookup (#32258) Fail shard if IndexShard#storeStats runs into an IOException (#32241) AwaitsFix RecoveryIT#testHistoryUUIDIsGenerated Remove unnecessary warning supressions (#32250) CCE when re-throwing "shard not available" exception in TransportShardMultiGetAction (#32185) Add new fields to monitoring template for Beats state (#32085)
WIP, but putting this up to see how @colings86 feels about the MultiValueSource stuff. Still needs loads of tests, comments and documentation.
Notable changes in this PR:
ArrayValueSource
because it takes multiple fields in an array. This was done because I couldn't find a good way to refactor the matrix aggs to use the new multi-value style, but didn't want to leave the name the same (and it also caused conflict issues).A new overload forMultiValueMode
was added. I wanted to reuse the capabilities ofMultiValueMode
, but all the existing selectors always returnedtrue
foradvanceDoc()
and set a default value. I wanted the normaladvanceDoc()
behavior, but the multiple-mode avg/sum/min/max functionality when a field has multiple values.The new multivalue stuff tries to be reasonably generic, allowing the agg to define how fields are exposed via helpers. For example, the
weighted_avg
defines two fields like this:Closes #15731