Skip to content

Commit

Permalink
[7.x] Add support for filters to T-Test aggregation (elastic#54980) (e…
Browse files Browse the repository at this point in the history
…lastic#55066)

Adds support for filters to T-Test aggregation. The filters can be used to
select populations based on some criteria and use values from the same or
different fields.

Closes elastic#53692
  • Loading branch information
imotov authored Apr 13, 2020
1 parent a2fafa6 commit 51c6f69
Show file tree
Hide file tree
Showing 14 changed files with 510 additions and 83 deletions.
14 changes: 7 additions & 7 deletions docs/build.gradle
Original file line number Diff line number Diff line change
Expand Up @@ -548,7 +548,7 @@ buildRestTests.setups['node_upgrade'] = '''
number_of_replicas: 1
mappings:
properties:
name:
group:
type: keyword
startup_time_before:
type: long
Expand All @@ -560,17 +560,17 @@ buildRestTests.setups['node_upgrade'] = '''
refresh: true
body: |
{"index":{}}
{"name": "A", "startup_time_before": 102, "startup_time_after": 89}
{"group": "A", "startup_time_before": 102, "startup_time_after": 89}
{"index":{}}
{"name": "B", "startup_time_before": 99, "startup_time_after": 93}
{"group": "A", "startup_time_before": 99, "startup_time_after": 93}
{"index":{}}
{"name": "C", "startup_time_before": 111, "startup_time_after": 72}
{"group": "A", "startup_time_before": 111, "startup_time_after": 72}
{"index":{}}
{"name": "D", "startup_time_before": 97, "startup_time_after": 98}
{"group": "B", "startup_time_before": 97, "startup_time_after": 98}
{"index":{}}
{"name": "E", "startup_time_before": 101, "startup_time_after": 102}
{"group": "B", "startup_time_before": 101, "startup_time_after": 102}
{"index":{}}
{"name": "F", "startup_time_before": 99, "startup_time_after": 98}'''
{"group": "B", "startup_time_before": 99, "startup_time_after": 98}'''

// Used by iprange agg
buildRestTests.setups['iprange'] = '''
Expand Down
75 changes: 69 additions & 6 deletions docs/reference/aggregations/metrics/t-test-aggregation.asciidoc
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
[role="xpack"]
[testenv="basic"]
[[search-aggregations-metrics-ttest-aggregation]]
=== TTest Aggregation
=== T-Test Aggregation

A `t_test` metrics aggregation that performs a statistical hypothesis test in which the test statistic follows a Student's t-distribution
under the null hypothesis on numeric values extracted from the aggregated documents or generated by provided scripts. In practice, this
Expand Down Expand Up @@ -43,8 +43,8 @@ GET node_upgrade/_search
}
--------------------------------------------------
// TEST[setup:node_upgrade]
<1> The field `startup_time_before` must be a numeric field
<2> The field `startup_time_after` must be a numeric field
<1> The field `startup_time_before` must be a numeric field.
<2> The field `startup_time_after` must be a numeric field.
<3> Since we have data from the same nodes, we are using paired t-test.

The response will return the p-value or probability value for the test. It is the probability of obtaining results at least as extreme as
Expand Down Expand Up @@ -74,6 +74,69 @@ The `t_test` aggregation supports unpaired and paired two-sample t-tests. The ty
`"type": "homoscedastic"`:: performs two-sample equal variance test
`"type": "heteroscedastic"`:: performs two-sample unequal variance test (this is default)

==== Filters

It is also possible to run unpaired t-test on different sets of records using filters. For example, if we want to test the difference
of startup times before upgrade between two different groups of nodes, we use the same field `startup_time_before` by separate groups of
nodes using terms filters on the group name field:

[source,console]
--------------------------------------------------
GET node_upgrade/_search
{
"size" : 0,
"aggs" : {
"startup_time_ttest" : {
"t_test" : {
"a" : {
"field" : "startup_time_before", <1>
"filter" : {
"term" : {
"group" : "A" <2>
}
}
},
"b" : {
"field" : "startup_time_before", <3>
"filter" : {
"term" : {
"group" : "B" <4>
}
}
},
"type" : "heteroscedastic" <5>
}
}
}
}
--------------------------------------------------
// TEST[setup:node_upgrade]
<1> The field `startup_time_before` must be a numeric field.
<2> Any query that separates two groups can be used here.
<3> We are using the same field
<4> but we are using different filters.
<5> Since we have data from different nodes, we cannot use paired t-test.


[source,console-result]
--------------------------------------------------
{
...
"aggregations": {
"startup_time_ttest": {
"value": 0.2981858007281437 <1>
}
}
}
--------------------------------------------------
// TESTRESPONSE[s/\.\.\./"took": $body.took,"timed_out": false,"_shards": $body._shards,"hits": $body.hits,/]
<1> The p-value.

In this example, we are using the same fields for both populations. However this is not a requirement and different fields and even
combination of fields and scripts can be used. Populations don't have to be in the same index either. If data sets are located in different
indices, the term filter on the <<mapping-index-field,`_index`>> field can be used to select populations.

==== Script

The `t_test` metric supports scripting. For example, if we need to adjust out load times for the before values, we could use
Expand Down Expand Up @@ -108,7 +171,7 @@ GET node_upgrade/_search
// TEST[setup:node_upgrade]

<1> The `field` parameter is replaced with a `script` parameter, which uses the
script to generate values which percentiles are calculated on
<2> Scripting supports parameterized input just like any other script
<3> We can mix scripts and fields
script to generate values which percentiles are calculated on.
<2> Scripting supports parameterized input just like any other script.
<3> We can mix scripts and fields.

Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@
import org.elasticsearch.common.xcontent.ObjectParser;
import org.elasticsearch.common.xcontent.ToXContent;
import org.elasticsearch.common.xcontent.XContentBuilder;
import org.elasticsearch.index.query.QueryBuilder;
import org.elasticsearch.index.query.QueryShardContext;
import org.elasticsearch.search.DocValueFormat;
import org.elasticsearch.search.aggregations.AggregationBuilder;
Expand All @@ -51,8 +52,8 @@ public class WeightedAvgAggregationBuilder extends MultiValuesSourceAggregationB
ObjectParser.fromBuilder(NAME, WeightedAvgAggregationBuilder::new);
static {
MultiValuesSourceParseHelper.declareCommon(PARSER, true, ValueType.NUMERIC);
MultiValuesSourceParseHelper.declareField(VALUE_FIELD.getPreferredName(), PARSER, true, false);
MultiValuesSourceParseHelper.declareField(WEIGHT_FIELD.getPreferredName(), PARSER, true, false);
MultiValuesSourceParseHelper.declareField(VALUE_FIELD.getPreferredName(), PARSER, true, false, false);
MultiValuesSourceParseHelper.declareField(WEIGHT_FIELD.getPreferredName(), PARSER, true, false, false);
}

public WeightedAvgAggregationBuilder(String name) {
Expand Down Expand Up @@ -99,10 +100,11 @@ public BucketCardinality bucketCardinality() {

@Override
protected MultiValuesSourceAggregatorFactory<Numeric> innerBuild(QueryShardContext queryShardContext,
Map<String, ValuesSourceConfig<Numeric>> configs,
DocValueFormat format,
AggregatorFactory parent,
Builder subFactoriesBuilder) throws IOException {
Map<String, ValuesSourceConfig<Numeric>> configs,
Map<String, QueryBuilder> filters,
DocValueFormat format,
AggregatorFactory parent,
Builder subFactoriesBuilder) throws IOException {
return new WeightedAvgAggregatorFactory(name, configs, format, queryShardContext, parent, subFactoriesBuilder, metadata);
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@
import org.elasticsearch.common.io.stream.StreamInput;
import org.elasticsearch.common.io.stream.StreamOutput;
import org.elasticsearch.common.xcontent.XContentBuilder;
import org.elasticsearch.index.query.QueryBuilder;
import org.elasticsearch.index.query.QueryShardContext;
import org.elasticsearch.search.DocValueFormat;
import org.elasticsearch.search.aggregations.AbstractAggregationBuilder;
Expand Down Expand Up @@ -168,13 +169,15 @@ protected final MultiValuesSourceAggregatorFactory<VS> doBuild(QueryShardContext
ValueType finalValueType = this.valueType != null ? this.valueType : targetValueType;

Map<String, ValuesSourceConfig<VS>> configs = new HashMap<>(fields.size());
Map<String, QueryBuilder> filters = new HashMap<>(fields.size());
fields.forEach((key, value) -> {
ValuesSourceConfig<VS> config = ValuesSourceConfig.resolve(queryShardContext, finalValueType,
value.getFieldName(), value.getScript(), value.getMissing(), value.getTimeZone(), format);
configs.put(key, config);
filters.put(key, value.getFilter());
});
DocValueFormat docValueFormat = resolveFormat(format, finalValueType);
return innerBuild(queryShardContext, configs, docValueFormat, parent, subFactoriesBuilder);
return innerBuild(queryShardContext, configs, filters, docValueFormat, parent, subFactoriesBuilder);
}


Expand All @@ -191,6 +194,7 @@ private static DocValueFormat resolveFormat(@Nullable String format, @Nullable V

protected abstract MultiValuesSourceAggregatorFactory<VS> innerBuild(QueryShardContext queryShardContext,
Map<String, ValuesSourceConfig<VS>> configs,
Map<String, QueryBuilder> filters,
DocValueFormat format, AggregatorFactory parent,
Builder subFactoriesBuilder) throws IOException;

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -30,26 +30,30 @@
import org.elasticsearch.common.xcontent.ToXContentObject;
import org.elasticsearch.common.xcontent.XContentBuilder;
import org.elasticsearch.common.xcontent.XContentParser;
import org.elasticsearch.index.query.AbstractQueryBuilder;
import org.elasticsearch.index.query.QueryBuilder;
import org.elasticsearch.script.Script;

import java.io.IOException;
import java.time.ZoneId;
import java.time.ZoneOffset;
import java.util.Objects;
import java.util.function.BiFunction;

public class MultiValuesSourceFieldConfig implements Writeable, ToXContentObject {
private String fieldName;
private Object missing;
private Script script;
private ZoneId timeZone;
private final String fieldName;
private final Object missing;
private final Script script;
private final ZoneId timeZone;
private final QueryBuilder filter;

private static final String NAME = "field_config";

public static final BiFunction<Boolean, Boolean, ObjectParser<MultiValuesSourceFieldConfig.Builder, Void>> PARSER
= (scriptable, timezoneAware) -> {
public static final ParseField FILTER = new ParseField("filter");

ObjectParser<MultiValuesSourceFieldConfig.Builder, Void> parser
public static <C> ObjectParser<MultiValuesSourceFieldConfig.Builder, C> parserBuilder(boolean scriptable, boolean timezoneAware,
boolean filtered) {

ObjectParser<MultiValuesSourceFieldConfig.Builder, C> parser
= new ObjectParser<>(MultiValuesSourceFieldConfig.NAME, MultiValuesSourceFieldConfig.Builder::new);

parser.declareString(MultiValuesSourceFieldConfig.Builder::setFieldName, ParseField.CommonFields.FIELD);
Expand All @@ -71,14 +75,21 @@ public class MultiValuesSourceFieldConfig implements Writeable, ToXContentObject
}
}, ParseField.CommonFields.TIME_ZONE, ObjectParser.ValueType.LONG);
}

if (filtered) {
parser.declareField(MultiValuesSourceFieldConfig.Builder::setFilter,
(p, context) -> AbstractQueryBuilder.parseInnerQueryBuilder(p),
FILTER, ObjectParser.ValueType.OBJECT);
}
return parser;
};

private MultiValuesSourceFieldConfig(String fieldName, Object missing, Script script, ZoneId timeZone) {
protected MultiValuesSourceFieldConfig(String fieldName, Object missing, Script script, ZoneId timeZone, QueryBuilder filter) {
this.fieldName = fieldName;
this.missing = missing;
this.script = script;
this.timeZone = timeZone;
this.filter = filter;
}

public MultiValuesSourceFieldConfig(StreamInput in) throws IOException {
Expand All @@ -94,6 +105,11 @@ public MultiValuesSourceFieldConfig(StreamInput in) throws IOException {
} else {
this.timeZone = in.readOptionalZoneId();
}
if (in.getVersion().onOrAfter(Version.V_7_8_0)) {
this.filter = in.readOptionalNamedWriteable(QueryBuilder.class);
} else {
this.filter = null;
}
}

public Object getMissing() {
Expand All @@ -112,6 +128,10 @@ public String getFieldName() {
return fieldName;
}

public QueryBuilder getFilter() {
return filter;
}

@Override
public void writeTo(StreamOutput out) throws IOException {
if (out.getVersion().onOrAfter(Version.V_7_6_0)) {
Expand All @@ -126,6 +146,9 @@ public void writeTo(StreamOutput out) throws IOException {
} else {
out.writeOptionalZoneId(timeZone);
}
if (out.getVersion().onOrAfter(Version.V_7_8_0)) {
out.writeOptionalNamedWriteable(filter);
}
}

@Override
Expand All @@ -143,6 +166,10 @@ public XContentBuilder toXContent(XContentBuilder builder, Params params) throws
if (timeZone != null) {
builder.field(ParseField.CommonFields.TIME_ZONE.getPreferredName(), timeZone.getId());
}
if (filter != null) {
builder.field(FILTER.getPreferredName());
filter.toXContent(builder, params);
}
builder.endObject();
return builder;
}
Expand All @@ -155,12 +182,13 @@ public boolean equals(Object o) {
return Objects.equals(fieldName, that.fieldName)
&& Objects.equals(missing, that.missing)
&& Objects.equals(script, that.script)
&& Objects.equals(timeZone, that.timeZone);
&& Objects.equals(timeZone, that.timeZone)
&& Objects.equals(filter, that.filter);
}

@Override
public int hashCode() {
return Objects.hash(fieldName, missing, script, timeZone);
return Objects.hash(fieldName, missing, script, timeZone, filter);
}

@Override
Expand All @@ -173,6 +201,7 @@ public static class Builder {
private Object missing = null;
private Script script = null;
private ZoneId timeZone = null;
private QueryBuilder filter = null;

public String getFieldName() {
return fieldName;
Expand Down Expand Up @@ -210,6 +239,11 @@ public Builder setTimeZone(ZoneId timeZone) {
return this;
}

public Builder setFilter(QueryBuilder filter) {
this.filter = filter;
return this;
}

public MultiValuesSourceFieldConfig build() {
if (Strings.isNullOrEmpty(fieldName) && script == null) {
throw new IllegalArgumentException("[" + ParseField.CommonFields.FIELD.getPreferredName()
Expand All @@ -223,7 +257,7 @@ public MultiValuesSourceFieldConfig build() {
"Please specify one or the other.");
}

return new MultiValuesSourceFieldConfig(fieldName, missing, script, timeZone);
return new MultiValuesSourceFieldConfig(fieldName, missing, script, timeZone, filter);
}
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -50,10 +50,10 @@ public static <VS extends ValuesSource, T> void declareCommon(

public static <VS extends ValuesSource, T> void declareField(String fieldName,
AbstractObjectParser<? extends MultiValuesSourceAggregationBuilder<VS, ?>, T> objectParser,
boolean scriptable, boolean timezoneAware) {
boolean scriptable, boolean timezoneAware, boolean filterable) {

objectParser.declareField((o, fieldConfig) -> o.field(fieldName, fieldConfig.build()),
(p, c) -> MultiValuesSourceFieldConfig.PARSER.apply(scriptable, timezoneAware).parse(p, null),
(p, c) -> MultiValuesSourceFieldConfig.parserBuilder(scriptable, timezoneAware, filterable).parse(p, null),
new ParseField(fieldName), ObjectParser.ValueType.OBJECT);
}
}
Loading

0 comments on commit 51c6f69

Please sign in to comment.