forked from elastic/elasticsearch
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add Student's t-test aggregation support
Adds t_test metric aggregation that can perform paired and unpaired two-sample t-tests. In this PR support for filters in unpaired is still missing. It will be added in a follow-up PR. Relates to elastic#53692
- Loading branch information
Showing
26 changed files
with
2,447 additions
and
5 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
111 changes: 111 additions & 0 deletions
111
docs/reference/aggregations/metrics/t-test-aggregation.asciidoc
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,111 @@ | ||
[role="xpack"] | ||
[testenv="basic"] | ||
[[search-aggregations-metrics-ttest-aggregation]] | ||
=== TTest Aggregation | ||
|
||
A `t_test` metrics aggregation that performs a statistical hypothesis test in which the test statistic follows a Student's t-distribution | ||
under the null hypothesis on numeric values extracted from the aggregated documents or generated by provided scripts. | ||
|
||
==== Syntax | ||
|
||
A `t_test` aggregation looks like this in isolation: | ||
|
||
[source,js] | ||
-------------------------------------------------- | ||
{ | ||
"t_test": { | ||
"a": "value_before", | ||
"b": "value_after", | ||
"type": "paired" | ||
} | ||
} | ||
-------------------------------------------------- | ||
// NOTCONSOLE | ||
|
||
Assuming that we have a record of node start up times before | ||
and after upgrade, let's look at a ttest to see if upgrade affected | ||
the node start up time in a meaningful way. | ||
|
||
[source,console] | ||
-------------------------------------------------- | ||
GET node_upgrade/_search | ||
{ | ||
"size": 0, | ||
"aggs" : { | ||
"startup_time_ttest" : { | ||
"t_test" : { | ||
"a" : {"field": "startup_time_before" } <1>, | ||
"b" : {"field": "startup_time_after"} <2>, | ||
"type": "paired" | ||
} | ||
} | ||
} | ||
} | ||
-------------------------------------------------- | ||
// TEST[setup:node_upgrade] | ||
<1> The field `startup_time_before` must be a numeric field | ||
<b> The field `startup_time_after` must be a numeric field | ||
<1> The field `startup_time_before` since we have data from the same nodes, we are using paired t-test. | ||
|
||
The response will look like this: | ||
|
||
[source,console-result] | ||
-------------------------------------------------- | ||
{ | ||
... | ||
"aggregations": { | ||
"startup_time_ttest": { | ||
"value": 0.1914368843365979 | ||
} | ||
} | ||
} | ||
-------------------------------------------------- | ||
// TESTRESPONSE[s/\.\.\./"took": $body.took,"timed_out": false,"_shards": $body._shards,"hits": $body.hits,/] | ||
|
||
==== T-Test Types | ||
|
||
The `t_test` aggregation supports unpaired and paired two-sample t-tests. The type of the test can be specified using the `type` parameter: | ||
|
||
`"type": "paired"`:: performs paired t-test | ||
`"type": "homoscedastic"`:: performs two-sample equal variance test | ||
`"type": "heteroscedastic"`:: performs two-sample unequal variance test (this is default) | ||
|
||
==== Script | ||
|
||
The `t_test` metric supports scripting. For example, if we need to adjust out load times for the before values, we could use | ||
a script to recalculate them on-the-fly: | ||
|
||
[source,console] | ||
-------------------------------------------------- | ||
GET node_upgrade/_search | ||
{ | ||
"size": 0, | ||
"aggs" : { | ||
"startup_time_ttest" : { | ||
"t_test" : { | ||
"a": { | ||
"script" : { | ||
"lang": "painless", | ||
"source": "doc['startup_time_before'].value - params.adjustment", <1> | ||
"params" : { | ||
"adjustment" : 10 <2> | ||
} | ||
} | ||
}, | ||
"b": { | ||
"field": "startup_time_after" <3> | ||
}, | ||
"type": "paired" | ||
} | ||
} | ||
} | ||
} | ||
-------------------------------------------------- | ||
// TEST[setup:node_upgrade] | ||
|
||
<1> The `field` parameter is replaced with a `script` parameter, which uses the | ||
script to generate values which percentiles are calculated on | ||
<2> Scripting supports parameterized input just like any other script | ||
<3> We can mix scripts and fields | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
ec2544ab27e110d2d431bdad7d538ed509b21e62 |
Oops, something went wrong.