Aggregations: add serial differencing pipeline aggregation #11196

polyfractal · 2015-05-15T21:08:05Z

No need for assignment or review yet, still need to write tests!

Serial Differencing

Serial differencing (or just differencing) is a technique where values in a time series are subtracted from itself at different time lags or periods. For example, the datapoint f(x) = f(x_t) - f(x_t-n), where n is the period being used.

A period of 1 is equivalent to a derivative: it is simply the change from one point to the next. Single periods are useful for removing constant, linear trends.

Single periods are also useful for transforming data into a stationary series. In this example, the Dow Jones is plotted over ~250 days. The raw data is not stationary, which would make it difficult to use with some techniques.

But once we plot the first-difference, it becomes a stationary series (we know this because the first difference is randomly distributed around zero, and doesn't seem to exhibit any pattern/behavior). The transformation reveals that the dataset is a random-walk model, which allows us to use further analysis.

Larger periods can be used to remove seasonal / cyclic behavior. In this example, a population of lemmings was synthetically generated with a sine wave + constant linear trend + random noise. The sine wave has a period of 30 days.

The first-difference removes the constant trend, leaving just a sine wave. The 30th-difference is then applied to the first-difference to remove the cyclic behavior, leaving a stationary series which is amenable to other analysis.

(Old PR and comments: #10190)

API

{
   "aggs": {
      "my_date_histo": {
         "date_histogram": {
            "field": "timestamp",
            "interval": "day"
         },
         "aggs": {
            "the_sum": {
               "sum": {
                  "field": "lemmings"
               }
            },
            "first_difference": {
               "serial_diff": {
                  "buckets_path": "the_sum",
                  "lag" : 1
               }
            },
            "thirtieth_difference": {
               "serial_diff": {
                  "buckets_path": "first_difference",
                  "lag" : 30
               }
            }
         }
      }
   }
}

polyfractal · 2015-07-07T19:25:23Z

@colings86 Low priority, but this is up for review whenever you have a few spare minutes. It is blissfully simple compared to moving_avg :)

Open question: currently, if there is not enough data (or the lag is too large), you just don't get any serial_diff metric values. We could also throw an exception, but that seems like poor behavior (the rest of your aggs may work fine). Thoughts?

colings86 · 2015-07-08T10:03:10Z

.../org/elasticsearch/search/aggregations/pipeline/serialdiff/SerialDiffPipelineAggregator.java

+        PipelineAggregatorStreams.registerStream(STREAM, TYPE.stream());
+    }
+
+    private static final Function<Aggregation, InternalAggregation> FUNCTION = new Function<Aggregation, InternalAggregation>() {


Could use PipelineAggregator.AGGREGATION_TRANFORM_FUNCTION instead of this?

colings86 · 2015-07-08T10:15:59Z

@polyfractal left some comment but I really like this aggregation and the documentation for it is great :)

To your question: I am struggling to decide what is best. As you say, throwing an exception seems unfriendly and would be different behaviour to other aggregations. But equally if we just don't output anything then it can easily confuse users as to why the aggregation is not working since there will be no message or indication anywhere of what caused the aggregation to not output any data.

polyfractal · 2015-07-10T16:39:34Z

@colings86 cleaned up, ready at your leisure :)

colings86 · 2015-07-10T16:50:23Z

LGTM

Aggregations: add serial differencing pipeline aggregation

$polyfractal$

$@polyfractal$ polyfractal added v2.0.0-beta1 WIP :Analytics/Aggregations Aggregations labels May 15, 2015

colings86 mentioned this pull request May 15, 2015

Add ability to perform computations on aggregations #9876

Closed

24 tasks

$@polyfractal$ polyfractal force-pushed the feature/aggs_2_0_diff branch from d6e0c55 to b1d07f0 Compare May 28, 2015 19:05

clintongormley changed the title ~~Aggregations: Add serial differencing aggregation~~ Add serial differencing aggregation Jun 8, 2015

clintongormley added the >feature label Jun 8, 2015

$@polyfractal$ polyfractal force-pushed the feature/aggs_2_0_diff branch 3 times, most recently from 4cc3ed9 to e133bdf Compare July 7, 2015 19:24

$@polyfractal$ polyfractal added review and removed WIP labels Jul 7, 2015

colings86 reviewed Jul 8, 2015
View reviewed changes

$@polyfractal$ polyfractal changed the title ~~Add serial differencing aggregation~~ Aggregations: add serial differencing pipeline aggregation Jul 10, 2015

$@polyfractal$

Aggregations: add serial differencing pipeline aggregation

e3f9d56

$@polyfractal$ polyfractal force-pushed the feature/aggs_2_0_diff branch from f443ded to e3f9d56 Compare July 10, 2015 22:24

polyfractal added a commit that referenced this pull request Jul 10, 2015

$@polyfractal$

Merge pull request #11196 from polyfractal/feature/aggs_2_0_diff

bb9c160

Aggregations: add serial differencing pipeline aggregation

$@polyfractal$ polyfractal merged commit bb9c160 into elastic:master Jul 10, 2015

kevinkluge removed the review label Jul 10, 2015

patjlm mentioned this pull request Dec 10, 2015

Different timeshift on same panel grafana/grafana#2093

Closed

arcolife mentioned this pull request Jul 13, 2016

use timeshifting feature to compare 2 different sa binaries distributed-system-analysis/sarjitsu#28

Closed

colings86 mentioned this pull request Aug 4, 2016

Should we remove/modify some of the experiment tags in the documentation #19798

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Aggregations: add serial differencing pipeline aggregation #11196

Aggregations: add serial differencing pipeline aggregation #11196

$@polyfractal$ polyfractal commented May 15, 2015

polyfractal commented Jul 7, 2015

colings86 Jul 8, 2015

colings86 commented Jul 8, 2015

polyfractal commented Jul 10, 2015

colings86 commented Jul 10, 2015

Aggregations: add serial differencing pipeline aggregation #11196

Aggregations: add serial differencing pipeline aggregation #11196

Conversation

polyfractal commented May 15, 2015

Serial Differencing

API

polyfractal commented Jul 7, 2015

colings86 Jul 8, 2015

Choose a reason for hiding this comment

colings86 commented Jul 8, 2015

polyfractal commented Jul 10, 2015

colings86 commented Jul 10, 2015

$@polyfractal$ polyfractal commented May 15, 2015