Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add MovingFunction pipeline aggregation #25137

Closed
wants to merge 1 commit into from

Conversation

polyfractal
Copy link
Contributor

This adds a moving_fn pipeline aggregation which executes a function on the values in a sliding window. For example, you can find the minimum value within a window of time, which then slides forward to find the next min, etc etc.

It supports four pre-built functions:

  • Min
  • Max
  • Sum
  • Median

It also supports user-defined scripts for custom functionality.

The PR looks larger than it really is for two reasons: first, the movavg package was renamed to moving so that it could accommodate the new agg. Second, some of the methods of MovAvgModel were moved into a new MovModel class (and MovAvgModel now extends that). That way simple functions like min can extend MovModel without extra cruft like predictions.

Also cleaned up some generics in movavg.... we only use Doubles, no need for generic numeric support.

Closes #17607, although we may want to add some more functions in the future as mentioned in the issue (moving std dev, etc).

This adds a pipeline aggregation which executes a function on the
values in a sliding window.  For example, you can find the minimum
value within a window of time, which then slides forward to find
the next min, etc etc.

It supports four pre-built functions:

- Min
- Max
- Sum
- Median

It also supports user-defined scripts for custom functionality.
Copy link
Member

@rjernst rjernst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need to have all these models? Can this just use scripts? min/max/etc are available already, and we will soon have custom whitelists which will allow adding other statistical functions just for this context.

if (script.getParams() != null) {
vars.putAll(script.getParams());
}
vars.put("values", values.toArray(new Double[values.size()]));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great case to use the new script contexts. Can you instead create a MovingFunctionScript class and have the execute method take double[] values?

@polyfractal
Copy link
Contributor Author

Regarding using a script for everything and getting rid of pre-built functions, we discussed this in the search meeting today.

The general consensus is that if we can offer a really simple one-liner that'd be fine. But if the one-liner is the standard java streaming min, it'll be too complicated and we should just stick with the pre-built functions to give the user a simpler interface.

E.g.

  • return Math.min(params.values); is acceptable
  • return params.values.stream().mapToDouble(Double::doubleValue).min().orElse(Double.NaN); is too long

@rjernst
Copy link
Member

rjernst commented Jun 12, 2017

Math.min takes 2 numbers, not an array. There are two options, either using Collections.min(values), or when whitelisting is ready in painless, adding a min augmentation that works on double[] or whatever, so that you can do values.min().

@polyfractal
Copy link
Contributor Author

Right, I meant the syntactic-sugar equivalent of Math.min() via whitelisting (or whatever), although I had forgotten Collections had min/max.

That still leaves sum and median, which I don't think has a (nice) one-liner available at the present time? How far off is custom whitelisting in Painless?

@rjernst
Copy link
Member

rjernst commented Jun 12, 2017

How far off is custom whitelisting in Painless?

Not far. Should make 6.0, I think.

@polyfractal
Copy link
Contributor Author

@clintongormley what're your thoughts on using pre-built functions now vs. waiting on custom-whitelists in Painless for 6.0?

@clintongormley
Copy link
Contributor

@polyfractal I'd wait for painless support in this case. This PR can be merged without waiting for Painless, but it will benefit from those changes when they arrive

@polyfractal
Copy link
Contributor Author

👍 I'll update with the script context stuff, then put it on the shelf until whitelisting is in :)

@tusharm
Copy link

tusharm commented Jan 18, 2018

Any updates on this? I am looking to use moving averages with medians/percentiles - this would be helpful.

@polyfractal
Copy link
Contributor Author

The blocking functionality (whitelists in painless) just got merged in #28161, so I'm hoping to get back to this in the somewhat near future. Need to finish up some other priority work first but I think this is mostly unblocked now. :)

@polyfractal
Copy link
Contributor Author

I'm going to close this, because the PR will need significant changes to incorporate the Scripting functionality.

We definitely want this agg, so it's not an indication of lack of interest... just logistically it'll be easier to start a new PR than try to update this one. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants