rolling functions, rolling aggregates, sliding window, moving average #2778

jangorecki · 2018-04-21T03:52:54Z

jangorecki · 2018-04-27T04:09:57Z

@mattdowle answering questions from PR

Why are we doing this inside data.table? Why are we integrating it instead of contributing to existing packages and using them from data.table?

There were 3 different issues created asking for that functionality in data.table. Also multiple SO questions tagged data.table. Users expects that to be in scope of data.table.
data.table fits perfectly for time-series data and rolling aggregates are pretty useful statistic there.

my guess is it comes down to syntax (features only possible or convenient if built into data.table; e.g. inside [...] and optimized) and building data.table internals into the rolling function at C level; e.g. froll* should be aware and use data.table indices and key. If so, more specifics on that are needed; e.g. a simple short example.

For me personally it is about speed and lack of chain of dependencies, nowadays not easy to achieve.
Key/indices could be useful for frollmin/frollmax, but it is unlikely that user will create index on measure variable. It is unlikely that user will make index on measure variable, also we haven't made this optimization for min/max yet. I don't see much sense for GForce optimization because allocated memory is not released after roll* call but returned as answer (as opposed to non-rolling mean, sum, etc.).

If there is no convincing argument for integrating, then we should contribute to the other packages instead.

I listed some above, if you are not convinced I recommend you to fill a question to data.table users, ask on twitter, etc. to check response. This feature was long time requested and by many users. If response won't convince you then you can close this issue.

jangorecki · 2023-08-30T11:13:26Z

rollcor
rollcov
rollrank
rollunqn
rolllm

went out of scope as of current moment. All can work using frollapply (not master branch but PRs), just not super fast. We could consider adding them to scope in future. For the current moment the following set of sum mean prod min max sd var median feels fine and complete to me.

roaldarbol · 2024-09-30T13:06:44Z

@jangorecki just following up here based on your comment in {roll}. I was happy to see that frollmedian and friends will be available in {data.table}! What is the status on frollmedian - do you have a rough ETA? I can see that the PR has not been worked in since January and currently fails checks.

jangorecki · 2024-09-30T14:58:50Z

No ETA (it requires multiple other branches to be merged first). I recommend to use rollmedian branch directly. It was made on a very stable point in master (cascading through other rolling related branches). I know it is being used in production.

roaldarbol · 2024-09-30T15:16:03Z

Sounds good, I'll try that. Which rolling functions are available on that branch? Just frollmedian or also others? (I'm doing some benchmarking, so just want make sure I get as many of your implementations as possible) 😊

jangorecki · 2024-09-30T20:48:45Z

Others as well, rollmedian is the most recent branch of all rolling branches so includes the rest as well. There is also rewritten frollapply to apply any function, which is multi threaded and memory optimized.

MichaelChirico · 2024-09-30T22:26:53Z

@roaldarbol if you're keen, the blocker for merging existing PR is lack of reviewer+author bandwidth. We could go for someone to either:

Help as reviewer: review to existing (large) PRs, starting from rolling functions: adaptive left, frollmax, frollapply adaptive, partial #5441 and including PRs under label:froll except the frollmaxN splits
Help as author: split up rolling functions: adaptive left, frollmax, frollapply adaptive, partial #5441 into digestible small PRs. An attempt was made in the frollmaxN splits but ideally we have a chain of PRs like for cbindlist()/mergelist() that is easily digested by reviewer

roaldarbol · 2024-10-02T12:26:10Z

Others as well, rollmedian is the most recent branch of all rolling branches so includes the rest as well. There is also rewritten frollapply to apply any function, which is multi threaded and memory optimized.

That's great, I'll give it a spin! @MichaelChirico I unfortunately don't have the time currently, but if the need is still there a few months from now I might have a look. 😊

roaldarbol · 2024-10-06T14:37:22Z

I've started benchmarking the various rolling stats across a bunch of packages, and the new data.table implementations are sweeping the floor! Hope we can get those PRs over the line!

PS As someone new here, would such a benchmark be worth adding to the Articles?

jangorecki · 2024-10-07T04:20:40Z

Thanks for benchmarking.

Note that readers will not really know how those functions scales, which reduces utility of the benchmark. It is always good to present multiple input vector sizes and as well multiple window sizes. If there are at least three different sizes then it is possible to conclude if it scales linearly or worse (or better) than linearly.
This scaling effect give a bit more insight than just single fixed input size and window size.
To give practical example, you can have situation where one tool will be fastest on window size 10 but will be slowest on window size 1000.
And here the median result will probably be different if you increase window size. I could add RollingWindow package to this benchmark: https://github.com/jangorecki/rollbench so it will be easily visible.

When benchmarking custom function with rollapply, I would go for some real custom function as there may be optimization detecting "sum" and switching to optimized sum.

Definitely make sense to add to articles, this is what articles are for in data.table wiki page.

roaldarbol · 2024-10-07T08:52:25Z

Oh yeah, absolutely, I'm quite aware of the scaling dimension - I started out with those here: jasonjfoster/roll#44. But I also find that a lot of the nuance in the smaller values disappears (e.g. when data.table is 5x faster than another fast function), and I want to show both, so I'm currently experimenting with better ways (or combinations of ways) to visualize benchmarks. Hope that makes sense. :-)

Thanks for the note on custom functions, I'll change that. And then I'll add it to articles once I've found a preferred way of visualising the scaling.

jangorecki added the feature request label Apr 21, 2018

jangorecki added this to the v1.11.2 milestone Apr 21, 2018

jangorecki self-assigned this Apr 21, 2018

This was referenced Apr 21, 2018

[R-Forge #2187] Add/document rolling mean, median etc.. combined with i #624

Closed

[R-Forge #2185] Add features/documentation for sliding windows with data.table #626

Closed

[Request] rollapply written in data.table #1855

Closed

This comment was marked as resolved.

Sign in to view

This comment was marked as outdated.

Sign in to view

jangorecki added a commit that referenced this issue Apr 24, 2018

roll.md moved to gh issue #2778

61d2ff6

jangorecki mentioned this issue Apr 24, 2018

[WIP] Rolling functions: rollmean #2795

Closed

This comment was marked as outdated.

Sign in to view

jangorecki added a commit that referenced this issue May 19, 2018

roll.md moved to gh issue #2778

7206ea5

jangorecki added a commit that referenced this issue May 29, 2018

roll.md moved to gh issue #2778

51f1310

jangorecki mentioned this issue Jul 3, 2018

rolling mean #2961

Merged

This comment was marked as outdated.

Sign in to view

This comment was marked as off-topic.

Sign in to view

ja-thomas mentioned this issue Nov 8, 2018

make sliding window faster QuayAu/fxtract#8

Closed

jangorecki added a commit that referenced this issue Nov 11, 2018

roll.md moved to gh issue #2778

5c1730d

jangorecki mentioned this issue Dec 15, 2018

rollmean post merge improvements #3224

Closed

3 tasks

st-pasha mentioned this issue Dec 21, 2018

Rolling aggregate support based on windows within a DT h2oai/datatable#1500

Open

jangorecki removed this from the 1.12.0 milestone Jan 5, 2019

This comment was marked as resolved.

Sign in to view

This comment was marked as outdated.

Sign in to view

MichaelChirico added the High label May 30, 2020

jangorecki removed the High label Jun 3, 2020

This comment was marked as outdated.

Sign in to view

This comment was marked as resolved.

Sign in to view

AdrianAntico mentioned this issue Aug 27, 2021

Benchmark on bigger data SebKrantz/collapse#184

Closed

jangorecki mentioned this issue Aug 31, 2022

rolling functions: adaptive left, frollmax, frollapply adaptive, partial #5441

Open

10 tasks

This comment was marked as outdated.

Sign in to view

jangorecki added the froll label Sep 26, 2022

jangorecki mentioned this issue Sep 2, 2023

more rolling functions #5682

Open

MichaelChirico added the top request One of our most-requested issues label Apr 14, 2024

roaldarbol mentioned this issue Dec 9, 2024

Methods for smoothing/filtering, filter_ functions roaldarbol/animovement#41

Closed

rolling functions, rolling aggregates, sliding window, moving average #2778

rolling functions, rolling aggregates, sliding window, moving average #2778

Comments

jangorecki commented Apr 21, 2018 • edited Loading

rolling functions

features

This comment was marked as resolved.

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as outdated.

jangorecki commented Apr 27, 2018 • edited Loading

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as off-topic.

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as resolved.

This comment was marked as outdated.

jangorecki commented Aug 30, 2023 • edited Loading

roaldarbol commented Sep 30, 2024 • edited Loading

jangorecki commented Sep 30, 2024

roaldarbol commented Sep 30, 2024

jangorecki commented Sep 30, 2024 • edited Loading

MichaelChirico commented Sep 30, 2024

roaldarbol commented Oct 2, 2024

roaldarbol commented Oct 6, 2024

jangorecki commented Oct 7, 2024 • edited Loading

roaldarbol commented Oct 7, 2024

jangorecki commented Apr 21, 2018 •

edited

Loading

jangorecki commented Apr 27, 2018 •

edited

Loading

jangorecki commented Aug 30, 2023 •

edited

Loading

roaldarbol commented Sep 30, 2024 •

edited

Loading

jangorecki commented Sep 30, 2024 •

edited

Loading

jangorecki commented Oct 7, 2024 •

edited

Loading