-
Notifications
You must be signed in to change notification settings - Fork 991
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rolling funs / shift could support logical window #3241
Comments
interesting package written by @gogonzo that already supports feature requested in this issue: https://github.com/gogonzo/runner The feature is called "handling missings" there. Another interesting feature in runner package is "varying window size" - it is implemented only to return all windows values, without applying any function, thus will require much more memory, but can be also more flexibly post-processed |
Another package implementing functionality described in this post is https://github.com/DavisVaughan/slide |
I am leaning towards removing this functionality from rolling functions implementation because this use case fits perfectly well into adaptive rolling functions which are part of rolling functions since the beginning. Therefore instead of adding support for that in our C code, we can simply provide a helper function that generates expected library(data.table)
id = c(0L,1L,2L,5L,6L,8L)
x = data.table(date=as.IDate(id), value=c(1,2,3,4,5,6))
x
# date value
#1: 1970-01-01 1
#2: 1970-01-02 2
#3: 1970-01-03 3
#4: 1970-01-06 4
#5: 1970-01-07 5
#6: 1970-01-09 6
## non-adaptive window of width 3
n = 3L
x[, n3 := frollsum(value, n)]
## adaptive window of 3 days
an = c(3L,3L,3L,1L,2L,2L)
x[, an3 := frollsum(value, an, adaptive=TRUE)]
x
# date value n3 an3
# <IDat> <num> <num> <num>
#1: 1970-01-01 1 NA NA
#2: 1970-01-02 2 NA NA
#3: 1970-01-03 3 6 6
#4: 1970-01-06 4 9 4
#5: 1970-01-07 5 12 9
#6: 1970-01-09 6 15 11 So the whole point is to provide function adapt = function(index, window) ... that for index column ( This will obviously not address the feature for |
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
updated timings based on library(slider)
library(data.table)
set.seed(108)
N = 1e6
n = 1e3
x = rnorm(N)
## slightly sparse
idx = sort(sample(N*1.1, N))
system.time(s <- slide_index_dbl(x, idx, mean, .before=n-1L, .complete=TRUE))
# user system elapsed
# 9.041 0.075 9.117
system.time(d <- frollmean(x, frolladapt(idx, n), adaptive=TRUE))
# user system elapsed
# 0.016 0.000 0.012
all.equal(d, s)
#[1] TRUE
## sparse
idx = sort(sample(N*2, N))
system.time(s <- slide_index_dbl(x, idx, mean, .before=n-1L, .complete=TRUE))
# user system elapsed
# 7.900 0.008 7.908
system.time(d <- frollmean(x, frolladapt(idx, n), adaptive=TRUE))
# user system elapsed
# 0.027 0.000 0.022
all.equal(d, s)
#[1] TRUE
|
@jangorecki a better slider benchmark is probably against But nice work with library(slider)
set.seed(108)
N = 1e6
n = 1e3
x = rnorm(N)
## slightly sparse
idx = sort(sample(N*1.1, N))
system.time(s <- slide_index_dbl(x, idx, mean, .before=n-1L, .complete=TRUE))
#> user system elapsed
#> 6.350 0.551 6.907
system.time(s2 <- slide_index_mean(x, idx, before=n-1L, complete=TRUE))
#> user system elapsed
#> 0.270 0.012 0.282
all.equal(s, s2)
#> [1] TRUE
## sparse
idx = sort(sample(N*2, N))
system.time(s <- slide_index_dbl(x, idx, mean, .before=n-1L, .complete=TRUE))
#> user system elapsed
#> 5.089 0.345 5.437
system.time(s2 <- slide_index_mean(x, idx, before=n-1L, complete=TRUE))
#> user system elapsed
#> 0.291 0.016 0.308
all.equal(s, s2)
#> [1] TRUE Created on 2023-01-08 with reprex v2.0.2.9000 |
Thanks for pointing out _mean version. I thought it was only for a non-index version and must have miss this one. frollapply doesn't really do much here, it's adaptive=TRUE in rolling functions that does almost all work. I actually developed it for different purpose, adaptive rolling functions. Unevenly spaced time series turned out to be a special case of it. |
I am filling this issue as a placeholder to evaluate users demand for such feature, at present there are no plans for incorporating it, so if you would need it be sure to upvote.
Extension of #2778.
Rolling functions and
shift
has been implemented to operate on physical order of data, which means that they do not handle "gaps" in, for example, time/date fields. If one wants to shift anIDate
type vector by one day, one has to ensure that every single day is included in vector. If it isn't then one has to expand vector (or eventually a data.table) and perform shift afterwards. This can be flexibly and time efficiently solved using "rolling join" but the problem is memory consumption, especially for very sparse data. In an ideal world we would prefer to isolateroll
functionality of rolling joins into helper function and re-use it in those cases.Some examples of expected output for input
x
:Related issue tagged as data.table: https://stackoverflow.com/questions/33553230/calculate-moving-average-every-n-hours
Worth to note that pandas, as of 0.23.4, do support rolling functions by logical order when
window
argument receivedoffset
instead ofint
: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.rolling.htmlThe text was updated successfully, but these errors were encountered: