Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add by.column=F argument in frollapply #4887

Open
matthewgson opened this issue Feb 2, 2021 · 6 comments · May be fixed by #5575
Open

add by.column=F argument in frollapply #4887

matthewgson opened this issue Feb 2, 2021 · 6 comments · May be fixed by #5575
Assignees
Milestone

Comments

@matthewgson
Copy link

matthewgson commented Feb 2, 2021

I might not be fully knowledgeable about the use of frollapply, but as far as I have experimented I was not successful in running rolling custom functions that requires multiple columns.

I found zoo::rollapply function which has by.column=F argument that allowed me to do the job. Thing is, it is not fully compatible with data.table[,by=] arguments so I had to loop manually. Could this by.column argument, or similar be implemented in the future? Or if I'm missing existing feature or workaround, please let me know. Thank you.

@jangorecki
Copy link
Member

Good feature request. Please provide reproducible zoo example, or your current loop code.

@matthewgson
Copy link
Author

matthewgson commented Feb 4, 2021

Here's sample code similar to what I did.

library(data.table)
library(zoo)
iris = as.data.table(iris)

# rolling calculation on two columns

flow_dt = function(DT){ 
  # Data table with two columns
  # needs to be applied in the zoo::rollapply function. 
  flow = (DT[2,1] - DT[1,1] * (1+DT[2,2])) / (DT[1,1])
  return(flow)
}

return = rollapply(iris[,1:2], 2, flow_dt, by.column=F)
dim(iris) # 150 5
length(return) # 149
frollapply(iris[,1:2], 2, flow_dt) # Error in DT[2, 1] : incorrect number of dimensions

iris[, flow := c(NA, rollapply(iris[,1:2], 2, flow_dt, by.column=F))] # works fine
iris[, flow := c(NA, rollapply(iris[,1:2], 2, flow_dt, by.column=F)), by=Species] # error

I looped my code by splitting data.table by column and running zoo::rollapply on each.


split_table = split(iris, by='Species')
split_table
for (dt in split_table){
  dt[, flow := c(NA, rollapply(dt[,1:2], 2, flow_dt, by.column=F))]
}

result = rbindlist(split_table)

@jangorecki
Copy link
Member

jangorecki commented Oct 7, 2022

@matthewgson Hi there,
there is a PR candidate that implements by.column=FALSE

install.packages("data.table", repos="https://jangorecki.gitlab.io/data.table")
library(data.table)
iris = as.data.table(iris)
flow_dt = function(DT){ 
  flow = (DT[2,1] - DT[1,1] * (1+DT[2,2])) / (DT[1,1])
  return(flow)
}
frollapply(iris[,1:2], 2, flow_dt, by.column=FALSE, fill=data.table(Sepal.Length=NA_real_))
#     Sepal.Length
#            <num>
#  1:           NA
#  2:    -3.039216
#  3:    -3.240816
#  4:    -3.121277
#  5:    -3.513043
# ---             
#146:    -3.000000
#147:    -2.559701
#148:    -2.968254
#149:    -3.446154
#150:    -3.048387

It is currently in my private fork, because it is based on another branch, rather than master branch. Once the other branch will be merged to master I will rebase this one to master and push to github.
Manual can be found in https://jangorecki.gitlab.io/data.table/reference/frollapply.html
Testing is very welcome.

This was referenced Oct 7, 2022
@jangorecki
Copy link
Member

jangorecki commented Oct 7, 2022

Btw. I simplified your function as it was returning single row single column data.tables. Now its just scalar numeric, and fill is automatically handled as well. Easier to apply by group.

flow = function(DT) {
  v1 = DT[[1L]]
  v2 = DT[[2L]]
  (v1[2L] - v1[1L] * (1+v2[2L])) / v1[1L]
}
iris[, "flow" := frollapply(.SD, 2, flow, by.column=F), by=Species, .SDcols=1:2][]
#     Sepal.Length Sepal.Width Petal.Length Petal.Width   Species      flow
#            <num>       <num>        <num>       <num>    <fctr>     <num>
#  1:          5.1         3.5          1.4         0.2    setosa        NA
#  2:          4.9         3.0          1.4         0.2    setosa -3.039216
#  3:          4.7         3.2          1.3         0.2    setosa -3.240816
#  4:          4.6         3.1          1.5         0.2    setosa -3.121277
#  5:          5.0         3.6          1.4         0.2    setosa -3.513043
# ---                                                                      
#146:          6.7         3.0          5.2         2.3 virginica -3.000000
#147:          6.3         2.5          5.0         1.9 virginica -2.559701
#148:          6.5         3.0          5.2         2.0 virginica -2.968254
#149:          6.2         3.4          5.4         2.3 virginica -3.446154
#150:          5.9         3.0          5.1         1.8 virginica -3.048387

@jangorecki jangorecki added this to the 1.14.7 milestone Oct 10, 2022
@jangorecki jangorecki linked a pull request Jan 3, 2023 that will close this issue
@jangorecki jangorecki modified the milestones: 1.14.11, 1.15.1 Oct 29, 2023
@Waldi73
Copy link

Waldi73 commented Mar 5, 2024

@jangorecki, thanks for adding this option, it would be great for rolling regression like here.
Didn't find it yet in 1.15.2. Is merge planned in upcoming versions?

@jangorecki
Copy link
Member

Hopefully in 1.16.0 but there are many PRs on the way that has to be merged first. If you need it very much you can install branch of the PR that closes this issue. You are as well welcome to contribute by amending requested changes to PRs needed to have this one merged.

@MichaelChirico MichaelChirico modified the milestones: 1.16.0, 1.17.0 Jul 14, 2024
@MichaelChirico MichaelChirico modified the milestones: 1.17.0, 1.18.0 Jan 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants