Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new functions fold and fold_over #2

Open
TimTeaFan opened this issue May 11, 2021 · 3 comments
Open

Add new functions fold and fold_over #2

TimTeaFan opened this issue May 11, 2021 · 3 comments
Assignees
Labels
enhancement New feature or request further discussion needed this issue needs further discussion next major release try to implement this in the next major release

Comments

@TimTeaFan
Copy link
Owner

TimTeaFan commented May 11, 2021

Based on this gist fold and fold_over might be useful add on functions for a future version of dplyover. There should be a better name than fold for this kind of functions.

likert_col <- function(n = 10) {
  sample(7, size = 10, replace = TRUE)
}

# toy data
dat <- tibble(
  cat_1 = likert_col(),
  cat_2 = likert_col(),
  cat_3 = likert_col(),
  dog_1 = likert_col(),
  dog_2 = likert_col()
)

# `fold` does not exist yet
dat %>% 
  transmute(fold(starts_with("cat"),
                 list(sum = ~ rowSums(.x),
                      mean = ~ rowMeans(.x))))

# A tibble: 10 x 2
   cat_sum cat_mean
     <dbl>    <dbl>
 1      11     3.67
 2      10     3.33
 3       6     2   
 4       4     1.33
 5      10     3.33
 6       7     2.33
 7      12     4   
 8      12     4   
 9      17     5.67
10      13     4.33

# `fold_over` does not exist yet
dat %>% 
  transmute(fold_over(cut_names("_[0-9]*$"),
                      ~ starts_with(.x),
                      ~ rowSums(.x)))

# A tibble: 10 x 2
     cat   dog
   <dbl> <dbl>
 1    11    11
 2    10    10
 3     6     6
 4     4     4
 5    10    10
 6     7     7
 7    12    12
 8    12    12
 9    17    17
10    13    13
@TimTeaFan TimTeaFan added the enhancement New feature or request label May 11, 2021
@TimTeaFan TimTeaFan self-assigned this May 11, 2021
@TimTeaFan TimTeaFan added the next major release try to implement this in the next major release label May 22, 2021
@TimTeaFan
Copy link
Owner Author

TimTeaFan commented May 22, 2021

I think fold would be a great extension of {dplyover}, but a better name should be found given that {rsample} uses vfold and {furrr} has also a fold function.

Then again, fold does pretty much what it says. It folds down several columns of a data.frame to one column, for example by calculating the rowMean.

@TimTeaFan TimTeaFan added the further discussion needed this issue needs further discussion label May 22, 2021
@TimTeaFan TimTeaFan added this to the next major release milestone May 22, 2021
@vorpalvorpal
Copy link

Firstly, thanks for the package. I think this has a far more common use case than Hadley suggested.

Secondly, maybe I'm misunderstanding the purpose of fold here, but wouldn't

summarise(over(starts_with("cat"),
                 list(sum = ~ rowSums(.x),
                      mean = ~ rowMeans(.x))))

do the same thing? At least that way you avoid using the name "fold".

@TimTeaFan
Copy link
Owner Author

TimTeaFan commented Aug 18, 2021

Thank you for your feedback! Unfortunately over and the other functions in the over-across function family don't work like that. over loops over a vector and creates a new column for each element. Apart from that over does not support tidy-select syntax in its .x argument.

However, we could create a named list of data.frames on the fly as input to over and then produce a similar outcome. Having a dedicated function like fold and fold_over would still be helpful I guess, since we wouldn't need to use one or several select calls as input to over.

# instead of fold_over we could do:
dat %>% 
  summarise(over(list(cat = select(., starts_with("cat")),
                      dog = select(., starts_with("dog"))),
                 list(sum  = rowSums,
                      mean = rowMeans)))

#> # A tibble: 10 x 4
#>    cat_sum cat_mean dog_sum dog_mean
#>      <dbl>    <dbl>   <dbl>    <dbl>
#>  1      12     4         12      6  
#>  2      11     3.67       3      1.5
#>  3      19     6.33       4      2  
#>  4       6     2          9      4.5
#>  5       9     3         14      7  
#>  6       4     1.33       7      3.5
#>  7       7     2.33      10      5  
#>  8       8     2.67       3      1.5
#>  9       9     3          9      4.5
#> 10      10     3.33       7      3.5

Created on 2021-08-19 by the reprex package (v0.3.0)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request further discussion needed this issue needs further discussion next major release try to implement this in the next major release
Projects
None yet
Development

No branches or pull requests

2 participants