Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

data.table::fcase with vectorized default #110

Closed
r2evans opened this issue Aug 12, 2020 · 2 comments
Closed

data.table::fcase with vectorized default #110

r2evans opened this issue Aug 12, 2020 · 2 comments
Labels
documentation Improvements or additions to documentation

Comments

@r2evans
Copy link

r2evans commented Aug 12, 2020

https://markfairbanks.github.io/tidytable/articles/speed_comparisons.html correctly states that data.table::fcase(..., default) must be length 1, but you can work around this using a catch-all rep(TRUE, .N):

as.data.table(mtcars)[, a := fcase(
  cyl == 4     , "_4",
  rep(TRUE, .N), as.character(cyl)
)][1:3,]
#     mpg cyl disp  hp drat    wt  qsec vs am gear carb  a
# 1: 21.0   6  160 110 3.90 2.620 16.46  0  1    4    4  6
# 2: 21.0   6  160 110 3.90 2.875 17.02  0  1    4    4  6
# 3: 22.8   4  108  93 3.85 2.320 18.61  1  1    4    1 _4

(A similar approach was mentioned several months ago in Rdatatable/data.table#4258 (comment).)

It's a little hack-y, but if you feel your speed comparison would be a better representation with vectorized defaults, then this is an option. Or perhaps as an additional comparison, demonstrating why perhaps having vectorized defaults (or non-vectorized "when" values) could be a good thing for data.table.

(More of a suggestion than an issue, feel free to close.)

@markfairbanks
Copy link
Owner

you can work around this using a catch-all

Hmm it might be a good idea to change the speed tests to use this. I'll think about it a bit.

As far as vectorized defaults, I was a little surprised something like this wasn't possible using fcase():

pacman::p_load(tidytable)

test_df <- data.table(x = 1:5)

test_df[, new_x := case.(x <= 3, 1, default = x)][]
#>    x new_x
#> 1: 1     1
#> 2: 2     1
#> 3: 3     1
#> 4: 4     4
#> 5: 5     5

This one could be covered by fifelse(), but it seems like you should be able to do a default of "leave the value alone". Oh well.

Either way, thanks for pointing out this workaround 👍

@r2evans
Copy link
Author

r2evans commented Aug 12, 2020

The data.table issue I linked to (it's still open) is discussing that at length, and it is on the "Master list of most-requested issues" (Rdatatable/data.table#3189). I agree, though my expectation of safe recycling is different than R's default behavior: I'd expect length .N or 1, nothing else (not .N/2, for instance).

@markfairbanks markfairbanks added the documentation Improvements or additions to documentation label Aug 18, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

2 participants