Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Enhancement] forder na.last=NA on characters #1580

Closed
arunsrinivasan opened this issue Mar 7, 2016 · 1 comment
Closed

[Enhancement] forder na.last=NA on characters #1580

arunsrinivasan opened this issue Mar 7, 2016 · 1 comment

Comments

@arunsrinivasan
Copy link
Member

forder on char with sort=FALSE is great (for speedup). And na.last=FALSE and na.last=TRUE need not hold much weight (since sort=FALSE).

However, na.last=NA implies omitting NA rows, sorting has no say here, I think. i.e., irrespective of whether sort=FALSE or TRUE, na.last=NA should not return indices of NAs in order indices.

Right now:

x = c("b", "b",  NA, "a", "b", "a")
data.table:::forderv(x, sort=FALSE, na.last=NA, retGrp=TRUE) 
[1] 1 2 5 3 4 6
attr(,"starts")
[1] 1 4 5
attr(,"maxgrpn")
[1] 3

but it should be:

[1] 1 2 5 0 4 6
attr(,"starts")
[1] 1 4 5
attr(,"maxgrpn")
[1] 3

This is useful in at least one scenario. Speeding up uniqueN when na.rm=TRUE. See #1455.

@arunsrinivasan
Copy link
Member Author

Hm, doesn't seem to speedup any better than sort=TRUE. In fact, sort=FALSE is slower (on 100,000 unique values). Sticking to current approach.

@arunsrinivasan arunsrinivasan removed this from the v1.9.8 milestone Mar 7, 2016
@arunsrinivasan arunsrinivasan reopened this Mar 8, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant