-
Notifications
You must be signed in to change notification settings - Fork 127
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fct_lump() shouldn't relabel if no lumping occurs #130
Labels
feature
a feature request or enhancement
Comments
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
library(forcats)
x <- as_factor(c("apple", "apple", "apple", "banana", "banana", "orange"))
fct_lump(x, 2)
#> [1] apple apple apple banana banana Other
#> Levels: apple banana Other Created on 2019-01-19 by the reprex package (v0.2.1) |
Here is the reprex! nRows <- 500
vec <- as.factor(c(rep("X",0.32*nRows),
rep("Y",0.08*nRows),
rep("Z",0.4*nRows),
rep('W', 0.2*nRows)))
rebinned_vec <- forcats::fct_lump(vec, prop = 0.1)
prop.table(table(rebinned_vec))
#> rebinned_vec
#> W X Z Other
#> 0.20 0.32 0.40 0.08
#> rebinned_vec
#> W X Z Other
#> 0.20 0.32 0.40 0.08 Created on 2019-01-19 by the reprex package (v0.2.1) |
Thanks for the reprexes! Do either of you want to take a stab at fixing this issue? |
hadley
added
feature
a feature request or enhancement
and removed
reprex
needs a minimal reproducible example
labels
Jan 19, 2019
hadley
changed the title
Issue with fct_lump rebinning
fct_lump() shouldn't relabel if no lumping occurs
Jan 19, 2019
Working on it now. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
If in a vector, only one value appears less than the 'prop' times, fct_lump creates a new level called 'other'. Ideally, it should keep the original level name as only one level was affected.
Example:
nRows <- 500
vec <- as.factor(c(rep("X",0.32nRows),rep("Y",0.08nRows), rep("Z",0.4nRows), rep('W', 0.2nRows)))
rebinned_vec <- fct_lump(vec, prop = 0.1)
prop.table(table(rebinned_vec)) gives the following output:
W X Z Other
0.20 0.32 0.40 0.08
In the above code, only the level 'Y' should be affected as it has less than 10% share. But since this is the only level affected, isn't it expected that fct_lump will leave the level 'Y' as it is rather than creating the 'other' level?
The text was updated successfully, but these errors were encountered: