Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fct_lump() shouldn't relabel if no lumping occurs #130

Closed
ahhaque opened this issue May 30, 2018 · 6 comments · Fixed by #168
Closed

fct_lump() shouldn't relabel if no lumping occurs #130

ahhaque opened this issue May 30, 2018 · 6 comments · Fixed by #168
Labels
feature a feature request or enhancement

Comments

@ahhaque
Copy link

ahhaque commented May 30, 2018

If in a vector, only one value appears less than the 'prop' times, fct_lump creates a new level called 'other'. Ideally, it should keep the original level name as only one level was affected.

Example:
nRows <- 500
vec <- as.factor(c(rep("X",0.32nRows),rep("Y",0.08nRows), rep("Z",0.4nRows), rep('W', 0.2nRows)))
rebinned_vec <- fct_lump(vec, prop = 0.1)

prop.table(table(rebinned_vec)) gives the following output:
W X Z Other
0.20 0.32 0.40 0.08

In the above code, only the level 'Y' should be affected as it has less than 10% share. But since this is the only level affected, isn't it expected that fct_lump will leave the level 'Y' as it is rather than creating the 'other' level?

@hadley hadley added the reprex needs a minimal reproducible example label Jan 4, 2019
@hadley

This comment has been minimized.

@davidbody

This comment has been minimized.

@davidbody
Copy link

library(forcats)

x <- as_factor(c("apple", "apple", "apple", "banana", "banana", "orange"))
fct_lump(x, 2)
#> [1] apple  apple  apple  banana banana Other 
#> Levels: apple banana Other

Created on 2019-01-19 by the reprex package (v0.2.1)

@zhiiiyang
Copy link
Contributor

Here is the reprex!

nRows <- 500
vec <- as.factor(c(rep("X",0.32*nRows),
                   rep("Y",0.08*nRows), 
                   rep("Z",0.4*nRows), 
                   rep('W', 0.2*nRows)))
rebinned_vec <- forcats::fct_lump(vec, prop = 0.1)

prop.table(table(rebinned_vec)) 
#> rebinned_vec
#>     W     X     Z Other 
#>  0.20  0.32  0.40  0.08
#> rebinned_vec
#>     W     X     Z Other 
#>  0.20  0.32  0.40  0.08

Created on 2019-01-19 by the reprex package (v0.2.1)

@hadley
Copy link
Member

hadley commented Jan 19, 2019

Thanks for the reprexes! Do either of you want to take a stab at fixing this issue?

@hadley hadley added feature a feature request or enhancement and removed reprex needs a minimal reproducible example labels Jan 19, 2019
@hadley hadley changed the title Issue with fct_lump rebinning fct_lump() shouldn't relabel if no lumping occurs Jan 19, 2019
@zhiiiyang
Copy link
Contributor

Thanks for the reprexes! Do either of you want to take a stab at fixing this issue?

Working on it now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature a feature request or enhancement
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants