Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add fct_sort function #156

Closed
wants to merge 5 commits into from
Closed

Add fct_sort function #156

wants to merge 5 commits into from

Conversation

russHyde
Copy link

@russHyde russHyde commented Jan 8, 2019

fct_sort takes a factor or character-vector (implictly converted to factor) and reorders the levels of that factor using a user-specified function .fun.

Code added to R/sort.R, unit tests into tests/testthat/test-fct_sort.R.

An example showing how to use fct_sort to sort number-containing character levels by the contained number.

Ensures that the values in the factor levels are unchanged by the sorting function, and uses ... to pass further arguments for the sorting function.

`fct_sort` takes a factor or character-vector (implictly converted to factor) and reorders the `levels` of that factor using a user-specified function `.fun`.

Code added to `R/sort.R`, unit tests into `tests/testthat/test-fct_sort.R`.

An example showing how to use `fct_sort` to sort number-containing character `levels` by the contained number.
@hadley hadley changed the title fix #117: add fct_sort function Add fct_sort function Jan 19, 2019
@hadley
Copy link
Member

hadley commented Jan 19, 2019

Fixes #117

R/sort.R Outdated

old_levels <- levels(f)
new_levels <- .fun(old_levels, ...)
stopifnot(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you mind making this a slightly friendlier function using the style guide at http://style.tidyverse.org/error-messages.html ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added more informative messages to indicate:

  • when .fun returns with something other than a vector

  • when the sorted-levels are of different length from the input-levels

  • when the sorted-levels contains at least one level that is absent from the input-levels

R/sort.R Outdated
@@ -0,0 +1,38 @@
#' Automatically sort factor levels according to a user-defined function
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe remove "Automatically"?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The description now starts #' Sort factor levels ...

R/sort.R Outdated
#' # naive alphanumeric sorting "1" < "10" < "2"
#' fct_sort(chr_fac, sort)
#'
#' # number-based alphanumeric sorting "1" < "2" < "10"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is maybe a bit complicated for an example? Maybe just do something with alphabetical sorting? And maybe sample() to show random reordering?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I replaced the example. The new examples includes

  • alphabetical-sort,
  • alphabetical-decreasing-sort (equiv to fct_rev)
  • alphabetical sort with an out-of-order baseline level (equiv to fct_relevel)
  • sampling from the levels

@jennybc
Copy link
Member

jennybc commented Jan 20, 2019

The name fct_sort() feels potentially misleading to me, as it's not clear if it acts on the factor (what I would assume) or its levels (what actually happens).

Worth contemplating: could this be achieved this by making fct_relevel() more capable? I.e. make it work more like purrr::set_names(), which can accept a vector of names, individual names, or a function or formula.

If that's a no-go, I still think it's worth trying to think of a better name here.

@huftis
Copy link

huftis commented Jan 20, 2019

The name fct_sort() feels potentially misleading to me, as it's not clear if it acts on the factor (what I would assume) or its levels (what actually happens).

The same is true for other forcats functions, like fct_rev() and fct_shuffle().

But perhaps fct_sort_levels() would be work? It doesn’t really take any longer time to write if you have auto-completion enabled.

@russHyde
Copy link
Author

Totally agree with @jennybc and @huftis that a better name should be applied. Happy to rewrite as fct_sort_levels.

I didn't consider modifying the fct_relevel formals, but that is a good proposal and would be possible, eg, as fct_relevel(.f, ..., after = 0L, .fun)

@huftis
Copy link

huftis commented Jan 20, 2019

I didn't consider modifying the fct_relevel formals, but that is a good proposal and would be possible, eg, as fct_relevel(.f, ..., after = 0L, .fun)

I would prefer having a separate fct_sort_levels() function. IMHO, having the sorting function as the second argument is much nicer. It’s faster and feels cleaner to write fct_sort_levels(x, mysort) than fct_relevel(x, .fun = mysort).

@hadley
Copy link
Member

hadley commented Jan 21, 2019

I think @jennybc was suggesting that fct_relevel(x, mysort) could be made to work.

Renamed man/R/tests files to correspond to the renamed function.

Changed the description of `fct_sort_levels` to "Sort factor levels ..."
following code review.
@russHyde
Copy link
Author

I believe I have made all changes suggested by @hadley and @huftis .
The function has been renamed fct_sort_levels.

#' Sort factor levels according to a user-defined function
#'
#' @param .f A factor (or character vector).
#' @param .fun A function that will sort or permute the existing factor levels.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#' @param .fun A function that will sort or permute the existing factor levels.
#' @param .fun A function that will permute the existing factor levels.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hadley : Given the emphasis on permutation in your edited version of the docs, would fct_permute_levels be a more appropriate name for the function?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or, indeed, lvls_permute?

#' @param .f A factor (or character vector).
#' @param .fun A function that will sort or permute the existing factor levels.
#' It must accept one character argument and return a character argument of
#' the same length as it's input.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#' the same length as it's input.
#' the same length as its input.

#' # Default (alphabetical) level-sorting:
#' fct_sort_levels(medieval_experiment, sort)
#'
#' # Reversed ordering (equivalent to `fct_rev`):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not in general equivalent to fct_rev(), so I think it'd be better to just remove that comparison

#' # Reversed ordering (equivalent to `fct_rev`):
#' fct_sort_levels(medieval_experiment, sort, decreasing = TRUE)
#'
#' # Level-sorting with "Control" as the first level
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And this seems inferior to fct_relevel() so I'd also drop it.

#' medieval_experiment,
#' function(x) c("Control", sort(setdiff(x, "Control"))))
#'
#' # Randomised sorting of the levels:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#' # Randomised sorting of the levels:
#' # Randomly permute the levels:

call. = FALSE
)
}
if (length(old_levels) != length(new_levels)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels a little overwrought to me. Doesn't fct_relevel() already give error messages?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it provdes an error in this setting - if passed a (vector) proper subset of a factor's levels, fct_relevel just pulls the subset to the start of the factor's levels:

fct_relevel(factor(letters[1:3]), c("b", "a"))
[1] a b c
Levels: b a c

(I thought the error messages might be overkill, but was trying to follow the examples in your style guide)

@hadley
Copy link
Member

hadley commented Feb 16, 2019

After thinking about @jennybc's comment more, I think this is best implemented as a special case of fct_relevel(). Your PR was very useful to help us think through the API and implementation — thanks!

@hadley hadley closed this Feb 16, 2019
@russHyde
Copy link
Author

thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants