-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sort WORDLIST in a locale-independent way #48
Comments
Hi thanks for the suggestion. I'm not super familiar with locales can you give an example how the sorting is different? I rather not changes locales because that has funky side effects in some systes (though |
Here is a reprex: test <- c(
letters,
LETTERS,
"Hugo's words",
"change"
)
library(withr)
with_locale(c("LC_COLLATE" = "fr_FR.UTF-8"), sort(test))
#> [1] "a" "A" "b" "B" "c"
#> [6] "C" "change" "d" "D" "e"
#> [11] "E" "f" "F" "g" "G"
#> [16] "h" "H" "Hugo's words" "i" "I"
#> [21] "j" "J" "k" "K" "l"
#> [26] "L" "m" "M" "n" "N"
#> [31] "o" "O" "p" "P" "q"
#> [36] "Q" "r" "R" "s" "S"
#> [41] "t" "T" "u" "U" "v"
#> [46] "V" "w" "W" "x" "X"
#> [51] "y" "Y" "z" "Z"
with_locale(c("LC_COLLATE" = "C"), sort(test))
#> [1] "A" "B" "C" "D" "E"
#> [6] "F" "G" "H" "Hugo's words" "I"
#> [11] "J" "K" "L" "M" "N"
#> [16] "O" "P" "Q" "R" "S"
#> [21] "T" "U" "V" "W" "X"
#> [26] "Y" "Z" "a" "b" "c"
#> [31] "change" "d" "e" "f" "g"
#> [36] "h" "i" "j" "k" "l"
#> [41] "m" "n" "o" "p" "q"
#> [46] "r" "s" "t" "u" "v"
#> [51] "w" "x" "y" "z"
with_locale(c("LC_COLLATE" = "sk_SK.UTF-8"), sort(test))
#> [1] "a" "A" "b" "B" "c"
#> [6] "C" "d" "D" "e" "E"
#> [11] "f" "F" "g" "G" "h"
#> [16] "H" "Hugo's words" "change" "i" "I"
#> [21] "j" "J" "k" "K" "l"
#> [26] "L" "m" "M" "n" "N"
#> [31] "o" "O" "p" "P" "q"
#> [36] "Q" "r" "R" "s" "S"
#> [41] "t" "T" "u" "U" "v"
#> [46] "V" "w" "W" "x" "X"
#> [51] "y" "Y" "z" "Z" Created on 2020-04-03 by the reprex package (v0.3.0) There are also plenty of other examples where it can go wrong because of diacritics in names but I can't find a good reprex for this right now. For a simple real-life example, see ropensci/lightr@515d193#diff-89da0e7dae7c72fd9541f184b5112343L13-L15 where EDIT: from
|
And if you think changing locales is a bad idea, test <- c(
letters,
LETTERS,
"Hugo's words",
"change"
)
library(withr)
with_locale(c("LC_COLLATE" = "fr_FR.UTF-8"), sort(test, method = "radix"))
#> [1] "A" "B" "C" "D" "E"
#> [6] "F" "G" "H" "Hugo's words" "I"
#> [11] "J" "K" "L" "M" "N"
#> [16] "O" "P" "Q" "R" "S"
#> [21] "T" "U" "V" "W" "X"
#> [26] "Y" "Z" "a" "b" "c"
#> [31] "change" "d" "e" "f" "g"
#> [36] "h" "i" "j" "k" "l"
#> [41] "m" "n" "o" "p" "q"
#> [46] "r" "s" "t" "u" "v"
#> [51] "w" "x" "y" "z"
with_locale(c("LC_COLLATE" = "C"), sort(test, method = "radix"))
#> [1] "A" "B" "C" "D" "E"
#> [6] "F" "G" "H" "Hugo's words" "I"
#> [11] "J" "K" "L" "M" "N"
#> [16] "O" "P" "Q" "R" "S"
#> [21] "T" "U" "V" "W" "X"
#> [26] "Y" "Z" "a" "b" "c"
#> [31] "change" "d" "e" "f" "g"
#> [36] "h" "i" "j" "k" "l"
#> [41] "m" "n" "o" "p" "q"
#> [46] "r" "s" "t" "u" "v"
#> [51] "w" "x" "y" "z"
with_locale(c("LC_COLLATE" = "sk_SK.UTF-8"), sort(test, method = "radix"))
#> [1] "A" "B" "C" "D" "E"
#> [6] "F" "G" "H" "Hugo's words" "I"
#> [11] "J" "K" "L" "M" "N"
#> [16] "O" "P" "Q" "R" "S"
#> [21] "T" "U" "V" "W" "X"
#> [26] "Y" "Z" "a" "b" "c"
#> [31] "change" "d" "e" "f" "g"
#> [36] "h" "i" "j" "k" "l"
#> [41] "m" "n" "o" "p" "q"
#> [46] "r" "s" "t" "u" "v"
#> [51] "w" "x" "y" "z" Created on 2020-04-03 by the reprex package (v0.3.0) |
Ok sounds good van you send a PR? |
Currently, the word order in
WORDLIST
is locale-dependent, which can create large spurious diffs when multiple people contribute to the package but use different locales.I see two solutions:
method = "radix"
insort()
. It is to my knowledge the only locale independent sorting methodThe nice thing about the second option is that you can set the locale to the one specified in
DESCRIPTION
.Please let me know if you'd like me to submit a PR for this.
The text was updated successfully, but these errors were encountered: