Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
(textproc/R-stringi) Updated 1.4.6 to 1.7.4, make test passes
## 1.7.4 (2021-08-12) * [BUGFIX] #449: Fixed segfaults generated by `stri_sprintf`. * [BUILD TIME] No longer defining `USE_RINTERNALS` and `R_NO_REMAP`. ## 1.7.3 (2021-07-15) * [BUGFIX] Fixed the previous patch of ICU55 causing a build failure on, amongst others, CRAN's Solaris-based target. ## 1.7.2 (2021-07-14) * [BUGFIX] Workaround for a bug in `tools::checkFF` failing when `NA_character_` is passed to `.Call`. ## 1.7.1 (2021-07-14) * [BACKWARD INCOMPATIBILITY] `%s$%` and `%stri$%` now use the new `stri_sprintf` (see below) function instead of `base::sprintf`. * [BACKWARD INCOMPATIBILITY, NEW FEATURE] In `stri_sub<-` and `stri_sub_all<-`, providing a negative `length` from now on does not result in the corresponding input string being altered. * [BACKWARD INCOMPATIBILITY, NEW FEATURE] In `stri_sub` and `stri_sub_all`, negative `length` results in the corresponding output being `NA` or not extracted at all, depending on the setting of the new argument `ignore_negative_length`. * [BACKWARD INCOMPATIBILITY, BUGFIX, NEW FEATURE] In `stri_subset*` and their replacement versions, `pattern` and `value` cannot be longer than `str` (but now they are recycled if necessary). * [BACKWARD INCOMPATIBILITY, NEW FEATURE] `stri_sub*` now accept the `from` argument being a matrix like `cbind(from, length=length)`. Unnamed columns or any other names are still interpreted as `cbind(from, to)`. Also, the new argument `use_matrix` can be used to disable the special treatment of such matrices. * [DOCUMENTATION] It has been clarified that the syntax of `*_charclass` (e.g., used in `stri_trim*`) differs slightly from regex character classes. * [NEW FEATURE] #420: `stri_sprintf` (alias: `stri_string_format`) is a Unicode-aware replacement for and enhancement of the base `sprintf`: it adds a customised handling of `NA`s (on demand), computing field size based on code point width, outputting substrings of at most given width, variable width and precision (both at the same time), etc. Moreover, `stri_printf` can be used to display formatted strings conveniently. * [NEW FEATURE] #153: `stri_match_*_regex` now extract capture group names. * [NEW FEATURE] #25: `stri_locate_*_regex` now have a new argument, `capture_groups`, which allows for extracting positions of matches to parenthesised subexpressions. * [NEW FEATURE] `stri_locate_*` now have a new argument, `get_length`, whose setting may result in generating *from-length* matrices (instead of *from-to* ones). * [NEW FEATURE] #438: `stri_trans_general` now supports rule-based as well as reverse-direction transliteration. * [NEW FEATURE] #434: `stri_datetime_format` and `stri_datetime_parse` are now vectorised also with respect to the `format` argument. * [NEW FEATURE] `stri_datetime_fstr` has a new argument, `ignore_special`, which defaults to `TRUE` for backward compatibility. * [NEW FEATURE] `stri_datetime_format`, `stri_datetime_add`, and `stri_datetime_fields` now call `as.POSIXct` more eagerly. * [NEW FEATURE] `stri_trim*` now have a new argument, `negate`. * [NEW FEATURE] `stri_replace_rstr` converts `gsub`-style replacement strings to `stri_replace`-style. * [INTERNAL] `stri_prepare_arg*` have been refactored, buffer overruns in the exception handling subsystem are now avoided. * [BUGFIX] Few functions (`stri_length`, `stri_enc_toutf32`, etc.) did not throw an exception on an invalid UTF-8 byte sequence (and merely issued a warning instead). * [BUGFIX] `stri_datetime_fstr` did not honour `NA_character_` and did not parse format strings such as `"%Y%m%d"` correctly. It has now been completely rewritten (in C). * [BUGFIX] `stri_wrap` did not recognise the width of certain Unicode sequences correctly. ## 1.6.2 (2021-05-14) * [BACKWARD INCOMPATIBILITY] In `stri_enc_list()`, `simplify` now defaults to `TRUE`. * [NEW FEATURE] #425: The outputs of `stri_enc_list()`, `stri_locale_list()`, `stri_timezone_list()`, and `stri_trans_list()` are now sorted. * [NEW FEATURE] #428: In `stri_flatten`, `na_empty=NA` now omits missing values. * [BUILD TIME] #431: Pre-4.9.0 GCC has `::max_align_t`, but not `std::max_align_t`, added a (possible) workaround, see the `INSTALL` file. * [BUGFIX] #429: `stri_width()` misclassified the width of certain code points (including grave accent, Eszett, etc.); General category *Sk* (Symbol, modifier) is no longer of width 0, `UCHAR_EAST_ASIAN_WIDTH` of `U_EA_AMBIGUOUS` is no longer of width 2. * [BUGFIX] #354: `ALTREP` `CHARSXP`s were not copied, and thus could have been garbage collected in the so-called meanwhile (with thanks to @jimhester). ## 1.6.1 (2021-05-05) * [GENERAL] #401: stringi is now bundled with ICU4C 69.1 (upgraded from 61.1), which is used on most Windows and OS X builds as well as on *nix systems not equipped with system ICU. However, if the C++11 support is disabled, stringi will be built against the battle-tested ICU4C 55.1. The update to ICU brings Unicode 13.0 and CLDR 39 support. * [DOCUMENTATION] A draft version of a paper on `stringi` is now available at https://stringi.gagolewski.com/_static/vignette/stringi.pdf * [GENERAL] stringi now requires R >= 3.1 (`CXX_STD` of `CXX11` or `CXX1X`). * [NEW FEATURE] #408: `stri_trans_casefold()` performs case folding; this is different from case mapping, which is locale-dependent. Folding makes two pieces of text that differ only in case identical. This can come in handy when comparing strings. * [NEW FEATURE] #421: `stri_rank()` ranks strings in a character vector (e.g., for ordering data frames with regards to multiple criteria, the ranks can be passed to `order()`, see #219). * [NEW FEATURE] #266: `stri_width()` now supports emojis. * [NEW FEATURE] `%s$%` and `%stri$%` are now vectorised with respect to both arguments. * [BUGFIX] `stri_sort_key()` now outputs `bytes`-encoded strings. * [BUGFIX] #415: `locale=''` was not equivalent to `locale=NULL` in `stri_opts_collator()`. * [INTERNAL] #414: Use `LEVELS(x)` macro instead of accessing `(x)->sxpinfo.gp` directly (@lukaszdaniel). ## 1.5.3 (2020-09-04) * [DOCUMENTATION] stringi home page has moved to https://stringi.gagolewski.com and now includes a comprehensive reference manual. * [NEW FEATURE] #400: `%s$%` and `%stri$%` are now binary operators that call base R's `sprintf()`. * [NEW FEATURE] #399: The `%s*%` and `%stri*%` operators can be used in addition to `stri_dup()`, for the very same purpose. * [NEW FEATURE] #355: `stri_opts_regex()` now accepts the `time_limit` and `stack_limit` options so as to prevent malformed or malicious regexes from running for too long. * [NEW FEATURE] #345: `stri_startswith()` and `stri_endswith()` are now equipped with the `negate` parameter. * [NEW FEATURE] #382: Incorrect regexes are now reported to ease debugging. * [DEPRECATION WARNING] #347: Any unknown option passed to `stri_opts_fixed()`, `stri_opts_regex()`, `stri_opts_coll()`, and `stri_opts_brkiter()` now generates a warning. In the future, the `...` parameter will be removed, so that will be an error. * [DEPRECATION WARNING] `stri_duplicated()`'s `fromLast` argument has been renamed `from_last`. `fromLast` is now its alias scheduled for removal in a future version of the package. * [DEPRECATION WARNING] `stri_enc_detect2()` is scheduled for removal in a future version of the package. Use `stri_enc_detect()` or the more targeted `stri_enc_isutf8()`, `stri_enc_isascii()`, etc., instead. * [DEPRECATION WARNING] `stri_read_lines()`, `stri_write_lines()`, `stri_read_raw()`: use `con` argument instead of `fname` now. The argument `fallback_encoding` is scheduled for removal and is no longer used. `stri_read_lines()` does not support `encoding="auto"` anymore. * [DEPRECATION WARNING] `nparagraphs` in `stri_rand_lipsum()` has been renamed `n_paragraphs`. * [NEW FEATURE] #398: Alternative, British spelling of function parameters has been introduced, e.g., `stri_opts_coll()` now supports both `normalization` and `normalisation`. * [NEW FEATURE] #393: `stri_read_bin()`, `stri_read_lines()`, and `stri_write_lines()` are no longer marked as draft API. * [NEW FEATURE] #187: `stri_read_bin()`, `stri_read_lines()`, and `stri_write_lines()` now support connection objects as well. * [NEW FEATURE] #386: New function `stri_sort_key()` for generating locale-dependent sort keys which can be ordered at the byte level and return an equivalent ordering to the original string (@DavisVaughan). * [BUGFIX] #138: `stri_encode()` and `stri_rand_strings()` now can generate strings of much larger lengths. * [BUGFIX] `stri_wrap()` did not honour `indent` correctly when `use_width` was `TRUE`.
- Loading branch information