Skip to content

Commit

Permalink
#401 ongoing: add icudt 69.1
Browse files Browse the repository at this point in the history
  • Loading branch information
gagolews committed May 1, 2021
1 parent 39fb530 commit a55f42d
Show file tree
Hide file tree
Showing 13 changed files with 916 additions and 119 deletions.
4 changes: 2 additions & 2 deletions .Rbuildignore
Original file line number Diff line number Diff line change
@@ -1,15 +1,15 @@
^NEWS_TODO$
^src/icu[0-9]+/data
^.*\.Rproj$
^\.Rproj\.user$
^.*\.kdev4$
^\.kdev4
^devel
kate-swp$
^src/boost
^src-i386
^src-x64
^src/.*\.o$
^src/icu55/data
^src/icu61/data
^src/.*\.a$
^src/.*\.so$
^src/.*\.dll$
Expand Down
1 change: 1 addition & 0 deletions .github/workflows/r-icu-bundle.yml
Original file line number Diff line number Diff line change
Expand Up @@ -26,3 +26,4 @@ jobs:
sudo make r-icu-bundle
sudo make tinytest
sudo make check
sudo make check-cran
8 changes: 1 addition & 7 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ config.status
.Rhistory
.RData
stringi.Rcheck
*.kate-swp

# compiled/precompiled files
*.o
Expand Down Expand Up @@ -44,11 +45,4 @@ src-x64/
cache-knitr/
cache/

# benchmark results and data
devel/benchmarks/results*
devel/benchmarks/figure
devel/benchmarks/report-*.md
devel/benchmarks/report-*.html
devel/benchmarks/test1.csv.gz

.vscode
11 changes: 7 additions & 4 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,6 @@

all: r

#CPPFLAGS="-fopenmp -march=native -mtune=native"
#LDFLAGS="-fopenmp"

autoconf:
autoconf
Rscript -e 'roxygen2::roxygenise(roclets=c("rd", "collate", "namespace", "vignette"), load_code=roxygen2::load_installed)'
Expand All @@ -35,6 +32,10 @@ build:
check: build
cd .. && R CMD check `ls -t stringi*.tar.gz | head -1` --no-manual #--as-cran

check-cran: build
cd .. && STRINGI_DISABLE_PKG_CONFIG=1 R CMD check `ls -t stringi*.tar.gz | head -1` --as-cran


weave:
cd devel/sphinx/weave && make && cd ../../../

Expand All @@ -56,7 +57,9 @@ sphinx: r weave rd2rst news
touch docs/.nojekyll

clean:
rm -f src/*.o src/*.so src/Makevars src/uconfig_local.h \
find src -name '*.o' -exec rm {} \;
find src -name '*.so' -exec rm {} \;
rm -f src/Makevars src/uconfig_local.h \
src/install.libs.R config.log config.status src/symbols.rds

purge: clean
Expand Down
27 changes: 15 additions & 12 deletions NEWS
Original file line number Diff line number Diff line change
Expand Up @@ -8,32 +8,35 @@
#405 (stri_sub match.length attrib)



## 1.6.1 (2021-XX-YY) **devel**

* [GENERAL] #401: stringi is now bundled with ICU4C 69.1 (upgraded from 61.1),
which is used on most Windows and OS X builds as well as on *nix systems
not equipped with system ICU. However, if the C++11 support is disabled,
stringi will be built against the battle-tested ICU4C 55.1.
The update to ICU brings Unicode 13.0 and CLDR 39 support.

* [GENERAL] stringi now requires R >= 3.1 (`CXX_STD=CXX11`).

* [DOCUMENTATION] A draft version of a paper on `stringi` is now available at
https://stringi.gagolewski.com/_static/vignette/stringi.pdf

* ...todo... #401 (update ICU4C to 69.1),
The ICU4C bundle has been updated from version 61.1 to 69.1
which features Unicode 13.0 and CLDR 39.

* [NEW FEATURE] #408: `stri_trans_casefold()` performs case folding;
this is different from case mapping, which is locale-dependent.
Folding makes two pieces of text that differ only in case identical.
This can come in handy when comparing strings.
this is different from case mapping, which is locale-dependent.
Folding makes two pieces of text that differ only in case identical.
This can come in handy when comparing strings.

* [NEW FEATURE] #421: `stri_rank()` ranks strings in a character vector
(e.g., for ordering data frames with regards to multiple criteria,
the ranks can be passed to `order()`, see #219).
(e.g., for ordering data frames with regards to multiple criteria,
the ranks can be passed to `order()`, see #219).

* [BUGFIX] `stri_sort_key()` now outputs `bytes`-encoded strings.

* [BUGFIX] #415: `locale=''` was not equivalent to `locale=NULL`
in `stri_opts_collator()`.
in `stri_opts_collator()`.

* [INTERNAL] #414: Use `LEVELS(x)` macro instead of accessing `(x)->sxpinfo.gp`
directly (@lukaszdaniel).
directly (@lukaszdaniel).


## 1.5.3 (2020-09-04) **CRAN**
Expand Down
124 changes: 45 additions & 79 deletions R/install.R
Original file line number Diff line number Diff line change
Expand Up @@ -31,55 +31,40 @@
## EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.


# @title
# Installation-Related Utilities [DEPRECATED]
#
# @description
# These functions are responsible for checking and guaranteeing
# that the ICU data library (icudt) is available and that \pkg{stringi}
# is ready to use.
#
# These functions are deprecated and will no longer be available
# in future \pkg{stringi} releases.
#
# @details
# ICU makes use of a wide variety of data tables to provide many
# of its services. Examples include converter mapping tables,
# collation rules, transliteration rules, break iterator rules
# and dictionaries, and other locale data.
#
# Without the ICU data library (icudt) many \pkg{stringi} features
# will not be available. The size of icudt is approx. 10-30 MB.
#
# \code{stri_install_check()} tests whether some ICU services
# are available. If this is not the case, it is most likely due to
# unavailable ICU data library.
#
# \code{stri_install_icudt()} downloads and installs the ICU data library
# specific to your platform (little/big-endian). The downloaded
# file will be decompressed into the directory where the package has been
# installed, see \code{\link{find.package}}, so make sure
# you have sufficient write permissions.
#
# @param silent suppress diagnostic messages
# @param check enable \code{stri_install_check()} tests
# @param outpath path to install icudt to. If \code{NULL}, then
# \code{file.path(path.package('stringi'), 'libs')} will be used.
# @param inpath path to search icudt archive in.
# If \code{NULL}, then only stringi mirror servers will be used.
# Mainly of interest to system admins and software developers.
#
# @return These functions return a logical value, invisibly.
# \code{TRUE} denotes that the requested operation has been completed
# successfully.
#
# @references
# \emph{ICU Data} -- ICU User Guide,
# \url{http://userguide.icu-project.org/icudata}
#
# @examples
# stri_install_check()
#
# internal functions used whilst installing stringi



icudt_fname <- c(
little55 = "icudt55l.zip",
big55 = "icudt55b.zip",
little61 = "icudt61l.zip",
big61 = "icudt61b.zip",
little69 = "icu4c-69_1-data-bin-l.zip",
big69 = "icu4c-69_1-data-bin-b.zip"
)

icudt_md5ex <- c(
little55 = "ff345529f230cc39bb8d450af0607708",
big55 = "1194f0dd879d3c1c1f189cde5fd90efe",
little61 = "6d14e059b26606f08bad3b41eb3b5c93",
big61 = "45719f3208b2d67132efa620cecccb56",
little69 = "58ecd3e72e9d96ea2876dd89627afeb8",
big69 = "e86eba75d1f39be63713569dc0dc9524"
)


# icudt_mirrors <- c(
# # "https://github.com/unicode-org/icu/releases/download/release-69-1/",
# "https://raw.githubusercontent.com/gagolews/stringi/master/src/icu69/data/",
# "https://raw.githubusercontent.com/gagolews/stringi/master/src/icu61/data/",
# "https://raw.githubusercontent.com/gagolews/stringi/master/src/icu55/data/",
# "http://raw.githubusercontent.com/gagolews/stringi/master/src/icu69/data/",
# "http://raw.githubusercontent.com/gagolews/stringi/master/src/icu61/data/",
# "http://raw.githubusercontent.com/gagolews/stringi/master/src/icu55/data/"
# )


# @rdname stri_install
stri_install_check <- function(silent = FALSE)
{
Expand Down Expand Up @@ -115,38 +100,19 @@ stri_install_check <- function(silent = FALSE)
}




icudt_fname <- c(
little55 = "icudt55l.zip",
big55 = "icudt55b.zip",
little61 = "icudt61l.zip",
big61 = "icudt61b.zip")

icudt_md5ex <- c(
little55 = "ff345529f230cc39bb8d450af0607708",
big55 = "1194f0dd879d3c1c1f189cde5fd90efe",
little61 = "6d14e059b26606f08bad3b41eb3b5c93",
big61 = "45719f3208b2d67132efa620cecccb56")

icudt_mirrors <- c("https://raw.githubusercontent.com/gagolews/stringi/master/src/icu61/data/",
"https://raw.githubusercontent.com/gagolews/stringi/master/src/icu55/data/",
"http://raw.githubusercontent.com/gagolews/stringi/master/src/icu61/data/",
"http://raw.githubusercontent.com/gagolews/stringi/master/src/icu55/data/",
"http://www.ibspan.waw.pl/~gagolews/stringi/",
"http://www.gagolewski.com/software/stringi/")



# @rdname stri_install
stri_download_icudt <- function(inpath, icu_bundle_version)
{

fname <- icudt_fname[paste0(.Platform$endian, icu_bundle_version)]

md5ex <- icudt_md5ex[paste0(.Platform$endian, icu_bundle_version)]

mirrors <- icudt_mirrors
# mirrors <- icudt_mirrors
mirrors <- sprintf(
"%s://raw.githubusercontent.com/gagolews/stringi/master/src/icu%d/data/",
c("https", "http"),
icu_bundle_version
)

icudtzipfname <- file.path(inpath, fname) #tempfile(fileext='.zip')

Expand All @@ -173,7 +139,7 @@ stri_download_icudt <- function(inpath, icu_bundle_version)
tryCatch({
suppressWarnings(file.remove(icudtzipfname))
# download icudt
if (download.file(paste(href, fname, sep = ""), icudtzipfname, mode = "wb") != 0) {
if (download.file(paste(href, fname, sep = ""), icudtzipfname, mode = "wb") != 0) {
return("download error")
}
if (!file.exists(icudtzipfname))
Expand Down Expand Up @@ -219,11 +185,11 @@ stri_download_icudt <- function(inpath, icu_bundle_version)
stri_install_icudt <- function(check = TRUE, outpath = NULL, inpath = NULL, icu_bundle_version = NULL)
{
# As of v1.1.3, this function is no longer exported.
# It was deprecated in 0.5-5.
# It was deprecated in v0.5-5.

stopifnot(is.logical(check), length(check) == 1, !is.na(check))
if (check && stri_install_check(TRUE)) {
message("icudt is already installed.")
message("icudt has already been installed.")
return(invisible(TRUE))
}

Expand All @@ -240,10 +206,10 @@ stri_install_icudt <- function(check = TRUE, outpath = NULL, inpath = NULL, icu_

stopifnot(is.character(outpath), length(outpath) == 1, file.exists(outpath))

message("decompressing icudt archive ", icudtzipfname, " to: ", outpath)
message("decompressing icudt ", icudtzipfname, " to: ", outpath)
res <- unzip(icudtzipfname, exdir = outpath, overwrite = TRUE)
if (!is.character(res) || length(res) <= 0) {
message("error decompressing icudt archive")
message("error decompressing icudt")
return(invisible(FALSE))
}

Expand Down
21 changes: 11 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ Available features include:
* pattern searching (e.g., with Java-like regular expressions),
* collation and sorting,
* random string generation,
* case mapping,
* case mapping and folding,
* string transliteration,
* Unicode normalisation,
* date-time formatting and parsing,
Expand All @@ -44,9 +44,10 @@ and many more.
**Authors and Contributors**: [Marek Gagolewski](https://www.gagolewski.com/),
with contributions from Bartłomiej Tartanus and many others.

The package's API was inspired by
Hadley Wickham's [stringr](https://stringr.tidyverse.org/)
package (and since 2015 *stringr* powered by *stringi*).
The package's API was inspired by that of the early (pre-tidyverse; v0.6.2)
version of Hadley Wickham's
[stringr](https://cran.r-project.org/web/packages/stringr/)
package (and since the 2015 v1.0.0 *stringr* is powered by *stringi*).



Expand All @@ -56,26 +57,26 @@ package (and since 2015 *stringr* powered by *stringi*).

[How to access the stringi C++ API from within an Rcpp-based R package](https://github.com/gagolews/ExampleRcppStringi)

**System Requirements**: *R >= 2.14*, *ICU4C >= 55* (refer to the
**System Requirements**: *R >= 3.1*, *ICU4C >= 55* (refer to the
[INSTALL](https://raw.githubusercontent.com/gagolews/stringi/master/INSTALL)
file for more details)

**License**: *stringi*'s source code is licensed under the open source
BSD-3-clause, for more details see the
[LICENSE](https://raw.githubusercontent.com/gagolews/stringi/master/LICENSE) file.

This *git* repository also contains a custom subset of *ICU4C 55.1*
and *ICU4C 61.1* source code which is copyrighted by Unicode and others.
This *git* repository also contains a custom subset of *ICU4C* source code
which is copyrighted by Unicode, Inc. and others.
A binary version of the Unicode Character Database is included.
For more details on copyright holders see the
[LICENSE](https://raw.githubusercontent.com/gagolews/stringi/master/LICENSE) file.
The *ICU* project is covered by the
[ICU license](http://source.icu-project.org/repos/icu/icu/trunk/LICENSE)
[Unicode license](https://github.com/unicode-org/icu/blob/main/icu4c/LICENSE)
a simple, permissive non-copyleft free software license, compatible with
the GNU GPL. The *ICU* license
is [intended](http://userguide.icu-project.org/icufaq#TOC-How-is-the-ICU-licensed-)
to allow *ICU* to be included both in free software projects
and in proprietary or commercial products.
to allow *ICU* to be included in free software projects as well as
in proprietary or commercial products.

**Changes**: see the
[NEWS](https://raw.githubusercontent.com/gagolews/stringi/master/NEWS) file.
2 changes: 1 addition & 1 deletion configure.win
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# `stringi` configure.win
# (C) 2015-2018 Marek Gagolewski
# Copyright (c) 2013-2021, Marek Gagolewski <https://www.gagolewski.com>

# this is an architecture-independent configure.win file

Expand Down
9 changes: 5 additions & 4 deletions devel/sphinx/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ It gives you a multitude of functions for:
* pattern searching (e.g., with ICU Java-like regular expressions),
* collation and sorting,
* random string generation,
* case mapping,
* case mapping and folding,
* string transliteration,
* Unicode normalisation,
* date-time formatting and parsing,
Expand All @@ -45,9 +45,10 @@ by calling:
It has been released under the open source BSD-3-clause
`license <https://raw.githubusercontent.com/gagolews/stringi/master/LICENSE>`_.

The package's API was inspired by Hadley Wickham's
`stringr <https://stringr.tidyverse.org/>`_ package
(and since 2015 `stringr` is powered by `stringi`).
The package's API was inspired by that of the early (pre-tidyverse; v0.6.2)
version of Hadley Wickham's
`stringr <https://cran.r-project.org/web/packages/stringr/>`_
package (and since the 2015 v1.0.0 `stringr` is powered by `stringi`).
Moreover, Hadley suggested quite a few new package features.
The contributions from Bartłomiej Tartanus and
`many others <https://github.com/gagolews/stringi/graphs/contributors>`_
Expand Down
Loading

0 comments on commit a55f42d

Please sign in to comment.