diff --git a/.devel/sphinx/rapi/about_arguments.md b/.devel/sphinx/rapi/about_arguments.md deleted file mode 100644 index 53772c62..00000000 --- a/.devel/sphinx/rapi/about_arguments.md +++ /dev/null @@ -1,45 +0,0 @@ -# about_arguments: - -## Description - -Below we explain how stringi deals with its functions\' arguments. - -If some function violates one of the following rules (for a very important reason), this is clearly indicated in its documentation (with discussion). - -## Coercion of Arguments - -When a character vector argument is expected, factors and other vectors coercible to characters vectors are silently converted with [`as.character`](https://stat.ethz.ch/R-manual/R-devel/library/base/html/character.html), otherwise an error is generated. Coercion from a list which does not consist of length-1 atomic vectors issues a warning. - -When a logical, numeric, or integer vector argument is expected, factors are converted with `as.*(as.character(...))`, and other coercible vectors are converted with `as.*`, otherwise an error is generated. - -## Vectorization - -Almost all functions are vectorized with respect to all their arguments and the recycling rule is applied whenever necessary. Due to this property you may, for instance, search for one pattern in each given string, search for each pattern in one given string, and search for the i-th pattern within the i-th string. - -We of course took great care of performance issues: e.g., in regular expression searching, regex matchers are reused from iteration to iteration, as long as it is possible. - -Functions with some non-vectorized arguments are rare: e.g., regular expression matcher\'s settings are established once per each call. - -Some functions assume that a vector with one element is given as an argument (like `collapse` in [`stri_join`](stri_join.md)). In such cases, if an empty vector is given you will get an error and for vectors with more than 1 elements - a warning will be generated (only the first element will be used). - -You may find details on vectorization behavior in the man pages on each particular function of your interest. - -## Handling Missing Values (`NA`s) - -stringi handles missing values consistently. For any vectorized operation, if at least one vector element is missing, then the corresponding resulting value is also set to `NA`. - -## Preserving Object Attributes - -Generally, all our functions drop input objects\' attributes (e.g., [`names`](https://stat.ethz.ch/R-manual/R-devel/library/base/html/names.html), [`dim`](https://stat.ethz.ch/R-manual/R-devel/library/base/html/dim.html), etc.). This is due to deep vectorization as well as for efficiency reasons. If the preservation of attributes is needed, important attributes can be manually copied. Alternatively, the notation `x[] <- stri_...(x, ...)` can sometimes be used too. - -## Author(s) - -[Marek Gagolewski](https://www.gagolewski.com/) and other contributors - -## See Also - -The official online manual of stringi at - -Gagolewski M., stringi: Fast and portable character string processing in R, *Journal of Statistical Software* 103(2), 2022, 1-59, [doi:10.18637/jss.v103.i02](https://doi.org/10.18637/jss.v103.i02) - -Other stringi_general_topics: [`about_encoding`](about_encoding.md), [`about_locale`](about_locale.md), [`about_search_boundaries`](about_search_boundaries.md), [`about_search_charclass`](about_search_charclass.md), [`about_search_coll`](about_search_coll.md), [`about_search_fixed`](about_search_fixed.md), [`about_search_regex`](about_search_regex.md), [`about_search`](about_search.md), [`about_stringi`](about_stringi.md) diff --git a/.devel/sphinx/rapi/about_encoding.md b/.devel/sphinx/rapi/about_encoding.md deleted file mode 100644 index 3b49c8c5..00000000 --- a/.devel/sphinx/rapi/about_encoding.md +++ /dev/null @@ -1,107 +0,0 @@ -# about_encoding: - -## Description - -This manual page explains how stringi deals with character strings in various encodings. - -In particular we should note that: - -- **R** lets strings in ASCII, UTF-8, and your platform\'s native encoding coexist. A character vector printed on the console by calling [`print`](https://stat.ethz.ch/R-manual/R-devel/library/base/html/print.html) or [`cat`](https://stat.ethz.ch/R-manual/R-devel/library/base/html/cat.html) is silently re-encoded to the native encoding. - -- Functions in stringi process each string internally in Unicode, the most universal character encoding ever. Even if a string is given in the native encoding, i.e., your platform\'s default one, it will be converted to Unicode (precisely: UTF-8 or UTF-16). - -- Most stringi functions always return UTF-8 encoded strings, regardless of the input encoding. What is more, the functions have been optimized for UTF-8/ASCII input (they have competitive, if not better performance, especially when performing more complex operations like string comparison, sorting, and even concatenation). Thus, it is best to rely on cascading calls to stringi operations solely. - -## Details - -Quoting the ICU User Guide, \'Hundreds of encodings have been developed over the years, each for small groups of languages and for special purposes. As a result, the interpretation of text, input, sorting, display, and storage depends on the knowledge of all the different types of character sets and their encodings. Programs have been written to handle either one single encoding at a time and switch between them, or to convert between external and internal encodings.\' - -\'Unicode provides a single character set that covers the major languages of the world, and a small number of machine-friendly encoding forms and schemes to fit the needs of existing applications and protocols. It is designed for best interoperability with both ASCII and ISO-8859-1 (the most widely used character sets) to make it easier for Unicode to be used in almost all applications and protocols\' (see the ICU User Guide). - -The Unicode Standard determines the way to map any possible character to a numeric value -- a so-called code point. Such code points, however, have to be stored somehow in computer\'s memory. The Unicode Standard encodes characters in the range U+0000..U+10FFFF, which amounts to a 21-bit code space. Depending on the encoding form (UTF-8, UTF-16, or UTF-32), each character will then be represented either as a sequence of one to four 8-bit bytes, one or two 16-bit code units, or a single 32-bit integer (compare the ICU FAQ). - -Unicode can be thought of as a superset of the spectrum of characters supported by any given code page. - -## UTF-8 and UTF-16 - -For portability reasons, the UTF-8 encoding is the most natural choice for representing Unicode character strings in **R**. UTF-8 has ASCII as its subset (code points 1--127 represent the same characters in both of them). Code points larger than 127 are represented by multi-byte sequences (from 2 to 4 bytes: Please note that not all sequences of bytes are valid UTF-8, compare [`stri_enc_isutf8`](stri_enc_isutf8.md)). - -Most of the computations in stringi are performed internally using either UTF-8 or UTF-16 encodings (this depends on type of service you request: some ICU services are designed only to work with UTF-16). Due to such a choice, with stringi you get the same result on each platform, which is -- unfortunately -- not the case of base **R**\'s functions (for instance, it is known that performing a regular expression search under Linux on some texts may give you a different result to those obtained under Windows). We really had portability in our minds while developing our package! - -We have observed that **R** correctly handles UTF-8 strings regardless of your platform\'s native encoding (see below). Therefore, we decided that most functions in stringi will output its results in UTF-8 -- this speeds ups computations on cascading calls to our functions: the strings does not have to be re-encoded each time. - -Note that some Unicode characters may have an ambiguous representation. For example, "a with ogonek" (one character) and "a"+"ogonek" (two graphemes) are semantically the same. stringi provides functions to normalize character sequences, see [`stri_trans_nfc`](stri_trans_nf.md) for discussion. However, it is observed that denormalized strings do appear very rarely in typical string processing activities. - -Additionally, do note that stringi silently removes byte order marks (BOMs - they may incidentally appear in a string read from a text file) from UTF8-encoded strings, see [`stri_enc_toutf8`](stri_enc_toutf8.md). - -## Character Encodings in **R** - -Data in memory are just bytes (small integer values) -- an en*coding* is a way to represent characters with such numbers, it is a semantic \'key\' to understand a given byte sequence. For example, in ISO-8859-2 (Central European), the value 177 represents Polish "a with ogonek", and in ISO-8859-1 (Western European), the same value denotes the "plus-minus" sign. Thus, a character encoding is a translation scheme: we need to communicate with **R** somehow, relying on how it represents strings. - -Overall, **R** has a very simple encoding marking mechanism, see [`stri_enc_mark`](stri_enc_mark.md). There is an implicit assumption that your platform\'s default (native) encoding always extends ASCII -- stringi checks that whenever your native encoding is being detected automatically on ICU\'s initialization and each time when you change it manually by calling [`stri_enc_set`](stri_enc_set.md). - -Character strings in **R** (internally) can be declared to be in: - -- `UTF-8`; - -- `latin1`, i.e., either ISO-8859-1 (Western European on Linux, OS X, and other Unixes) or WINDOWS-1252 (Windows); - -- `bytes` -- for strings that should be manipulated as sequences of bytes. - -Moreover, there are two other cases: - -- ASCII -- for strings consisting only of byte codes not greater than 127; - -- `native` (a.k.a. `unknown` in [`Encoding`](https://stat.ethz.ch/R-manual/R-devel/library/base/html/Encoding.html); quite a misleading name: no explicit encoding mark) -- for strings that are assumed to be in your platform\'s native (default) encoding. This can represent UTF-8 if you are an OS X user, or some 8-bit Windows code page, for example. The native encoding used by **R** may be determined by examining the LC_CTYPE category, see [`Sys.getlocale`](https://stat.ethz.ch/R-manual/R-devel/library/base/html/locales.html). - -Intuitively, "native" strings result from reading a string from stdin (e.g., keyboard input). This makes sense: your operating system works in some encoding and provides **R** with some data. - -Each time when a stringi function encounters a string declared in native encoding, it assumes that the input data should be translated from the default encoding, i.e., the one returned by [`stri_enc_get`](stri_enc_set.md) (unless you know what you are doing, the default encoding should only be changed if the automatic encoding detection process fails on stringi load). - -Functions which allow `'bytes'` encoding markings are very rare in stringi, and were carefully selected. These are: [`stri_enc_toutf8`](stri_enc_toutf8.md) (with argument `is_unknown_8bit=TRUE`), [`stri_enc_toascii`](stri_enc_toascii.md), and [`stri_encode`](stri_encode.md). - -Finally, note that **R** lets strings in ASCII, UTF-8, and your platform\'s native encoding coexist. A character vector printed with [`print`](https://stat.ethz.ch/R-manual/R-devel/library/base/html/print.html), [`cat`](https://stat.ethz.ch/R-manual/R-devel/library/base/html/cat.html), etc., is silently re-encoded so that it can be properly shown, e.g., on the console. - -## Encoding Conversion - -Apart from automatic conversion from the native encoding, you may re-encode a string manually, for example when you read it from a file created on a different platform. Call [`stri_enc_list`](stri_enc_list.md) for the list of encodings supported by ICU. Note that converter names are case-insensitive and ICU tries to normalize the encoding specifiers. Leading zeroes are ignored in sequences of digits (if further digits follow), and all non-alphanumeric characters are ignored. Thus the strings \'UTF-8\', \'utf_8\', \'u\*Tf08\' and \'Utf 8\' are equivalent. - -The [`stri_encode`](stri_encode.md) function allows you to convert between any given encodings (in some cases you will obtain `bytes`-marked strings, or even lists of raw vectors (i.e., for UTF-16). There are also some useful more specialized functions, like [`stri_enc_toutf32`](stri_enc_toutf32.md) (converts a character vector to a list of integers, where one code point is exactly one numeric value) or [`stri_enc_toascii`](stri_enc_toascii.md) (substitutes all non-ASCII bytes with the SUBSTITUTE CHARACTER, which plays a similar role as **R**\'s `NA` value). - -There are also some routines for automated encoding detection, see, e.g., [`stri_enc_detect`](stri_enc_detect.md). - -## Encoding Detection - -Given a text file, one has to know how to interpret (encode) raw data in order to obtain meaningful information. - -Encoding detection is always an imprecise operation and needs a considerable amount of data. However, in case of some encodings (like UTF-8, ASCII, or UTF-32) a "false positive" byte sequence is quite rare (statistically speaking). - -Check out [`stri_enc_detect`](stri_enc_detect.md) (among others) for a useful function in this category. - -## Author(s) - -[Marek Gagolewski](https://www.gagolewski.com/) and other contributors - -## References - -*Unicode Basics* -- ICU User Guide, - -*Conversion* -- ICU User Guide, - -*Converters* -- ICU User Guide, (technical details) - -*UTF-8, UTF-16, UTF-32 & BOM* -- ICU FAQ, - -## See Also - -The official online manual of stringi at - -Gagolewski M., stringi: Fast and portable character string processing in R, *Journal of Statistical Software* 103(2), 2022, 1-59, [doi:10.18637/jss.v103.i02](https://doi.org/10.18637/jss.v103.i02) - -Other stringi_general_topics: [`about_arguments`](about_arguments.md), [`about_locale`](about_locale.md), [`about_search_boundaries`](about_search_boundaries.md), [`about_search_charclass`](about_search_charclass.md), [`about_search_coll`](about_search_coll.md), [`about_search_fixed`](about_search_fixed.md), [`about_search_regex`](about_search_regex.md), [`about_search`](about_search.md), [`about_stringi`](about_stringi.md) - -Other encoding_management: [`stri_enc_info()`](stri_enc_info.md), [`stri_enc_list()`](stri_enc_list.md), [`stri_enc_mark()`](stri_enc_mark.md), [`stri_enc_set()`](stri_enc_set.md) - -Other encoding_detection: [`stri_enc_detect2()`](stri_enc_detect2.md), [`stri_enc_detect()`](stri_enc_detect.md), [`stri_enc_isascii()`](stri_enc_isascii.md), [`stri_enc_isutf16be()`](stri_enc_isutf16.md), [`stri_enc_isutf8()`](stri_enc_isutf8.md) - -Other encoding_conversion: [`stri_enc_fromutf32()`](stri_enc_fromutf32.md), [`stri_enc_toascii()`](stri_enc_toascii.md), [`stri_enc_tonative()`](stri_enc_tonative.md), [`stri_enc_toutf32()`](stri_enc_toutf32.md), [`stri_enc_toutf8()`](stri_enc_toutf8.md), [`stri_encode()`](stri_encode.md) diff --git a/.devel/sphinx/rapi/about_locale.md b/.devel/sphinx/rapi/about_locale.md deleted file mode 100644 index 89f15d5b..00000000 --- a/.devel/sphinx/rapi/about_locale.md +++ /dev/null @@ -1,59 +0,0 @@ -# about_locale: - -## Description - -In this section we explain how we specify locales in stringi. Locale is a fundamental concept in ICU. It identifies a specific user community, i.e., a group of users who have similar culture and language expectations for human-computer interaction. - -## Details - -Because a locale is just an identifier of a region, no validity check is performed when you specify a Locale. ICU is implemented as a set of services. If you want to verify whether particular resources are available in the locale you asked for, you must query those resources. Note: when you ask for a resource for a particular locale, you get back the best available match, not necessarily precisely the one you requested. - -## Locale Identifiers - -ICU services are parametrized by locale, to deliver culturally correct results. Locales are identified by character strings of the form `Language` code, `Language_Country` code, or `Language_Country_Variant` code, e.g., \'en_US\'. - -The two-letter `Language` code uses the ISO-639-1 standard, e.g., \'en\' stands for English, \'pl\' -- Polish, \'fr\' -- French, and \'de\' for German. - -`Country` is a two-letter code following the ISO-3166 standard. This is to reflect different language conventions within the same language, for example in US-English (\'en_US\') and Australian-English (\'en_AU\'). - -Differences may also appear in language conventions used within the same country. For example, the Euro currency may be used in several European countries while the individual country\'s currency is still in circulation. In such a case, ICU `Variant` \'\_EURO\' could be used for selecting locales that support the Euro currency. - -The final (optional) element of a locale is a list of keywords together with their values. Keywords must be unique. Their order is not significant. Unknown keywords are ignored. The handling of keywords depends on the specific services that utilize them. Currently, the following keywords are recognized: `calendar`, `collation`, `currency`, and `numbers`, e.g., `fr@collation=phonebook;``calendar=islamic-civil` is a valid French locale specifier together with keyword arguments. For more information, refer to the ICU user guide. - -For a list of locales that are recognized by ICU, call [`stri_locale_list`](stri_locale_list.md). - -## A Note on Default Locales - -Each locale-sensitive function in stringi selects the current default locale if an empty string or `NULL` is provided as its `locale` argument. Default locales are available to all the functions: they are initially set to be the system locale on that platform, and may be changed with [`stri_locale_set`](stri_locale_set.md), for example, if automatic detection fails to recognize your locale properly. - -It is suggested that your program should avoid changing the default locale. All locale-sensitive functions may request any desired locale per-call (by specifying the `locale` argument), i.e., without referencing to the default locale. During many tests, however, we did not observe any improper behavior of stringi while using a modified default locale. - -## Locale-Sensitive Functions in stringi - -One of many examples of locale-dependent services is the Collator, which performs a locale-aware string comparison. It is used for string comparing, ordering, sorting, and searching. See [`stri_opts_collator`](stri_opts_collator.md) for the description on how to tune its settings, and its `locale` argument in particular. - -Other locale-sensitive functions include, e.g., [`stri_trans_tolower`](stri_trans_casemap.md) (that does character case mapping). - -## Author(s) - -[Marek Gagolewski](https://www.gagolewski.com/) and other contributors - -## References - -*Locale* -- ICU User Guide, - -*ISO 639: Language Codes*, - -*ISO 3166: Country Codes*, - -## See Also - -The official online manual of stringi at - -Gagolewski M., stringi: Fast and portable character string processing in R, *Journal of Statistical Software* 103(2), 2022, 1-59, [doi:10.18637/jss.v103.i02](https://doi.org/10.18637/jss.v103.i02) - -Other locale_management: [`stri_locale_info()`](stri_locale_info.md), [`stri_locale_list()`](stri_locale_list.md), [`stri_locale_set()`](stri_locale_set.md) - -Other locale_sensitive: [`%s<%()`](+25s+3C+25.md), [`about_search_boundaries`](about_search_boundaries.md), [`about_search_coll`](about_search_coll.md), [`stri_compare()`](stri_compare.md), [`stri_count_boundaries()`](stri_count_boundaries.md), [`stri_duplicated()`](stri_duplicated.md), [`stri_enc_detect2()`](stri_enc_detect2.md), [`stri_extract_all_boundaries()`](stri_extract_boundaries.md), [`stri_locate_all_boundaries()`](stri_locate_boundaries.md), [`stri_opts_collator()`](stri_opts_collator.md), [`stri_order()`](stri_order.md), [`stri_rank()`](stri_rank.md), [`stri_sort_key()`](stri_sort_key.md), [`stri_sort()`](stri_sort.md), [`stri_split_boundaries()`](stri_split_boundaries.md), [`stri_trans_tolower()`](stri_trans_casemap.md), [`stri_unique()`](stri_unique.md), [`stri_wrap()`](stri_wrap.md) - -Other stringi_general_topics: [`about_arguments`](about_arguments.md), [`about_encoding`](about_encoding.md), [`about_search_boundaries`](about_search_boundaries.md), [`about_search_charclass`](about_search_charclass.md), [`about_search_coll`](about_search_coll.md), [`about_search_fixed`](about_search_fixed.md), [`about_search_regex`](about_search_regex.md), [`about_search`](about_search.md), [`about_stringi`](about_stringi.md) diff --git a/.devel/sphinx/rapi/about_search.md b/.devel/sphinx/rapi/about_search.md deleted file mode 100644 index 6f400d05..00000000 --- a/.devel/sphinx/rapi/about_search.md +++ /dev/null @@ -1,73 +0,0 @@ -# about_search: String Searching - -## Description - -This man page explains how to perform string search-based operations in stringi. - -## Details - -The following independent string searching engines are available in stringi. - -- `stri_*_regex` -- ICU\'s regular expressions (regexes), see [about_search_regex](about_search_regex.md), - -- `stri_*_fixed` -- locale-independent byte-wise pattern matching, see [about_search_fixed](about_search_fixed.md), - -- `stri_*_coll` -- ICU\'s `StringSearch`, locale-sensitive, Collator-based pattern search, useful for natural language processing tasks, see [about_search_coll](about_search_coll.md), - -- `stri_*_charclass` -- character classes search, e.g., Unicode General Categories or Binary Properties, see [about_search_charclass](about_search_charclass.md), - -- `stri_*_boundaries` -- text boundary analysis, see [about_search_boundaries](about_search_boundaries.md) - -Each search engine is able to perform many search-based operations. These may include: - -- `stri_detect_*` - detect if a pattern occurs in a string, see, e.g., [`stri_detect`](stri_detect.md), - -- `stri_count_*` - count the number of pattern occurrences, see, e.g., [`stri_count`](stri_count.md), - -- `stri_locate_*` - locate all, first, or last occurrences of a pattern, see, e.g., [`stri_locate`](stri_locate.md), - -- `stri_extract_*` - extract all, first, or last occurrences of a pattern, see, e.g., [`stri_extract`](stri_extract.md) and, in case of regexes, [`stri_match`](stri_match.md), - -- `stri_replace_*` - replace all, first, or last occurrences of a pattern, see, e.g., [`stri_replace`](stri_replace.md) and also [`stri_trim`](stri_trim.md), - -- `stri_split_*` - split a string into chunks indicated by occurrences of a pattern, see, e.g., [`stri_split`](stri_split.md), - -- `stri_startswith_*` and `stri_endswith_*` detect if a string starts or ends with a pattern match, see, e.g., [`stri_startswith`](stri_startsendswith.md), - -- `stri_subset_*` - return a subset of a character vector with strings that match a given pattern, see, e.g., [`stri_subset`](stri_subset.md). - -## Author(s) - -[Marek Gagolewski](https://www.gagolewski.com/) and other contributors - -## See Also - -The official online manual of stringi at - -Gagolewski M., stringi: Fast and portable character string processing in R, *Journal of Statistical Software* 103(2), 2022, 1-59, [doi:10.18637/jss.v103.i02](https://doi.org/10.18637/jss.v103.i02) - -Other text_boundaries: [`about_search_boundaries`](about_search_boundaries.md), [`stri_count_boundaries()`](stri_count_boundaries.md), [`stri_extract_all_boundaries()`](stri_extract_boundaries.md), [`stri_locate_all_boundaries()`](stri_locate_boundaries.md), [`stri_opts_brkiter()`](stri_opts_brkiter.md), [`stri_split_boundaries()`](stri_split_boundaries.md), [`stri_split_lines()`](stri_split_lines.md), [`stri_trans_tolower()`](stri_trans_casemap.md), [`stri_wrap()`](stri_wrap.md) - -Other search_regex: [`about_search_regex`](about_search_regex.md), [`stri_opts_regex()`](stri_opts_regex.md) - -Other search_fixed: [`about_search_fixed`](about_search_fixed.md), [`stri_opts_fixed()`](stri_opts_fixed.md) - -Other search_coll: [`about_search_coll`](about_search_coll.md), [`stri_opts_collator()`](stri_opts_collator.md) - -Other search_charclass: [`about_search_charclass`](about_search_charclass.md), [`stri_trim_both()`](stri_trim.md) - -Other search_detect: [`stri_detect()`](stri_detect.md), [`stri_startswith()`](stri_startsendswith.md) - -Other search_count: [`stri_count_boundaries()`](stri_count_boundaries.md), [`stri_count()`](stri_count.md) - -Other search_locate: [`stri_locate_all_boundaries()`](stri_locate_boundaries.md), [`stri_locate_all()`](stri_locate.md) - -Other search_replace: [`stri_replace_all()`](stri_replace.md), [`stri_replace_rstr()`](stri_replace_rstr.md), [`stri_trim_both()`](stri_trim.md) - -Other search_split: [`stri_split_boundaries()`](stri_split_boundaries.md), [`stri_split_lines()`](stri_split_lines.md), [`stri_split()`](stri_split.md) - -Other search_subset: [`stri_subset()`](stri_subset.md) - -Other search_extract: [`stri_extract_all_boundaries()`](stri_extract_boundaries.md), [`stri_extract_all()`](stri_extract.md), [`stri_match_all()`](stri_match.md) - -Other stringi_general_topics: [`about_arguments`](about_arguments.md), [`about_encoding`](about_encoding.md), [`about_locale`](about_locale.md), [`about_search_boundaries`](about_search_boundaries.md), [`about_search_charclass`](about_search_charclass.md), [`about_search_coll`](about_search_coll.md), [`about_search_fixed`](about_search_fixed.md), [`about_search_regex`](about_search_regex.md), [`about_stringi`](about_stringi.md) diff --git a/.devel/sphinx/rapi/about_search_boundaries.md b/.devel/sphinx/rapi/about_search_boundaries.md deleted file mode 100644 index 4b619c33..00000000 --- a/.devel/sphinx/rapi/about_search_boundaries.md +++ /dev/null @@ -1,53 +0,0 @@ -# about_search_boundaries: - -## Description - -Text boundary analysis is the process of locating linguistic boundaries while formatting and handling text. - -## Details - -Examples of the boundary analysis process include: - -- Locating positions to word-wrap text to fit within specific margins while displaying or printing, see [`stri_wrap`](stri_wrap.md) and [`stri_split_boundaries`](stri_split_boundaries.md). - -- Counting characters, words, sentences, or paragraphs, see [`stri_count_boundaries`](stri_count_boundaries.md). - -- Making a list of the unique words in a document, see [`stri_extract_all_words`](stri_extract_boundaries.md) and then [`stri_unique`](stri_unique.md). - -- Capitalizing the first letter of each word or sentence, see also [`stri_trans_totitle`](stri_trans_casemap.md). - -- Locating a particular unit of the text (for example, finding the third word in the document), see [`stri_locate_all_boundaries`](stri_locate_boundaries.md). - -Generally, text boundary analysis is a locale-dependent operation. For example, in Japanese and Chinese one does not separate words with spaces - a line break can occur even in the middle of a word. These languages have punctuation and diacritical marks that cannot start or end a line, so this must also be taken into account. - -stringi uses ICU\'s `BreakIterator` to locate specific text boundaries. Note that the `BreakIterator`\'s behavior may be controlled in come cases, see [`stri_opts_brkiter`](stri_opts_brkiter.md). - -- The `character` boundary iterator tries to match what a user would think of as a "character" -- a basic unit of a writing system for a language -- which may be more than just a single Unicode code point. - -- The `word` boundary iterator locates the boundaries of words, for purposes such as "Find whole words" operations. - -- The `line_break` iterator locates positions that would be appropriate to wrap lines when displaying the text. - -- The break iterator of type `sentence` locates sentence boundaries. - -For technical details on different classes of text boundaries refer to the ICU User Guide, see below. - -## Author(s) - -[Marek Gagolewski](https://www.gagolewski.com/) and other contributors - -## References - -*Boundary Analysis* -- ICU User Guide, - -## See Also - -The official online manual of stringi at - -Gagolewski M., stringi: Fast and portable character string processing in R, *Journal of Statistical Software* 103(2), 2022, 1-59, [doi:10.18637/jss.v103.i02](https://doi.org/10.18637/jss.v103.i02) - -Other locale_sensitive: [`%s<%()`](+25s+3C+25.md), [`about_locale`](about_locale.md), [`about_search_coll`](about_search_coll.md), [`stri_compare()`](stri_compare.md), [`stri_count_boundaries()`](stri_count_boundaries.md), [`stri_duplicated()`](stri_duplicated.md), [`stri_enc_detect2()`](stri_enc_detect2.md), [`stri_extract_all_boundaries()`](stri_extract_boundaries.md), [`stri_locate_all_boundaries()`](stri_locate_boundaries.md), [`stri_opts_collator()`](stri_opts_collator.md), [`stri_order()`](stri_order.md), [`stri_rank()`](stri_rank.md), [`stri_sort_key()`](stri_sort_key.md), [`stri_sort()`](stri_sort.md), [`stri_split_boundaries()`](stri_split_boundaries.md), [`stri_trans_tolower()`](stri_trans_casemap.md), [`stri_unique()`](stri_unique.md), [`stri_wrap()`](stri_wrap.md) - -Other text_boundaries: [`about_search`](about_search.md), [`stri_count_boundaries()`](stri_count_boundaries.md), [`stri_extract_all_boundaries()`](stri_extract_boundaries.md), [`stri_locate_all_boundaries()`](stri_locate_boundaries.md), [`stri_opts_brkiter()`](stri_opts_brkiter.md), [`stri_split_boundaries()`](stri_split_boundaries.md), [`stri_split_lines()`](stri_split_lines.md), [`stri_trans_tolower()`](stri_trans_casemap.md), [`stri_wrap()`](stri_wrap.md) - -Other stringi_general_topics: [`about_arguments`](about_arguments.md), [`about_encoding`](about_encoding.md), [`about_locale`](about_locale.md), [`about_search_charclass`](about_search_charclass.md), [`about_search_coll`](about_search_coll.md), [`about_search_fixed`](about_search_fixed.md), [`about_search_regex`](about_search_regex.md), [`about_search`](about_search.md), [`about_stringi`](about_stringi.md) diff --git a/.devel/sphinx/rapi/about_search_charclass.md b/.devel/sphinx/rapi/about_search_charclass.md deleted file mode 100644 index ece5a376..00000000 --- a/.devel/sphinx/rapi/about_search_charclass.md +++ /dev/null @@ -1,453 +0,0 @@ -# about_search_charclass: - -## Description - -Here we describe how character classes (sets) can be specified in the stringi package. These are useful for defining search patterns (note that the ICU regex engine uses the same scheme for denoting character classes) or, e.g., generating random code points with [`stri_rand_strings`](stri_rand_strings.md). - -## Details - -All `stri_*_charclass` functions in stringi perform a single character (i.e., Unicode code point) search-based operations. You may obtain the same results using [about_search_regex](about_search_regex.md). However, these very functions aim to be faster. - -Character classes are defined using ICU\'s `UnicodeSet` patterns. Below we briefly summarize their syntax. For more details refer to the bibliographic References below. - -## `UnicodeSet` patterns - -A `UnicodeSet` represents a subset of Unicode code points (recall that stringi converts strings in your native encoding to Unicode automatically). Legal code points are U+0000 to U+10FFFF, inclusive. - -Patterns either consist of series of characters bounded by square brackets (such patterns follow a syntax similar to that employed by regular expression character classes) or of Perl-like Unicode property set specifiers. - -`[]` denotes an empty set, `[a]` -- a set consisting of character "a", `[\u0105]` -- a set with character U+0105, and `[abc]` -- a set with "a", "b", and "c". - -`[a-z]` denotes a set consisting of characters "a" through "z" inclusively, in Unicode code point order. - -Some set-theoretic operations are available. `^` denotes the complement, e.g., `[^a-z]` contains all characters but "a" through "z". Moreover, `[[pat1][pat2]]`, `[[pat1]\&[pat2]]`, and `[[pat1]-[pat2]]` denote union, intersection, and asymmetric difference of sets specified by `pat1` and `pat2`, respectively. - -Note that all white-spaces are ignored unless they are quoted or back-slashed (white spaces can be freely used for clarity, as `[a c d-f m]` means the same as `[acd-fm]`). stringi does not allow including multi-character strings (see `UnicodeSet` API documentation). Also, empty string patterns are disallowed. - -Any character may be preceded by a backslash in order to remove its special meaning. - -A malformed pattern always results in an error. - -Set expressions at a glance (according to ): - -Some examples: - -`[abc]` - -: Match any of the characters a, b or c. - -`[^abc]` - -: Negation -- match any character except a, b or c. - -`[A-M]` - -: Range -- match any character from A to M. The characters to include are determined by Unicode code point ordering. - -`[\u0000-\U0010ffff]` - -: Range -- match all characters. - -`[\p{Letter}]` or `[\p{General_Category=Letter}]` or `[\p{L}]` - -: Characters with Unicode Category = Letter. All forms shown are equivalent. - -`[\P{Letter}]` - -: Negated property (Note the upper case `\P`) -- match everything except Letters. - -`[\p{numeric_value=9}]` - -: Match all numbers with a numeric value of 9. Any Unicode Property may be used in set expressions. - -`[\p{Letter}&\p{script=cyrillic}]` - -: Set intersection -- match the set of all Cyrillic letters. - -`[\p{Letter}-\p{script=latin}]` - -: Set difference -- match all non-Latin letters. - -`[[a-z][A-Z][0-9]]` or `[a-zA-Z0-9]` - -: Implicit union of sets -- match ASCII letters and digits (the two forms are equivalent). - -`[:script=Greek:]` - -: Alternative POSIX-like syntax for properties -- equivalent to `\p{script=Greek}`. - -## Unicode properties - -Unicode property sets are specified with a POSIX-like syntax, e.g., `[:Letter:]`, or with a (extended) Perl-style syntax, e.g., `\p{L}`. The complements of the above sets are `[:^Letter:]` and `\P{L}`, respectively. - -The names are normalized before matching (for example, the match is case-insensitive). Moreover, many names have short aliases. - -Among predefined Unicode properties we find, e.g.: - -- Unicode General Categories, e.g., `Lu` for uppercase letters, - -- Unicode Binary Properties, e.g., `WHITE_SPACE`, - -and many more (including Unicode scripts). - -Each property provides access to the large and comprehensive Unicode Character Database. Generally, the list of properties available in ICU is not well-documented. Please refer to the References section for some links. - -Please note that some classes might overlap. However, e.g., General Category `Z` (some space) and Binary Property `WHITE_SPACE` matches different character sets. - -## Unicode General Categories - -The Unicode General Category property of a code point provides the most general classification of that code point. Each code point falls into one and only one Category. - -`Cc` - -: a C0 or C1 control code. - -`Cf` - -: a format control character. - -`Cn` - -: a reserved unassigned code point or a non-character. - -`Co` - -: a private-use character. - -`Cs` - -: a surrogate code point. - -`Lc` - -: the union of Lu, Ll, Lt. - -`Ll` - -: a lowercase letter. - -`Lm` - -: a modifier letter. - -`Lo` - -: other letters, including syllables and ideographs. - -`Lt` - -: a digraphic character, with the first part uppercase. - -`Lu` - -: an uppercase letter. - -`Mc` - -: a spacing combining mark (positive advance width). - -`Me` - -: an enclosing combining mark. - -`Mn` - -: a non-spacing combining mark (zero advance width). - -`Nd` - -: a decimal digit. - -`Nl` - -: a letter-like numeric character. - -`No` - -: a numeric character of other type. - -`Pd` - -: a dash or hyphen punctuation mark. - -`Ps` - -: an opening punctuation mark (of a pair). - -`Pe` - -: a closing punctuation mark (of a pair). - -`Pc` - -: a connecting punctuation mark, like a tie. - -`Po` - -: a punctuation mark of other type. - -`Pi` - -: an initial quotation mark. - -`Pf` - -: a final quotation mark. - -`Sm` - -: a symbol of mathematical use. - -`Sc` - -: a currency sign. - -`Sk` - -: a non-letter-like modifier symbol. - -`So` - -: a symbol of other type. - -`Zs` - -: a space character (of non-zero width). - -`Zl` - -: U+2028 LINE SEPARATOR only. - -`Zp` - -: U+2029 PARAGRAPH SEPARATOR only. - -`C` - -: the union of Cc, Cf, Cs, Co, Cn. - -`L` - -: the union of Lu, Ll, Lt, Lm, Lo. - -`M` - -: the union of Mn, Mc, Me. - -`N` - -: the union of Nd, Nl, No. - -`P` - -: the union of Pc, Pd, Ps, Pe, Pi, Pf, Po. - -`S` - -: the union of Sm, Sc, Sk, So. - -`Z` - -: the union of Zs, Zl, Zp - -## Unicode Binary Properties - -Each character may follow many Binary Properties at a time. - -Here is a comprehensive list of supported Binary Properties: - -`ALPHABETIC` - -: alphabetic character. - -`ASCII_HEX_DIGIT` - -: a character matching the `[0-9A-Fa-f]` charclass. - -`BIDI_CONTROL` - -: a format control which have specific functions in the Bidi (bidirectional text) Algorithm. - -`BIDI_MIRRORED` - -: a character that may change display in right-to-left text. - -`DASH` - -: a kind of a dash character. - -`DEFAULT_IGNORABLE_CODE_POINT` - -: characters that are ignorable in most text processing activities, e.g., \<2060..206F, FFF0..FFFB, E0000..E0FFF\>. - -`DEPRECATED` - -: a deprecated character according to the current Unicode standard (the usage of deprecated characters is strongly discouraged). - -`DIACRITIC` - -: a character that linguistically modifies the meaning of another character to which it applies. - -`EXTENDER` - -: a character that extends the value or shape of a preceding alphabetic character, e.g., a length and iteration mark. - -`HEX_DIGIT` - -: a character commonly used for hexadecimal numbers, see also `ASCII_HEX_DIGIT`. - -`HYPHEN` - -: a dash used to mark connections between pieces of words, plus the Katakana middle dot. - -`ID_CONTINUE` - -: a character that can continue an identifier, `ID_START`+`Mn`+`Mc`+`Nd`+`Pc`. - -`ID_START` - -: a character that can start an identifier, `Lu`+`Ll`+`Lt`+`Lm`+`Lo`+`Nl`. - -`IDEOGRAPHIC` - -: a CJKV (Chinese-Japanese-Korean-Vietnamese) ideograph. - -`LOWERCASE` - -: \... - -`MATH` - -: \... - -`NONCHARACTER_CODE_POINT` - -: \... - -`QUOTATION_MARK` - -: \... - -`SOFT_DOTTED` - -: a character with a "soft dot", like i or j, such that an accent placed on this character causes the dot to disappear. - -`TERMINAL_PUNCTUATION` - -: a punctuation character that generally marks the end of textual units. - -`UPPERCASE` - -: \... - -`WHITE_SPACE` - -: a space character or TAB or CR or LF or ZWSP or ZWNBSP. - -`CASE_SENSITIVE` - -: \... - -`POSIX_ALNUM` - -: \... - -`POSIX_BLANK` - -: \... - -`POSIX_GRAPH` - -: \... - -`POSIX_PRINT` - -: \... - -`POSIX_XDIGIT` - -: \... - -`CASED` - -: \... - -`CASE_IGNORABLE` - -: \... - -`CHANGES_WHEN_LOWERCASED` - -: \... - -`CHANGES_WHEN_UPPERCASED` - -: \... - -`CHANGES_WHEN_TITLECASED` - -: \... - -`CHANGES_WHEN_CASEFOLDED` - -: \... - -`CHANGES_WHEN_CASEMAPPED` - -: \... - -`CHANGES_WHEN_NFKC_CASEFOLDED` - -: \... - -`EMOJI` - -: Since ICU 57 - -`EMOJI_PRESENTATION` - -: Since ICU 57 - -`EMOJI_MODIFIER` - -: Since ICU 57 - -`EMOJI_MODIFIER_BASE` - -: Since ICU 57 - -## POSIX Character Classes - -Avoid using POSIX character classes, e.g., `[:punct:]`. The ICU User Guide (see below) states that in general they are not well-defined, so you may end up with something different than you expect. - -In particular, in POSIX-like regex engines, `[:punct:]` stands for the character class corresponding to the `ispunct()` classification function (check out `man 3 ispunct` on UNIX-like systems). According to ISO/IEC 9899:1990 (ISO C90), the `ispunct()` function tests for any printing character except for space or a character for which `isalnum()` is true. However, in a POSIX setting, the details of what characters belong into which class depend on the current locale. So the `[:punct:]` class does not lead to a portable code (again, in POSIX-like regex engines). - -Therefore, a POSIX flavor of `[:punct:]` is more like `[\p{P}\p{S}]` in ICU. You have been warned. - -## Author(s) - -[Marek Gagolewski](https://www.gagolewski.com/) and other contributors - -## References - -*The Unicode Character Database* -- Unicode Standard Annex #44, - -*UnicodeSet* -- ICU User Guide, - -*Properties* -- ICU User Guide, - -*C/POSIX Migration* -- ICU User Guide, - -*Unicode Script Data*, - -*icu::Unicodeset Class Reference* -- ICU4C API Documentation, - -## See Also - -The official online manual of stringi at - -Gagolewski M., stringi: Fast and portable character string processing in R, *Journal of Statistical Software* 103(2), 2022, 1-59, [doi:10.18637/jss.v103.i02](https://doi.org/10.18637/jss.v103.i02) - -Other search_charclass: [`about_search`](about_search.md), [`stri_trim_both()`](stri_trim.md) - -Other stringi_general_topics: [`about_arguments`](about_arguments.md), [`about_encoding`](about_encoding.md), [`about_locale`](about_locale.md), [`about_search_boundaries`](about_search_boundaries.md), [`about_search_coll`](about_search_coll.md), [`about_search_fixed`](about_search_fixed.md), [`about_search_regex`](about_search_regex.md), [`about_search`](about_search.md), [`about_stringi`](about_stringi.md) diff --git a/.devel/sphinx/rapi/about_search_coll.md b/.devel/sphinx/rapi/about_search_coll.md deleted file mode 100644 index ec3eafff..00000000 --- a/.devel/sphinx/rapi/about_search_coll.md +++ /dev/null @@ -1,37 +0,0 @@ -# about_search_coll: - -## Description - -String searching facilities described here provide a way to locate a specific piece of text. Interestingly, locale-sensitive searching, especially on a non-English text, is a much more complex process than it seems at first glance. - -## Locale-Aware String Search Engine - -All `stri_*_coll` functions in stringi use ICU\'s `StringSearch` engine, which implements a locale-sensitive string search algorithm. The matches are defined by using the notion of "canonical equivalence" between strings. - -Tuning the Collator\'s parameters allows you to perform correct matching that properly takes into account accented letters, conjoined letters, ignorable punctuation and letter case. - -For more information on ICU\'s Collator and the search engine and how to tune it up in stringi, refer to [`stri_opts_collator`](stri_opts_collator.md). - -Please note that ICU\'s `StringSearch`-based functions are often much slower that those to perform fixed pattern searches. - -## Author(s) - -[Marek Gagolewski](https://www.gagolewski.com/) and other contributors - -## References - -*ICU String Search Service* -- ICU User Guide, - -L. Werner, *Efficient Text Searching in Java*, 1999, - -## See Also - -The official online manual of stringi at - -Gagolewski M., stringi: Fast and portable character string processing in R, *Journal of Statistical Software* 103(2), 2022, 1-59, [doi:10.18637/jss.v103.i02](https://doi.org/10.18637/jss.v103.i02) - -Other search_coll: [`about_search`](about_search.md), [`stri_opts_collator()`](stri_opts_collator.md) - -Other locale_sensitive: [`%s<%()`](+25s+3C+25.md), [`about_locale`](about_locale.md), [`about_search_boundaries`](about_search_boundaries.md), [`stri_compare()`](stri_compare.md), [`stri_count_boundaries()`](stri_count_boundaries.md), [`stri_duplicated()`](stri_duplicated.md), [`stri_enc_detect2()`](stri_enc_detect2.md), [`stri_extract_all_boundaries()`](stri_extract_boundaries.md), [`stri_locate_all_boundaries()`](stri_locate_boundaries.md), [`stri_opts_collator()`](stri_opts_collator.md), [`stri_order()`](stri_order.md), [`stri_rank()`](stri_rank.md), [`stri_sort_key()`](stri_sort_key.md), [`stri_sort()`](stri_sort.md), [`stri_split_boundaries()`](stri_split_boundaries.md), [`stri_trans_tolower()`](stri_trans_casemap.md), [`stri_unique()`](stri_unique.md), [`stri_wrap()`](stri_wrap.md) - -Other stringi_general_topics: [`about_arguments`](about_arguments.md), [`about_encoding`](about_encoding.md), [`about_locale`](about_locale.md), [`about_search_boundaries`](about_search_boundaries.md), [`about_search_charclass`](about_search_charclass.md), [`about_search_fixed`](about_search_fixed.md), [`about_search_regex`](about_search_regex.md), [`about_search`](about_search.md), [`about_stringi`](about_stringi.md) diff --git a/.devel/sphinx/rapi/about_search_fixed.md b/.devel/sphinx/rapi/about_search_fixed.md deleted file mode 100644 index 3ba2044d..00000000 --- a/.devel/sphinx/rapi/about_search_fixed.md +++ /dev/null @@ -1,37 +0,0 @@ -# about_search_fixed: - -## Description - -String searching facilities described here provide a way to locate a specific sequence of bytes in a string. The search engine\'s settings may be tuned up (for example to perform case-insensitive search) via a call to the [`stri_opts_fixed`](stri_opts_fixed.md) function. - -## Byte Compare - -The fast Knuth-Morris-Pratt search algorithm, with worst time complexity of O(n+p) (`n == length(str)`, `p == length(pattern)`) is implemented (with some tweaks for very short search patterns). - -Be aware that, for natural language processing, fixed pattern searching might not be what you actually require. It is because a bitwise match will not give correct results in cases of: - -1. accented letters; - -2. conjoined letters; - -3. ignorable punctuation; - -4. ignorable case, - -see also [about_search_coll](about_search_coll.md). - -Note that the conversion of input data to Unicode is done as usual. - -## Author(s) - -[Marek Gagolewski](https://www.gagolewski.com/) and other contributors - -## See Also - -The official online manual of stringi at - -Gagolewski M., stringi: Fast and portable character string processing in R, *Journal of Statistical Software* 103(2), 2022, 1-59, [doi:10.18637/jss.v103.i02](https://doi.org/10.18637/jss.v103.i02) - -Other search_fixed: [`about_search`](about_search.md), [`stri_opts_fixed()`](stri_opts_fixed.md) - -Other stringi_general_topics: [`about_arguments`](about_arguments.md), [`about_encoding`](about_encoding.md), [`about_locale`](about_locale.md), [`about_search_boundaries`](about_search_boundaries.md), [`about_search_charclass`](about_search_charclass.md), [`about_search_coll`](about_search_coll.md), [`about_search_regex`](about_search_regex.md), [`about_search`](about_search.md), [`about_stringi`](about_stringi.md) diff --git a/.devel/sphinx/rapi/about_search_regex.md b/.devel/sphinx/rapi/about_search_regex.md deleted file mode 100644 index 0bb96fcf..00000000 --- a/.devel/sphinx/rapi/about_search_regex.md +++ /dev/null @@ -1,371 +0,0 @@ -# about_search_regex: - -## Description - -A regular expression is a pattern describing, possibly in a very abstract way, a text fragment. With so many regex functions in stringi, regular expressions may be a very powerful tool to perform string searching, substring extraction, string splitting, etc., tasks. - -## Details - -All `stri_*_regex` functions in stringi use the ICU regex engine. Its settings may be tuned up (for example to perform case-insensitive search) via the [`stri_opts_regex`](stri_opts_regex.md) function. - -Regular expression patterns in ICU are quite similar in form and behavior to Perl\'s regexes. Their implementation is loosely inspired by JDK 1.4 `java.util.regex`. ICU Regular Expressions conform to the Unicode Technical Standard #18 (see References section) and its features are summarized in the ICU User Guide (see below). A good general introduction to regexes is (Friedl, 2002). Some general topics are also covered in the **R** manual, see [regex](https://stat.ethz.ch/R-manual/R-devel/library/base/html/regex.html). - -## ICU Regex Operators at a Glance - -Here is a list of operators provided by the ICU User Guide on regexes. - -`|` - -: Alternation. `A|B` matches either A or B. - -`*` - -: Match 0 or more times. Match as many times as possible. - -`+` - -: Match 1 or more times. Match as many times as possible. - -`?` - -: Match zero or one times. Prefer one. - -`{n}` - -: Match exactly n times. - -`{n,}` - -: Match at least n times. Match as many times as possible. - -`{n,m}` - -: Match between n and m times. Match as many times as possible, but not more than m. - -`*?` - -: Match 0 or more times. Match as few times as possible. - -`+?` - -: Match 1 or more times. Match as few times as possible. - -`??` - -: Match zero or one times. Prefer zero. - -`{n}?` - -: Match exactly n times. - -`{n,}?` - -: Match at least n times, but no more than required for an overall pattern match. - -`{n,m}?` - -: Match between n and m times. Match as few times as possible, but not less than n. - -`*+` - -: Match 0 or more times. Match as many times as possible when first encountered, do not retry with fewer even if overall match fails (Possessive Match). - -`++` - -: Match 1 or more times. Possessive match. - -`?+` - -: Match zero or one times. Possessive match. - -`{n}+` - -: Match exactly n times. - -`{n,}+` - -: Match at least n times. Possessive Match. - -`{n,m}+` - -: Match between n and m times. Possessive Match. - -`(...)` - -: Capturing parentheses. Range of input that matched the parenthesized sub-expression is available after the match, see [`stri_match`](stri_match.md). - -`(?:...)` - -: Non-capturing parentheses. Groups the included pattern, but does not provide capturing of matching text. Somewhat more efficient than capturing parentheses. - -`(?>...)` - -: Atomic-match parentheses. The first match of the parenthesized sub-expression is the only one tried; if it does not lead to an overall pattern match, back up the search for a match to a position before the `(?>`. - -`(?#...)` - -: Free-format comment `(?# comment )`. - -`(?=...)` - -: Look-ahead assertion. True if the parenthesized pattern matches at the current input position, but does not advance the input position. - -`(?!...)` - -: Negative look-ahead assertion. True if the parenthesized pattern does not match at the current input position. Does not advance the input position. - -`(?<=...)` - -: Look-behind assertion. True if the parenthesized pattern matches text preceding the current input position, with the last character of the match being the input character just before the current position. Does not alter the input position. The length of possible strings matched by the look-behind pattern must not be unbounded (no `*` or `+` operators.) - -`(?...)` - -: Named capture group, where `name` (enclosed within the angle brackets) is a sequence like `[A-Za-z][A-Za-z0-9]*` - -`(?ismwx-ismwx:...)` - -: Flag settings. Evaluate the parenthesized expression with the specified flags enabled or `-`disabled, see also [`stri_opts_regex`](stri_opts_regex.md). - -`(?ismwx-ismwx)` - -: Flag settings. Change the flag settings. Changes apply to the portion of the pattern following the setting. For example, `(?i)` changes to a case insensitive match, see also [`stri_opts_regex`](stri_opts_regex.md). - -## ICU Regex Meta-characters at a Glance - -Here is a list of meta-characters provided by the ICU User Guide on regexes. - -`\a` - -: Match a BELL, `\u0007`. - -`\A` - -: Match at the beginning of the input. Differs from `^`. in that `\A` will not match after a new line within the input. - -`\b` - -: Match if the current position is a word boundary. Boundaries occur at the transitions between word (`\w`) and non-word (`\W`) characters, with combining marks ignored. For better word boundaries, see ICU Boundary Analysis, e.g., [`stri_extract_all_words`](stri_extract_boundaries.md). - -`\B` - -: Match if the current position is not a word boundary. - -`\cX` - -: Match a control-`X` character. - -`\d` - -: Match any character with the Unicode General Category of `Nd` (Number, Decimal Digit.). - -`\D` - -: Match any character that is not a decimal digit. - -`\e` - -: Match an ESCAPE, `\u001B`. - -`\E` - -: Terminates a `\Q` \... `\E` quoted sequence. - -`\f` - -: Match a FORM FEED, `\u000C`. - -`\G` - -: Match if the current position is at the end of the previous match. - -`\h` - -: Match a Horizontal White Space character. They are characters with Unicode General Category of Space_Separator plus the ASCII tab, `\u0009`. \[Since ICU 55\] - -`\H` - -: Match a non-Horizontal White Space character. \[Since ICU 55\] - -`\k` - -: Named Capture Back Reference. \[Since ICU 55\] - -`\n` - -: Match a LINE FEED, `\u000A`. - -`\N{UNICODE CHARACTER NAME}` - -: Match the named character. - -`\p{UNICODE PROPERTY NAME}` - -: Match any character with the specified Unicode Property. - -`\P{UNICODE PROPERTY NAME}` - -: Match any character not having the specified Unicode Property. - -`\Q` - -: Quotes all following characters until `\E`. - -`\r` - -: Match a CARRIAGE RETURN, `\u000D`. - -`\s` - -: Match a white space character. White space is defined as `[\t\n\f\r\p{Z}]`. - -`\S` - -: Match a non-white space character. - -`\t` - -: Match a HORIZONTAL TABULATION, `\u0009`. - -`\uhhhh` - -: Match the character with the hex value `hhhh`. - -`\Uhhhhhhhh` - -: Match the character with the hex value `hhhhhhhh`. Exactly eight hex digits must be provided, even though the largest Unicode code point is `\U0010ffff`. - -`\w` - -: Match a word character. Word characters are `[\p{Alphabetic}\p{Mark}\p{Decimal_Number}\p{Connector_Punctuation}\u200c\u200d]`. - -`\W` - -: Match a non-word character. - -`\x{hhhh}` - -: Match the character with hex value hhhh. From one to six hex digits may be supplied. - -`\xhh` - -: Match the character with two digit hex value hh - -`\X` - -: Match a Grapheme Cluster. - -`\Z` - -: Match if the current position is at the end of input, but before the final line terminator, if one exists. - -`\z` - -: Match if the current position is at the end of input. - -`\n` - -: Back Reference. Match whatever the nth capturing group matched. n must be a number \> 1 and \< total number of capture groups in the pattern. - -`\0ooo` - -: Match an Octal character. `'ooo'` is from one to three octal digits. 0377 is the largest allowed Octal character. The leading zero is required; it distinguishes Octal constants from back references. - -`[pattern]` - -: Match any one character from the set. - -`.` - -: Match any character except for - by default - newline, compare [`stri_opts_regex`](stri_opts_regex.md). - -`^` - -: Match at the beginning of a line. - -`$` - -: Match at the end of a line. - -`\` - -: \[outside of sets\] Quotes the following character. Characters that must be quoted to be treated as literals are `* ? + [ ( ) { } ^ $ | \ .`. - -`\` - -: \[inside sets\] Quotes the following character. Characters that must be quoted to be treated as literals are `[ ] \`; Characters that may need to be quoted, depending on the context are `- &`. - -## Character Classes - -The syntax is similar, but not 100% compatible with the one described in [about_search_charclass](about_search_charclass.md). In particular, whitespaces are not ignored and set-theoretic operations are denoted slightly differently. However, other than this [about_search_charclass](about_search_charclass.md) is a good reference on the capabilities offered. - -The ICU User Guide on regexes lists what follows. - -`[abc]` - -: Match any of the characters a, b, or c - -`[^abc]` - -: Negation -- match any character except a, b, or c - -`[A-M]` - -: Range -- match any character from A to M (based on Unicode code point ordering) - -`[\p{L}]`, `[\p{Letter}]`, `[\p{General_Category=Letter}]`, `[:letter:]` - -: Characters with Unicode Category = Letter (4 equivalent forms) - -`[\P{Letter}]` - -: Negated property -- natch everything except Letters - -`[\p{numeric_value=9}]` - -: Match all numbers with a numeric value of 9 - -`[\p{Letter}&&\p{script=cyrillic}]` - -: Intersection; match the set of all Cyrillic letters - -`[\p{Letter}--\p{script=latin}]` - -: Set difference; match all non-Latin letters - -`[[a-z][A-Z][0-9]]`, `[a-zA-Z0-9]` - -: Union; match ASCII letters and digits (2 equivalent forms) - -## Regex Functions in stringi - -Note that if a given regex `pattern` is empty, then all the functions in stringi give `NA` in result and generate a warning. On a syntax error, a quite informative failure message is shown. - -If you wish to search for a fixed pattern, refer to [about_search_coll](about_search_coll.md) or [about_search_fixed](about_search_fixed.md). They allow to perform a locale-aware text lookup, or a very fast exact-byte search, respectively. - -## Author(s) - -[Marek Gagolewski](https://www.gagolewski.com/) and other contributors - -## References - -*Regular expressions* -- ICU User Guide, - -J.E.F. Friedl, *Mastering Regular Expressions*, O\'Reilly, 2002 - -*Unicode Regular Expressions* -- Unicode Technical Standard #18, - -*Unicode Regular Expressions* -- Regex tutorial, - -## See Also - -The official online manual of stringi at - -Gagolewski M., stringi: Fast and portable character string processing in R, *Journal of Statistical Software* 103(2), 2022, 1-59, [doi:10.18637/jss.v103.i02](https://doi.org/10.18637/jss.v103.i02) - -Other search_regex: [`about_search`](about_search.md), [`stri_opts_regex()`](stri_opts_regex.md) - -Other stringi_general_topics: [`about_arguments`](about_arguments.md), [`about_encoding`](about_encoding.md), [`about_locale`](about_locale.md), [`about_search_boundaries`](about_search_boundaries.md), [`about_search_charclass`](about_search_charclass.md), [`about_search_coll`](about_search_coll.md), [`about_search_fixed`](about_search_fixed.md), [`about_search`](about_search.md), [`about_stringi`](about_stringi.md) diff --git a/.devel/sphinx/rapi/about_stringi.md b/.devel/sphinx/rapi/about_stringi.md deleted file mode 100644 index d140d0cd..00000000 --- a/.devel/sphinx/rapi/about_stringi.md +++ /dev/null @@ -1,89 +0,0 @@ -# about_stringi: Fast and Portable Character String Processing in R - -## Description - -stringi is THE R package for fast, correct, consistent, and convenient string/text manipulation. It gives predictable results on every platform, in each locale, and under any native character encoding. - -**Keywords**: R, text processing, character strings, internationalization, localization, ICU, ICU4C, i18n, l10n, Unicode. - -**Homepage**: - -**License**: The BSD-3-clause license for the package code, the ICU license for the accompanying ICU4C distribution, and the UCD license for the Unicode Character Database. See the COPYRIGHTS and LICENSE file for more details. - -## Details - -Manual pages on general topics: - -- [about_encoding](about_encoding.md) -- character encoding issues, including information on encoding management in stringi, as well as on encoding detection and conversion. - -- [about_locale](about_locale.md) -- locale issues, including locale management and specification in stringi, and the list of locale-sensitive operations. In particular, see [`stri_opts_collator`](stri_opts_collator.md) for a description of the string collation algorithm, which is used for string comparing, ordering, ranking, sorting, case-folding, and searching. - -- [about_arguments](about_arguments.md) -- information on how stringi handles the arguments passed to its function. - -## Facilities available - -Refer to the following: - -- [about_search](about_search.md) for string searching facilities; these include pattern searching, matching, string splitting, and so on. The following independent search engines are provided: - - - [about_search_regex](about_search_regex.md) -- with ICU (Java-like) regular expressions, - - - [about_search_fixed](about_search_fixed.md) -- fast, locale-independent, byte-wise pattern matching, - - - [about_search_coll](about_search_coll.md) -- locale-aware pattern matching for natural language processing tasks, - - - [about_search_charclass](about_search_charclass.md) -- seeking elements of particular character classes, like "all whites-paces" or "all digits", - - - [about_search_boundaries](about_search_boundaries.md) -- text boundary analysis. - -- [`stri_datetime_format`](stri_datetime_format.md) for date/time formatting and parsing. Also refer to the links therein for other date/time/time zone- related operations. - -- [`stri_stats_general`](stri_stats_general.md) and [`stri_stats_latex`](stri_stats_latex.md) for gathering some fancy statistics on a character vector\'s contents. - -- [`stri_join`](stri_join.md), [`stri_dup`](stri_dup.md), [`%s+%`](+25s+2B+25.md), and [`stri_flatten`](stri_flatten.md) for concatenation-based operations. - -- [`stri_sub`](stri_sub.md) for extracting and replacing substrings, and [`stri_reverse`](stri_reverse.md) for a joyful function to reverse all code points in a string. - -- [`stri_length`](stri_length.md) (among others) for determining the number of code points in a string. See also [`stri_count_boundaries`](stri_count_boundaries.md) for counting the number of Unicode characters and [`stri_width`](stri_width.md) for approximating the width of a string. - -- [`stri_trim`](stri_trim.md) (among others) for trimming characters from the beginning or/and end of a string, see also [about_search_charclass](about_search_charclass.md), and [`stri_pad`](stri_pad.md) for padding strings so that they are of the same width. Additionally, [`stri_wrap`](stri_wrap.md) wraps text into lines. - -- [`stri_trans_tolower`](stri_trans_casemap.md) (among others) for case mapping, i.e., conversion to lower, UPPER, or Title Case, [`stri_trans_nfc`](stri_trans_nf.md) (among others) for Unicode normalization, [`stri_trans_char`](stri_trans_char.md) for translating individual code points, and [`stri_trans_general`](stri_trans_general.md) for other universal text transforms, including transliteration. - -- [`stri_cmp`](stri_compare.md), [`%s<%`](+25s+3C+25.md), [`stri_order`](stri_order.md), [`stri_sort`](stri_sort.md), [`stri_rank`](stri_rank.md), [`stri_unique`](stri_unique.md), and [`stri_duplicated`](stri_duplicated.md) for collation-based, locale-aware operations, see also [about_locale](about_locale.md). - -- [`stri_split_lines`](stri_split_lines.md) (among others) to split a string into text lines. - -- [`stri_escape_unicode`](stri_escape_unicode.md) (among others) for escaping some code points. - -- [`stri_rand_strings`](stri_rand_strings.md), [`stri_rand_shuffle`](stri_rand_shuffle.md), and [`stri_rand_lipsum`](stri_rand_lipsum.md) for generating (pseudo)random strings. - -- [`stri_read_raw`](stri_read_raw.md), [`stri_read_lines`](stri_read_lines.md), and [`stri_write_lines`](stri_write_lines.md) for reading and writing text files. - -Note that each man page provides many further links to other interesting facilities and topics. - -## Author(s) - -Marek Gagolewski, with contributions from Bartek Tartanus and many others. ICU4C was developed by IBM, Unicode, Inc., and others. - -## References - -*stringi Package Homepage*, - -Gagolewski M., stringi: Fast and portable character string processing in R, *Journal of Statistical Software* 103(2), 2022, 1-59, [doi:10.18637/jss.v103.i02](https://doi.org/10.18637/jss.v103.i02) - -*ICU -- International Components for Unicode*, - -*ICU4C API Documentation*, - -*The Unicode Consortium*, - -*UTF-8, A Transformation Format of ISO 10646* -- RFC 3629, - -## See Also - -The official online manual of stringi at - -Gagolewski M., stringi: Fast and portable character string processing in R, *Journal of Statistical Software* 103(2), 2022, 1-59, [doi:10.18637/jss.v103.i02](https://doi.org/10.18637/jss.v103.i02) - -Other stringi_general_topics: [`about_arguments`](about_arguments.md), [`about_encoding`](about_encoding.md), [`about_locale`](about_locale.md), [`about_search_boundaries`](about_search_boundaries.md), [`about_search_charclass`](about_search_charclass.md), [`about_search_coll`](about_search_coll.md), [`about_search_fixed`](about_search_fixed.md), [`about_search_regex`](about_search_regex.md), [`about_search`](about_search.md) diff --git a/.devel/sphinx/rapi/operator_add.md b/.devel/sphinx/rapi/operator_add.md deleted file mode 100644 index dc8d7e4b..00000000 --- a/.devel/sphinx/rapi/operator_add.md +++ /dev/null @@ -1,63 +0,0 @@ -# operator_add: Concatenate Two Character Vectors - -## Description - -Binary operators for joining (concatenating) two character vectors, with a typical R look-and-feel. - -## Usage - -``` r -e1 %s+% e2 - -e1 %stri+% e2 -``` - -## Arguments - -| | | -|------|-----------------------------------------------------------------| -| `e1` | a character vector or an object coercible to a character vector | -| `e2` | a character vector or an object coercible to a character vector | - -## Details - -Vectorized over `e1` and `e2`. - -These operators act like a call to [`stri_join(e1, e2, sep='')`](stri_join.md). However, note that joining 3 vectors, e.g., `e1 %s+% e2 %s+% e3` is slower than [`stri_join(e1, e2, e3, sep='')`](stri_join.md), because it creates a new (temporary) result vector each time the operator is applied. - -## Value - -Returns a character vector. - -## Author(s) - -[Marek Gagolewski](https://www.gagolewski.com/) and other contributors - -## See Also - -The official online manual of stringi at - -Gagolewski M., stringi: Fast and portable character string processing in R, *Journal of Statistical Software* 103(2), 2022, 1-59, [doi:10.18637/jss.v103.i02](https://doi.org/10.18637/jss.v103.i02) - -Other join: [`stri_dup()`](stri_dup.md), [`stri_flatten()`](stri_flatten.md), [`stri_join_list()`](stri_join_list.md), [`stri_join()`](stri_join.md) - -## Examples - - - - -```r -c('abc', '123', 'xy') %s+% letters[1:6] -``` - -``` -## [1] "abca" "123b" "xyc" "abcd" "123e" "xyf" -``` - -```r -'ID_' %s+% 1:5 -``` - -``` -## [1] "ID_1" "ID_2" "ID_3" "ID_4" "ID_5" -``` diff --git a/.devel/sphinx/rapi/operator_compare.md b/.devel/sphinx/rapi/operator_compare.md deleted file mode 100644 index efe889de..00000000 --- a/.devel/sphinx/rapi/operator_compare.md +++ /dev/null @@ -1,92 +0,0 @@ -# operator_compare: Compare Strings with or without Collation - -## Description - -Relational operators for comparing corresponding strings in two character vectors, with a typical R look-and-feel. - -## Usage - -``` r -e1 %s<% e2 - -e1 %s<=% e2 - -e1 %s>% e2 - -e1 %s>=% e2 - -e1 %s==% e2 - -e1 %s!=% e2 - -e1 %s===% e2 - -e1 %s!==% e2 - -e1 %stri<% e2 - -e1 %stri<=% e2 - -e1 %stri>% e2 - -e1 %stri>=% e2 - -e1 %stri==% e2 - -e1 %stri!=% e2 - -e1 %stri===% e2 - -e1 %stri!==% e2 -``` - -## Arguments - -| | | -|------------|-------------------------------------------------------------| -| `e1`, `e2` | character vectors or objects coercible to character vectors | - -## Details - -These functions call [`stri_cmp_le`](stri_compare.md) or its friends, using the default collator options. As a consequence, they are vectorized over `e1` and `e2`. - -`%stri==%` tests for canonical equivalence of strings (see [`stri_cmp_equiv`](stri_compare.md)) and is a locale-dependent operation. - -`%stri===%` performs a locale-independent, code point-based comparison. - -## Value - -All the functions return a logical vector indicating the result of a pairwise comparison. As usual, the elements of shorter vectors are recycled if necessary. - -## Author(s) - -[Marek Gagolewski](https://www.gagolewski.com/) and other contributors - -## See Also - -The official online manual of stringi at - -Gagolewski M., stringi: Fast and portable character string processing in R, *Journal of Statistical Software* 103(2), 2022, 1-59, [doi:10.18637/jss.v103.i02](https://doi.org/10.18637/jss.v103.i02) - -Other locale_sensitive: [`about_locale`](about_locale.md), [`about_search_boundaries`](about_search_boundaries.md), [`about_search_coll`](about_search_coll.md), [`stri_compare()`](stri_compare.md), [`stri_count_boundaries()`](stri_count_boundaries.md), [`stri_duplicated()`](stri_duplicated.md), [`stri_enc_detect2()`](stri_enc_detect2.md), [`stri_extract_all_boundaries()`](stri_extract_boundaries.md), [`stri_locate_all_boundaries()`](stri_locate_boundaries.md), [`stri_opts_collator()`](stri_opts_collator.md), [`stri_order()`](stri_order.md), [`stri_rank()`](stri_rank.md), [`stri_sort_key()`](stri_sort_key.md), [`stri_sort()`](stri_sort.md), [`stri_split_boundaries()`](stri_split_boundaries.md), [`stri_trans_tolower()`](stri_trans_casemap.md), [`stri_unique()`](stri_unique.md), [`stri_wrap()`](stri_wrap.md) - -## Examples - - - - -```r -'a' %stri<% 'b' -``` - -``` -## [1] TRUE -``` - -```r -c('a', 'b', 'c') %stri>=% 'b' -``` - -``` -## [1] FALSE TRUE TRUE -``` diff --git a/.devel/sphinx/rapi/operator_dollar.md b/.devel/sphinx/rapi/operator_dollar.md deleted file mode 100644 index 1a7d361c..00000000 --- a/.devel/sphinx/rapi/operator_dollar.md +++ /dev/null @@ -1,108 +0,0 @@ -# operator_dollar: as a Binary Operator - -## Description - -Provides access to [`stri_sprintf`](stri_sprintf.md) in form of a binary operator in a way similar to Python\'s `%` overloaded for strings. - -Missing values and empty vectors are propagated as usual. - -## Usage - -``` r -e1 %s$% e2 - -e1 %stri$% e2 -``` - -## Arguments - -| | | -|------|------------------------------------------------------------------------------------------------------| -| `e1` | format strings, see [`stri_sprintf`](stri_sprintf.md) for syntax | -| `e2` | a list of atomic vectors to be passed to [`stri_sprintf`](stri_sprintf.md) or a single atomic vector | - -## Details - -Vectorized over `e1` and `e2`. - -`e1 %s$% atomic_vector` is equivalent to `e1 %s$% list(atomic_vector)`. - -## Value - -Returns a character vector. - -## Author(s) - -[Marek Gagolewski](https://www.gagolewski.com/) and other contributors - -## See Also - -The official online manual of stringi at - -Gagolewski M., stringi: Fast and portable character string processing in R, *Journal of Statistical Software* 103(2), 2022, 1-59, [doi:10.18637/jss.v103.i02](https://doi.org/10.18637/jss.v103.i02) - -Other length: [`stri_isempty()`](stri_isempty.md), [`stri_length()`](stri_length.md), [`stri_numbytes()`](stri_numbytes.md), [`stri_pad_both()`](stri_pad.md), [`stri_sprintf()`](stri_sprintf.md), [`stri_width()`](stri_width.md) - -## Examples - - - - -```r -"value='%d'" %s$% 3 -``` - -``` -## [1] "value='3'" -``` - -```r -"value='%d'" %s$% 1:3 -``` - -``` -## [1] "value='1'" "value='2'" "value='3'" -``` - -```r -"%s='%d'" %s$% list("value", 3) -``` - -``` -## [1] "value='3'" -``` - -```r -"%s='%d'" %s$% list("value", 1:3) -``` - -``` -## [1] "value='1'" "value='2'" "value='3'" -``` - -```r -"%s='%d'" %s$% list(c("a", "b", "c"), 1) -``` - -``` -## [1] "a='1'" "b='1'" "c='1'" -``` - -```r -"%s='%d'" %s$% list(c("a", "b", "c"), 1:3) -``` - -``` -## [1] "a='1'" "b='2'" "c='3'" -``` - -```r -x <- c("abcd", "\u00DF\u00B5\U0001F970", "abcdef") -cat("[%6s]" %s$% x, sep="\n") # width used, not the number of bytes -``` - -``` -## [ abcd] -## [ ßµ🥰] -## [abcdef] -``` diff --git a/.devel/sphinx/rapi/stri_compare.md b/.devel/sphinx/rapi/stri_compare.md deleted file mode 100644 index 4b93fbdb..00000000 --- a/.devel/sphinx/rapi/stri_compare.md +++ /dev/null @@ -1,200 +0,0 @@ -# stri_compare: Compare Strings with or without Collation - -## Description - -These functions may be used to determine if two strings are equal, canonically equivalent (this is performed in a much more clever fashion than when testing for equality), or to check whether they are in a specific lexicographic order. - -## Usage - -``` r -stri_compare(e1, e2, ..., opts_collator = NULL) - -stri_cmp(e1, e2, ..., opts_collator = NULL) - -stri_cmp_eq(e1, e2) - -stri_cmp_neq(e1, e2) - -stri_cmp_equiv(e1, e2, ..., opts_collator = NULL) - -stri_cmp_nequiv(e1, e2, ..., opts_collator = NULL) - -stri_cmp_lt(e1, e2, ..., opts_collator = NULL) - -stri_cmp_gt(e1, e2, ..., opts_collator = NULL) - -stri_cmp_le(e1, e2, ..., opts_collator = NULL) - -stri_cmp_ge(e1, e2, ..., opts_collator = NULL) -``` - -## Arguments - -| | | -|-----------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| `e1`, `e2` | character vectors or objects coercible to character vectors | -| `...` | additional settings for `opts_collator` | -| `opts_collator` | a named list with ICU Collator\'s options, see [`stri_opts_collator`](stri_opts_collator.md), `NULL` for the default collation options. | - -## Details - -All the functions listed here are vectorized over `e1` and `e2`. - -`stri_cmp_eq` tests whether two corresponding strings consist of exactly the same code points, while `stri_cmp_neq` allows to check whether there is any difference between them. These are locale-independent operations: for natural language processing, where the notion of canonical equivalence is more valid, this might not be exactly what you are looking for, see Examples. Please note that stringi always silently removes UTF-8 BOMs from input strings, therefore, e.g., `stri_cmp_eq` does not take BOMs into account while comparing strings. - -`stri_cmp_equiv` tests for canonical equivalence of two strings and is locale-dependent. Additionally, the ICU\'s Collator may be tuned up so that, e.g., the comparison is case-insensitive. To test whether two strings are not canonically equivalent, call `stri_cmp_nequiv`. - -`stri_cmp_le` tests whether the elements in the first vector are less than or equal to the corresponding elements in the second vector, `stri_cmp_ge` tests whether they are greater or equal, `stri_cmp_lt` if less, and `stri_cmp_gt` if greater, see also, e.g., [`%s<%`](+25s+3C+25.md). - -`stri_compare` is an alias to `stri_cmp`. They both perform exactly the same locale-dependent operation. Both functions provide a C library\'s `strcmp()` look-and-feel, see Value for details. - -For more information on ICU\'s Collator and how to tune its settings refer to [`stri_opts_collator`](stri_opts_collator.md). Note that different locale settings may lead to different results (see the examples below). - -## Value - -The `stri_cmp` and `stri_compare` functions return an integer vector representing the comparison results: `-1` if `e1[...] < e2[...]`, `0` if they are canonically equivalent, and `1` if greater. - -All the other functions return a logical vector that indicates whether a given relation holds between two corresponding elements in `e1` and `e2`. - -## Author(s) - -[Marek Gagolewski](https://www.gagolewski.com/) and other contributors - -## References - -*Collation* -- ICU User Guide, - -## See Also - -The official online manual of stringi at - -Gagolewski M., stringi: Fast and portable character string processing in R, *Journal of Statistical Software* 103(2), 2022, 1-59, [doi:10.18637/jss.v103.i02](https://doi.org/10.18637/jss.v103.i02) - -Other locale_sensitive: [`%s<%()`](+25s+3C+25.md), [`about_locale`](about_locale.md), [`about_search_boundaries`](about_search_boundaries.md), [`about_search_coll`](about_search_coll.md), [`stri_count_boundaries()`](stri_count_boundaries.md), [`stri_duplicated()`](stri_duplicated.md), [`stri_enc_detect2()`](stri_enc_detect2.md), [`stri_extract_all_boundaries()`](stri_extract_boundaries.md), [`stri_locate_all_boundaries()`](stri_locate_boundaries.md), [`stri_opts_collator()`](stri_opts_collator.md), [`stri_order()`](stri_order.md), [`stri_rank()`](stri_rank.md), [`stri_sort_key()`](stri_sort_key.md), [`stri_sort()`](stri_sort.md), [`stri_split_boundaries()`](stri_split_boundaries.md), [`stri_trans_tolower()`](stri_trans_casemap.md), [`stri_unique()`](stri_unique.md), [`stri_wrap()`](stri_wrap.md) - -## Examples - - - - -```r -# in Polish, ch < h: -stri_cmp_lt('hladny', 'chladny', locale='pl_PL') -``` - -``` -## [1] FALSE -``` - -```r -# in Slovak, ch > h: -stri_cmp_lt('hladny', 'chladny', locale='sk_SK') -``` - -``` -## [1] TRUE -``` - -```r -# < or > (depends on locale): -stri_cmp('hladny', 'chladny') -``` - -``` -## [1] 1 -``` - -```r -# ignore case differences: -stri_cmp_equiv('hladny', 'HLADNY', strength=2) -``` - -``` -## [1] TRUE -``` - -```r -# also ignore diacritical differences: -stri_cmp_equiv('hladn\u00FD', 'hladny', strength=1, locale='sk_SK') -``` - -``` -## [1] TRUE -``` - -```r -marios <- c('Mario', 'mario', 'M\\u00e1rio', 'm\\u00e1rio') -stri_cmp_equiv(marios, 'mario', case_level=TRUE, strength=2L) -``` - -``` -## [1] FALSE TRUE FALSE FALSE -``` - -```r -stri_cmp_equiv(marios, 'mario', case_level=TRUE, strength=1L) -``` - -``` -## [1] FALSE TRUE FALSE FALSE -``` - -```r -stri_cmp_equiv(marios, 'mario', strength=1L) -``` - -``` -## [1] TRUE TRUE FALSE FALSE -``` - -```r -stri_cmp_equiv(marios, 'mario', strength=2L) -``` - -``` -## [1] TRUE TRUE FALSE FALSE -``` - -```r -# non-Unicode-normalized vs normalized string: -stri_cmp_equiv(stri_trans_nfkd('\u0105'), '\u105') -``` - -``` -## [1] TRUE -``` - -```r -# note the difference: -stri_cmp_eq(stri_trans_nfkd('\u0105'), '\u105') -``` - -``` -## [1] FALSE -``` - -```r -# ligatures: -stri_cmp_equiv('\ufb00', 'ff', strength=2) -``` - -``` -## [1] TRUE -``` - -```r -# phonebook collation -stri_cmp_equiv('G\u00e4rtner', 'Gaertner', locale='de_DE@collation=phonebook', strength=1L) -``` - -``` -## [1] TRUE -``` - -```r -stri_cmp_equiv('G\u00e4rtner', 'Gaertner', locale='de_DE', strength=1L) -``` - -``` -## [1] FALSE -``` diff --git a/.devel/sphinx/rapi/stri_count.md b/.devel/sphinx/rapi/stri_count.md deleted file mode 100644 index e9fbf42f..00000000 --- a/.devel/sphinx/rapi/stri_count.md +++ /dev/null @@ -1,187 +0,0 @@ -# stri_count: Count the Number of Pattern Occurrences - -## Description - -These functions count the number of occurrences of a pattern in a string. - -## Usage - -``` r -stri_count(str, ..., regex, fixed, coll, charclass) - -stri_count_charclass(str, pattern) - -stri_count_coll(str, pattern, ..., opts_collator = NULL) - -stri_count_fixed(str, pattern, ..., opts_fixed = NULL) - -stri_count_regex(str, pattern, ..., opts_regex = NULL) -``` - -## Arguments - -| | | -|--------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| `str` | character vector; strings to search in | -| `...` | supplementary arguments passed to the underlying functions, including additional settings for `opts_collator`, `opts_regex`, `opts_fixed`, and so on | -| `pattern`, `regex`, `fixed`, `coll`, `charclass` | character vector; search patterns; for more details refer to [stringi-search](about_search.md) | -| `opts_collator`, `opts_fixed`, `opts_regex` | a named list used to tune up the search engine\'s settings; see [`stri_opts_collator`](stri_opts_collator.md), [`stri_opts_fixed`](stri_opts_fixed.md), and [`stri_opts_regex`](stri_opts_regex.md), respectively; `NULL` for the defaults | - -## Details - -Vectorized over `str` and `pattern` (with recycling of the elements in the shorter vector if necessary). This allows to, for instance, search for one pattern in each given string, search for each pattern in one given string, and search for the i-th pattern within the i-th string. - -If `pattern` is empty, then the result is `NA` and a warning is generated. - -`stri_count` is a convenience function. It calls either `stri_count_regex`, `stri_count_fixed`, `stri_count_coll`, or `stri_count_charclass`, depending on the argument used. - -## Value - -All the functions return an integer vector. - -## Author(s) - -[Marek Gagolewski](https://www.gagolewski.com/) and other contributors - -## See Also - -The official online manual of stringi at - -Gagolewski M., stringi: Fast and portable character string processing in R, *Journal of Statistical Software* 103(2), 2022, 1-59, [doi:10.18637/jss.v103.i02](https://doi.org/10.18637/jss.v103.i02) - -Other search_count: [`about_search`](about_search.md), [`stri_count_boundaries()`](stri_count_boundaries.md) - -## Examples - - - - -```r -s <- 'Lorem ipsum dolor sit amet, consectetur adipisicing elit.' -stri_count(s, fixed='dolor') -``` - -``` -## [1] 1 -``` - -```r -stri_count(s, regex='\\p{L}+') -``` - -``` -## [1] 8 -``` - -```r -stri_count_fixed(s, ' ') -``` - -``` -## [1] 7 -``` - -```r -stri_count_fixed(s, 'o') -``` - -``` -## [1] 4 -``` - -```r -stri_count_fixed(s, 'it') -``` - -``` -## [1] 2 -``` - -```r -stri_count_fixed(s, letters) -``` - -``` -## [1] 2 0 3 2 5 0 1 0 7 0 0 2 3 2 4 2 0 3 4 5 2 0 0 0 0 0 -``` - -```r -stri_count_fixed('babab', 'b') -``` - -``` -## [1] 3 -``` - -```r -stri_count_fixed(c('stringi', '123'), 'string') -``` - -``` -## [1] 1 0 -``` - -```r -stri_count_charclass(c('stRRRingi', 'STrrrINGI', '123'), - c('\\p{Ll}', '\\p{Lu}', '\\p{Zs}')) -``` - -``` -## [1] 6 6 0 -``` - -```r -stri_count_charclass(' \t\n', '\\p{WHITE_SPACE}') # white space - binary property -``` - -``` -## [1] 3 -``` - -```r -stri_count_charclass(' \t\n', '\\p{Z}') # white-space - general category (note the difference) -``` - -``` -## [1] 1 -``` - -```r -stri_count_regex(s, '(s|el)it') -``` - -``` -## [1] 2 -``` - -```r -stri_count_regex(s, 'i.i') -``` - -``` -## [1] 2 -``` - -```r -stri_count_regex(s, '.it') -``` - -``` -## [1] 2 -``` - -```r -stri_count_regex('bab baab baaab', c('b.*?b', 'b.b')) -``` - -``` -## [1] 3 2 -``` - -```r -stri_count_regex(c('stringi', '123'), '^(s|1)') -``` - -``` -## [1] 1 1 -``` diff --git a/.devel/sphinx/rapi/stri_count_boundaries.md b/.devel/sphinx/rapi/stri_count_boundaries.md deleted file mode 100644 index 4833569c..00000000 --- a/.devel/sphinx/rapi/stri_count_boundaries.md +++ /dev/null @@ -1,117 +0,0 @@ -# stri_count_boundaries: Count the Number of Text Boundaries - -## Description - -These functions determine the number of text boundaries (like character, word, line, or sentence boundaries) in a string. - -## Usage - -``` r -stri_count_boundaries(str, ..., opts_brkiter = NULL) - -stri_count_words(str, locale = NULL) -``` - -## Arguments - -| | | -|----------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| `str` | character vector or an object coercible to | -| `...` | additional settings for `opts_brkiter` | -| `opts_brkiter` | a named list with ICU BreakIterator\'s settings, see [`stri_opts_brkiter`](stri_opts_brkiter.md); `NULL` for the default break iterator, i.e., `line_break` | -| `locale` | `NULL` or `''` for text boundary analysis following the conventions of the default locale, or a single string with locale identifier, see [stringi-locale](about_locale.md) | - -## Details - -Vectorized over `str`. - -For more information on text boundary analysis performed by ICU\'s `BreakIterator`, see [stringi-search-boundaries](about_search_boundaries.md). - -In case of `stri_count_words`, just like in [`stri_extract_all_words`](stri_extract_boundaries.md) and [`stri_locate_all_words`](stri_locate_boundaries.md), ICU\'s word `BreakIterator` iterator is used to locate the word boundaries, and all non-word characters (`UBRK_WORD_NONE` rule status) are ignored. This function is equivalent to a call to [`stri_count_boundaries(str, type='word', skip_word_none=TRUE, locale=locale)`](stri_count_boundaries.md). - -Note that a `BreakIterator` of type `character` may be used to count the number of *Unicode characters* in a string. The [`stri_length`](stri_length.md) function, which aims to count the number of *Unicode code points*, might report different results. - -Moreover, a `BreakIterator` of type `sentence` may be used to count the number of sentences in a text piece. - -## Value - -Both functions return an integer vector. - -## Author(s) - -[Marek Gagolewski](https://www.gagolewski.com/) and other contributors - -## See Also - -The official online manual of stringi at - -Gagolewski M., stringi: Fast and portable character string processing in R, *Journal of Statistical Software* 103(2), 2022, 1-59, [doi:10.18637/jss.v103.i02](https://doi.org/10.18637/jss.v103.i02) - -Other search_count: [`about_search`](about_search.md), [`stri_count()`](stri_count.md) - -Other locale_sensitive: [`%s<%()`](+25s+3C+25.md), [`about_locale`](about_locale.md), [`about_search_boundaries`](about_search_boundaries.md), [`about_search_coll`](about_search_coll.md), [`stri_compare()`](stri_compare.md), [`stri_duplicated()`](stri_duplicated.md), [`stri_enc_detect2()`](stri_enc_detect2.md), [`stri_extract_all_boundaries()`](stri_extract_boundaries.md), [`stri_locate_all_boundaries()`](stri_locate_boundaries.md), [`stri_opts_collator()`](stri_opts_collator.md), [`stri_order()`](stri_order.md), [`stri_rank()`](stri_rank.md), [`stri_sort_key()`](stri_sort_key.md), [`stri_sort()`](stri_sort.md), [`stri_split_boundaries()`](stri_split_boundaries.md), [`stri_trans_tolower()`](stri_trans_casemap.md), [`stri_unique()`](stri_unique.md), [`stri_wrap()`](stri_wrap.md) - -Other text_boundaries: [`about_search_boundaries`](about_search_boundaries.md), [`about_search`](about_search.md), [`stri_extract_all_boundaries()`](stri_extract_boundaries.md), [`stri_locate_all_boundaries()`](stri_locate_boundaries.md), [`stri_opts_brkiter()`](stri_opts_brkiter.md), [`stri_split_boundaries()`](stri_split_boundaries.md), [`stri_split_lines()`](stri_split_lines.md), [`stri_trans_tolower()`](stri_trans_casemap.md), [`stri_wrap()`](stri_wrap.md) - -## Examples - - - - -```r -test <- 'The\u00a0above-mentioned features are very useful. Spam, spam, eggs, bacon, and spam.' -stri_count_boundaries(test, type='word') -``` - -``` -## [1] 31 -``` - -```r -stri_count_boundaries(test, type='sentence') -``` - -``` -## [1] 2 -``` - -```r -stri_count_boundaries(test, type='character') -``` - -``` -## [1] 83 -``` - -```r -stri_count_words(test) -``` - -``` -## [1] 13 -``` - -```r -test2 <- stri_trans_nfkd('\u03c0\u0153\u0119\u00a9\u00df\u2190\u2193\u2192') -stri_count_boundaries(test2, type='character') -``` - -``` -## [1] 8 -``` - -```r -stri_length(test2) -``` - -``` -## [1] 9 -``` - -```r -stri_numbytes(test2) -``` - -``` -## [1] 20 -``` diff --git a/.devel/sphinx/rapi/stri_datetime_add.md b/.devel/sphinx/rapi/stri_datetime_add.md deleted file mode 100644 index f5a63b4c..00000000 --- a/.devel/sphinx/rapi/stri_datetime_add.md +++ /dev/null @@ -1,104 +0,0 @@ -# stri_datetime_add: Date and Time Arithmetic - -## Description - -Modifies a date-time object by adding a specific amount of time units. - -## Usage - -``` r -stri_datetime_add( - time, - value = 1L, - units = "seconds", - tz = NULL, - locale = NULL -) - -stri_datetime_add(time, units = "seconds", tz = NULL, locale = NULL) <- value -``` - -## Arguments - -| | | -|----------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| `time` | an object of class [`POSIXct`](https://stat.ethz.ch/R-manual/R-devel/library/base/html/DateTimeClasses.html) (`as.POSIXct` will be called on character vectors and objects of class `POSIXlt`, `Date`, and `factor`) | -| `value` | integer vector; signed number of units to add to `time` | -| `units` | single string; one of `'years'`, `'months'`, `'weeks'`, `'days'`, `'hours'`, `'minutes'`, `'seconds'`, or `'milliseconds'` | -| `tz` | `NULL` or `''` for the default time zone or a single string with a timezone identifier, | -| `locale` | `NULL` or `''` for default locale, or a single string with locale identifier; a non-Gregorian calendar may be specified by setting the `@calendar=name` keyword | - -## Details - -Vectorized over `time` and `value`. - -Note that, e.g., January, 31 + 1 month = February, 28 or 29. - -## Value - -Both functions return an object of class [`POSIXct`](https://stat.ethz.ch/R-manual/R-devel/library/base/html/DateTimeClasses.html). - -The replacement version of `stri_datetime_add` modifies the state of the `time` object. - -## Author(s) - -[Marek Gagolewski](https://www.gagolewski.com/) and other contributors - -## References - -*Calendar Classes* - ICU User Guide, - -## See Also - -The official online manual of stringi at - -Gagolewski M., stringi: Fast and portable character string processing in R, *Journal of Statistical Software* 103(2), 2022, 1-59, [doi:10.18637/jss.v103.i02](https://doi.org/10.18637/jss.v103.i02) - -Other datetime: [`stri_datetime_create()`](stri_datetime_create.md), [`stri_datetime_fields()`](stri_datetime_fields.md), [`stri_datetime_format()`](stri_datetime_format.md), [`stri_datetime_fstr()`](stri_datetime_fstr.md), [`stri_datetime_now()`](stri_datetime_now.md), [`stri_datetime_symbols()`](stri_datetime_symbols.md), [`stri_timezone_get()`](stri_timezone_set.md), [`stri_timezone_info()`](stri_timezone_info.md), [`stri_timezone_list()`](stri_timezone_list.md) - -## Examples - - - - -```r -x <- stri_datetime_now() -stri_datetime_add(x, units='months') <- 2 -print(x) -``` - -``` -## [1] "2024-01-05 14:58:29 AEDT" -``` - -```r -stri_datetime_add(x, -2, units='months') -``` - -``` -## [1] "2023-11-05 14:58:29 AEDT" -``` - -```r -stri_datetime_add(stri_datetime_create(2014, 4, 20), 1, units='years') -``` - -``` -## [1] "2015-04-20 12:00:00 AEST" -``` - -```r -stri_datetime_add(stri_datetime_create(2014, 4, 20), 1, units='years', locale='@calendar=hebrew') -``` - -``` -## [1] "2015-04-09 12:00:00 AEST" -``` - -```r -stri_datetime_add(stri_datetime_create(2016, 1, 31), 1, units='months') -``` - -``` -## [1] "2016-02-29 12:00:00 AEDT" -``` diff --git a/.devel/sphinx/rapi/stri_datetime_create.md b/.devel/sphinx/rapi/stri_datetime_create.md deleted file mode 100644 index f81215db..00000000 --- a/.devel/sphinx/rapi/stri_datetime_create.md +++ /dev/null @@ -1,92 +0,0 @@ -# stri_datetime_create: Create a Date-Time Object - -## Description - -Constructs date-time objects from numeric representations. - -## Usage - -``` r -stri_datetime_create( - year, - month, - day, - hour = 12L, - minute = 0L, - second = 0, - lenient = FALSE, - tz = NULL, - locale = NULL -) -``` - -## Arguments - -| | | -|-----------|-------------------------------------------------------------------------------------------------------------------------------------------------------------| -| `year` | integer vector; 0 is 1BC, -1 is 2BC, etc. | -| `month` | integer vector; months are 1-based | -| `day` | integer vector | -| `hour` | integer vector | -| `minute` | integer vector | -| `second` | numeric vector; fractional seconds are allowed | -| `lenient` | single logical value; should the operation be lenient? | -| `tz` | `NULL` or `''` for the default time zone or a single string with time zone identifier, see [`stri_timezone_list`](stri_timezone_list.md) | -| `locale` | `NULL` or `''` for default locale, or a single string with locale identifier; a non-Gregorian calendar may be specified by setting `@calendar=name` keyword | - -## Details - -Vectorized over `year`, `month`, `day`, `hour`, `hour`, `minute`, and `second`. - -## Value - -Returns an object of class [`POSIXct`](https://stat.ethz.ch/R-manual/R-devel/library/base/html/DateTimeClasses.html). - -## Author(s) - -[Marek Gagolewski](https://www.gagolewski.com/) and other contributors - -## See Also - -The official online manual of stringi at - -Gagolewski M., stringi: Fast and portable character string processing in R, *Journal of Statistical Software* 103(2), 2022, 1-59, [doi:10.18637/jss.v103.i02](https://doi.org/10.18637/jss.v103.i02) - -Other datetime: [`stri_datetime_add()`](stri_datetime_add.md), [`stri_datetime_fields()`](stri_datetime_fields.md), [`stri_datetime_format()`](stri_datetime_format.md), [`stri_datetime_fstr()`](stri_datetime_fstr.md), [`stri_datetime_now()`](stri_datetime_now.md), [`stri_datetime_symbols()`](stri_datetime_symbols.md), [`stri_timezone_get()`](stri_timezone_set.md), [`stri_timezone_info()`](stri_timezone_info.md), [`stri_timezone_list()`](stri_timezone_list.md) - -## Examples - - - - -```r -stri_datetime_create(2015, 12, 31, 23, 59, 59.999) -``` - -``` -## [1] "2015-12-31 23:59:59 AEDT" -``` - -```r -stri_datetime_create(5775, 8, 1, locale='@calendar=hebrew') # 1 Nisan 5775 -> 2015-03-21 -``` - -``` -## [1] "2015-03-21 12:00:00 AEDT" -``` - -```r -stri_datetime_create(2015, 02, 29) -``` - -``` -## [1] NA -``` - -```r -stri_datetime_create(2015, 02, 29, lenient=TRUE) -``` - -``` -## [1] "2015-03-01 12:00:00 AEDT" -``` diff --git a/.devel/sphinx/rapi/stri_datetime_fields.md b/.devel/sphinx/rapi/stri_datetime_fields.md deleted file mode 100644 index e6dab43c..00000000 --- a/.devel/sphinx/rapi/stri_datetime_fields.md +++ /dev/null @@ -1,104 +0,0 @@ -# stri_datetime_fields: Get Values for Date and Time Fields - -## Description - -Computes and returns values for all date and time fields. - -## Usage - -``` r -stri_datetime_fields(time, tz = attr(time, "tzone"), locale = NULL) -``` - -## Arguments - -| | | -|----------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| `time` | an object of class [`POSIXct`](https://stat.ethz.ch/R-manual/R-devel/library/base/html/DateTimeClasses.html) (`as.POSIXct` will be called on character vectors and objects of class `POSIXlt`, `Date`, and `factor`) | -| `tz` | `NULL` or `''` for the default time zone or a single string with time zone identifier, see [`stri_timezone_list`](stri_timezone_list.md) | -| `locale` | `NULL` or `''` for the current default locale, or a single string with a locale identifier; a non-Gregorian calendar may be specified by setting `@calendar=name` keyword | - -## Details - -Vectorized over `time`. - -## Value - -Returns a data frame with the following columns: - -1. Year (0 is 1BC, -1 is 2BC, etc.) - -2. Month (1-based, i.e., 1 stands for the first month, e.g., January; note that the number of months depends on the selected calendar, see [`stri_datetime_symbols`](stri_datetime_symbols.md)) - -3. Day - -4. Hour (24-h clock) - -5. Minute - -6. Second - -7. Millisecond - -8. WeekOfYear (this is locale-dependent) - -9. WeekOfMonth (this is locale-dependent) - -10. DayOfYear - -11. DayOfWeek (1-based, 1 denotes Sunday; see [`stri_datetime_symbols`](stri_datetime_symbols.md)) - -12. Hour12 (12-h clock) - -13. AmPm (see [`stri_datetime_symbols`](stri_datetime_symbols.md)) - -14. Era (see [`stri_datetime_symbols`](stri_datetime_symbols.md)) - -## Author(s) - -[Marek Gagolewski](https://www.gagolewski.com/) and other contributors - -## See Also - -The official online manual of stringi at - -Gagolewski M., stringi: Fast and portable character string processing in R, *Journal of Statistical Software* 103(2), 2022, 1-59, [doi:10.18637/jss.v103.i02](https://doi.org/10.18637/jss.v103.i02) - -Other datetime: [`stri_datetime_add()`](stri_datetime_add.md), [`stri_datetime_create()`](stri_datetime_create.md), [`stri_datetime_format()`](stri_datetime_format.md), [`stri_datetime_fstr()`](stri_datetime_fstr.md), [`stri_datetime_now()`](stri_datetime_now.md), [`stri_datetime_symbols()`](stri_datetime_symbols.md), [`stri_timezone_get()`](stri_timezone_set.md), [`stri_timezone_info()`](stri_timezone_info.md), [`stri_timezone_list()`](stri_timezone_list.md) - -## Examples - - - - -```r -stri_datetime_fields(stri_datetime_now()) -``` - -``` -## Year Month Day Hour Minute Second Millisecond WeekOfYear WeekOfMonth -## 1 2023 11 5 14 58 29 415 45 1 -## DayOfYear DayOfWeek Hour12 AmPm Era -## 1 309 1 2 2 2 -``` - -```r -stri_datetime_fields(stri_datetime_now(), locale='@calendar=hebrew') -``` - -``` -## Year Month Day Hour Minute Second Millisecond WeekOfYear WeekOfMonth -## 1 5784 2 21 14 58 29 420 8 3 -## DayOfYear DayOfWeek Hour12 AmPm Era -## 1 51 1 2 2 1 -``` - -```r -stri_datetime_symbols(locale='@calendar=hebrew')$Month[ - stri_datetime_fields(stri_datetime_now(), locale='@calendar=hebrew')$Month -] -``` - -``` -## [1] "Heshvan" -``` diff --git a/.devel/sphinx/rapi/stri_datetime_format.md b/.devel/sphinx/rapi/stri_datetime_format.md deleted file mode 100644 index e4bcc5c0..00000000 --- a/.devel/sphinx/rapi/stri_datetime_format.md +++ /dev/null @@ -1,225 +0,0 @@ -# stri_datetime_format: Date and Time Formatting and Parsing - -## Description - -These functions convert a given date/time object to a character vector, or vice versa. - -## Usage - -``` r -stri_datetime_format( - time, - format = "uuuu-MM-dd HH:mm:ss", - tz = NULL, - locale = NULL -) - -stri_datetime_parse( - str, - format = "uuuu-MM-dd HH:mm:ss", - lenient = FALSE, - tz = NULL, - locale = NULL -) -``` - -## Arguments - -| | | -|-----------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| `time` | an object of class [`POSIXct`](https://stat.ethz.ch/R-manual/R-devel/library/base/html/DateTimeClasses.html) (`as.POSIXct` will be called on character vectors and objects of class `POSIXlt`, `Date`, and `factor`) | -| `format` | character vector, see Details; see also [`stri_datetime_fstr`](stri_datetime_fstr.md) | -| `tz` | `NULL` or `''` for the default time zone or a single string with a timezone identifier, see [`stri_timezone_get`](stri_timezone_set.md) and [`stri_timezone_list`](stri_timezone_list.md) | -| `locale` | `NULL` or `''` for the default locale, or a single string with locale identifier; a non-Gregorian calendar may be specified by setting the `@calendar=name` keyword | -| `str` | character vector | -| `lenient` | single logical value; should date/time parsing be lenient? | - -## Details - -Vectorized over `format` and `time` or `str`. - -By default, `stri_datetime_format` (for the sake of compatibility with the [`strftime`](https://stat.ethz.ch/R-manual/R-devel/library/base/help/strftime.html) function) formats a date/time object using the current default time zone. - -Unspecified fields (e.g., seconds where only hours and minutes are given) are filled with the ones based on current date and time. - -`format` may be one of `DT_STYLE` or `DT_relative_STYLE`, where `DT` is equal to `date`, `time`, or `datetime`, and `STYLE` is equal to `full`, `long`, `medium`, or `short`. This gives a locale-dependent date and/or time format. Note that currently ICU does not support `relative` `time` formats, thus this flag is currently ignored in such a context. - -Otherwise, `format` is a pattern: a string where specific sequences of characters are replaced with date/time data from a calendar when formatting or used to generate data for a calendar when parsing. For example, `y` stands for \'year\'. Characters may be used multiple times: `yy` might produce `99`, whereas `yyyy` yields `1999`. For most numerical fields, the number of characters specifies the field width. For example, if `h` is the hour, `h` might produce `5`, but `hh` yields `05`. For some characters, the count specifies whether an abbreviated or full form should be used. - -Two single quotes represent a literal single quote, either inside or outside single quotes. Text within single quotes is not interpreted in any way (except for two adjacent single quotes). Otherwise, all ASCII letters from `a` to `z` and `A` to `Z` are reserved as syntax characters, and require quoting if they are to represent literal characters. In addition, certain ASCII punctuation characters may become available in the future (e.g., `:` being interpreted as the time separator and `/` as a date separator, and replaced by respective locale-sensitive characters in display). - -| | | | | -|:-----------|:-------------------------------------------------------|:---------------|:---------------------------------| -| **Symbol** | **Meaning** | **Example(s)** | **Output** | -| G | era designator | G, GG, or GGG | AD | -| | | GGGG | Anno Domini | -| | | GGGGG | A | -| y | year | yy | 96 | -| | | y or yyyy | 1996 | -| u | extended year | u | 4601 | -| U | cyclic year name, as in Chinese lunar calendar | U | | -| r | related Gregorian year | r | 1996 | -| Q | quarter | Q or QQ | 02 | -| | | QQQ | Q2 | -| | | QQQQ | 2nd quarter | -| | | QQQQQ | 2 | -| q | Stand Alone quarter | q or qq | 02 | -| | | qqq | Q2 | -| | | qqqq | 2nd quarter | -| | | qqqqq | 2 | -| M | month in year | M or MM | 09 | -| | | MMM | Sep | -| | | MMMM | September | -| | | MMMMM | S | -| L | Stand Alone month in year | L or LL | 09 | -| | | LLL | Sep | -| | | LLLL | September | -| | | LLLLL | S | -| w | week of year | w or ww | 27 | -| W | week of month | W | 2 | -| d | day in month | d | 2 | -| | | dd | 02 | -| D | day of year | D | 189 | -| F | day of week in month | F | 2 (2nd Wed in July) | -| g | modified Julian day | g | 2451334 | -| E | day of week | E, EE, or EEE | Tue | -| | | EEEE | Tuesday | -| | | EEEEE | T | -| | | EEEEEE | Tu | -| e | local day of week | e or ee | 2 | -| | example: if Monday is 1st day, Tuesday is 2nd ) | eee | Tue | -| | | eeee | Tuesday | -| | | eeeee | T | -| | | eeeeee | Tu | -| c | Stand Alone local day of week | c or cc | 2 | -| | | ccc | Tue | -| | | cccc | Tuesday | -| | | ccccc | T | -| | | cccccc | Tu | -| a | am/pm marker | a | pm | -| h | hour in am/pm (1\~12) | h | 7 | -| | | hh | 07 | -| H | hour in day (0\~23) | H | 0 | -| | | HH | 00 | -| k | hour in day (1\~24) | k | 24 | -| | | kk | 24 | -| K | hour in am/pm (0\~11) | K | 0 | -| | | KK | 00 | -| m | minute in hour | m | 4 | -| | | mm | 04 | -| s | second in minute | s | 5 | -| | | ss | 05 | -| S | fractional second - truncates (like other time fields) | S | 2 | -| | to the count of letters when formatting. Appends | SS | 23 | -| | zeros if more than 3 letters specified. Truncates at | SSS | 235 | -| | three significant digits when parsing. | SSSS | 2350 | -| A | milliseconds in day | A | 61201235 | -| z | Time Zone: specific non-location | z, zz, or zzz | PDT | -| | | zzzz | Pacific Daylight Time | -| Z | Time Zone: ISO8601 basic hms? / RFC 822 | Z, ZZ, or ZZZ | -0800 | -| | Time Zone: long localized GMT (=OOOO) | ZZZZ | GMT-08:00 | -| | Time Zone: ISO8601 extended hms? (=XXXXX) | ZZZZZ | -08:00, -07:52:58, Z | -| O | Time Zone: short localized GMT | O | GMT-8 | -| | Time Zone: long localized GMT (=ZZZZ) | OOOO | GMT-08:00 | -| v | Time Zone: generic non-location | v | PT | -| | (falls back first to VVVV) | vvvv | Pacific Time or Los Angeles Time | -| V | Time Zone: short time zone ID | V | uslax | -| | Time Zone: long time zone ID | VV | America/Los_Angeles | -| | Time Zone: time zone exemplar city | VVV | Los Angeles | -| | Time Zone: generic location (falls back to OOOO) | VVVV | Los Angeles Time | -| X | Time Zone: ISO8601 basic hm?, with Z for 0 | X | -08, +0530, Z | -| | Time Zone: ISO8601 basic hm, with Z | XX | -0800, Z | -| | Time Zone: ISO8601 extended hm, with Z | XXX | -08:00, Z | -| | Time Zone: ISO8601 basic hms?, with Z | XXXX | -0800, -075258, Z | -| | Time Zone: ISO8601 extended hms?, with Z | XXXXX | -08:00, -07:52:58, Z | -| x | Time Zone: ISO8601 basic hm?, without Z for 0 | x | -08, +0530 | -| | Time Zone: ISO8601 basic hm, without Z | xx | -0800 | -| | Time Zone: ISO8601 extended hm, without Z | xxx | -08:00 | -| | Time Zone: ISO8601 basic hms?, without Z | xxxx | -0800, -075258 | -| | Time Zone: ISO8601 extended hms?, without Z | xxxxx | -08:00, -07:52:58 | -| \' | escape for text | \' | (nothing) | -| \' \' | two single quotes produce one | \' \' | \' | - -Note that any characters in the pattern that are not in the ranges of `[a-z]` and `[A-Z]` will be treated as quoted text. For instance, characters like `:`, `.`, (a space), `#` and `@` will appear in the resulting time text even if they are not enclosed within single quotes. The single quote is used to "escape" the letters. Two single quotes in a row, inside or outside a quoted sequence, represent a "real" single quote. - -A few examples: - -| | | -|:-------------------------------|:--------------------------------------------------| -| **Example Pattern** | **Result** | -| yyyy.MM.dd \'at\' HH:mm:ss zzz | 2015.12.31 at 23:59:59 GMT+1 | -| EEE, MMM d, \'\'yy | czw., gru 31, \'15 | -| h:mm a | 11:59 PM | -| hh \'o\'\'clock\' a, zzzz | 11 o\'clock PM, GMT+01:00 | -| K:mm a, z | 11:59 PM, GMT+1 | -| yyyyy.MMMM.dd GGG hh:mm aaa | 2015.grudnia.31 n.e. 11:59 PM | -| uuuu-MM-dd\'T\'HH:mm:ssZ | 2015-12-31T23:59:59+0100 (the ISO 8601 guideline) | -| | | - -## Value - -`stri_datetime_format` returns a character vector. - -`stri_datetime_parse` returns an object of class [`POSIXct`](https://stat.ethz.ch/R-manual/R-devel/library/base/html/DateTimeClasses.html). - -## Author(s) - -[Marek Gagolewski](https://www.gagolewski.com/) and other contributors - -## References - -*Formatting Dates and Times* -- ICU User Guide, - -## See Also - -The official online manual of stringi at - -Gagolewski M., stringi: Fast and portable character string processing in R, *Journal of Statistical Software* 103(2), 2022, 1-59, [doi:10.18637/jss.v103.i02](https://doi.org/10.18637/jss.v103.i02) - -Other datetime: [`stri_datetime_add()`](stri_datetime_add.md), [`stri_datetime_create()`](stri_datetime_create.md), [`stri_datetime_fields()`](stri_datetime_fields.md), [`stri_datetime_fstr()`](stri_datetime_fstr.md), [`stri_datetime_now()`](stri_datetime_now.md), [`stri_datetime_symbols()`](stri_datetime_symbols.md), [`stri_timezone_get()`](stri_timezone_set.md), [`stri_timezone_info()`](stri_timezone_info.md), [`stri_timezone_list()`](stri_timezone_list.md) - -## Examples - - - - -```r -x <- c('2015-02-28', '2015-02-29') -stri_datetime_parse(x, 'yyyy-MM-dd') -``` - -``` -## [1] "2015-02-28 14:58:29 AEDT" NA -``` - -```r -stri_datetime_parse(x, 'yyyy-MM-dd', lenient=TRUE) -``` - -``` -## [1] "2015-02-28 14:58:29 AEDT" "2015-03-01 14:58:29 AEDT" -``` - -```r -stri_datetime_parse(x %s+% " 00:00:00", "yyyy-MM-dd HH:mm:ss") -``` - -``` -## [1] "2015-02-28 00:00:00 AEDT" NA -``` - -```r -stri_datetime_parse('19 lipca 2015', 'date_long', locale='pl_PL') -``` - -``` -## [1] "2015-07-19 14:58:29 AEST" -``` - -```r -stri_datetime_format(stri_datetime_now(), 'datetime_relative_medium') -``` - -``` -## [1] "today, 2:58:29 pm" -``` diff --git a/.devel/sphinx/rapi/stri_datetime_fstr.md b/.devel/sphinx/rapi/stri_datetime_fstr.md deleted file mode 100644 index 193c55de..00000000 --- a/.devel/sphinx/rapi/stri_datetime_fstr.md +++ /dev/null @@ -1,53 +0,0 @@ -# stri_datetime_fstr: -Style Format Strings - -## Description - -This function converts [`strptime`](https://stat.ethz.ch/R-manual/R-devel/library/base/help/strptime.html) or [`strftime`](https://stat.ethz.ch/R-manual/R-devel/library/base/help/strftime.html)-style format strings to ICU format strings that may be used in [`stri_datetime_parse`](stri_datetime_format.md) and [`stri_datetime_format`](stri_datetime_format.md) functions. - -## Usage - -``` r -stri_datetime_fstr(x, ignore_special = TRUE) -``` - -## Arguments - -| | | -|------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------| -| `x` | character vector of date/time format strings | -| `ignore_special` | if `FALSE`, special identifiers like `"datetime_full"` or `date_relative_short` (see [`stri_datetime_format`](stri_datetime_format.md)) are left as-is | - -## Details - -For more details on conversion specifiers please refer to the manual page of [`strptime`](https://stat.ethz.ch/R-manual/R-devel/library/base/help/strptime.html). Most of the formatters of the form `%x`, where `x` is a letter, are supported. Moreover, each `%%` is replaced with `%`. - -Warnings are given in the case of `%x`, `%X`, `%u`, `%w`, `%g`, `%G`, `%c`, `%U`, and `%W` as in such circumstances either ICU does not support the functionality requested using the string format API or there are some inconsistencies between base R and ICU. - -## Value - -Returns a character vector. - -## Author(s) - -[Marek Gagolewski](https://www.gagolewski.com/) and other contributors - -## See Also - -The official online manual of stringi at - -Gagolewski M., stringi: Fast and portable character string processing in R, *Journal of Statistical Software* 103(2), 2022, 1-59, [doi:10.18637/jss.v103.i02](https://doi.org/10.18637/jss.v103.i02) - -Other datetime: [`stri_datetime_add()`](stri_datetime_add.md), [`stri_datetime_create()`](stri_datetime_create.md), [`stri_datetime_fields()`](stri_datetime_fields.md), [`stri_datetime_format()`](stri_datetime_format.md), [`stri_datetime_now()`](stri_datetime_now.md), [`stri_datetime_symbols()`](stri_datetime_symbols.md), [`stri_timezone_get()`](stri_timezone_set.md), [`stri_timezone_info()`](stri_timezone_info.md), [`stri_timezone_list()`](stri_timezone_list.md) - -## Examples - - - - -```r -stri_datetime_fstr('%Y-%m-%d %H:%M:%S') -``` - -``` -## [1] "yyyy'-'MM'-'dd' 'HH':'mm':'ss" -``` diff --git a/.devel/sphinx/rapi/stri_datetime_now.md b/.devel/sphinx/rapi/stri_datetime_now.md deleted file mode 100644 index 4b2eefa4..00000000 --- a/.devel/sphinx/rapi/stri_datetime_now.md +++ /dev/null @@ -1,31 +0,0 @@ -# stri_datetime_now: Get Current Date and Time - -## Description - -Returns the current date and time. - -## Usage - -``` r -stri_datetime_now() -``` - -## Details - -The current date and time in stringi is represented as the (signed) number of seconds since 1970-01-01 00:00:00 UTC. UTC leap seconds are ignored. - -## Value - -Returns an object of class [`POSIXct`](https://stat.ethz.ch/R-manual/R-devel/library/base/html/DateTimeClasses.html). - -## Author(s) - -[Marek Gagolewski](https://www.gagolewski.com/) and other contributors - -## See Also - -The official online manual of stringi at - -Gagolewski M., stringi: Fast and portable character string processing in R, *Journal of Statistical Software* 103(2), 2022, 1-59, [doi:10.18637/jss.v103.i02](https://doi.org/10.18637/jss.v103.i02) - -Other datetime: [`stri_datetime_add()`](stri_datetime_add.md), [`stri_datetime_create()`](stri_datetime_create.md), [`stri_datetime_fields()`](stri_datetime_fields.md), [`stri_datetime_format()`](stri_datetime_format.md), [`stri_datetime_fstr()`](stri_datetime_fstr.md), [`stri_datetime_symbols()`](stri_datetime_symbols.md), [`stri_timezone_get()`](stri_timezone_set.md), [`stri_timezone_info()`](stri_timezone_info.md), [`stri_timezone_list()`](stri_timezone_list.md) diff --git a/.devel/sphinx/rapi/stri_datetime_symbols.md b/.devel/sphinx/rapi/stri_datetime_symbols.md deleted file mode 100644 index 4903c0cf..00000000 --- a/.devel/sphinx/rapi/stri_datetime_symbols.md +++ /dev/null @@ -1,486 +0,0 @@ -# stri_datetime_symbols: List Localizable Date-Time Formatting Data - -## Description - -Returns a list of all localizable date-time formatting data, including month and weekday names, localized AM/PM strings, etc. - -## Usage - -``` r -stri_datetime_symbols(locale = NULL, context = "standalone", width = "wide") -``` - -## Arguments - -| | | -|-----------|------------------------------------------------------------------------------| -| `locale` | `NULL` or `''` for default locale, or a single string with locale identifier | -| `context` | single string; one of: `'format'`, `'standalone'` | -| `width` | single string; one of: `'abbreviated'`, `'wide'`, `'narrow'` | - -## Details - -`context` stands for a selector for date formatting context and `width` - for date formatting width. - -## Value - -Returns a list with the following named components: - -1. `Month` - month names, - -2. `Weekday` - weekday names, - -3. `Quarter` - quarter names, - -4. `AmPm` - AM/PM names, - -5. `Era` - era names. - -## Author(s) - -[Marek Gagolewski](https://www.gagolewski.com/) and other contributors - -## References - -*Calendar* - ICU User Guide, - -*DateFormatSymbols* class -- ICU API Documentation, - -*Formatting Dates and Times* -- ICU User Guide, - -## See Also - -The official online manual of stringi at - -Gagolewski M., stringi: Fast and portable character string processing in R, *Journal of Statistical Software* 103(2), 2022, 1-59, [doi:10.18637/jss.v103.i02](https://doi.org/10.18637/jss.v103.i02) - -Other datetime: [`stri_datetime_add()`](stri_datetime_add.md), [`stri_datetime_create()`](stri_datetime_create.md), [`stri_datetime_fields()`](stri_datetime_fields.md), [`stri_datetime_format()`](stri_datetime_format.md), [`stri_datetime_fstr()`](stri_datetime_fstr.md), [`stri_datetime_now()`](stri_datetime_now.md), [`stri_timezone_get()`](stri_timezone_set.md), [`stri_timezone_info()`](stri_timezone_info.md), [`stri_timezone_list()`](stri_timezone_list.md) - -## Examples - - - - -```r -stri_datetime_symbols() # uses the Gregorian calendar in most locales -``` - -``` -## $Month -## [1] "January" "February" "March" "April" "May" "June" -## [7] "July" "August" "September" "October" "November" "December" -## -## $Weekday -## [1] "Sunday" "Monday" "Tuesday" "Wednesday" "Thursday" "Friday" -## [7] "Saturday" -## -## $Quarter -## [1] "1st quarter" "2nd quarter" "3rd quarter" "4th quarter" -## -## $AmPm -## [1] "am" "pm" -## -## $Era -## [1] "Before Christ" "Anno Domini" -``` - -```r -stri_datetime_symbols('@calendar=hebrew') -``` - -``` -## $Month -## [1] "Tishri" "Heshvan" "Kislev" "Tevet" "Shevat" "Adar I" "Adar" -## [8] "Nisan" "Iyar" "Sivan" "Tamuz" "Av" "Elul" "Adar II" -## -## $Weekday -## [1] "Sunday" "Monday" "Tuesday" "Wednesday" "Thursday" "Friday" -## [7] "Saturday" -## -## $Quarter -## [1] "1st quarter" "2nd quarter" "3rd quarter" "4th quarter" -## -## $AmPm -## [1] "am" "pm" -## -## $Era -## [1] "AM" -``` - -```r -stri_datetime_symbols('he_IL@calendar=hebrew') -``` - -``` -## $Month -## [1] "תשרי" "חשוון" "כסלו" "טבת" "שבט" "אדר א׳" "אדר" "ניסן" -## [9] "אייר" "סיוון" "תמוז" "אב" "אלול" "אדר ב׳" -## -## $Weekday -## [1] "יום ראשון" "יום שני" "יום שלישי" "יום רביעי" "יום חמישי" "יום שישי" -## [7] "יום שבת" -## -## $Quarter -## [1] "רבעון 1" "רבעון 2" "רבעון 3" "רבעון 4" -## -## $AmPm -## [1] "לפנה״צ" "אחה״צ" -## -## $Era -## [1] "לבריאת העולם" -``` - -```r -stri_datetime_symbols('@calendar=islamic') -``` - -``` -## $Month -## [1] "Muharram" "Safar" "Rabiʻ I" "Rabiʻ II" "Jumada I" -## [6] "Jumada II" "Rajab" "Shaʻban" "Ramadan" "Shawwal" -## [11] "Dhuʻl-Qiʻdah" "Dhuʻl-Hijjah" -## -## $Weekday -## [1] "Sunday" "Monday" "Tuesday" "Wednesday" "Thursday" "Friday" -## [7] "Saturday" -## -## $Quarter -## [1] "1st quarter" "2nd quarter" "3rd quarter" "4th quarter" -## -## $AmPm -## [1] "am" "pm" -## -## $Era -## [1] "AH" -``` - -```r -stri_datetime_symbols('@calendar=persian') -``` - -``` -## $Month -## [1] "Farvardin" "Ordibehesht" "Khordad" "Tir" "Mordad" -## [6] "Shahrivar" "Mehr" "Aban" "Azar" "Dey" -## [11] "Bahman" "Esfand" -## -## $Weekday -## [1] "Sunday" "Monday" "Tuesday" "Wednesday" "Thursday" "Friday" -## [7] "Saturday" -## -## $Quarter -## [1] "1st quarter" "2nd quarter" "3rd quarter" "4th quarter" -## -## $AmPm -## [1] "am" "pm" -## -## $Era -## [1] "AP" -``` - -```r -stri_datetime_symbols('@calendar=indian') -``` - -``` -## $Month -## [1] "Chaitra" "Vaisakha" "Jyaistha" "Asadha" "Sravana" -## [6] "Bhadra" "Asvina" "Kartika" "Agrahayana" "Pausa" -## [11] "Magha" "Phalguna" -## -## $Weekday -## [1] "Sunday" "Monday" "Tuesday" "Wednesday" "Thursday" "Friday" -## [7] "Saturday" -## -## $Quarter -## [1] "1st quarter" "2nd quarter" "3rd quarter" "4th quarter" -## -## $AmPm -## [1] "am" "pm" -## -## $Era -## [1] "Saka" -``` - -```r -stri_datetime_symbols('@calendar=coptic') -``` - -``` -## $Month -## [1] "Tout" "Baba" "Hator" "Kiahk" "Toba" "Amshir" -## [7] "Baramhat" "Baramouda" "Bashans" "Paona" "Epep" "Mesra" -## [13] "Nasie" -## -## $Weekday -## [1] "Sunday" "Monday" "Tuesday" "Wednesday" "Thursday" "Friday" -## [7] "Saturday" -## -## $Quarter -## [1] "1st quarter" "2nd quarter" "3rd quarter" "4th quarter" -## -## $AmPm -## [1] "am" "pm" -## -## $Era -## [1] "ERA0" "ERA1" -``` - -```r -stri_datetime_symbols('@calendar=japanese') -``` - -``` -## $Month -## [1] "January" "February" "March" "April" "May" "June" -## [7] "July" "August" "September" "October" "November" "December" -## -## $Weekday -## [1] "Sunday" "Monday" "Tuesday" "Wednesday" "Thursday" "Friday" -## [7] "Saturday" -## -## $Quarter -## [1] "1st quarter" "2nd quarter" "3rd quarter" "4th quarter" -## -## $AmPm -## [1] "am" "pm" -## -## $Era -## [1] "Taika (645–650)" "Hakuchi (650–671)" "Hakuhō (672–686)" -## [4] "Shuchō (686–701)" "Taihō (701–704)" "Keiun (704–708)" -## [7] "Wadō (708–715)" "Reiki (715–717)" "Yōrō (717–724)" -## [10] "Jinki (724–729)" "Tenpyō (729–749)" "Tenpyō-kampō (749–749)" -## [13] "Tenpyō-shōhō (749–757)" "Tenpyō-hōji (757–765)" "Tenpyō-jingo (765–767)" -## [16] "Jingo-keiun (767–770)" "Hōki (770–780)" "Ten-ō (781–782)" -## [19] "Enryaku (782–806)" "Daidō (806–810)" "Kōnin (810–824)" -## [22] "Tenchō (824–834)" "Jōwa (834–848)" "Kajō (848–851)" -## [25] "Ninju (851–854)" "Saikō (854–857)" "Ten-an (857–859)" -## [28] "Jōgan (859–877)" "Gangyō (877–885)" "Ninna (885–889)" -## [31] "Kanpyō (889–898)" "Shōtai (898–901)" "Engi (901–923)" -## [34] "Enchō (923–931)" "Jōhei (931–938)" "Tengyō (938–947)" -## [37] "Tenryaku (947–957)" "Tentoku (957–961)" "Ōwa (961–964)" -## [40] "Kōhō (964–968)" "Anna (968–970)" "Tenroku (970–973)" -## [43] "Ten’en (973–976)" "Jōgen (976–978)" "Tengen (978–983)" -## [46] "Eikan (983–985)" "Kanna (985–987)" "Eien (987–989)" -## [49] "Eiso (989–990)" "Shōryaku (990–995)" "Chōtoku (995–999)" -## [52] "Chōhō (999–1004)" "Kankō (1004–1012)" "Chōwa (1012–1017)" -## [55] "Kannin (1017–1021)" "Jian (1021–1024)" "Manju (1024–1028)" -## [58] "Chōgen (1028–1037)" "Chōryaku (1037–1040)" "Chōkyū (1040–1044)" -## [61] "Kantoku (1044–1046)" "Eishō (1046–1053)" "Tengi (1053–1058)" -## [64] "Kōhei (1058–1065)" "Jiryaku (1065–1069)" "Enkyū (1069–1074)" -## [67] "Shōho (1074–1077)" "Shōryaku (1077–1081)" "Eihō (1081–1084)" -## [70] "Ōtoku (1084–1087)" "Kanji (1087–1094)" "Kahō (1094–1096)" -## [73] "Eichō (1096–1097)" "Jōtoku (1097–1099)" "Kōwa (1099–1104)" -## [76] "Chōji (1104–1106)" "Kashō (1106–1108)" "Tennin (1108–1110)" -## [79] "Ten-ei (1110–1113)" "Eikyū (1113–1118)" "Gen’ei (1118–1120)" -## [82] "Hōan (1120–1124)" "Tenji (1124–1126)" "Daiji (1126–1131)" -## [85] "Tenshō (1131–1132)" "Chōshō (1132–1135)" "Hōen (1135–1141)" -## [88] "Eiji (1141–1142)" "Kōji (1142–1144)" "Ten’yō (1144–1145)" -## [91] "Kyūan (1145–1151)" "Ninpei (1151–1154)" "Kyūju (1154–1156)" -## [94] "Hōgen (1156–1159)" "Heiji (1159–1160)" "Eiryaku (1160–1161)" -## [97] "Ōho (1161–1163)" "Chōkan (1163–1165)" "Eiman (1165–1166)" -## [100] "Nin’an (1166–1169)" "Kaō (1169–1171)" "Shōan (1171–1175)" -## [103] "Angen (1175–1177)" "Jishō (1177–1181)" "Yōwa (1181–1182)" -## [106] "Juei (1182–1184)" "Genryaku (1184–1185)" "Bunji (1185–1190)" -## [109] "Kenkyū (1190–1199)" "Shōji (1199–1201)" "Kennin (1201–1204)" -## [112] "Genkyū (1204–1206)" "Ken’ei (1206–1207)" "Jōgen (1207–1211)" -## [115] "Kenryaku (1211–1213)" "Kenpō (1213–1219)" "Jōkyū (1219–1222)" -## [118] "Jōō (1222–1224)" "Gennin (1224–1225)" "Karoku (1225–1227)" -## [121] "Antei (1227–1229)" "Kanki (1229–1232)" "Jōei (1232–1233)" -## [124] "Tenpuku (1233–1234)" "Bunryaku (1234–1235)" "Katei (1235–1238)" -## [127] "Ryakunin (1238–1239)" "En’ō (1239–1240)" "Ninji (1240–1243)" -## [130] "Kangen (1243–1247)" "Hōji (1247–1249)" "Kenchō (1249–1256)" -## [133] "Kōgen (1256–1257)" "Shōka (1257–1259)" "Shōgen (1259–1260)" -## [136] "Bun’ō (1260–1261)" "Kōchō (1261–1264)" "Bun’ei (1264–1275)" -## [139] "Kenji (1275–1278)" "Kōan (1278–1288)" "Shōō (1288–1293)" -## [142] "Einin (1293–1299)" "Shōan (1299–1302)" "Kengen (1302–1303)" -## [145] "Kagen (1303–1306)" "Tokuji (1306–1308)" "Enkyō (1308–1311)" -## [148] "Ōchō (1311–1312)" "Shōwa (1312–1317)" "Bunpō (1317–1319)" -## [151] "Genō (1319–1321)" "Genkō (1321–1324)" "Shōchū (1324–1326)" -## [154] "Karyaku (1326–1329)" "Gentoku (1329–1331)" "Genkō (1331–1334)" -## [157] "Kenmu (1334–1336)" "Engen (1336–1340)" "Kōkoku (1340–1346)" -## [160] "Shōhei (1346–1370)" "Kentoku (1370–1372)" "Bunchū (1372–1375)" -## [163] "Tenju (1375–1379)" "Kōryaku (1379–1381)" "Kōwa (1381–1384)" -## [166] "Genchū (1384–1392)" "Meitoku (1384–1387)" "Kakei (1387–1389)" -## [169] "Kōō (1389–1390)" "Meitoku (1390–1394)" "Ōei (1394–1428)" -## [172] "Shōchō (1428–1429)" "Eikyō (1429–1441)" "Kakitsu (1441–1444)" -## [175] "Bun’an (1444–1449)" "Hōtoku (1449–1452)" "Kyōtoku (1452–1455)" -## [178] "Kōshō (1455–1457)" "Chōroku (1457–1460)" "Kanshō (1460–1466)" -## [181] "Bunshō (1466–1467)" "Ōnin (1467–1469)" "Bunmei (1469–1487)" -## [184] "Chōkyō (1487–1489)" "Entoku (1489–1492)" "Meiō (1492–1501)" -## [187] "Bunki (1501–1504)" "Eishō (1504–1521)" "Taiei (1521–1528)" -## [190] "Kyōroku (1528–1532)" "Tenbun (1532–1555)" "Kōji (1555–1558)" -## [193] "Eiroku (1558–1570)" "Genki (1570–1573)" "Tenshō (1573–1592)" -## [196] "Bunroku (1592–1596)" "Keichō (1596–1615)" "Genna (1615–1624)" -## [199] "Kan’ei (1624–1644)" "Shōho (1644–1648)" "Keian (1648–1652)" -## [202] "Jōō (1652–1655)" "Meireki (1655–1658)" "Manji (1658–1661)" -## [205] "Kanbun (1661–1673)" "Enpō (1673–1681)" "Tenna (1681–1684)" -## [208] "Jōkyō (1684–1688)" "Genroku (1688–1704)" "Hōei (1704–1711)" -## [211] "Shōtoku (1711–1716)" "Kyōhō (1716–1736)" "Genbun (1736–1741)" -## [214] "Kanpō (1741–1744)" "Enkyō (1744–1748)" "Kan’en (1748–1751)" -## [217] "Hōreki (1751–1764)" "Meiwa (1764–1772)" "An’ei (1772–1781)" -## [220] "Tenmei (1781–1789)" "Kansei (1789–1801)" "Kyōwa (1801–1804)" -## [223] "Bunka (1804–1818)" "Bunsei (1818–1830)" "Tenpō (1830–1844)" -## [226] "Kōka (1844–1848)" "Kaei (1848–1854)" "Ansei (1854–1860)" -## [229] "Man’en (1860–1861)" "Bunkyū (1861–1864)" "Genji (1864–1865)" -## [232] "Keiō (1865–1868)" "Meiji" "Taishō" -## [235] "Shōwa" "Heisei" "Reiwa" -``` - -```r -stri_datetime_symbols('ja_JP_TRADITIONAL') # uses the Japanese calendar by default -``` - -``` -## $Month -## [1] "1月" "2月" "3月" "4月" "5月" "6月" "7月" "8月" "9月" "10月" -## [11] "11月" "12月" -## -## $Weekday -## [1] "日曜日" "月曜日" "火曜日" "水曜日" "木曜日" "金曜日" "土曜日" -## -## $Quarter -## [1] "第1四半期" "第2四半期" "第3四半期" "第4四半期" -## -## $AmPm -## [1] "午前" "午後" -## -## $Era -## [1] "紀元前" "西暦" -``` - -```r -stri_datetime_symbols('th_TH_TRADITIONAL') # uses the Buddhist calendar -``` - -``` -## $Month -## [1] "มกราคม" "กุมภาพันธ์" "มีนาคม" "เมษายน" "พฤษภาคม" "มิถุนายน" -## [7] "กรกฎาคม" "สิงหาคม" "กันยายน" "ตุลาคม" "พฤศจิกายน" "ธันวาคม" -## -## $Weekday -## [1] "วันอาทิตย์" "วันจันทร์" "วันอังคาร" "วันพุธ" "วันพฤหัสบดี" "วันศุกร์" "วันเสาร์" -## -## $Quarter -## [1] "ไตรมาส 1" "ไตรมาส 2" "ไตรมาส 3" "ไตรมาส 4" -## -## $AmPm -## [1] "ก่อนเที่ยง" "หลังเที่ยง" -## -## $Era -## [1] "ปีก่อนคริสตกาล" "คริสต์ศักราช" -``` - -```r -stri_datetime_symbols('pl_PL', context='format') -``` - -``` -## $Month -## [1] "stycznia" "lutego" "marca" "kwietnia" "maja" -## [6] "czerwca" "lipca" "sierpnia" "września" "października" -## [11] "listopada" "grudnia" -## -## $Weekday -## [1] "niedziela" "poniedziałek" "wtorek" "środa" "czwartek" -## [6] "piątek" "sobota" -## -## $Quarter -## [1] "I kwartał" "II kwartał" "III kwartał" "IV kwartał" -## -## $AmPm -## [1] "AM" "PM" -## -## $Era -## [1] "przed naszą erą" "naszej ery" -``` - -```r -stri_datetime_symbols('pl_PL', context='standalone') -``` - -``` -## $Month -## [1] "styczeń" "luty" "marzec" "kwiecień" "maj" -## [6] "czerwiec" "lipiec" "sierpień" "wrzesień" "październik" -## [11] "listopad" "grudzień" -## -## $Weekday -## [1] "niedziela" "poniedziałek" "wtorek" "środa" "czwartek" -## [6] "piątek" "sobota" -## -## $Quarter -## [1] "I kwartał" "II kwartał" "III kwartał" "IV kwartał" -## -## $AmPm -## [1] "AM" "PM" -## -## $Era -## [1] "przed naszą erą" "naszej ery" -``` - -```r -stri_datetime_symbols(width='wide') -``` - -``` -## $Month -## [1] "January" "February" "March" "April" "May" "June" -## [7] "July" "August" "September" "October" "November" "December" -## -## $Weekday -## [1] "Sunday" "Monday" "Tuesday" "Wednesday" "Thursday" "Friday" -## [7] "Saturday" -## -## $Quarter -## [1] "1st quarter" "2nd quarter" "3rd quarter" "4th quarter" -## -## $AmPm -## [1] "am" "pm" -## -## $Era -## [1] "Before Christ" "Anno Domini" -``` - -```r -stri_datetime_symbols(width='abbreviated') -``` - -``` -## $Month -## [1] "Jan" "Feb" "Mar" "Apr" "May" "June" "July" "Aug" "Sept" "Oct" -## [11] "Nov" "Dec" -## -## $Weekday -## [1] "Sun" "Mon" "Tue" "Wed" "Thu" "Fri" "Sat" -## -## $Quarter -## [1] "Q1" "Q2" "Q3" "Q4" -## -## $AmPm -## [1] "am" "pm" -## -## $Era -## [1] "BC" "AD" -``` - -```r -stri_datetime_symbols(width='narrow') -``` - -``` -## $Month -## [1] "J" "F" "M" "A" "M" "J" "J" "A" "S" "O" "N" "D" -## -## $Weekday -## [1] "S" "M" "T" "W" "T" "F" "S" -## -## $Quarter -## [1] "1" "2" "3" "4" -## -## $AmPm -## [1] "am" "pm" -## -## $Era -## [1] "B" "A" -``` diff --git a/.devel/sphinx/rapi/stri_detect.md b/.devel/sphinx/rapi/stri_detect.md deleted file mode 100644 index 568628fa..00000000 --- a/.devel/sphinx/rapi/stri_detect.md +++ /dev/null @@ -1,176 +0,0 @@ -# stri_detect: Detect Pattern Occurrences - -## Description - -These functions determine, for each string in `str`, if there is at least one match to a corresponding `pattern`. - -## Usage - -``` r -stri_detect(str, ..., regex, fixed, coll, charclass) - -stri_detect_fixed( - str, - pattern, - negate = FALSE, - max_count = -1, - ..., - opts_fixed = NULL -) - -stri_detect_charclass(str, pattern, negate = FALSE, max_count = -1) - -stri_detect_coll( - str, - pattern, - negate = FALSE, - max_count = -1, - ..., - opts_collator = NULL -) - -stri_detect_regex( - str, - pattern, - negate = FALSE, - max_count = -1, - ..., - opts_regex = NULL -) -``` - -## Arguments - -| | | -|--------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| `str` | character vector; strings to search in | -| `...` | supplementary arguments passed to the underlying functions, including additional settings for `opts_collator`, `opts_regex`, `opts_fixed`, and so on | -| `pattern`, `regex`, `fixed`, `coll`, `charclass` | character vector; search patterns; for more details refer to [stringi-search](about_search.md) | -| `negate` | single logical value; whether a no-match to a pattern is rather of interest | -| `max_count` | single integer; allows to stop searching once a given number of occurrences is detected; `-1` (the default) inspects all elements | -| `opts_collator`, `opts_fixed`, `opts_regex` | a named list used to tune up the search engine\'s settings; see [`stri_opts_collator`](stri_opts_collator.md), [`stri_opts_fixed`](stri_opts_fixed.md), and [`stri_opts_regex`](stri_opts_regex.md), respectively; `NULL` for the defaults | - -## Details - -Vectorized over `str` and `pattern` (with recycling of the elements in the shorter vector if necessary). This allows to, for instance, search for one pattern in each given string, search for each pattern in one given string, and search for the i-th pattern within the i-th string. - -If `pattern` is empty, then the result is `NA` and a warning is generated. - -`stri_detect` is a convenience function. It calls either `stri_detect_regex`, `stri_detect_fixed`, `stri_detect_coll`, or `stri_detect_charclass`, depending on the argument used. - -See also [`stri_startswith`](stri_startsendswith.md) and [`stri_endswith`](stri_startsendswith.md) for testing whether a string starts or ends with a match to a given pattern. Moreover, see [`stri_subset`](stri_subset.md) for a character vector subsetting. - -If `max_count` is negative, then all stings are examined. Otherwise, searching terminates once `max_count` matches (or, if `negate` is `TRUE`, no-matches) are detected. The uninspected cases are marked as missing in the return vector. Be aware that, unless `pattern` is a singleton, the elements in `str` might be inspected in a non-consecutive order. - -## Value - -Each function returns a logical vector. - -## Author(s) - -[Marek Gagolewski](https://www.gagolewski.com/) and other contributors - -## See Also - -The official online manual of stringi at - -Gagolewski M., stringi: Fast and portable character string processing in R, *Journal of Statistical Software* 103(2), 2022, 1-59, [doi:10.18637/jss.v103.i02](https://doi.org/10.18637/jss.v103.i02) - -Other search_detect: [`about_search`](about_search.md), [`stri_startswith()`](stri_startsendswith.md) - -## Examples - - - - -```r -stri_detect_fixed(c('stringi R', 'R STRINGI', '123'), c('i', 'R', '0')) -``` - -``` -## [1] TRUE TRUE FALSE -``` - -```r -stri_detect_fixed(c('stringi R', 'R STRINGI', '123'), 'R') -``` - -``` -## [1] TRUE TRUE FALSE -``` - -```r -stri_detect_charclass(c('stRRRingi','R STRINGI', '123'), - c('\\p{Ll}', '\\p{Lu}', '\\p{Zs}')) -``` - -``` -## [1] TRUE TRUE FALSE -``` - -```r -stri_detect_regex(c('stringi R', 'R STRINGI', '123'), 'R.') -``` - -``` -## [1] FALSE TRUE FALSE -``` - -```r -stri_detect_regex(c('stringi R', 'R STRINGI', '123'), '[[:alpha:]]*?') -``` - -``` -## [1] TRUE TRUE TRUE -``` - -```r -stri_detect_regex(c('stringi R', 'R STRINGI', '123'), '[a-zC1]') -``` - -``` -## [1] TRUE FALSE TRUE -``` - -```r -stri_detect_regex(c('stringi R', 'R STRINGI', '123'), '( R|RE)') -``` - -``` -## [1] TRUE FALSE FALSE -``` - -```r -stri_detect_regex('stringi', 'STRING.', case_insensitive=TRUE) -``` - -``` -## [1] TRUE -``` - -```r -stri_detect_regex(c('abc', 'def', '123', 'ghi', '456', '789', 'jkl'), - '^[0-9]+$', max_count=1) -``` - -``` -## [1] FALSE FALSE TRUE NA NA NA NA -``` - -```r -stri_detect_regex(c('abc', 'def', '123', 'ghi', '456', '789', 'jkl'), - '^[0-9]+$', max_count=2) -``` - -``` -## [1] FALSE FALSE TRUE FALSE TRUE NA NA -``` - -```r -stri_detect_regex(c('abc', 'def', '123', 'ghi', '456', '789', 'jkl'), - '^[0-9]+$', negate=TRUE, max_count=3) -``` - -``` -## [1] TRUE TRUE FALSE TRUE NA NA NA -``` diff --git a/.devel/sphinx/rapi/stri_dup.md b/.devel/sphinx/rapi/stri_dup.md deleted file mode 100644 index e59f3d26..00000000 --- a/.devel/sphinx/rapi/stri_dup.md +++ /dev/null @@ -1,81 +0,0 @@ -# stri_dup: Duplicate Strings - -## Description - -Duplicates each `str`(`e1`) string `times`(`e2`) times and concatenates the results. - -## Usage - -``` r -stri_dup(str, times) - -e1 %s*% e2 - -e1 %stri*% e2 -``` - -## Arguments - -| | | -|---------------|----------------------------------------------------------------------| -| `str`, `e1` | a character vector of strings to be duplicated | -| `times`, `e2` | an integer vector with the numbers of times to duplicate each string | - -## Details - -Vectorized over all arguments. - -`e1 %s*% e2` and `e1 %stri*% e2` are synonyms for `stri_dup(e1, e2)` - -## Value - -Returns a character vector. - -## Author(s) - -[Marek Gagolewski](https://www.gagolewski.com/) and other contributors - -## See Also - -The official online manual of stringi at - -Gagolewski M., stringi: Fast and portable character string processing in R, *Journal of Statistical Software* 103(2), 2022, 1-59, [doi:10.18637/jss.v103.i02](https://doi.org/10.18637/jss.v103.i02) - -Other join: [`%s+%()`](+25s+2B+25.md), [`stri_flatten()`](stri_flatten.md), [`stri_join_list()`](stri_join_list.md), [`stri_join()`](stri_join.md) - -## Examples - - - - -```r -stri_dup('a', 1:5) -``` - -``` -## [1] "a" "aa" "aaa" "aaaa" "aaaaa" -``` - -```r -stri_dup(c('a', NA, 'ba'), 4) -``` - -``` -## [1] "aaaa" NA "babababa" -``` - -```r -stri_dup(c('abc', 'pqrst'), c(4, 2)) -``` - -``` -## [1] "abcabcabcabc" "pqrstpqrst" -``` - -```r -"a" %s*% 5 -``` - -``` -## [1] "aaaaa" -``` diff --git a/.devel/sphinx/rapi/stri_duplicated.md b/.devel/sphinx/rapi/stri_duplicated.md deleted file mode 100644 index eb010301..00000000 --- a/.devel/sphinx/rapi/stri_duplicated.md +++ /dev/null @@ -1,131 +0,0 @@ -# stri_duplicated: Determine Duplicated Elements - -## Description - -`stri_duplicated()` determines which strings in a character vector are duplicates of other elements. - -`stri_duplicated_any()` determines if there are any duplicated strings in a character vector. - -## Usage - -``` r -stri_duplicated( - str, - from_last = FALSE, - fromLast = from_last, - ..., - opts_collator = NULL -) - -stri_duplicated_any( - str, - from_last = FALSE, - fromLast = from_last, - ..., - opts_collator = NULL -) -``` - -## Arguments - -| | | -|-----------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------| -| `str` | a character vector | -| `from_last` | a single logical value; indicates whether search should be performed from the last to the first string | -| `fromLast` | \[DEPRECATED\] alias of `from_last` | -| `...` | additional settings for `opts_collator` | -| `opts_collator` | a named list with ICU Collator\'s options, see [`stri_opts_collator`](stri_opts_collator.md), `NULL` for default collation options | - -## Details - -Missing values are regarded as equal. - -Unlike [`duplicated`](https://stat.ethz.ch/R-manual/R-devel/library/base/html/duplicated.html) and [`anyDuplicated`](https://stat.ethz.ch/R-manual/R-devel/library/base/html/duplicated.html), these functions test for canonical equivalence of strings (and not whether the strings are just bytewise equal) Such operations are locale-dependent. Hence, `stri_duplicated` and `stri_duplicated_any` are significantly slower (but much better suited for natural language processing) than their base R counterparts. - -See also [`stri_unique`](stri_unique.md) for extracting unique elements. - -## Value - -`stri_duplicated()` returns a logical vector of the same length as `str`. Each of its elements indicates whether a canonically equivalent string was already found in `str`. - -`stri_duplicated_any()` returns a single non-negative integer. Value of 0 indicates that all the elements in `str` are unique. Otherwise, it gives the index of the first non-unique element. - -## Author(s) - -[Marek Gagolewski](https://www.gagolewski.com/) and other contributors - -## References - -*Collation* - ICU User Guide, - -## See Also - -The official online manual of stringi at - -Gagolewski M., stringi: Fast and portable character string processing in R, *Journal of Statistical Software* 103(2), 2022, 1-59, [doi:10.18637/jss.v103.i02](https://doi.org/10.18637/jss.v103.i02) - -Other locale_sensitive: [`%s<%()`](+25s+3C+25.md), [`about_locale`](about_locale.md), [`about_search_boundaries`](about_search_boundaries.md), [`about_search_coll`](about_search_coll.md), [`stri_compare()`](stri_compare.md), [`stri_count_boundaries()`](stri_count_boundaries.md), [`stri_enc_detect2()`](stri_enc_detect2.md), [`stri_extract_all_boundaries()`](stri_extract_boundaries.md), [`stri_locate_all_boundaries()`](stri_locate_boundaries.md), [`stri_opts_collator()`](stri_opts_collator.md), [`stri_order()`](stri_order.md), [`stri_rank()`](stri_rank.md), [`stri_sort_key()`](stri_sort_key.md), [`stri_sort()`](stri_sort.md), [`stri_split_boundaries()`](stri_split_boundaries.md), [`stri_trans_tolower()`](stri_trans_casemap.md), [`stri_unique()`](stri_unique.md), [`stri_wrap()`](stri_wrap.md) - -## Examples - - - - -```r -# In the following examples, we have 3 duplicated values, -# 'a' - 2 times, NA - 1 time -stri_duplicated(c('a', 'b', 'a', NA, 'a', NA)) -``` - -``` -## [1] FALSE FALSE TRUE FALSE TRUE TRUE -``` - -```r -stri_duplicated(c('a', 'b', 'a', NA, 'a', NA), from_last=TRUE) -``` - -``` -## [1] TRUE FALSE TRUE TRUE FALSE FALSE -``` - -```r -stri_duplicated_any(c('a', 'b', 'a', NA, 'a', NA)) -``` - -``` -## [1] 3 -``` - -```r -# compare the results: -stri_duplicated(c('\u0105', stri_trans_nfkd('\u0105'))) -``` - -``` -## [1] FALSE TRUE -``` - -```r -duplicated(c('\u0105', stri_trans_nfkd('\u0105'))) -``` - -``` -## [1] FALSE FALSE -``` - -```r -stri_duplicated(c('gro\u00df', 'GROSS', 'Gro\u00df', 'Gross'), strength=1) -``` - -``` -## [1] FALSE TRUE TRUE TRUE -``` - -```r -duplicated(c('gro\u00df', 'GROSS', 'Gro\u00df', 'Gross')) -``` - -``` -## [1] FALSE FALSE FALSE FALSE -``` diff --git a/.devel/sphinx/rapi/stri_enc_detect.md b/.devel/sphinx/rapi/stri_enc_detect.md deleted file mode 100644 index b3c348d0..00000000 --- a/.devel/sphinx/rapi/stri_enc_detect.md +++ /dev/null @@ -1,108 +0,0 @@ -# stri_enc_detect: Detect Character Set and Language - -## Description - -This function uses the ICU engine to determine the character set, or encoding, of character data in an unknown format. - -## Usage - -``` r -stri_enc_detect(str, filter_angle_brackets = FALSE) -``` - -## Arguments - -| | | -|-------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------| -| `str` | character vector, a raw vector, or a list of `raw` vectors | -| `filter_angle_brackets` | logical; If filtering is enabled, text within angle brackets (\'\<\' and \'\>\') will be removed before detection, which will remove most HTML or XML markup. | - -## Details - -Vectorized over `str` and `filter_angle_brackets`. - -For a character vector input, merging all text lines via [`stri_flatten(str, collapse='\n')`](stri_flatten.md) might be needed if `str` has been obtained via a call to `readLines` and in fact represents an image of a single text file. - -This is, at best, an imprecise operation using statistics and heuristics. Because of this, detection works best if you supply at least a few hundred bytes of character data that is mostly in a single language. However, because the detection only looks at a limited amount of the input data, some of the returned character sets may fail to handle all of the input data. Note that in some cases, the language can be determined along with the encoding. - -Several different techniques are used for character set detection. For multi-byte encodings, the sequence of bytes is checked for legible patterns. The detected characters are also checked against a list of frequently used characters in that encoding. For single byte encodings, the data is checked against a list of the most commonly occurring three letter groups for each language that can be written using that encoding. - -The detection process can be configured to optionally ignore HTML or XML style markup (using ICU\'s internal facilities), which can interfere with the detection process by changing the statistics. - -This function should most often be used for byte-marked input strings, especially after loading them from text files and before the main conversion with [`stri_encode`](stri_encode.md). The input encoding is of course not taken into account here, even if marked. - -The following table shows all the encodings that can be detected: - -| | | -|:------------------|:--------------------------------------------------------------------------------| -| **Character_Set** | **Languages** | -| UTF-8 | \-- | -| UTF-16BE | \-- | -| UTF-16LE | \-- | -| UTF-32BE | \-- | -| UTF-32LE | \-- | -| Shift_JIS | Japanese | -| ISO-2022-JP | Japanese | -| ISO-2022-CN | Simplified Chinese | -| ISO-2022-KR | Korean | -| GB18030 | Chinese | -| Big5 | Traditional Chinese | -| EUC-JP | Japanese | -| EUC-KR | Korean | -| ISO-8859-1 | Danish, Dutch, English, French, German, Italian, Norwegian, Portuguese, Swedish | -| ISO-8859-2 | Czech, Hungarian, Polish, Romanian | -| ISO-8859-5 | Russian | -| ISO-8859-6 | Arabic | -| ISO-8859-7 | Greek | -| ISO-8859-8 | Hebrew | -| ISO-8859-9 | Turkish | -| windows-1250 | Czech, Hungarian, Polish, Romanian | -| windows-1251 | Russian | -| windows-1252 | Danish, Dutch, English, French, German, Italian, Norwegian, Portuguese, Swedish | -| windows-1253 | Greek | -| windows-1254 | Turkish | -| windows-1255 | Hebrew | -| windows-1256 | Arabic | -| KOI8-R | Russian | -| IBM420 | Arabic | -| IBM424 | Hebrew | -| | | - -## Value - -Returns a list of length equal to the length of `str`. Each list element is a data frame with the following three named vectors representing all the guesses: - -- `Encoding` -- string; guessed encodings; `NA` on failure, - -- `Language` -- string; guessed languages; `NA` if the language could not be determined (e.g., in case of UTF-8), - -- `Confidence` -- numeric in \[0,1\]; the higher the value, the more confidence there is in the match; `NA` on failure. - -The guesses are ordered by decreasing confidence. - -## Author(s) - -[Marek Gagolewski](https://www.gagolewski.com/) and other contributors - -## References - -*Character Set Detection* -- ICU User Guide, - -## See Also - -The official online manual of stringi at - -Gagolewski M., stringi: Fast and portable character string processing in R, *Journal of Statistical Software* 103(2), 2022, 1-59, [doi:10.18637/jss.v103.i02](https://doi.org/10.18637/jss.v103.i02) - -Other encoding_detection: [`about_encoding`](about_encoding.md), [`stri_enc_detect2()`](stri_enc_detect2.md), [`stri_enc_isascii()`](stri_enc_isascii.md), [`stri_enc_isutf16be()`](stri_enc_isutf16.md), [`stri_enc_isutf8()`](stri_enc_isutf8.md) - -## Examples - - - - -```r -## Not run: -## f <- rawToChar(readBin('test.txt', 'raw', 100000)) -## stri_enc_detect(f) -``` diff --git a/.devel/sphinx/rapi/stri_enc_detect2.md b/.devel/sphinx/rapi/stri_enc_detect2.md deleted file mode 100644 index 52f70bcc..00000000 --- a/.devel/sphinx/rapi/stri_enc_detect2.md +++ /dev/null @@ -1,56 +0,0 @@ -# stri_enc_detect2: \[DEPRECATED\] Detect Locale-Sensitive Character Encoding - -## Description - -This function tries to detect character encoding in case the language of text is known. - -## Usage - -``` r -stri_enc_detect2(str, locale = NULL) -``` - -## Arguments - -| | | -|----------|-------------------------------------------------------------------------------------------------------------------------| -| `str` | character vector, a raw vector, or a list of `raw` vectors | -| `locale` | `NULL` or `''` for default locale, `NA` for just checking the UTF-\* family, or a single string with locale identifier. | - -## Details - -Vectorized over `str`. - -First, the text is checked whether it is valid UTF-32BE, UTF-32LE, UTF-16BE, UTF-16LE, UTF-8 (as in [`stri_enc_detect`](stri_enc_detect.md), this is roughly inspired by ICU\'s `i18n/csrucode.cpp`) or ASCII. - -If `locale` is not `NA` and the above fails, the text is checked for the number of occurrences of language-specific code points (data provided by the ICU library) converted to all possible 8-bit encodings that fully cover the indicated language. The encoding is selected based on the greatest number of total byte hits. - -The guess is of course imprecise, as it is obtained using statistics and heuristics. Because of this, detection works best if you supply at least a few hundred bytes of character data that is in a single language. - -If you have no initial guess on the language and encoding, try with [`stri_enc_detect`](stri_enc_detect.md) (uses ICU facilities). - -## Value - -Just like [`stri_enc_detect`](stri_enc_detect.md), this function returns a list of length equal to the length of `str`. Each list element is a data frame with the following three named components: - -- `Encoding` -- string; guessed encodings; `NA` on failure (if and only if `encodings` is empty), - -- `Language` -- always `NA`, - -- `Confidence` -- numeric in \[0,1\]; the higher the value, the more confidence there is in the match; `NA` on failure. - -The guesses are ordered by decreasing confidence. - -## Author(s) - -[Marek Gagolewski](https://www.gagolewski.com/) and other contributors - -## See Also - -The official online manual of stringi at - -Gagolewski M., stringi: Fast and portable character string processing in R, *Journal of Statistical Software* 103(2), 2022, 1-59, [doi:10.18637/jss.v103.i02](https://doi.org/10.18637/jss.v103.i02) - -Other locale_sensitive: [`%s<%()`](+25s+3C+25.md), [`about_locale`](about_locale.md), [`about_search_boundaries`](about_search_boundaries.md), [`about_search_coll`](about_search_coll.md), [`stri_compare()`](stri_compare.md), [`stri_count_boundaries()`](stri_count_boundaries.md), [`stri_duplicated()`](stri_duplicated.md), [`stri_extract_all_boundaries()`](stri_extract_boundaries.md), [`stri_locate_all_boundaries()`](stri_locate_boundaries.md), [`stri_opts_collator()`](stri_opts_collator.md), [`stri_order()`](stri_order.md), [`stri_rank()`](stri_rank.md), [`stri_sort_key()`](stri_sort_key.md), [`stri_sort()`](stri_sort.md), [`stri_split_boundaries()`](stri_split_boundaries.md), [`stri_trans_tolower()`](stri_trans_casemap.md), [`stri_unique()`](stri_unique.md), [`stri_wrap()`](stri_wrap.md) - -Other encoding_detection: [`about_encoding`](about_encoding.md), [`stri_enc_detect()`](stri_enc_detect.md), [`stri_enc_isascii()`](stri_enc_isascii.md), [`stri_enc_isutf16be()`](stri_enc_isutf16.md), [`stri_enc_isutf8()`](stri_enc_isutf8.md) diff --git a/.devel/sphinx/rapi/stri_enc_fromutf32.md b/.devel/sphinx/rapi/stri_enc_fromutf32.md deleted file mode 100644 index dc04a5a5..00000000 --- a/.devel/sphinx/rapi/stri_enc_fromutf32.md +++ /dev/null @@ -1,43 +0,0 @@ -# stri_enc_fromutf32: Convert From UTF-32 - -## Description - -This function converts integer vectors, representing sequences of UTF-32 code points, to UTF-8 strings. - -## Usage - -``` r -stri_enc_fromutf32(vec) -``` - -## Arguments - -| | | -|-------|------------------------------------------------------------------------------------------------------------------------------------------| -| `vec` | a list of integer vectors (or objects coercible to such vectors) or `NULL`s. For convenience, a single integer vector can also be given. | - -## Details - -UTF-32 is a 32-bit encoding where each Unicode code point corresponds to exactly one integer value. - -This function is a vectorized version of [`intToUtf8`](https://stat.ethz.ch/R-manual/R-devel/library/base/html/utf8Conversion.html). As usual in stringi, it returns character strings in UTF-8. See [`stri_enc_toutf32`](stri_enc_toutf32.md) for a dual operation. - -If an ill-defined code point is given, a warning is generated and the corresponding string is set to `NA`. Note that `0`s are not allowed in `vec`, as they are used internally to mark the end of a string (in the C API). - -See also [`stri_encode`](stri_encode.md) for decoding arbitrary byte sequences from any given encoding. - -## Value - -Returns a character vector (in UTF-8). `NULL`s in the input list are converted to `NA_character_`. - -## Author(s) - -[Marek Gagolewski](https://www.gagolewski.com/) and other contributors - -## See Also - -The official online manual of stringi at - -Gagolewski M., stringi: Fast and portable character string processing in R, *Journal of Statistical Software* 103(2), 2022, 1-59, [doi:10.18637/jss.v103.i02](https://doi.org/10.18637/jss.v103.i02) - -Other encoding_conversion: [`about_encoding`](about_encoding.md), [`stri_enc_toascii()`](stri_enc_toascii.md), [`stri_enc_tonative()`](stri_enc_tonative.md), [`stri_enc_toutf32()`](stri_enc_toutf32.md), [`stri_enc_toutf8()`](stri_enc_toutf8.md), [`stri_encode()`](stri_encode.md) diff --git a/.devel/sphinx/rapi/stri_enc_info.md b/.devel/sphinx/rapi/stri_enc_info.md deleted file mode 100644 index cc3aeda1..00000000 --- a/.devel/sphinx/rapi/stri_enc_info.md +++ /dev/null @@ -1,53 +0,0 @@ -# stri_enc_info: Query a Character Encoding - -## Description - -Gets basic information on a character encoding. - -## Usage - -``` r -stri_enc_info(enc = NULL) -``` - -## Arguments - -| | | -|-------|--------------------------------------------------------------------------------| -| `enc` | `NULL` or `''` for the default encoding, or a single string with encoding name | - -## Details - -An error is raised if the provided encoding is unknown to ICU (see [`stri_enc_list`](stri_enc_list.md) for more details). - -## Value - -Returns a list with the following components: - -- `Name.friendly` -- friendly encoding name: MIME Name or JAVA Name or ICU Canonical Name (the first of provided ones is selected, see below); - -- `Name.ICU` -- encoding name as identified by ICU; - -- `Name.*` -- other standardized encoding names, e.g., `Name.UTR22`, `Name.IBM`, `Name.WINDOWS`, `Name.JAVA`, `Name.IANA`, `Name.MIME` (some of them may be unavailable for all the encodings); - -- `ASCII.subset` -- is ASCII a subset of the given encoding?; - -- `Unicode.1to1` -- for 8-bit encodings only: are all characters translated to exactly one Unicode code point and is the translation scheme reversible?; - -- `CharSize.8bit` -- is this an 8-bit encoding, i.e., do we have `CharSize.min == CharSize.max` and `CharSize.min == 1`?; - -- `CharSize.min` -- minimal number of bytes used to represent a UChar (in UTF-16, this is not the same as UChar32) - -- `CharSize.max` -- maximal number of bytes used to represent a UChar (in UTF-16, this is not the same as UChar32, i.e., does not reflect the maximal code point representation size) - -## Author(s) - -[Marek Gagolewski](https://www.gagolewski.com/) and other contributors - -## See Also - -The official online manual of stringi at - -Gagolewski M., stringi: Fast and portable character string processing in R, *Journal of Statistical Software* 103(2), 2022, 1-59, [doi:10.18637/jss.v103.i02](https://doi.org/10.18637/jss.v103.i02) - -Other encoding_management: [`about_encoding`](about_encoding.md), [`stri_enc_list()`](stri_enc_list.md), [`stri_enc_mark()`](stri_enc_mark.md), [`stri_enc_set()`](stri_enc_set.md) diff --git a/.devel/sphinx/rapi/stri_enc_isascii.md b/.devel/sphinx/rapi/stri_enc_isascii.md deleted file mode 100644 index 96824b03..00000000 --- a/.devel/sphinx/rapi/stri_enc_isascii.md +++ /dev/null @@ -1,58 +0,0 @@ -# stri_enc_isascii: Check If a Data Stream Is Possibly in ASCII - -## Description - -The function checks whether all bytes in a string are \<= 127. - -## Usage - -``` r -stri_enc_isascii(str) -``` - -## Arguments - -| | | -|-------|------------------------------------------------------------| -| `str` | character vector, a raw vector, or a list of `raw` vectors | - -## Details - -This function is independent of the way **R** marks encodings in character strings (see [Encoding](https://stat.ethz.ch/R-manual/R-devel/library/base/html/Encoding.html) and [stringi-encoding](about_encoding.md)). - -## Value - -Returns a logical vector. The i-th element indicates whether the i-th string corresponds to a valid ASCII byte sequence. - -## Author(s) - -[Marek Gagolewski](https://www.gagolewski.com/) and other contributors - -## See Also - -The official online manual of stringi at - -Gagolewski M., stringi: Fast and portable character string processing in R, *Journal of Statistical Software* 103(2), 2022, 1-59, [doi:10.18637/jss.v103.i02](https://doi.org/10.18637/jss.v103.i02) - -Other encoding_detection: [`about_encoding`](about_encoding.md), [`stri_enc_detect2()`](stri_enc_detect2.md), [`stri_enc_detect()`](stri_enc_detect.md), [`stri_enc_isutf16be()`](stri_enc_isutf16.md), [`stri_enc_isutf8()`](stri_enc_isutf8.md) - -## Examples - - - - -```r -stri_enc_isascii(letters[1:3]) -``` - -``` -## [1] TRUE TRUE TRUE -``` - -```r -stri_enc_isascii('\u0105\u0104') -``` - -``` -## [1] FALSE -``` diff --git a/.devel/sphinx/rapi/stri_enc_isutf16.md b/.devel/sphinx/rapi/stri_enc_isutf16.md deleted file mode 100644 index 391a6fd5..00000000 --- a/.devel/sphinx/rapi/stri_enc_isutf16.md +++ /dev/null @@ -1,47 +0,0 @@ -# stri_enc_isutf16: Check If a Data Stream Is Possibly in UTF-16 or UTF-32 - -## Description - -These functions detect whether a given byte stream is valid UTF-16LE, UTF-16BE, UTF-32LE, or UTF-32BE. - -## Usage - -``` r -stri_enc_isutf16be(str) - -stri_enc_isutf16le(str) - -stri_enc_isutf32be(str) - -stri_enc_isutf32le(str) -``` - -## Arguments - -| | | -|-------|------------------------------------------------------------| -| `str` | character vector, a raw vector, or a list of `raw` vectors | - -## Details - -These functions are independent of the way **R** marks encodings in character strings (see [Encoding](https://stat.ethz.ch/R-manual/R-devel/library/base/html/Encoding.html) and [stringi-encoding](about_encoding.md)). Most often, these functions act on raw vectors. - -A result of `FALSE` means that a string is surely not valid UTF-16 or UTF-32. However, false positives are possible. - -Also note that a data stream may be sometimes classified as both valid UTF-16LE and UTF-16BE. - -## Value - -Returns a logical vector. - -## Author(s) - -[Marek Gagolewski](https://www.gagolewski.com/) and other contributors - -## See Also - -The official online manual of stringi at - -Gagolewski M., stringi: Fast and portable character string processing in R, *Journal of Statistical Software* 103(2), 2022, 1-59, [doi:10.18637/jss.v103.i02](https://doi.org/10.18637/jss.v103.i02) - -Other encoding_detection: [`about_encoding`](about_encoding.md), [`stri_enc_detect2()`](stri_enc_detect2.md), [`stri_enc_detect()`](stri_enc_detect.md), [`stri_enc_isascii()`](stri_enc_isascii.md), [`stri_enc_isutf8()`](stri_enc_isutf8.md) diff --git a/.devel/sphinx/rapi/stri_enc_isutf8.md b/.devel/sphinx/rapi/stri_enc_isutf8.md deleted file mode 100644 index b3f220b9..00000000 --- a/.devel/sphinx/rapi/stri_enc_isutf8.md +++ /dev/null @@ -1,70 +0,0 @@ -# stri_enc_isutf8: Check If a Data Stream Is Possibly in UTF-8 - -## Description - -The function checks whether given sequences of bytes forms a proper UTF-8 string. - -## Usage - -``` r -stri_enc_isutf8(str) -``` - -## Arguments - -| | | -|-------|------------------------------------------------------------| -| `str` | character vector, a raw vector, or a list of `raw` vectors | - -## Details - -`FALSE` means that a string is certainly not valid UTF-8. However, false positives are possible. For instance, `(c4,85)` represents (\'a with ogonek\') in UTF-8 as well as (\'A umlaut\', \'Ellipsis\') in WINDOWS-1250. Also note that UTF-8, as well as most 8-bit encodings, extend ASCII (note that [`stri_enc_isascii`](stri_enc_isascii.md) implies that [`stri_enc_isutf8`](stri_enc_isutf8.md)). - -However, the longer the sequence, the greater the possibility that the result is indeed in UTF-8 -- this is because not all sequences of bytes are valid UTF-8. - -This function is independent of the way **R** marks encodings in character strings (see [Encoding](https://stat.ethz.ch/R-manual/R-devel/library/base/html/Encoding.html) and [stringi-encoding](about_encoding.md)). - -## Value - -Returns a logical vector. Its i-th element indicates whether the i-th string corresponds to a valid UTF-8 byte sequence. - -## Author(s) - -[Marek Gagolewski](https://www.gagolewski.com/) and other contributors - -## See Also - -The official online manual of stringi at - -Gagolewski M., stringi: Fast and portable character string processing in R, *Journal of Statistical Software* 103(2), 2022, 1-59, [doi:10.18637/jss.v103.i02](https://doi.org/10.18637/jss.v103.i02) - -Other encoding_detection: [`about_encoding`](about_encoding.md), [`stri_enc_detect2()`](stri_enc_detect2.md), [`stri_enc_detect()`](stri_enc_detect.md), [`stri_enc_isascii()`](stri_enc_isascii.md), [`stri_enc_isutf16be()`](stri_enc_isutf16.md) - -## Examples - - - - -```r -stri_enc_isutf8(letters[1:3]) -``` - -``` -## [1] TRUE TRUE TRUE -``` - -```r -stri_enc_isutf8('\u0105\u0104') -``` - -``` -## [1] TRUE -``` - -```r -stri_enc_isutf8('\u1234\u0222') -``` - -``` -## [1] TRUE -``` diff --git a/.devel/sphinx/rapi/stri_enc_list.md b/.devel/sphinx/rapi/stri_enc_list.md deleted file mode 100644 index 6dc6e7c9..00000000 --- a/.devel/sphinx/rapi/stri_enc_list.md +++ /dev/null @@ -1,2217 +0,0 @@ -# stri_enc_list: List Known Character Encodings - -## Description - -Gives the list of encodings that are supported by ICU. - -## Usage - -``` r -stri_enc_list(simplify = TRUE) -``` - -## Arguments - -| | | -|------------|---------------------------------------------------------------------------------| -| `simplify` | single logical value; return a character vector or a list of character vectors? | - -## Details - -Apart from given encoding identifiers and their aliases, some other specifiers might additionally be available. This is due to the fact that ICU tries to normalize converter names. For instance, `'UTF8'` is also valid, see [stringi-encoding](about_encoding.md) for more information. - -## Value - -If `simplify` is `FALSE`, a list of character vectors is returned. Each list element represents a unique character encoding. The `name` attribute gives the ICU Canonical Name of an encoding family. The elements (character vectors) are its aliases. - -If `simplify` is `TRUE` (the default), then the resulting list is coerced to a character vector and sorted, and returned with removed duplicated entries. - -## Author(s) - -[Marek Gagolewski](https://www.gagolewski.com/) and other contributors - -## See Also - -The official online manual of stringi at - -Gagolewski M., stringi: Fast and portable character string processing in R, *Journal of Statistical Software* 103(2), 2022, 1-59, [doi:10.18637/jss.v103.i02](https://doi.org/10.18637/jss.v103.i02) - -Other encoding_management: [`about_encoding`](about_encoding.md), [`stri_enc_info()`](stri_enc_info.md), [`stri_enc_mark()`](stri_enc_mark.md), [`stri_enc_set()`](stri_enc_set.md) - -## Examples - - - - -```r -stri_enc_list() -``` - -``` -## [1] "037" -## [2] "273" -## [3] "277" -## [4] "278" -## [5] "280" -## [6] "284" -## [7] "285" -## [8] "297" -## [9] "420" -## [10] "424" -## [11] "437" -## [12] "500" -## [13] "646" -## [14] "737" -## [15] "775" -## [16] "813" -## [17] "819" -## [18] "838" -## [19] "850" -## [20] "851" -## [21] "852" -## [22] "855" -## [23] "856" -## [24] "857" -## [25] "860" -## [26] "861" -## [27] "862" -## [28] "863" -## [29] "865" -## [30] "866" -## [31] "868" -## [32] "869" -## [33] "871" -## [34] "875" -## [35] "912" -## [36] "913" -## [37] "914" -## [38] "915" -## [39] "916" -## [40] "920" -## [41] "921" -## [42] "922" -## [43] "923" -## [44] "930" -## [45] "933" -## [46] "935" -## [47] "937" -## [48] "939" -## [49] "943" -## [50] "949" -## [51] "950" -## [52] "964" -## [53] "970" -## [54] "1006" -## [55] "1025" -## [56] "1026" -## [57] "1047" -## [58] "1089" -## [59] "1097" -## [60] "1098" -## [61] "1112" -## [62] "1122" -## [63] "1123" -## [64] "1124" -## [65] "1383" -## [66] "5601" -## [67] "8859_1" -## [68] "8859_2" -## [69] "8859_3" -## [70] "8859_4" -## [71] "8859_5" -## [72] "8859_6" -## [73] "8859_7" -## [74] "8859_8" -## [75] "8859_9" -## [76] "8859_13" -## [77] "8859_15" -## [78] "33722" -## [79] "Adobe-Standard-Encoding" -## [80] "ANSI_X3.4-1968" -## [81] "ANSI_X3.4-1986" -## [82] "ANSI1251" -## [83] "arabic" -## [84] "ASCII" -## [85] "ascii7" -## [86] "ASMO-708" -## [87] "Big5" -## [88] "Big5-HKSCS" -## [89] "big5-hkscs:unicode3.0" -## [90] "big5hk" -## [91] "BOCU-1" -## [92] "CCSID00858" -## [93] "CCSID01140" -## [94] "CCSID01141" -## [95] "CCSID01142" -## [96] "CCSID01143" -## [97] "CCSID01144" -## [98] "CCSID01145" -## [99] "CCSID01146" -## [100] "CCSID01147" -## [101] "CCSID01148" -## [102] "CCSID01149" -## [103] "CESU-8" -## [104] "chinese" -## [105] "cns11643" -## [106] "COMPOUND_TEXT" -## [107] "cp-ar" -## [108] "cp-gr" -## [109] "cp-is" -## [110] "cp037" -## [111] "cp37" -## [112] "CP273" -## [113] "cp277" -## [114] "cp278" -## [115] "CP280" -## [116] "CP284" -## [117] "CP285" -## [118] "cp290" -## [119] "cp297" -## [120] "cp367" -## [121] "cp420" -## [122] "cp424" -## [123] "cp437" -## [124] "CP500" -## [125] "cp737" -## [126] "cp775" -## [127] "cp803" -## [128] "cp813" -## [129] "cp819" -## [130] "cp838" -## [131] "cp850" -## [132] "cp851" -## [133] "cp852" -## [134] "cp855" -## [135] "cp856" -## [136] "cp857" -## [137] "CP00858" -## [138] "cp858" -## [139] "cp860" -## [140] "cp861" -## [141] "cp862" -## [142] "cp863" -## [143] "cp864" -## [144] "cp865" -## [145] "cp866" -## [146] "CP868" -## [147] "cp869" -## [148] "CP870" -## [149] "CP871" -## [150] "cp874" -## [151] "cp875" -## [152] "cp878" -## [153] "cp912" -## [154] "cp913" -## [155] "cp914" -## [156] "cp915" -## [157] "cp916" -## [158] "CP918" -## [159] "cp920" -## [160] "cp921" -## [161] "cp922" -## [162] "cp923" -## [163] "cp930" -## [164] "cp932" -## [165] "cp933" -## [166] "cp935" -## [167] "CP936" -## [168] "cp937" -## [169] "cp939" -## [170] "cp943" -## [171] "cp943c" -## [172] "cp949" -## [173] "cp949c" -## [174] "cp950" -## [175] "cp964" -## [176] "cp970" -## [177] "cp1006" -## [178] "cp1025" -## [179] "CP1026" -## [180] "cp1047" -## [181] "cp1089" -## [182] "cp1097" -## [183] "cp1098" -## [184] "cp1112" -## [185] "cp1122" -## [186] "cp1123" -## [187] "cp1124" -## [188] "cp1125" -## [189] "cp1131" -## [190] "CP01140" -## [191] "cp1140" -## [192] "CP01141" -## [193] "cp1141" -## [194] "CP01142" -## [195] "cp1142" -## [196] "CP01143" -## [197] "cp1143" -## [198] "CP01144" -## [199] "cp1144" -## [200] "CP01145" -## [201] "cp1145" -## [202] "CP01146" -## [203] "cp1146" -## [204] "CP01147" -## [205] "cp1147" -## [206] "CP01148" -## [207] "cp1148" -## [208] "CP01149" -## [209] "cp1149" -## [210] "cp1200" -## [211] "cp1201" -## [212] "cp1208" -## [213] "cp1250" -## [214] "cp1251" -## [215] "cp1252" -## [216] "cp1253" -## [217] "cp1254" -## [218] "cp1255" -## [219] "cp1256" -## [220] "cp1257" -## [221] "cp1258" -## [222] "cp1363" -## [223] "cp1383" -## [224] "cp1386" -## [225] "cp33722" -## [226] "cpibm37" -## [227] "cpibm284" -## [228] "cpibm285" -## [229] "cpibm297" -## [230] "csAdobeStandardEncoding" -## [231] "csASCII" -## [232] "csBig5" -## [233] "csBOCU-1" -## [234] "csEUCKR" -## [235] "csEUCPkdFmtJapanese" -## [236] "csGB2312" -## [237] "csHPRoman8" -## [238] "csIBM037" -## [239] "csIBM273" -## [240] "csIBM277" -## [241] "csIBM278" -## [242] "csIBM280" -## [243] "csIBM284" -## [244] "csIBM285" -## [245] "csIBM290" -## [246] "csIBM297" -## [247] "csIBM420" -## [248] "csIBM424" -## [249] "csIBM500" -## [250] "csIBM855" -## [251] "csIBM857" -## [252] "csIBM860" -## [253] "csIBM861" -## [254] "csIBM863" -## [255] "csIBM864" -## [256] "csIBM865" -## [257] "csIBM866" -## [258] "csIBM868" -## [259] "csIBM869" -## [260] "csIBM870" -## [261] "csIBM871" -## [262] "csIBM918" -## [263] "csIBM1026" -## [264] "csIBMThai" -## [265] "csISO58GB231280" -## [266] "csISO2022CN" -## [267] "csISO2022JP" -## [268] "csISO2022JP2" -## [269] "csISO2022KR" -## [270] "csisolatin0" -## [271] "csISOLatin1" -## [272] "csISOLatin2" -## [273] "csISOLatin3" -## [274] "csISOLatin4" -## [275] "csISOLatin5" -## [276] "csISOLatin6" -## [277] "csisolatin9" -## [278] "csISOLatinArabic" -## [279] "csISOLatinCyrillic" -## [280] "csISOLatinGreek" -## [281] "csISOLatinHebrew" -## [282] "csJISEncoding" -## [283] "csKOI8R" -## [284] "csKSC56011987" -## [285] "csMacintosh" -## [286] "csPC8CodePage437" -## [287] "csPC775Baltic" -## [288] "csPC850Multilingual" -## [289] "csPC851" -## [290] "csPC862LatinHebrew" -## [291] "csPCp852" -## [292] "csPCp855" -## [293] "csShiftJIS" -## [294] "csUCS4" -## [295] "csUnicode" -## [296] "csWindows31J" -## [297] "cyrillic" -## [298] "DOS-720" -## [299] "DOS-862" -## [300] "ebcdic-ar" -## [301] "ebcdic-cp-ar1" -## [302] "ebcdic-cp-ar2" -## [303] "ebcdic-cp-be" -## [304] "ebcdic-cp-ca" -## [305] "ebcdic-cp-ch" -## [306] "EBCDIC-CP-DK" -## [307] "ebcdic-cp-es" -## [308] "ebcdic-cp-fi" -## [309] "ebcdic-cp-fr" -## [310] "ebcdic-cp-gb" -## [311] "ebcdic-cp-he" -## [312] "ebcdic-cp-is" -## [313] "ebcdic-cp-it" -## [314] "ebcdic-cp-nl" -## [315] "EBCDIC-CP-NO" -## [316] "ebcdic-cp-roece" -## [317] "ebcdic-cp-se" -## [318] "ebcdic-cp-us" -## [319] "ebcdic-cp-wt" -## [320] "ebcdic-cp-yu" -## [321] "ebcdic-de" -## [322] "ebcdic-de-273+euro" -## [323] "ebcdic-dk" -## [324] "ebcdic-dk-277+euro" -## [325] "ebcdic-es-284+euro" -## [326] "ebcdic-fi-278+euro" -## [327] "ebcdic-fr-297+euro" -## [328] "ebcdic-gb" -## [329] "ebcdic-gb-285+euro" -## [330] "ebcdic-he" -## [331] "ebcdic-international-500+euro" -## [332] "ebcdic-is" -## [333] "ebcdic-is-871+euro" -## [334] "ebcdic-it-280+euro" -## [335] "EBCDIC-JP-kana" -## [336] "ebcdic-no-277+euro" -## [337] "ebcdic-se-278+euro" -## [338] "ebcdic-sv" -## [339] "ebcdic-us-37+euro" -## [340] "ebcdic-xml-us" -## [341] "ECMA-114" -## [342] "ECMA-118" -## [343] "ECMA-128" -## [344] "ELOT_928" -## [345] "EUC-CN" -## [346] "EUC-JP" -## [347] "euc-jp-2007" -## [348] "EUC-KR" -## [349] "EUC-TW" -## [350] "euc-tw-2014" -## [351] "eucjis" -## [352] "eucTH" -## [353] "Extended_UNIX_Code_Packed_Format_for_Japanese" -## [354] "GB_2312-80" -## [355] "GB2312" -## [356] "gb2312-1980" -## [357] "GB2312.1980-0" -## [358] "GB18030" -## [359] "gb18030-2022" -## [360] "GBK" -## [361] "greek" -## [362] "greek8" -## [363] "gsm-03.38-2009" -## [364] "GSM0338" -## [365] "hebrew" -## [366] "hebrew8" -## [367] "hkbig5" -## [368] "HKSCS-BIG5" -## [369] "hp-roman8" -## [370] "hp15CN" -## [371] "HZ" -## [372] "HZ-GB-2312" -## [373] "ibm-37" -## [374] "ibm-037" -## [375] "ibm-37_P100-1995" -## [376] "ibm-37_P100-1995,swaplfnl" -## [377] "ibm-37-s390" -## [378] "ibm-273" -## [379] "ibm-273_P100-1995" -## [380] "ibm-277" -## [381] "ibm-277_P100-1995" -## [382] "ibm-278" -## [383] "ibm-278_P100-1995" -## [384] "ibm-280" -## [385] "ibm-280_P100-1995" -## [386] "ibm-284" -## [387] "ibm-284_P100-1995" -## [388] "ibm-285" -## [389] "ibm-285_P100-1995" -## [390] "ibm-290" -## [391] "ibm-290_P100-1995" -## [392] "ibm-297" -## [393] "ibm-297_P100-1995" -## [394] "ibm-367" -## [395] "ibm-420" -## [396] "ibm-420_X120-1999" -## [397] "ibm-424" -## [398] "ibm-424_P100-1995" -## [399] "ibm-437" -## [400] "ibm-437_P100-1995" -## [401] "ibm-500" -## [402] "ibm-500_P100-1995" -## [403] "ibm-720" -## [404] "ibm-720_P100-1997" -## [405] "ibm-737" -## [406] "ibm-737_P100-1997" -## [407] "ibm-775" -## [408] "ibm-775_P100-1996" -## [409] "ibm-803" -## [410] "ibm-803_P100-1999" -## [411] "ibm-813" -## [412] "ibm-813_P100-1995" -## [413] "ibm-819" -## [414] "ibm-838" -## [415] "ibm-838_P100-1995" -## [416] "ibm-850" -## [417] "ibm-850_P100-1995" -## [418] "ibm-851" -## [419] "ibm-851_P100-1995" -## [420] "ibm-852" -## [421] "ibm-852_P100-1995" -## [422] "ibm-855" -## [423] "ibm-855_P100-1995" -## [424] "ibm-856" -## [425] "ibm-856_P100-1995" -## [426] "ibm-857" -## [427] "ibm-857_P100-1995" -## [428] "ibm-858" -## [429] "ibm-858_P100-1997" -## [430] "ibm-860" -## [431] "ibm-860_P100-1995" -## [432] "ibm-861" -## [433] "ibm-861_P100-1995" -## [434] "ibm-862" -## [435] "ibm-862_P100-1995" -## [436] "ibm-863" -## [437] "ibm-863_P100-1995" -## [438] "ibm-864" -## [439] "ibm-864_X110-1999" -## [440] "ibm-865" -## [441] "ibm-865_P100-1995" -## [442] "ibm-866" -## [443] "ibm-866_P100-1995" -## [444] "ibm-867" -## [445] "ibm-867_P100-1998" -## [446] "ibm-868" -## [447] "ibm-868_P100-1995" -## [448] "ibm-869" -## [449] "ibm-869_P100-1995" -## [450] "ibm-870" -## [451] "ibm-870_P100-1995" -## [452] "ibm-871" -## [453] "ibm-871_P100-1995" -## [454] "ibm-874" -## [455] "ibm-874_P100-1995" -## [456] "ibm-875" -## [457] "ibm-875_P100-1995" -## [458] "ibm-878" -## [459] "ibm-878_P100-1996" -## [460] "ibm-901" -## [461] "ibm-901_P100-1999" -## [462] "ibm-902" -## [463] "ibm-902_P100-1999" -## [464] "ibm-912" -## [465] "ibm-912_P100-1995" -## [466] "ibm-913" -## [467] "ibm-913_P100-2000" -## [468] "ibm-914" -## [469] "ibm-914_P100-1995" -## [470] "ibm-915" -## [471] "ibm-915_P100-1995" -## [472] "ibm-916" -## [473] "ibm-916_P100-1995" -## [474] "ibm-918" -## [475] "ibm-918_P100-1995" -## [476] "ibm-920" -## [477] "ibm-920_P100-1995" -## [478] "ibm-921" -## [479] "ibm-921_P100-1995" -## [480] "ibm-922" -## [481] "ibm-922_P100-1999" -## [482] "ibm-923" -## [483] "ibm-923_P100-1998" -## [484] "ibm-930" -## [485] "ibm-930_P120-1999" -## [486] "ibm-931" -## [487] "ibm-932" -## [488] "ibm-932_VSUB_VPUA" -## [489] "ibm-933" -## [490] "ibm-933_P110-1995" -## [491] "ibm-935" -## [492] "ibm-935_P110-1999" -## [493] "ibm-937" -## [494] "ibm-937_P110-1999" -## [495] "ibm-939" -## [496] "ibm-939_P120-1999" -## [497] "ibm-942" -## [498] "ibm-942_P12A-1999" -## [499] "ibm-942_VSUB_VPUA" -## [500] "ibm-943" -## [501] "ibm-943_P15A-2003" -## [502] "ibm-943_P130-1999" -## [503] "ibm-943_VASCII_VSUB_VPUA" -## [504] "ibm-943_VSUB_VPUA" -## [505] "IBM-943C" -## [506] "ibm-949" -## [507] "ibm-949_P11A-1999" -## [508] "ibm-949_P110-1999" -## [509] "ibm-949_VASCII_VSUB_VPUA" -## [510] "ibm-949_VSUB_VPUA" -## [511] "IBM-949C" -## [512] "ibm-950" -## [513] "ibm-950_P110-1999" -## [514] "ibm-954" -## [515] "ibm-954_P101-2007" -## [516] "ibm-964" -## [517] "ibm-964_P110-1999" -## [518] "ibm-964_VPUA" -## [519] "ibm-970" -## [520] "ibm-970_P110_P110-2006_U2" -## [521] "ibm-970_VPUA" -## [522] "ibm-971" -## [523] "ibm-971_P100-1995" -## [524] "ibm-971_VPUA" -## [525] "ibm-1006" -## [526] "ibm-1006_P100-1995" -## [527] "ibm-1025" -## [528] "ibm-1025_P100-1995" -## [529] "ibm-1026" -## [530] "ibm-1026_P100-1995" -## [531] "ibm-1047" -## [532] "ibm-1047_P100-1995" -## [533] "ibm-1047_P100-1995,swaplfnl" -## [534] "ibm-1047-s390" -## [535] "ibm-1051" -## [536] "ibm-1051_P100-1995" -## [537] "ibm-1089" -## [538] "ibm-1089_P100-1995" -## [539] "ibm-1097" -## [540] "ibm-1097_P100-1995" -## [541] "ibm-1098" -## [542] "ibm-1098_P100-1995" -## [543] "ibm-1112" -## [544] "ibm-1112_P100-1995" -## [545] "ibm-1122" -## [546] "ibm-1122_P100-1999" -## [547] "ibm-1123" -## [548] "ibm-1123_P100-1995" -## [549] "ibm-1124" -## [550] "ibm-1124_P100-1996" -## [551] "ibm-1125" -## [552] "ibm-1125_P100-1997" -## [553] "ibm-1129" -## [554] "ibm-1129_P100-1997" -## [555] "ibm-1130" -## [556] "ibm-1130_P100-1997" -## [557] "ibm-1131" -## [558] "ibm-1131_P100-1997" -## [559] "ibm-1132" -## [560] "ibm-1132_P100-1998" -## [561] "ibm-1133" -## [562] "ibm-1133_P100-1997" -## [563] "ibm-1137" -## [564] "ibm-1137_P100-1999" -## [565] "ibm-1140" -## [566] "ibm-1140_P100-1997" -## [567] "ibm-1140_P100-1997,swaplfnl" -## [568] "ibm-1140-s390" -## [569] "ibm-1141" -## [570] "ibm-1141_P100-1997" -## [571] "ibm-1141_P100-1997,swaplfnl" -## [572] "ibm-1141-s390" -## [573] "ibm-1142" -## [574] "ibm-1142_P100-1997" -## [575] "ibm-1142_P100-1997,swaplfnl" -## [576] "ibm-1142-s390" -## [577] "ibm-1143" -## [578] "ibm-1143_P100-1997" -## [579] "ibm-1143_P100-1997,swaplfnl" -## [580] "ibm-1143-s390" -## [581] "ibm-1144" -## [582] "ibm-1144_P100-1997" -## [583] "ibm-1144_P100-1997,swaplfnl" -## [584] "ibm-1144-s390" -## [585] "ibm-1145" -## [586] "ibm-1145_P100-1997" -## [587] "ibm-1145_P100-1997,swaplfnl" -## [588] "ibm-1145-s390" -## [589] "ibm-1146" -## [590] "ibm-1146_P100-1997" -## [591] "ibm-1146_P100-1997,swaplfnl" -## [592] "ibm-1146-s390" -## [593] "ibm-1147" -## [594] "ibm-1147_P100-1997" -## [595] "ibm-1147_P100-1997,swaplfnl" -## [596] "ibm-1147-s390" -## [597] "ibm-1148" -## [598] "ibm-1148_P100-1997" -## [599] "ibm-1148_P100-1997,swaplfnl" -## [600] "ibm-1148-s390" -## [601] "ibm-1149" -## [602] "ibm-1149_P100-1997" -## [603] "ibm-1149_P100-1997,swaplfnl" -## [604] "ibm-1149-s390" -## [605] "ibm-1153" -## [606] "ibm-1153_P100-1999" -## [607] "ibm-1153_P100-1999,swaplfnl" -## [608] "ibm-1153-s390" -## [609] "ibm-1154" -## [610] "ibm-1154_P100-1999" -## [611] "ibm-1155" -## [612] "ibm-1155_P100-1999" -## [613] "ibm-1156" -## [614] "ibm-1156_P100-1999" -## [615] "ibm-1157" -## [616] "ibm-1157_P100-1999" -## [617] "ibm-1158" -## [618] "ibm-1158_P100-1999" -## [619] "ibm-1160" -## [620] "ibm-1160_P100-1999" -## [621] "ibm-1162" -## [622] "ibm-1162_P100-1999" -## [623] "ibm-1164" -## [624] "ibm-1164_P100-1999" -## [625] "ibm-1168" -## [626] "ibm-1168_P100-2002" -## [627] "ibm-1200" -## [628] "ibm-1201" -## [629] "ibm-1202" -## [630] "ibm-1203" -## [631] "ibm-1204" -## [632] "ibm-1205" -## [633] "ibm-1208" -## [634] "ibm-1209" -## [635] "ibm-1212" -## [636] "ibm-1213" -## [637] "ibm-1214" -## [638] "ibm-1215" -## [639] "ibm-1232" -## [640] "ibm-1233" -## [641] "ibm-1234" -## [642] "ibm-1235" -## [643] "ibm-1236" -## [644] "ibm-1237" -## [645] "ibm-1250" -## [646] "ibm-1250_P100-1995" -## [647] "ibm-1251" -## [648] "ibm-1251_P100-1995" -## [649] "ibm-1252" -## [650] "ibm-1252_P100-2000" -## [651] "ibm-1253" -## [652] "ibm-1253_P100-1995" -## [653] "ibm-1254" -## [654] "ibm-1254_P100-1995" -## [655] "ibm-1255" -## [656] "ibm-1255_P100-1995" -## [657] "ibm-1256" -## [658] "ibm-1256_P110-1997" -## [659] "ibm-1257" -## [660] "ibm-1257_P100-1995" -## [661] "ibm-1258" -## [662] "ibm-1258_P100-1997" -## [663] "ibm-1276" -## [664] "ibm-1276_P100-1995" -## [665] "ibm-1363" -## [666] "ibm-1363_P11B-1998" -## [667] "ibm-1363_P110-1997" -## [668] "ibm-1363_VASCII_VSUB_VPUA" -## [669] "ibm-1363_VSUB_VPUA" -## [670] "ibm-1364" -## [671] "ibm-1364_P110-2007" -## [672] "ibm-1371" -## [673] "ibm-1371_P100-1999" -## [674] "ibm-1373" -## [675] "ibm-1373_P100-2002" -## [676] "ibm-1375" -## [677] "ibm-1375_P100-2008" -## [678] "ibm-1383" -## [679] "ibm-1383_P110-1999" -## [680] "ibm-1383_VPUA" -## [681] "ibm-1386" -## [682] "ibm-1386_P100-2001" -## [683] "ibm-1386_VSUB_VPUA" -## [684] "ibm-1388" -## [685] "ibm-1388_P103-2001" -## [686] "ibm-1390" -## [687] "ibm-1390_P110-2003" -## [688] "ibm-1392" -## [689] "ibm-1399" -## [690] "ibm-1399_P110-2003" -## [691] "ibm-4517" -## [692] "ibm-4517_P100-2005" -## [693] "ibm-4899" -## [694] "ibm-4899_P100-1998" -## [695] "ibm-4902" -## [696] "ibm-4909" -## [697] "ibm-4909_P100-1999" -## [698] "ibm-4971" -## [699] "ibm-4971_P100-1999" -## [700] "ibm-5012" -## [701] "ibm-5012_P100-1999" -## [702] "ibm-5026" -## [703] "ibm-5035" -## [704] "ibm-5050" -## [705] "ibm-5054" -## [706] "ibm-5123" -## [707] "ibm-5123_P100-1999" -## [708] "ibm-5304" -## [709] "ibm-5305" -## [710] "ibm-5346" -## [711] "ibm-5346_P100-1998" -## [712] "ibm-5347" -## [713] "ibm-5347_P100-1998" -## [714] "ibm-5348" -## [715] "ibm-5348_P100-1997" -## [716] "ibm-5349" -## [717] "ibm-5349_P100-1998" -## [718] "ibm-5350" -## [719] "ibm-5350_P100-1998" -## [720] "ibm-5351" -## [721] "ibm-5351_P100-1998" -## [722] "ibm-5352" -## [723] "ibm-5352_P100-1998" -## [724] "ibm-5353" -## [725] "ibm-5353_P100-1998" -## [726] "ibm-5354" -## [727] "ibm-5354_P100-1998" -## [728] "ibm-5471" -## [729] "ibm-5471_P100-2006" -## [730] "ibm-5478" -## [731] "ibm-5478_P100-1995" -## [732] "ibm-8482" -## [733] "ibm-8482_P100-1999" -## [734] "ibm-9005" -## [735] "ibm-9005_X110-2007" -## [736] "ibm-9030" -## [737] "ibm-9066" -## [738] "ibm-9067" -## [739] "ibm-9067_X100-2005" -## [740] "ibm-9400" -## [741] "ibm-9424" -## [742] "ibm-9447" -## [743] "ibm-9447_P100-2002" -## [744] "ibm-9448" -## [745] "ibm-9448_X100-2005" -## [746] "ibm-9449" -## [747] "ibm-9449_P100-2002" -## [748] "ibm-9580" -## [749] "ibm-12712" -## [750] "ibm-12712_P100-1998" -## [751] "ibm-12712_P100-1998,swaplfnl" -## [752] "ibm-12712-s390" -## [753] "ibm-13488" -## [754] "ibm-13489" -## [755] "ibm-13490" -## [756] "ibm-13491" -## [757] "ibm-13496" -## [758] "ibm-13497" -## [759] "ibm-16684" -## [760] "ibm-16684_P110-2003" -## [761] "ibm-16804" -## [762] "ibm-16804_X110-1999" -## [763] "ibm-16804_X110-1999,swaplfnl" -## [764] "ibm-16804-s390" -## [765] "ibm-17584" -## [766] "ibm-17585" -## [767] "ibm-17586" -## [768] "ibm-17587" -## [769] "ibm-17592" -## [770] "ibm-17593" -## [771] "ibm-20780" -## [772] "ibm-21680" -## [773] "ibm-21681" -## [774] "ibm-21682" -## [775] "ibm-21683" -## [776] "ibm-25546" -## [777] "ibm-25776" -## [778] "ibm-25777" -## [779] "ibm-25778" -## [780] "ibm-25779" -## [781] "ibm-29872" -## [782] "ibm-29873" -## [783] "ibm-29874" -## [784] "ibm-29875" -## [785] "ibm-33722" -## [786] "ibm-33722_P12A_P12A-2009_U2" -## [787] "ibm-33722_P120-1999" -## [788] "ibm-33722_VASCII_VPUA" -## [789] "ibm-33722_VPUA" -## [790] "ibm-61955" -## [791] "ibm-61956" -## [792] "ibm-65025" -## [793] "ibm-eucCN" -## [794] "IBM-eucJP" -## [795] "ibm-eucKR" -## [796] "ibm-eucTW" -## [797] "IBM-Thai" -## [798] "IBM037" -## [799] "IBM273" -## [800] "IBM277" -## [801] "IBM278" -## [802] "IBM280" -## [803] "IBM284" -## [804] "IBM285" -## [805] "IBM290" -## [806] "IBM297" -## [807] "IBM367" -## [808] "IBM420" -## [809] "IBM424" -## [810] "IBM437" -## [811] "IBM500" -## [812] "IBM737" -## [813] "IBM775" -## [814] "IBM819" -## [815] "IBM838" -## [816] "IBM850" -## [817] "IBM851" -## [818] "IBM852" -## [819] "IBM855" -## [820] "IBM856" -## [821] "IBM857" -## [822] "IBM00858" -## [823] "IBM860" -## [824] "IBM861" -## [825] "IBM862" -## [826] "IBM863" -## [827] "IBM864" -## [828] "IBM865" -## [829] "IBM866" -## [830] "IBM868" -## [831] "IBM869" -## [832] "IBM870" -## [833] "IBM871" -## [834] "IBM875" -## [835] "IBM918" -## [836] "IBM922" -## [837] "IBM930" -## [838] "IBM939" -## [839] "IBM1006" -## [840] "IBM1026" -## [841] "IBM1047" -## [842] "IBM1047_LF" -## [843] "IBM1098" -## [844] "IBM01140" -## [845] "IBM01141" -## [846] "IBM1141_LF" -## [847] "IBM01142" -## [848] "IBM01143" -## [849] "IBM01144" -## [850] "IBM01145" -## [851] "IBM01146" -## [852] "IBM01147" -## [853] "IBM01148" -## [854] "IBM01149" -## [855] "IBM1153" -## [856] "IMAP-mailbox-name" -## [857] "iscii-bng" -## [858] "iscii-dev" -## [859] "iscii-guj" -## [860] "iscii-gur" -## [861] "iscii-knd" -## [862] "iscii-mlm" -## [863] "iscii-ori" -## [864] "iscii-tlg" -## [865] "iscii-tml" -## [866] "ISCII,version=0" -## [867] "ISCII,version=1" -## [868] "ISCII,version=2" -## [869] "ISCII,version=3" -## [870] "ISCII,version=4" -## [871] "ISCII,version=5" -## [872] "ISCII,version=6" -## [873] "ISCII,version=7" -## [874] "ISCII,version=8" -## [875] "iso_646.irv:1983" -## [876] "ISO_646.irv:1991" -## [877] "ISO_2022,locale=ja,version=0" -## [878] "ISO_2022,locale=ja,version=1" -## [879] "ISO_2022,locale=ja,version=2" -## [880] "ISO_2022,locale=ja,version=3" -## [881] "ISO_2022,locale=ja,version=4" -## [882] "ISO_2022,locale=ko,version=0" -## [883] "ISO_2022,locale=ko,version=1" -## [884] "ISO_2022,locale=zh,version=0" -## [885] "ISO_2022,locale=zh,version=1" -## [886] "ISO_2022,locale=zh,version=2" -## [887] "ISO_8859-1:1987" -## [888] "ISO_8859-2:1987" -## [889] "ISO_8859-3:1988" -## [890] "ISO_8859-4:1988" -## [891] "ISO_8859-5:1988" -## [892] "ISO_8859-6:1987" -## [893] "ISO_8859-7:1987" -## [894] "ISO_8859-8:1988" -## [895] "ISO_8859-9:1989" -## [896] "ISO_8859-10:1992" -## [897] "ISO_8859-14:1998" -## [898] "ISO-2022-CN" -## [899] "ISO-2022-CN-CNS" -## [900] "ISO-2022-CN-EXT" -## [901] "ISO-2022-JP" -## [902] "ISO-2022-JP-1" -## [903] "ISO-2022-JP-2" -## [904] "ISO-2022-KR" -## [905] "iso-8859_10-1998" -## [906] "iso-8859_11-2001" -## [907] "iso-8859_14-1998" -## [908] "ISO-8859-1" -## [909] "ISO-8859-2" -## [910] "ISO-8859-3" -## [911] "ISO-8859-4" -## [912] "ISO-8859-5" -## [913] "ISO-8859-6" -## [914] "ISO-8859-6-E" -## [915] "ISO-8859-6-I" -## [916] "ISO-8859-7" -## [917] "ISO-8859-8" -## [918] "ISO-8859-8-E" -## [919] "ISO-8859-8-I" -## [920] "ISO-8859-9" -## [921] "ISO-8859-10" -## [922] "ISO-8859-11" -## [923] "ISO-8859-13" -## [924] "ISO-8859-14" -## [925] "ISO-8859-15" -## [926] "ISO-10646-UCS-2" -## [927] "ISO-10646-UCS-4" -## [928] "iso-celtic" -## [929] "iso-ir-6" -## [930] "iso-ir-58" -## [931] "iso-ir-100" -## [932] "iso-ir-101" -## [933] "iso-ir-109" -## [934] "iso-ir-110" -## [935] "iso-ir-126" -## [936] "iso-ir-127" -## [937] "iso-ir-138" -## [938] "iso-ir-144" -## [939] "iso-ir-148" -## [940] "iso-ir-149" -## [941] "iso-ir-157" -## [942] "iso-ir-199" -## [943] "ISO646-US" -## [944] "iso8859_15_fdis" -## [945] "JIS" -## [946] "JIS_Encoding" -## [947] "JIS7" -## [948] "JIS8" -## [949] "koi8" -## [950] "KOI8-R" -## [951] "KOI8-U" -## [952] "korean" -## [953] "KS_C_5601-1987" -## [954] "KS_C_5601-1989" -## [955] "ksc" -## [956] "KSC_5601" -## [957] "l1" -## [958] "l2" -## [959] "l3" -## [960] "l4" -## [961] "l5" -## [962] "l6" -## [963] "l8" -## [964] "l9" -## [965] "Latin-9" -## [966] "latin0" -## [967] "latin1" -## [968] "latin2" -## [969] "latin3" -## [970] "latin4" -## [971] "latin5" -## [972] "latin6" -## [973] "latin8" -## [974] "lmbcs" -## [975] "LMBCS-1" -## [976] "mac" -## [977] "mac-cyrillic" -## [978] "macce" -## [979] "maccentraleurope" -## [980] "maccy" -## [981] "macgr" -## [982] "macintosh" -## [983] "macos-0_2-10.2" -## [984] "macos-6_2-10.4" -## [985] "macos-7_3-10.2" -## [986] "macos-29-10.2" -## [987] "macos-35-10.2" -## [988] "macroman" -## [989] "mactr" -## [990] "MS_Kanji" -## [991] "MS874" -## [992] "ms932" -## [993] "MS936" -## [994] "ms949" -## [995] "ms950" -## [996] "MS950_HKSCS" -## [997] "PC-Multilingual-850+euro" -## [998] "pck" -## [999] "r8" -## [1000] "roman8" -## [1001] "SCSU" -## [1002] "Shift_JIS" -## [1003] "shift_jis78" -## [1004] "sjis" -## [1005] "sjis78" -## [1006] "sun_eu_greek" -## [1007] "thai8" -## [1008] "TIS-620" -## [1009] "tis620.2533" -## [1010] "turkish" -## [1011] "turkish8" -## [1012] "ucs-2" -## [1013] "ucs-4" -## [1014] "ujis" -## [1015] "unicode" -## [1016] "unicode-1-1-utf-7" -## [1017] "unicode-1-1-utf-8" -## [1018] "unicode-2-0-utf-7" -## [1019] "unicode-2-0-utf-8" -## [1020] "UnicodeBig" -## [1021] "UnicodeBigUnmarked" -## [1022] "UnicodeLittle" -## [1023] "UnicodeLittleUnmarked" -## [1024] "us" -## [1025] "US-ASCII" -## [1026] "UTF-7" -## [1027] "UTF-8" -## [1028] "UTF-16" -## [1029] "UTF-16,version=1" -## [1030] "UTF-16,version=2" -## [1031] "UTF-16BE" -## [1032] "UTF-16BE,version=1" -## [1033] "UTF-16LE" -## [1034] "UTF-16LE,version=1" -## [1035] "UTF-32" -## [1036] "UTF-32BE" -## [1037] "UTF-32LE" -## [1038] "UTF16_BigEndian" -## [1039] "UTF16_LittleEndian" -## [1040] "UTF16_OppositeEndian" -## [1041] "UTF16_PlatformEndian" -## [1042] "UTF32_BigEndian" -## [1043] "UTF32_LittleEndian" -## [1044] "UTF32_OppositeEndian" -## [1045] "UTF32_PlatformEndian" -## [1046] "windows-31j" -## [1047] "windows-437" -## [1048] "windows-720" -## [1049] "windows-737" -## [1050] "windows-775" -## [1051] "windows-850" -## [1052] "windows-852" -## [1053] "windows-855" -## [1054] "windows-857" -## [1055] "windows-858" -## [1056] "windows-861" -## [1057] "windows-862" -## [1058] "windows-866" -## [1059] "windows-869" -## [1060] "windows-874" -## [1061] "windows-874-2000" -## [1062] "windows-932" -## [1063] "windows-936" -## [1064] "windows-936-2000" -## [1065] "windows-949" -## [1066] "windows-949-2000" -## [1067] "windows-950" -## [1068] "windows-950-2000" -## [1069] "windows-1200" -## [1070] "windows-1201" -## [1071] "windows-1250" -## [1072] "windows-1251" -## [1073] "windows-1252" -## [1074] "windows-1253" -## [1075] "windows-1254" -## [1076] "windows-1255" -## [1077] "windows-1256" -## [1078] "windows-1257" -## [1079] "windows-1258" -## [1080] "windows-10000" -## [1081] "windows-10006" -## [1082] "windows-10007" -## [1083] "windows-10029" -## [1084] "windows-10081" -## [1085] "windows-20127" -## [1086] "windows-20866" -## [1087] "windows-21866" -## [1088] "windows-28592" -## [1089] "windows-28593" -## [1090] "windows-28594" -## [1091] "windows-28595" -## [1092] "windows-28596" -## [1093] "windows-28597" -## [1094] "windows-28598" -## [1095] "windows-28599" -## [1096] "windows-28603" -## [1097] "windows-28605" -## [1098] "windows-51949" -## [1099] "windows-54936" -## [1100] "windows-57002" -## [1101] "windows-57003" -## [1102] "windows-57004" -## [1103] "windows-57005" -## [1104] "windows-57006" -## [1105] "windows-57007" -## [1106] "windows-57008" -## [1107] "windows-57009" -## [1108] "windows-57010" -## [1109] "windows-57011" -## [1110] "windows-65000" -## [1111] "windows-65001" -## [1112] "x-big5" -## [1113] "x-compound-text" -## [1114] "X-EUC-JP" -## [1115] "x-IBM720" -## [1116] "x-IBM737" -## [1117] "x-IBM856" -## [1118] "x-IBM867" -## [1119] "x-IBM874" -## [1120] "x-IBM875" -## [1121] "x-IBM921" -## [1122] "x-IBM922" -## [1123] "x-IBM930" -## [1124] "x-IBM930A" -## [1125] "x-IBM933" -## [1126] "x-IBM935" -## [1127] "x-IBM937" -## [1128] "x-IBM939" -## [1129] "x-IBM939A" -## [1130] "x-IBM942" -## [1131] "x-IBM942C" -## [1132] "x-IBM943" -## [1133] "x-IBM949" -## [1134] "x-IBM949C" -## [1135] "x-IBM950" -## [1136] "x-IBM954" -## [1137] "x-IBM954C" -## [1138] "x-IBM964" -## [1139] "x-IBM970" -## [1140] "x-IBM971" -## [1141] "x-IBM1006" -## [1142] "x-IBM1025" -## [1143] "x-IBM1097" -## [1144] "x-IBM1098" -## [1145] "x-IBM1112" -## [1146] "x-IBM1122" -## [1147] "x-IBM1123" -## [1148] "x-IBM1124" -## [1149] "x-IBM1153" -## [1150] "x-IBM1363" -## [1151] "x-IBM1363C" -## [1152] "x-IBM1364" -## [1153] "x-IBM1371" -## [1154] "x-IBM1388" -## [1155] "x-IBM1390" -## [1156] "x-IBM1399" -## [1157] "x-IBM33722" -## [1158] "x-IBM33722A" -## [1159] "x-IBM33722C" -## [1160] "x-iscii-as" -## [1161] "x-iscii-be" -## [1162] "x-iscii-de" -## [1163] "x-iscii-gu" -## [1164] "x-iscii-ka" -## [1165] "x-iscii-ma" -## [1166] "x-iscii-or" -## [1167] "x-iscii-pa" -## [1168] "x-iscii-ta" -## [1169] "x-iscii-te" -## [1170] "x-ISCII91" -## [1171] "x-ISO-2022-CN-CNS" -## [1172] "x-ISO-2022-CN-GB" -## [1173] "x-ISO-8859-6S" -## [1174] "x-iso-8859-11" -## [1175] "x-JISAutoDetect" -## [1176] "x-KSC5601" -## [1177] "x-mac-ce" -## [1178] "x-mac-centraleurroman" -## [1179] "x-mac-cyrillic" -## [1180] "x-mac-greek" -## [1181] "x-mac-turkish" -## [1182] "x-MacCentralEurope" -## [1183] "x-MacCyrillic" -## [1184] "x-MacGreek" -## [1185] "x-macroman" -## [1186] "x-MacTurkish" -## [1187] "x-MacUkraine" -## [1188] "x-ms-cp932" -## [1189] "x-MS932_0213" -## [1190] "x-MS950-HKSCS" -## [1191] "x-roman8" -## [1192] "x-sjis" -## [1193] "x-UTF_8J" -## [1194] "x-utf-16be" -## [1195] "x-utf-16le" -## [1196] "x-UTF-16LE-BOM" -## [1197] "x-windows-874" -## [1198] "x-windows-950" -## [1199] "x-windows-1256S" -## [1200] "x-windows-50220" -## [1201] "x-windows-50221" -## [1202] "x-windows-iso2022jp" -## [1203] "x11-compound-text" -``` - -```r -stri_enc_list(FALSE) -``` - -``` -## $`BOCU-1` -## [1] "BOCU-1" "csBOCU-1" "ibm-1214" "ibm-1215" -## -## $`CESU-8` -## [1] "CESU-8" "ibm-9400" -## -## $`ebcdic-xml-us` -## [1] "ebcdic-xml-us" -## -## $`euc-jp-2007` -## [1] "csEUCPkdFmtJapanese" -## [2] "EUC-JP" -## [3] "euc-jp-2007" -## [4] "eucjis" -## [5] "Extended_UNIX_Code_Packed_Format_for_Japanese" -## [6] "ujis" -## [7] "X-EUC-JP" -## -## $`euc-tw-2014` -## [1] "EUC-TW" "euc-tw-2014" -## -## $`gb18030-2022` -## [1] "GB18030" "gb18030-2022" "ibm-1392" "windows-54936" -## -## $`gsm-03.38-2009` -## [1] "gsm-03.38-2009" "GSM0338" -## -## $HZ -## [1] "HZ" "HZ-GB-2312" -## -## $`ibm-37_P100-1995` -## [1] "037" "cp037" "cp37" "cpibm37" -## [5] "csIBM037" "ebcdic-cp-ca" "ebcdic-cp-nl" "ebcdic-cp-us" -## [9] "ebcdic-cp-wt" "ibm-37" "ibm-037" "ibm-37_P100-1995" -## [13] "IBM037" -## -## $`ibm-37_P100-1995,swaplfnl` -## [1] "ibm-37_P100-1995,swaplfnl" "ibm-37-s390" -## -## $`ibm-273_P100-1995` -## [1] "273" "CP273" "csIBM273" -## [4] "ebcdic-de" "ibm-273" "ibm-273_P100-1995" -## [7] "IBM273" -## -## $`ibm-277_P100-1995` -## [1] "277" "cp277" "csIBM277" -## [4] "EBCDIC-CP-DK" "EBCDIC-CP-NO" "ebcdic-dk" -## [7] "ibm-277" "ibm-277_P100-1995" "IBM277" -## -## $`ibm-278_P100-1995` -## [1] "278" "cp278" "csIBM278" -## [4] "ebcdic-cp-fi" "ebcdic-cp-se" "ebcdic-sv" -## [7] "ibm-278" "ibm-278_P100-1995" "IBM278" -## -## $`ibm-280_P100-1995` -## [1] "280" "CP280" "csIBM280" -## [4] "ebcdic-cp-it" "ibm-280" "ibm-280_P100-1995" -## [7] "IBM280" -## -## $`ibm-284_P100-1995` -## [1] "284" "CP284" "cpibm284" -## [4] "csIBM284" "ebcdic-cp-es" "ibm-284" -## [7] "ibm-284_P100-1995" "IBM284" -## -## $`ibm-285_P100-1995` -## [1] "285" "CP285" "cpibm285" -## [4] "csIBM285" "ebcdic-cp-gb" "ebcdic-gb" -## [7] "ibm-285" "ibm-285_P100-1995" "IBM285" -## -## $`ibm-290_P100-1995` -## [1] "cp290" "csIBM290" "EBCDIC-JP-kana" -## [4] "ibm-290" "ibm-290_P100-1995" "IBM290" -## -## $`ibm-297_P100-1995` -## [1] "297" "cp297" "cpibm297" -## [4] "csIBM297" "ebcdic-cp-fr" "ibm-297" -## [7] "ibm-297_P100-1995" "IBM297" -## -## $`ibm-420_X120-1999` -## [1] "420" "cp420" "csIBM420" -## [4] "ebcdic-cp-ar1" "ibm-420" "ibm-420_X120-1999" -## [7] "IBM420" -## -## $`ibm-424_P100-1995` -## [1] "424" "cp424" "csIBM424" -## [4] "ebcdic-cp-he" "ibm-424" "ibm-424_P100-1995" -## [7] "IBM424" -## -## $`ibm-437_P100-1995` -## [1] "437" "cp437" "csPC8CodePage437" -## [4] "ibm-437" "ibm-437_P100-1995" "IBM437" -## [7] "windows-437" -## -## $`ibm-500_P100-1995` -## [1] "500" "CP500" "csIBM500" -## [4] "ebcdic-cp-be" "ebcdic-cp-ch" "ibm-500" -## [7] "ibm-500_P100-1995" "IBM500" -## -## $`ibm-720_P100-1997` -## [1] "DOS-720" "ibm-720" "ibm-720_P100-1997" -## [4] "windows-720" "x-IBM720" -## -## $`ibm-737_P100-1997` -## [1] "737" "cp737" "ibm-737" -## [4] "ibm-737_P100-1997" "IBM737" "windows-737" -## [7] "x-IBM737" -## -## $`ibm-775_P100-1996` -## [1] "775" "cp775" "csPC775Baltic" -## [4] "ibm-775" "ibm-775_P100-1996" "IBM775" -## [7] "windows-775" -## -## $`ibm-803_P100-1999` -## [1] "cp803" "ibm-803" "ibm-803_P100-1999" -## -## $`ibm-813_P100-1995` -## [1] "813" "cp813" "ibm-813" -## [4] "ibm-813_P100-1995" -## -## $`ibm-838_P100-1995` -## [1] "838" "cp838" "csIBMThai" -## [4] "ibm-838" "ibm-838_P100-1995" "ibm-9030" -## [7] "IBM-Thai" "IBM838" -## -## $`ibm-850_P100-1995` -## [1] "850" "cp850" "csPC850Multilingual" -## [4] "ibm-850" "ibm-850_P100-1995" "IBM850" -## [7] "windows-850" -## -## $`ibm-851_P100-1995` -## [1] "851" "cp851" "csPC851" -## [4] "ibm-851" "ibm-851_P100-1995" "IBM851" -## -## $`ibm-852_P100-1995` -## [1] "852" "cp852" "csPCp852" -## [4] "ibm-852" "ibm-852_P100-1995" "IBM852" -## [7] "windows-852" -## -## $`ibm-855_P100-1995` -## [1] "855" "cp855" "csIBM855" -## [4] "csPCp855" "ibm-855" "ibm-855_P100-1995" -## [7] "IBM855" "windows-855" -## -## $`ibm-856_P100-1995` -## [1] "856" "cp856" "ibm-856" -## [4] "ibm-856_P100-1995" "IBM856" "x-IBM856" -## -## $`ibm-857_P100-1995` -## [1] "857" "cp857" "csIBM857" -## [4] "ibm-857" "ibm-857_P100-1995" "IBM857" -## [7] "windows-857" -## -## $`ibm-858_P100-1997` -## [1] "CCSID00858" "CP00858" -## [3] "cp858" "ibm-858" -## [5] "ibm-858_P100-1997" "IBM00858" -## [7] "PC-Multilingual-850+euro" "windows-858" -## -## $`ibm-860_P100-1995` -## [1] "860" "cp860" "csIBM860" -## [4] "ibm-860" "ibm-860_P100-1995" "IBM860" -## -## $`ibm-861_P100-1995` -## [1] "861" "cp-is" "cp861" -## [4] "csIBM861" "ibm-861" "ibm-861_P100-1995" -## [7] "IBM861" "windows-861" -## -## $`ibm-862_P100-1995` -## [1] "862" "cp862" "csPC862LatinHebrew" -## [4] "DOS-862" "ibm-862" "ibm-862_P100-1995" -## [7] "IBM862" "windows-862" -## -## $`ibm-863_P100-1995` -## [1] "863" "cp863" "csIBM863" -## [4] "ibm-863" "ibm-863_P100-1995" "IBM863" -## -## $`ibm-864_X110-1999` -## [1] "cp864" "csIBM864" "ibm-864" -## [4] "ibm-864_X110-1999" "IBM864" -## -## $`ibm-865_P100-1995` -## [1] "865" "cp865" "csIBM865" -## [4] "ibm-865" "ibm-865_P100-1995" "IBM865" -## -## $`ibm-866_P100-1995` -## [1] "866" "cp866" "csIBM866" -## [4] "ibm-866" "ibm-866_P100-1995" "IBM866" -## [7] "windows-866" -## -## $`ibm-867_P100-1998` -## [1] "ibm-867" "ibm-867_P100-1998" "x-IBM867" -## -## $`ibm-868_P100-1995` -## [1] "868" "cp-ar" "CP868" -## [4] "csIBM868" "ibm-868" "ibm-868_P100-1995" -## [7] "IBM868" -## -## $`ibm-869_P100-1995` -## [1] "869" "cp-gr" "cp869" -## [4] "csIBM869" "ibm-869" "ibm-869_P100-1995" -## [7] "IBM869" "windows-869" -## -## $`ibm-870_P100-1995` -## [1] "CP870" "csIBM870" "ebcdic-cp-roece" -## [4] "ebcdic-cp-yu" "ibm-870" "ibm-870_P100-1995" -## [7] "IBM870" -## -## $`ibm-871_P100-1995` -## [1] "871" "CP871" "csIBM871" -## [4] "ebcdic-cp-is" "ebcdic-is" "ibm-871" -## [7] "ibm-871_P100-1995" "IBM871" -## -## $`ibm-874_P100-1995` -## [1] "cp874" "eucTH" "ibm-874" -## [4] "ibm-874_P100-1995" "ibm-9066" "TIS-620" -## [7] "tis620.2533" "x-IBM874" -## -## $`ibm-875_P100-1995` -## [1] "875" "cp875" "ibm-875" -## [4] "ibm-875_P100-1995" "IBM875" "x-IBM875" -## -## $`ibm-878_P100-1996` -## [1] "cp878" "csKOI8R" "ibm-878" -## [4] "ibm-878_P100-1996" "koi8" "KOI8-R" -## [7] "windows-20866" -## -## $`ibm-901_P100-1999` -## [1] "ibm-901" "ibm-901_P100-1999" -## -## $`ibm-902_P100-1999` -## [1] "ibm-902" "ibm-902_P100-1999" -## -## $`ibm-912_P100-1995` -## [1] "912" "8859_2" "cp912" -## [4] "csISOLatin2" "ibm-912" "ibm-912_P100-1995" -## [7] "ISO_8859-2:1987" "ISO-8859-2" "iso-ir-101" -## [10] "l2" "latin2" "windows-28592" -## -## $`ibm-913_P100-2000` -## [1] "913" "8859_3" "cp913" -## [4] "csISOLatin3" "ibm-913" "ibm-913_P100-2000" -## [7] "ISO_8859-3:1988" "ISO-8859-3" "iso-ir-109" -## [10] "l3" "latin3" "windows-28593" -## -## $`ibm-914_P100-1995` -## [1] "914" "8859_4" "cp914" -## [4] "csISOLatin4" "ibm-914" "ibm-914_P100-1995" -## [7] "ISO_8859-4:1988" "ISO-8859-4" "iso-ir-110" -## [10] "l4" "latin4" "windows-28594" -## -## $`ibm-915_P100-1995` -## [1] "915" "8859_5" "cp915" -## [4] "csISOLatinCyrillic" "cyrillic" "ibm-915" -## [7] "ibm-915_P100-1995" "ISO_8859-5:1988" "ISO-8859-5" -## [10] "iso-ir-144" "windows-28595" -## -## $`ibm-916_P100-1995` -## [1] "916" "cp916" "ibm-916" -## [4] "ibm-916_P100-1995" -## -## $`ibm-918_P100-1995` -## [1] "CP918" "csIBM918" "ebcdic-cp-ar2" -## [4] "ibm-918" "ibm-918_P100-1995" "IBM918" -## -## $`ibm-920_P100-1995` -## [1] "920" "8859_9" "cp920" -## [4] "csISOLatin5" "ECMA-128" "ibm-920" -## [7] "ibm-920_P100-1995" "ISO_8859-9:1989" "ISO-8859-9" -## [10] "iso-ir-148" "l5" "latin5" -## [13] "turkish" "turkish8" "windows-28599" -## -## $`ibm-921_P100-1995` -## [1] "921" "8859_13" "cp921" -## [4] "ibm-921" "ibm-921_P100-1995" "ISO-8859-13" -## [7] "windows-28603" "x-IBM921" -## -## $`ibm-922_P100-1999` -## [1] "922" "cp922" "ibm-922" -## [4] "ibm-922_P100-1999" "IBM922" "x-IBM922" -## -## $`ibm-923_P100-1998` -## [1] "923" "8859_15" "cp923" -## [4] "csisolatin0" "csisolatin9" "ibm-923" -## [7] "ibm-923_P100-1998" "ISO-8859-15" "iso8859_15_fdis" -## [10] "l9" "Latin-9" "latin0" -## [13] "windows-28605" -## -## $`ibm-930_P120-1999` -## [1] "930" "cp930" "ibm-930" -## [4] "ibm-930_P120-1999" "ibm-5026" "IBM930" -## [7] "x-IBM930" "x-IBM930A" -## -## $`ibm-933_P110-1995` -## [1] "933" "cp933" "ibm-933" -## [4] "ibm-933_P110-1995" "x-IBM933" -## -## $`ibm-935_P110-1999` -## [1] "935" "cp935" "ibm-935" -## [4] "ibm-935_P110-1999" "x-IBM935" -## -## $`ibm-937_P110-1999` -## [1] "937" "cp937" "ibm-937" -## [4] "ibm-937_P110-1999" "x-IBM937" -## -## $`ibm-939_P120-1999` -## [1] "939" "cp939" "ibm-931" -## [4] "ibm-939" "ibm-939_P120-1999" "ibm-5035" -## [7] "IBM939" "x-IBM939" "x-IBM939A" -## -## $`ibm-942_P12A-1999` -## [1] "cp932" "ibm-932" "ibm-932_VSUB_VPUA" -## [4] "ibm-942" "ibm-942_P12A-1999" "ibm-942_VSUB_VPUA" -## [7] "shift_jis78" "sjis78" "x-IBM942" -## [10] "x-IBM942C" -## -## $`ibm-943_P15A-2003` -## [1] "cp932" "cp943c" "csShiftJIS" -## [4] "csWindows31J" "ibm-943" "ibm-943_P15A-2003" -## [7] "ibm-943_VSUB_VPUA" "IBM-943C" "MS_Kanji" -## [10] "ms932" "pck" "Shift_JIS" -## [13] "sjis" "windows-31j" "windows-932" -## [16] "x-JISAutoDetect" "x-ms-cp932" "x-MS932_0213" -## [19] "x-sjis" -## -## $`ibm-943_P130-1999` -## [1] "943" "cp943" -## [3] "ibm-943" "ibm-943_P130-1999" -## [5] "ibm-943_VASCII_VSUB_VPUA" "Shift_JIS" -## [7] "x-IBM943" -## -## $`ibm-949_P11A-1999` -## [1] "cp949c" "ibm-949" "ibm-949_P11A-1999" -## [4] "ibm-949_VSUB_VPUA" "IBM-949C" "x-IBM949C" -## -## $`ibm-949_P110-1999` -## [1] "949" "cp949" -## [3] "ibm-949" "ibm-949_P110-1999" -## [5] "ibm-949_VASCII_VSUB_VPUA" "x-IBM949" -## -## $`ibm-950_P110-1999` -## [1] "950" "cp950" "ibm-950" -## [4] "ibm-950_P110-1999" "x-IBM950" -## -## $`ibm-954_P101-2007` -## [1] "ibm-954" "ibm-954_P101-2007" "x-IBM954" -## [4] "x-IBM954C" -## -## $`ibm-964_P110-1999` -## [1] "964" "cns11643" "cp964" -## [4] "ibm-964" "ibm-964_P110-1999" "ibm-964_VPUA" -## [7] "ibm-eucTW" "x-IBM964" -## -## $`ibm-970_P110_P110-2006_U2` -## [1] "970" "5601" -## [3] "cp970" "csEUCKR" -## [5] "EUC-KR" "ibm-970" -## [7] "ibm-970_P110_P110-2006_U2" "ibm-970_VPUA" -## [9] "ibm-eucKR" "KS_C_5601-1987" -## [11] "KSC_5601" "windows-51949" -## [13] "x-IBM970" -## -## $`ibm-971_P100-1995` -## [1] "ibm-971" "ibm-971_P100-1995" "ibm-971_VPUA" -## [4] "x-IBM971" -## -## $`ibm-1006_P100-1995` -## [1] "1006" "cp1006" "ibm-1006" -## [4] "ibm-1006_P100-1995" "IBM1006" "x-IBM1006" -## -## $`ibm-1025_P100-1995` -## [1] "1025" "cp1025" "ibm-1025" -## [4] "ibm-1025_P100-1995" "x-IBM1025" -## -## $`ibm-1026_P100-1995` -## [1] "1026" "CP1026" "csIBM1026" -## [4] "ibm-1026" "ibm-1026_P100-1995" "IBM1026" -## -## $`ibm-1047_P100-1995` -## [1] "1047" "cp1047" "ibm-1047" -## [4] "ibm-1047_P100-1995" "IBM1047" -## -## $`ibm-1047_P100-1995,swaplfnl` -## [1] "ibm-1047_P100-1995,swaplfnl" "ibm-1047-s390" -## [3] "IBM1047_LF" -## -## $`ibm-1051_P100-1995` -## [1] "csHPRoman8" "hp-roman8" "ibm-1051" -## [4] "ibm-1051_P100-1995" "r8" "roman8" -## [7] "x-roman8" -## -## $`ibm-1089_P100-1995` -## [1] "1089" "8859_6" "arabic" -## [4] "ASMO-708" "cp1089" "csISOLatinArabic" -## [7] "ECMA-114" "ibm-1089" "ibm-1089_P100-1995" -## [10] "ISO_8859-6:1987" "ISO-8859-6" "ISO-8859-6-E" -## [13] "ISO-8859-6-I" "iso-ir-127" "windows-28596" -## [16] "x-ISO-8859-6S" -## -## $`ibm-1097_P100-1995` -## [1] "1097" "cp1097" "ibm-1097" -## [4] "ibm-1097_P100-1995" "x-IBM1097" -## -## $`ibm-1098_P100-1995` -## [1] "1098" "cp1098" "ibm-1098" -## [4] "ibm-1098_P100-1995" "IBM1098" "x-IBM1098" -## -## $`ibm-1112_P100-1995` -## [1] "1112" "cp1112" "ibm-1112" -## [4] "ibm-1112_P100-1995" "x-IBM1112" -## -## $`ibm-1122_P100-1999` -## [1] "1122" "cp1122" "ibm-1122" -## [4] "ibm-1122_P100-1999" "x-IBM1122" -## -## $`ibm-1123_P100-1995` -## [1] "1123" "cp1123" "ibm-1123" -## [4] "ibm-1123_P100-1995" "x-IBM1123" -## -## $`ibm-1124_P100-1996` -## [1] "1124" "cp1124" "ibm-1124" -## [4] "ibm-1124_P100-1996" "x-IBM1124" -## -## $`ibm-1125_P100-1997` -## [1] "cp1125" "ibm-1125" "ibm-1125_P100-1997" -## -## $`ibm-1129_P100-1997` -## [1] "ibm-1129" "ibm-1129_P100-1997" -## -## $`ibm-1130_P100-1997` -## [1] "ibm-1130" "ibm-1130_P100-1997" -## -## $`ibm-1131_P100-1997` -## [1] "cp1131" "ibm-1131" "ibm-1131_P100-1997" -## -## $`ibm-1132_P100-1998` -## [1] "ibm-1132" "ibm-1132_P100-1998" -## -## $`ibm-1133_P100-1997` -## [1] "ibm-1133" "ibm-1133_P100-1997" -## -## $`ibm-1137_P100-1999` -## [1] "ibm-1137" "ibm-1137_P100-1999" -## -## $`ibm-1140_P100-1997` -## [1] "CCSID01140" "CP01140" "cp1140" -## [4] "ebcdic-us-37+euro" "ibm-1140" "ibm-1140_P100-1997" -## [7] "IBM01140" -## -## $`ibm-1140_P100-1997,swaplfnl` -## [1] "ibm-1140_P100-1997,swaplfnl" "ibm-1140-s390" -## -## $`ibm-1141_P100-1997` -## [1] "CCSID01141" "CP01141" "cp1141" -## [4] "ebcdic-de-273+euro" "ibm-1141" "ibm-1141_P100-1997" -## [7] "IBM01141" -## -## $`ibm-1141_P100-1997,swaplfnl` -## [1] "ibm-1141_P100-1997,swaplfnl" "ibm-1141-s390" -## [3] "IBM1141_LF" -## -## $`ibm-1142_P100-1997` -## [1] "CCSID01142" "CP01142" "cp1142" -## [4] "ebcdic-dk-277+euro" "ebcdic-no-277+euro" "ibm-1142" -## [7] "ibm-1142_P100-1997" "IBM01142" -## -## $`ibm-1142_P100-1997,swaplfnl` -## [1] "ibm-1142_P100-1997,swaplfnl" "ibm-1142-s390" -## -## $`ibm-1143_P100-1997` -## [1] "CCSID01143" "CP01143" "cp1143" -## [4] "ebcdic-fi-278+euro" "ebcdic-se-278+euro" "ibm-1143" -## [7] "ibm-1143_P100-1997" "IBM01143" -## -## $`ibm-1143_P100-1997,swaplfnl` -## [1] "ibm-1143_P100-1997,swaplfnl" "ibm-1143-s390" -## -## $`ibm-1144_P100-1997` -## [1] "CCSID01144" "CP01144" "cp1144" -## [4] "ebcdic-it-280+euro" "ibm-1144" "ibm-1144_P100-1997" -## [7] "IBM01144" -## -## $`ibm-1144_P100-1997,swaplfnl` -## [1] "ibm-1144_P100-1997,swaplfnl" "ibm-1144-s390" -## -## $`ibm-1145_P100-1997` -## [1] "CCSID01145" "CP01145" "cp1145" -## [4] "ebcdic-es-284+euro" "ibm-1145" "ibm-1145_P100-1997" -## [7] "IBM01145" -## -## $`ibm-1145_P100-1997,swaplfnl` -## [1] "ibm-1145_P100-1997,swaplfnl" "ibm-1145-s390" -## -## $`ibm-1146_P100-1997` -## [1] "CCSID01146" "CP01146" "cp1146" -## [4] "ebcdic-gb-285+euro" "ibm-1146" "ibm-1146_P100-1997" -## [7] "IBM01146" -## -## $`ibm-1146_P100-1997,swaplfnl` -## [1] "ibm-1146_P100-1997,swaplfnl" "ibm-1146-s390" -## -## $`ibm-1147_P100-1997` -## [1] "CCSID01147" "CP01147" "cp1147" -## [4] "ebcdic-fr-297+euro" "ibm-1147" "ibm-1147_P100-1997" -## [7] "IBM01147" -## -## $`ibm-1147_P100-1997,swaplfnl` -## [1] "ibm-1147_P100-1997,swaplfnl" "ibm-1147-s390" -## -## $`ibm-1148_P100-1997` -## [1] "CCSID01148" "CP01148" -## [3] "cp1148" "ebcdic-international-500+euro" -## [5] "ibm-1148" "ibm-1148_P100-1997" -## [7] "IBM01148" -## -## $`ibm-1148_P100-1997,swaplfnl` -## [1] "ibm-1148_P100-1997,swaplfnl" "ibm-1148-s390" -## -## $`ibm-1149_P100-1997` -## [1] "CCSID01149" "CP01149" "cp1149" -## [4] "ebcdic-is-871+euro" "ibm-1149" "ibm-1149_P100-1997" -## [7] "IBM01149" -## -## $`ibm-1149_P100-1997,swaplfnl` -## [1] "ibm-1149_P100-1997,swaplfnl" "ibm-1149-s390" -## -## $`ibm-1153_P100-1999` -## [1] "ibm-1153" "ibm-1153_P100-1999" "IBM1153" -## [4] "x-IBM1153" -## -## $`ibm-1153_P100-1999,swaplfnl` -## [1] "ibm-1153_P100-1999,swaplfnl" "ibm-1153-s390" -## -## $`ibm-1154_P100-1999` -## [1] "ibm-1154" "ibm-1154_P100-1999" -## -## $`ibm-1155_P100-1999` -## [1] "ibm-1155" "ibm-1155_P100-1999" -## -## $`ibm-1156_P100-1999` -## [1] "ibm-1156" "ibm-1156_P100-1999" -## -## $`ibm-1157_P100-1999` -## [1] "ibm-1157" "ibm-1157_P100-1999" -## -## $`ibm-1158_P100-1999` -## [1] "ibm-1158" "ibm-1158_P100-1999" -## -## $`ibm-1160_P100-1999` -## [1] "ibm-1160" "ibm-1160_P100-1999" -## -## $`ibm-1162_P100-1999` -## [1] "ibm-1162" "ibm-1162_P100-1999" -## -## $`ibm-1164_P100-1999` -## [1] "ibm-1164" "ibm-1164_P100-1999" -## -## $`ibm-1168_P100-2002` -## [1] "ibm-1168" "ibm-1168_P100-2002" "KOI8-U" -## [4] "windows-21866" -## -## $`ibm-1250_P100-1995` -## [1] "ibm-1250" "ibm-1250_P100-1995" "windows-1250" -## -## $`ibm-1251_P100-1995` -## [1] "ibm-1251" "ibm-1251_P100-1995" "windows-1251" -## -## $`ibm-1252_P100-2000` -## [1] "ibm-1252" "ibm-1252_P100-2000" "windows-1252" -## -## $`ibm-1253_P100-1995` -## [1] "ibm-1253" "ibm-1253_P100-1995" "windows-1253" -## -## $`ibm-1254_P100-1995` -## [1] "ibm-1254" "ibm-1254_P100-1995" "windows-1254" -## -## $`ibm-1255_P100-1995` -## [1] "ibm-1255" "ibm-1255_P100-1995" -## -## $`ibm-1256_P110-1997` -## [1] "ibm-1256" "ibm-1256_P110-1997" -## -## $`ibm-1257_P100-1995` -## [1] "ibm-1257" "ibm-1257_P100-1995" -## -## $`ibm-1258_P100-1997` -## [1] "ibm-1258" "ibm-1258_P100-1997" "windows-1258" -## -## $`ibm-1276_P100-1995` -## [1] "Adobe-Standard-Encoding" "csAdobeStandardEncoding" -## [3] "ibm-1276" "ibm-1276_P100-1995" -## -## $`ibm-1363_P11B-1998` -## [1] "5601" "cp1363" "csKSC56011987" -## [4] "ibm-1363" "ibm-1363_P11B-1998" "ibm-1363_VSUB_VPUA" -## [7] "iso-ir-149" "korean" "KS_C_5601-1987" -## [10] "KS_C_5601-1989" "ksc" "KSC_5601" -## [13] "windows-949" "x-IBM1363C" -## -## $`ibm-1363_P110-1997` -## [1] "ibm-1363" "ibm-1363_P110-1997" -## [3] "ibm-1363_VASCII_VSUB_VPUA" "x-IBM1363" -## -## $`ibm-1364_P110-2007` -## [1] "ibm-1364" "ibm-1364_P110-2007" "x-IBM1364" -## -## $`ibm-1371_P100-1999` -## [1] "ibm-1371" "ibm-1371_P100-1999" "x-IBM1371" -## -## $`ibm-1373_P100-2002` -## [1] "ibm-1373" "ibm-1373_P100-2002" "windows-950" -## -## $`ibm-1375_P100-2008` -## [1] "Big5-HKSCS" "big5hk" "HKSCS-BIG5" -## [4] "ibm-1375" "ibm-1375_P100-2008" -## -## $`ibm-1383_P110-1999` -## [1] "1383" "cp1383" "csGB2312" -## [4] "EUC-CN" "GB2312" "hp15CN" -## [7] "ibm-1383" "ibm-1383_P110-1999" "ibm-1383_VPUA" -## [10] "ibm-eucCN" -## -## $`ibm-1386_P100-2001` -## [1] "cp1386" "ibm-1386" "ibm-1386_P100-2001" -## [4] "ibm-1386_VSUB_VPUA" "windows-936" -## -## $`ibm-1388_P103-2001` -## [1] "ibm-1388" "ibm-1388_P103-2001" "ibm-9580" -## [4] "x-IBM1388" -## -## $`ibm-1390_P110-2003` -## [1] "ibm-1390" "ibm-1390_P110-2003" "x-IBM1390" -## -## $`ibm-1399_P110-2003` -## [1] "ibm-1399" "ibm-1399_P110-2003" "x-IBM1399" -## -## $`ibm-4517_P100-2005` -## [1] "ibm-4517" "ibm-4517_P100-2005" -## -## $`ibm-4899_P100-1998` -## [1] "ibm-4899" "ibm-4899_P100-1998" -## -## $`ibm-4909_P100-1999` -## [1] "ibm-4909" "ibm-4909_P100-1999" -## -## $`ibm-4971_P100-1999` -## [1] "ibm-4971" "ibm-4971_P100-1999" -## -## $`ibm-5012_P100-1999` -## [1] "8859_8" "csISOLatinHebrew" "hebrew" -## [4] "hebrew8" "ibm-5012" "ibm-5012_P100-1999" -## [7] "ISO_8859-8:1988" "ISO-8859-8" "ISO-8859-8-E" -## [10] "ISO-8859-8-I" "iso-ir-138" "windows-28598" -## -## $`ibm-5123_P100-1999` -## [1] "ibm-5123" "ibm-5123_P100-1999" -## -## $`ibm-5346_P100-1998` -## [1] "cp1250" "ibm-5346" "ibm-5346_P100-1998" -## [4] "windows-1250" -## -## $`ibm-5347_P100-1998` -## [1] "ANSI1251" "cp1251" "ibm-5347" -## [4] "ibm-5347_P100-1998" "windows-1251" -## -## $`ibm-5348_P100-1997` -## [1] "cp1252" "ibm-5348" "ibm-5348_P100-1997" -## [4] "windows-1252" -## -## $`ibm-5349_P100-1998` -## [1] "cp1253" "ibm-5349" "ibm-5349_P100-1998" -## [4] "windows-1253" -## -## $`ibm-5350_P100-1998` -## [1] "cp1254" "ibm-5350" "ibm-5350_P100-1998" -## [4] "windows-1254" -## -## $`ibm-5351_P100-1998` -## [1] "ibm-5351" "ibm-5351_P100-1998" "windows-1255" -## -## $`ibm-5352_P100-1998` -## [1] "ibm-5352" "ibm-5352_P100-1998" "windows-1256" -## -## $`ibm-5353_P100-1998` -## [1] "ibm-5353" "ibm-5353_P100-1998" "windows-1257" -## -## $`ibm-5354_P100-1998` -## [1] "cp1258" "ibm-5354" "ibm-5354_P100-1998" -## [4] "windows-1258" -## -## $`ibm-5471_P100-2006` -## [1] "Big5-HKSCS" "big5-hkscs:unicode3.0" "hkbig5" -## [4] "ibm-5471" "ibm-5471_P100-2006" "MS950_HKSCS" -## [7] "x-MS950-HKSCS" -## -## $`ibm-5478_P100-1995` -## [1] "chinese" "csISO58GB231280" "GB_2312-80" -## [4] "gb2312-1980" "GB2312.1980-0" "ibm-5478" -## [7] "ibm-5478_P100-1995" "iso-ir-58" -## -## $`ibm-8482_P100-1999` -## [1] "ibm-8482" "ibm-8482_P100-1999" -## -## $`ibm-9005_X110-2007` -## [1] "8859_7" "csISOLatinGreek" "ECMA-118" -## [4] "ELOT_928" "greek" "greek8" -## [7] "ibm-9005" "ibm-9005_X110-2007" "ISO_8859-7:1987" -## [10] "ISO-8859-7" "iso-ir-126" "sun_eu_greek" -## [13] "windows-28597" -## -## $`ibm-9067_X100-2005` -## [1] "ibm-9067" "ibm-9067_X100-2005" -## -## $`ibm-9447_P100-2002` -## [1] "cp1255" "ibm-9447" "ibm-9447_P100-2002" -## [4] "windows-1255" -## -## $`ibm-9448_X100-2005` -## [1] "cp1256" "ibm-9448" "ibm-9448_X100-2005" -## [4] "windows-1256" "x-windows-1256S" -## -## $`ibm-9449_P100-2002` -## [1] "cp1257" "ibm-9449" "ibm-9449_P100-2002" -## [4] "windows-1257" -## -## $`ibm-12712_P100-1998` -## [1] "ebcdic-he" "ibm-12712" "ibm-12712_P100-1998" -## -## $`ibm-12712_P100-1998,swaplfnl` -## [1] "ibm-12712_P100-1998,swaplfnl" "ibm-12712-s390" -## -## $`ibm-16684_P110-2003` -## [1] "ibm-16684" "ibm-16684_P110-2003" "ibm-20780" -## -## $`ibm-16804_X110-1999` -## [1] "ebcdic-ar" "ibm-16804" "ibm-16804_X110-1999" -## -## $`ibm-16804_X110-1999,swaplfnl` -## [1] "ibm-16804_X110-1999,swaplfnl" "ibm-16804-s390" -## -## $`ibm-33722_P12A_P12A-2009_U2` -## [1] "ibm-5050" "ibm-33722" -## [3] "ibm-33722_P12A_P12A-2009_U2" "ibm-33722_VPUA" -## [5] "IBM-eucJP" -## -## $`ibm-33722_P120-1999` -## [1] "33722" "cp33722" "ibm-5050" -## [4] "ibm-33722" "ibm-33722_P120-1999" "ibm-33722_VASCII_VPUA" -## [7] "x-IBM33722" "x-IBM33722A" "x-IBM33722C" -## -## $`IMAP-mailbox-name` -## [1] "IMAP-mailbox-name" -## -## $`ISCII,version=0` -## [1] "ibm-4902" "iscii-dev" "ISCII,version=0" "windows-57002" -## [5] "x-iscii-de" "x-ISCII91" -## -## $`ISCII,version=1` -## [1] "iscii-bng" "ISCII,version=1" "windows-57003" "windows-57006" -## [5] "x-iscii-as" "x-iscii-be" -## -## $`ISCII,version=2` -## [1] "iscii-gur" "ISCII,version=2" "windows-57011" "x-iscii-pa" -## -## $`ISCII,version=3` -## [1] "iscii-guj" "ISCII,version=3" "windows-57010" "x-iscii-gu" -## -## $`ISCII,version=4` -## [1] "iscii-ori" "ISCII,version=4" "windows-57007" "x-iscii-or" -## -## $`ISCII,version=5` -## [1] "iscii-tml" "ISCII,version=5" "windows-57004" "x-iscii-ta" -## -## $`ISCII,version=6` -## [1] "iscii-tlg" "ISCII,version=6" "windows-57005" "x-iscii-te" -## -## $`ISCII,version=7` -## [1] "iscii-knd" "ISCII,version=7" "windows-57008" "x-iscii-ka" -## -## $`ISCII,version=8` -## [1] "iscii-mlm" "ISCII,version=8" "windows-57009" "x-iscii-ma" -## -## $`ISO_2022,locale=ja,version=0` -## [1] "csISO2022JP" "ISO_2022,locale=ja,version=0" -## [3] "ISO-2022-JP" "x-windows-50220" -## [5] "x-windows-iso2022jp" -## -## $`ISO_2022,locale=ja,version=1` -## [1] "csJISEncoding" "ibm-5054" -## [3] "ISO_2022,locale=ja,version=1" "ISO-2022-JP-1" -## [5] "JIS" "JIS_Encoding" -## [7] "x-windows-50221" -## -## $`ISO_2022,locale=ja,version=2` -## [1] "csISO2022JP2" "ISO_2022,locale=ja,version=2" -## [3] "ISO-2022-JP-2" -## -## $`ISO_2022,locale=ja,version=3` -## [1] "ISO_2022,locale=ja,version=3" "JIS7" -## -## $`ISO_2022,locale=ja,version=4` -## [1] "ISO_2022,locale=ja,version=4" "JIS8" -## -## $`ISO_2022,locale=ko,version=0` -## [1] "csISO2022KR" "ISO_2022,locale=ko,version=0" -## [3] "ISO-2022-KR" -## -## $`ISO_2022,locale=ko,version=1` -## [1] "ibm-25546" "ISO_2022,locale=ko,version=1" -## -## $`ISO_2022,locale=zh,version=0` -## [1] "csISO2022CN" "ISO_2022,locale=zh,version=0" -## [3] "ISO-2022-CN" "x-ISO-2022-CN-GB" -## -## $`ISO_2022,locale=zh,version=1` -## [1] "ISO_2022,locale=zh,version=1" "ISO-2022-CN-EXT" -## -## $`ISO_2022,locale=zh,version=2` -## [1] "ISO_2022,locale=zh,version=2" "ISO-2022-CN-CNS" -## [3] "x-ISO-2022-CN-CNS" -## -## $`iso-8859_10-1998` -## [1] "csISOLatin6" "ISO_8859-10:1992" "iso-8859_10-1998" "ISO-8859-10" -## [5] "iso-ir-157" "l6" "latin6" -## -## $`iso-8859_11-2001` -## [1] "iso-8859_11-2001" "ISO-8859-11" "thai8" "x-iso-8859-11" -## -## $`iso-8859_14-1998` -## [1] "ISO_8859-14:1998" "iso-8859_14-1998" "ISO-8859-14" "iso-celtic" -## [5] "iso-ir-199" "l8" "latin8" -## -## $`ISO-8859-1` -## [1] "819" "8859_1" "cp819" "csISOLatin1" -## [5] "ibm-819" "IBM819" "ISO_8859-1:1987" "ISO-8859-1" -## [9] "iso-ir-100" "l1" "latin1" -## -## $`LMBCS-1` -## [1] "ibm-65025" "lmbcs" "LMBCS-1" -## -## $`macos-0_2-10.2` -## [1] "csMacintosh" "mac" "macintosh" "macos-0_2-10.2" -## [5] "macroman" "windows-10000" "x-macroman" -## -## $`macos-6_2-10.4` -## [1] "macgr" "macos-6_2-10.4" "windows-10006" "x-mac-greek" -## [5] "x-MacGreek" -## -## $`macos-7_3-10.2` -## [1] "mac-cyrillic" "maccy" "macos-7_3-10.2" "windows-10007" -## [5] "x-mac-cyrillic" "x-MacCyrillic" "x-MacUkraine" -## -## $`macos-29-10.2` -## [1] "macce" "maccentraleurope" "macos-29-10.2" -## [4] "windows-10029" "x-mac-ce" "x-mac-centraleurroman" -## [7] "x-MacCentralEurope" -## -## $`macos-35-10.2` -## [1] "macos-35-10.2" "mactr" "windows-10081" "x-mac-turkish" -## [5] "x-MacTurkish" -## -## $SCSU -## [1] "ibm-1212" "ibm-1213" "SCSU" -## -## $`US-ASCII` -## [1] "646" "ANSI_X3.4-1968" "ANSI_X3.4-1986" "ASCII" -## [5] "ascii7" "cp367" "csASCII" "ibm-367" -## [9] "IBM367" "iso_646.irv:1983" "ISO_646.irv:1991" "iso-ir-6" -## [13] "ISO646-US" "us" "US-ASCII" "windows-20127" -## -## $`UTF-7` -## [1] "unicode-1-1-utf-7" "unicode-2-0-utf-7" "UTF-7" -## [4] "windows-65000" -## -## $`UTF-8` -## [1] "cp1208" "ibm-1208" "ibm-1209" -## [4] "ibm-5304" "ibm-5305" "ibm-13496" -## [7] "ibm-13497" "ibm-17592" "ibm-17593" -## [10] "unicode-1-1-utf-8" "unicode-2-0-utf-8" "UTF-8" -## [13] "windows-65001" "x-UTF_8J" -## -## $`UTF-16` -## [1] "csUnicode" "ibm-1204" "ibm-1205" "ISO-10646-UCS-2" -## [5] "ucs-2" "unicode" "UTF-16" -## -## $`UTF-16,version=1` -## [1] "UTF-16,version=1" -## -## $`UTF-16,version=2` -## [1] "UTF-16,version=2" -## -## $`UTF-16BE` -## [1] "cp1200" "cp1201" "ibm-1200" -## [4] "ibm-1201" "ibm-13488" "ibm-13489" -## [7] "ibm-17584" "ibm-17585" "ibm-21680" -## [10] "ibm-21681" "ibm-25776" "ibm-25777" -## [13] "ibm-29872" "ibm-29873" "ibm-61955" -## [16] "ibm-61956" "UnicodeBigUnmarked" "UTF-16BE" -## [19] "UTF16_BigEndian" "windows-1201" "x-utf-16be" -## -## $`UTF-16BE,version=1` -## [1] "UnicodeBig" "UTF-16BE,version=1" -## -## $`UTF-16LE` -## [1] "ibm-1202" "ibm-1203" "ibm-13490" -## [4] "ibm-13491" "ibm-17586" "ibm-17587" -## [7] "ibm-21682" "ibm-21683" "ibm-25778" -## [10] "ibm-25779" "ibm-29874" "ibm-29875" -## [13] "UnicodeLittleUnmarked" "UTF-16LE" "UTF16_LittleEndian" -## [16] "windows-1200" "x-utf-16le" -## -## $`UTF-16LE,version=1` -## [1] "UnicodeLittle" "UTF-16LE,version=1" "x-UTF-16LE-BOM" -## -## $`UTF-32` -## [1] "csUCS4" "ibm-1236" "ibm-1237" "ISO-10646-UCS-4" -## [5] "ucs-4" "UTF-32" -## -## $`UTF-32BE` -## [1] "ibm-1232" "ibm-1233" "ibm-9424" "UTF-32BE" -## [5] "UTF32_BigEndian" -## -## $`UTF-32LE` -## [1] "ibm-1234" "ibm-1235" "UTF-32LE" -## [4] "UTF32_LittleEndian" -## -## $UTF16_OppositeEndian -## [1] "UTF16_OppositeEndian" -## -## $UTF16_PlatformEndian -## [1] "UTF16_PlatformEndian" -## -## $UTF32_OppositeEndian -## [1] "UTF32_OppositeEndian" -## -## $UTF32_PlatformEndian -## [1] "UTF32_PlatformEndian" -## -## $`windows-874-2000` -## [1] "MS874" "TIS-620" "windows-874" "windows-874-2000" -## [5] "x-windows-874" -## -## $`windows-936-2000` -## [1] "CP936" "GBK" "MS936" "windows-936" -## [5] "windows-936-2000" -## -## $`windows-949-2000` -## [1] "csKSC56011987" "iso-ir-149" "korean" "KS_C_5601-1987" -## [5] "KS_C_5601-1989" "KSC_5601" "ms949" "windows-949" -## [9] "windows-949-2000" "x-KSC5601" -## -## $`windows-950-2000` -## [1] "Big5" "csBig5" "ms950" "windows-950" -## [5] "windows-950-2000" "x-big5" "x-windows-950" -## -## $`x11-compound-text` -## [1] "COMPOUND_TEXT" "x-compound-text" "x11-compound-text" -``` diff --git a/.devel/sphinx/rapi/stri_enc_mark.md b/.devel/sphinx/rapi/stri_enc_mark.md deleted file mode 100644 index 0836f72e..00000000 --- a/.devel/sphinx/rapi/stri_enc_mark.md +++ /dev/null @@ -1,43 +0,0 @@ -# stri_enc_mark: Get Declared Encodings of Each String - -## Description - -Reads declared encodings for each string in a character vector as seen by stringi. - -## Usage - -``` r -stri_enc_mark(str) -``` - -## Arguments - -| | | -|-------|---------------------------------------------------------------| -| `str` | character vector or an object coercible to a character vector | - -## Details - -According to [`Encoding`](https://stat.ethz.ch/R-manual/R-devel/library/base/html/Encoding.html), **R** has a simple encoding marking mechanism: strings can be declared to be in `latin1`, `UTF-8` or `bytes`. - -Moreover, we may check (via the R/C API) whether a string is in ASCII (**R** assumes that this holds if and only if all bytes in a string are not greater than 127, so there is an implicit assumption that your platform uses an encoding that extends ASCII) or in the system\'s default (a.k.a. `unknown` in [`Encoding`](https://stat.ethz.ch/R-manual/R-devel/library/base/html/Encoding.html)) encoding. - -Intuitively, the default encoding should be equivalent to the one you use on `stdin` (e.g., your \'keyboard\'). In stringi we assume that such an encoding is equivalent to the one returned by [`stri_enc_get`](stri_enc_set.md). It is automatically detected by ICU to match -- by default -- the encoding part of the `LC_CTYPE` category as given by [`Sys.getlocale`](https://stat.ethz.ch/R-manual/R-devel/library/base/html/locales.html). - -## Value - -Returns a character vector of the same length as `str`. Unlike in the [`Encoding`](https://stat.ethz.ch/R-manual/R-devel/library/base/html/Encoding.html) function, here the possible encodings are: `ASCII`, `latin1`, `bytes`, `native`, and `UTF-8`. Additionally, missing values are handled properly. - -This gives exactly the same data that is used by all the functions in stringi to re-encode their inputs. - -## Author(s) - -[Marek Gagolewski](https://www.gagolewski.com/) and other contributors - -## See Also - -The official online manual of stringi at - -Gagolewski M., stringi: Fast and portable character string processing in R, *Journal of Statistical Software* 103(2), 2022, 1-59, [doi:10.18637/jss.v103.i02](https://doi.org/10.18637/jss.v103.i02) - -Other encoding_management: [`about_encoding`](about_encoding.md), [`stri_enc_info()`](stri_enc_info.md), [`stri_enc_list()`](stri_enc_list.md), [`stri_enc_set()`](stri_enc_set.md) diff --git a/.devel/sphinx/rapi/stri_enc_set.md b/.devel/sphinx/rapi/stri_enc_set.md deleted file mode 100644 index d94c8e4e..00000000 --- a/.devel/sphinx/rapi/stri_enc_set.md +++ /dev/null @@ -1,47 +0,0 @@ -# stri_enc_set: - -## Description - -`stri_enc_set` sets the encoding used to re-encode strings internally (i.e., by **R**) declared to be in native encoding, see [stringi-encoding](about_encoding.md) and [`stri_enc_mark`](stri_enc_mark.md). `stri_enc_get` returns the currently used default encoding. - -## Usage - -``` r -stri_enc_set(enc) - -stri_enc_get() -``` - -## Arguments - -| | | -|-------|----------------------------------------------------------------------------------------------------------------------| -| `enc` | single string; character encoding name, see [`stri_enc_list`](stri_enc_list.md) for the list of supported encodings. | - -## Details - -`stri_enc_get` is the same as [`stri_enc_info(NULL)$Name.friendly`](stri_enc_info.md). - -Note that changing the default encoding may have undesired consequences. Unless you are an expert user and you know what you are doing, `stri_enc_set` should only be used if ICU fails to detect your system\'s encoding correctly (while testing stringi we only encountered such a situation on a very old Solaris machine). Note that ICU tries to match the encoding part of the `LC_CTYPE` category as given by [`Sys.getlocale`](https://stat.ethz.ch/R-manual/R-devel/library/base/html/locales.html). - -If you set a default encoding that is neither a superset of ASCII, nor an 8-bit encoding, a warning will be generated, see [stringi-encoding](about_encoding.md) for discussion. - -`stri_enc_set` has no effect if the system ICU assumes that the default charset is always UTF-8 (i.e., where the internal `U_CHARSET_IS_UTF8` is defined and set to 1), see [`stri_info`](stri_info.md). - -## Value - -`stri_enc_set` returns a string with previously used character encoding, invisibly. - -`stri_enc_get` returns a string with current default character encoding. - -## Author(s) - -[Marek Gagolewski](https://www.gagolewski.com/) and other contributors - -## See Also - -The official online manual of stringi at - -Gagolewski M., stringi: Fast and portable character string processing in R, *Journal of Statistical Software* 103(2), 2022, 1-59, [doi:10.18637/jss.v103.i02](https://doi.org/10.18637/jss.v103.i02) - -Other encoding_management: [`about_encoding`](about_encoding.md), [`stri_enc_info()`](stri_enc_info.md), [`stri_enc_list()`](stri_enc_list.md), [`stri_enc_mark()`](stri_enc_mark.md) diff --git a/.devel/sphinx/rapi/stri_enc_toascii.md b/.devel/sphinx/rapi/stri_enc_toascii.md deleted file mode 100644 index da5c73d8..00000000 --- a/.devel/sphinx/rapi/stri_enc_toascii.md +++ /dev/null @@ -1,41 +0,0 @@ -# stri_enc_toascii: Convert To ASCII - -## Description - -This function converts input strings to ASCII, i.e., to character strings consisting of bytes not greater than 127. - -## Usage - -``` r -stri_enc_toascii(str) -``` - -## Arguments - -| | | -|-------|------------------------------------| -| `str` | a character vector to be converted | - -## Details - -All code points greater than 127 are replaced with the ASCII SUBSTITUTE CHARACTER (0x1A). **R** encoding declarations are always used to determine which encoding is assumed for each input, see [`stri_enc_mark`](stri_enc_mark.md). If ill-formed byte sequences are found in UTF-8 byte streams, a warning is generated. - -A `bytes`-marked string is assumed to be in an 8-bit encoding extending the ASCII map (a common assumption in **R** itself). - -Note that the SUBSTITUTE CHARACTER (`\x1a == \032`) may be interpreted as the ASCII missing value for single characters. - -## Value - -Returns a character vector. - -## Author(s) - -[Marek Gagolewski](https://www.gagolewski.com/) and other contributors - -## See Also - -The official online manual of stringi at - -Gagolewski M., stringi: Fast and portable character string processing in R, *Journal of Statistical Software* 103(2), 2022, 1-59, [doi:10.18637/jss.v103.i02](https://doi.org/10.18637/jss.v103.i02) - -Other encoding_conversion: [`about_encoding`](about_encoding.md), [`stri_enc_fromutf32()`](stri_enc_fromutf32.md), [`stri_enc_tonative()`](stri_enc_tonative.md), [`stri_enc_toutf32()`](stri_enc_toutf32.md), [`stri_enc_toutf8()`](stri_enc_toutf8.md), [`stri_encode()`](stri_encode.md) diff --git a/.devel/sphinx/rapi/stri_enc_tonative.md b/.devel/sphinx/rapi/stri_enc_tonative.md deleted file mode 100644 index 0c8f5fd6..00000000 --- a/.devel/sphinx/rapi/stri_enc_tonative.md +++ /dev/null @@ -1,39 +0,0 @@ -# stri_enc_tonative: Convert Strings To Native Encoding - -## Description - -Converts character strings with declared encodings to the current native encoding. - -## Usage - -``` r -stri_enc_tonative(str) -``` - -## Arguments - -| | | -|-------|------------------------------------| -| `str` | a character vector to be converted | - -## Details - -This function just calls [`stri_encode(str, NULL, NULL)`](stri_encode.md). The current native encoding can be read with [`stri_enc_get`](stri_enc_set.md). Character strings declared to be in `bytes` encoding will fail here. - -Note that if working in a UTF-8 environment, resulting strings will be marked with `UTF-8` and not `native`, see [`stri_enc_mark`](stri_enc_mark.md). - -## Value - -Returns a character vector. - -## Author(s) - -[Marek Gagolewski](https://www.gagolewski.com/) and other contributors - -## See Also - -The official online manual of stringi at - -Gagolewski M., stringi: Fast and portable character string processing in R, *Journal of Statistical Software* 103(2), 2022, 1-59, [doi:10.18637/jss.v103.i02](https://doi.org/10.18637/jss.v103.i02) - -Other encoding_conversion: [`about_encoding`](about_encoding.md), [`stri_enc_fromutf32()`](stri_enc_fromutf32.md), [`stri_enc_toascii()`](stri_enc_toascii.md), [`stri_enc_toutf32()`](stri_enc_toutf32.md), [`stri_enc_toutf8()`](stri_enc_toutf8.md), [`stri_encode()`](stri_encode.md) diff --git a/.devel/sphinx/rapi/stri_enc_toutf32.md b/.devel/sphinx/rapi/stri_enc_toutf32.md deleted file mode 100644 index 6ad89946..00000000 --- a/.devel/sphinx/rapi/stri_enc_toutf32.md +++ /dev/null @@ -1,41 +0,0 @@ -# stri_enc_toutf32: Convert Strings To UTF-32 - -## Description - -UTF-32 is a 32-bit encoding where each Unicode code point corresponds to exactly one integer value. This function converts a character vector to a list of integer vectors so that, e.g., individual code points may be easily accessed, changed, etc. - -## Usage - -``` r -stri_enc_toutf32(str) -``` - -## Arguments - -| | | -|-------|----------------------------------------------------------------| -| `str` | a character vector (or an object coercible to) to be converted | - -## Details - -See [`stri_enc_fromutf32`](stri_enc_fromutf32.md) for a dual operation. - -This function is roughly equivalent to a vectorized call to [`utf8ToInt(enc2utf8(str))`](https://stat.ethz.ch/R-manual/R-devel/library/base/html/utf8Conversion.html). If you want a list of raw vectors on output, use [`stri_encode`](stri_encode.md). - -Unlike `utf8ToInt`, if ill-formed UTF-8 byte sequences are detected, a corresponding element is set to NULL and a warning is generated. To deal with such issues, use, e.g., [`stri_enc_toutf8`](stri_enc_toutf8.md). - -## Value - -Returns a list of integer vectors. Missing values are converted to `NULL`s. - -## Author(s) - -[Marek Gagolewski](https://www.gagolewski.com/) and other contributors - -## See Also - -The official online manual of stringi at - -Gagolewski M., stringi: Fast and portable character string processing in R, *Journal of Statistical Software* 103(2), 2022, 1-59, [doi:10.18637/jss.v103.i02](https://doi.org/10.18637/jss.v103.i02) - -Other encoding_conversion: [`about_encoding`](about_encoding.md), [`stri_enc_fromutf32()`](stri_enc_fromutf32.md), [`stri_enc_toascii()`](stri_enc_toascii.md), [`stri_enc_tonative()`](stri_enc_tonative.md), [`stri_enc_toutf8()`](stri_enc_toutf8.md), [`stri_encode()`](stri_encode.md) diff --git a/.devel/sphinx/rapi/stri_enc_toutf8.md b/.devel/sphinx/rapi/stri_enc_toutf8.md deleted file mode 100644 index 96dfb7aa..00000000 --- a/.devel/sphinx/rapi/stri_enc_toutf8.md +++ /dev/null @@ -1,47 +0,0 @@ -# stri_enc_toutf8: Convert Strings To UTF-8 - -## Description - -Converts character strings with declared marked encodings to UTF-8 strings. - -## Usage - -``` r -stri_enc_toutf8(str, is_unknown_8bit = FALSE, validate = FALSE) -``` - -## Arguments - -| | | -|-------------------|---------------------------------------------------| -| `str` | a character vector to be converted | -| `is_unknown_8bit` | a single logical value, see Details | -| `validate` | a single logical value (can be `NA`), see Details | - -## Details - -If `is_unknown_8bit` is set to `FALSE` (the default), then R encoding marks are used, see [`stri_enc_mark`](stri_enc_mark.md). Bytes-marked strings will cause the function to fail. - -If a string is in UTF-8 and has a byte order mark (BOM), then the BOM will be silently removed from the output string. - -If the default encoding is UTF-8, see [`stri_enc_get`](stri_enc_set.md), then strings marked with `native` are -- for efficiency reasons -- returned as-is, i.e., with unchanged markings. A similar behavior is observed when calling [`enc2utf8`](https://stat.ethz.ch/R-manual/R-devel/library/base/html/Encoding.html). - -For `is_unknown_8bit=TRUE`, if a string is declared to be neither in ASCII nor in UTF-8, then all byte codes \> 127 are replaced with the Unicode REPLACEMENT CHARACTER (\\Ufffd). Note that the REPLACEMENT CHARACTER may be interpreted as Unicode missing value for single characters. Here a `bytes`-marked string is assumed to use an 8-bit encoding that extends the ASCII map. - -What is more, setting `validate` to `TRUE` or `NA` in both cases validates the resulting UTF-8 byte stream. If `validate=TRUE`, then in case of any incorrect byte sequences, they will be replaced with the REPLACEMENT CHARACTER. This option may be used in a case where you want to fix an invalid UTF-8 byte sequence. For `NA`, a bogus string will be replaced with a missing value. - -## Value - -Returns a character vector. - -## Author(s) - -[Marek Gagolewski](https://www.gagolewski.com/) and other contributors - -## See Also - -The official online manual of stringi at - -Gagolewski M., stringi: Fast and portable character string processing in R, *Journal of Statistical Software* 103(2), 2022, 1-59, [doi:10.18637/jss.v103.i02](https://doi.org/10.18637/jss.v103.i02) - -Other encoding_conversion: [`about_encoding`](about_encoding.md), [`stri_enc_fromutf32()`](stri_enc_fromutf32.md), [`stri_enc_toascii()`](stri_enc_toascii.md), [`stri_enc_tonative()`](stri_enc_tonative.md), [`stri_enc_toutf32()`](stri_enc_toutf32.md), [`stri_encode()`](stri_encode.md) diff --git a/.devel/sphinx/rapi/stri_encode.md b/.devel/sphinx/rapi/stri_encode.md deleted file mode 100644 index 8411f6d5..00000000 --- a/.devel/sphinx/rapi/stri_encode.md +++ /dev/null @@ -1,62 +0,0 @@ -# stri_encode: Convert Strings Between Given Encodings - -## Description - -These functions convert strings between encodings. They aim to serve as a more portable and faster replacement for **R**\'s own [`iconv`](https://stat.ethz.ch/R-manual/R-devel/library/base/html/iconv.html). - -## Usage - -``` r -stri_encode(str, from = NULL, to = NULL, to_raw = FALSE) - -stri_conv(str, from = NULL, to = NULL, to_raw = FALSE) -``` - -## Arguments - -| | | -|----------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| `str` | a character vector, a raw vector, or a list of `raw` vectors to be converted | -| `from` | input encoding: `NULL` or `''` for the default encoding or internal encoding marks\' usage (see Details); otherwise, a single string with encoding name, see [`stri_enc_list`](stri_enc_list.md) | -| `to` | target encoding: `NULL` or `''` for default encoding (see [`stri_enc_get`](stri_enc_set.md)), or a single string with encoding name | -| `to_raw` | a single logical value; indicates whether a list of raw vectors rather than a character vector should be returned | - -## Details - -`stri_conv` is an alias for `stri_encode`. - -Refer to [`stri_enc_list`](stri_enc_list.md) for the list of supported encodings and [stringi-encoding](about_encoding.md) for a general discussion. - -If `from` is either missing, `''`, or `NULL`, and if `str` is a character vector then the marked encodings are used (see [`stri_enc_mark`](stri_enc_mark.md)) -- in such a case `bytes`-declared strings are disallowed. Otherwise, i.e., if `str` is a `raw`-type vector or a list of raw vectors, we assume that the input encoding is the current default encoding as given by [`stri_enc_get`](stri_enc_set.md). - -However, if `from` is given explicitly, the internal encoding declarations are always ignored. - -For `to_raw=FALSE`, the output strings always have the encodings marked according to the target converter used (as specified by `to`) and the current default Encoding (`ASCII`, `latin1`, `UTF-8`, `native`, or `bytes` in all other cases). - -Note that some issues might occur if `to` indicates, e.g, UTF-16 or UTF-32, as the output strings may have embedded NULs. In such cases, please use `to_raw=TRUE` and consider specifying a byte order marker (BOM) for portability reasons (e.g., set `UTF-16` or `UTF-32` which automatically adds the BOMs). - -Note that `stri_encode(as.raw(data), 'encodingname')` is a clever substitute for [`rawToChar`](https://stat.ethz.ch/R-manual/R-devel/library/base/html/rawConversion.html). - -In the current version of stringi, if an incorrect code point is found on input, it is replaced with the default (for that target encoding) \'missing/erroneous\' character (with a warning), e.g., the SUBSTITUTE character (U+001A) or the REPLACEMENT one (U+FFFD). Occurrences thereof can be located in the output string to diagnose the problematic sequences, e.g., by calling: `stri_locate_all_regex(converted_string, '[\ufffd\u001a]'`. - -Because of the way this function is currently implemented, maximal size of a single string to be converted cannot exceed \~0.67 GB. - -## Value - -If `to_raw` is `FALSE`, then a character vector with encoded strings (and appropriate encoding marks) is returned. Otherwise, a list of vectors of type raw is produced. - -## Author(s) - -[Marek Gagolewski](https://www.gagolewski.com/) and other contributors - -## References - -*Conversion* -- ICU User Guide, - -## See Also - -The official online manual of stringi at - -Gagolewski M., stringi: Fast and portable character string processing in R, *Journal of Statistical Software* 103(2), 2022, 1-59, [doi:10.18637/jss.v103.i02](https://doi.org/10.18637/jss.v103.i02) - -Other encoding_conversion: [`about_encoding`](about_encoding.md), [`stri_enc_fromutf32()`](stri_enc_fromutf32.md), [`stri_enc_toascii()`](stri_enc_toascii.md), [`stri_enc_tonative()`](stri_enc_tonative.md), [`stri_enc_toutf32()`](stri_enc_toutf32.md), [`stri_enc_toutf8()`](stri_enc_toutf8.md) diff --git a/.devel/sphinx/rapi/stri_escape_unicode.md b/.devel/sphinx/rapi/stri_escape_unicode.md deleted file mode 100644 index ab4cd76b..00000000 --- a/.devel/sphinx/rapi/stri_escape_unicode.md +++ /dev/null @@ -1,52 +0,0 @@ -# stri_escape_unicode: Escape Unicode Code Points - -## Description - -Escapes all Unicode (not ASCII-printable) code points. - -## Usage - -``` r -stri_escape_unicode(str) -``` - -## Arguments - -| | | -|-------|------------------| -| `str` | character vector | - -## Details - -For non-printable and certain special (well-known, see also R man page [Quotes](https://stat.ethz.ch/R-manual/R-devel/library/base/html/Quotes.html)) ASCII characters the following (also recognized in R) convention is used. We get `\a`, `\b`, `\t`, `\n`, `\v`, `\f`, `\r`, `\"`, `\'`, `\\` or either `\uXXXX` (4 hex digits) or `\UXXXXXXXX` (8 hex digits) otherwise. - -As usual, any input string is converted to Unicode before executing the escape process. - -## Value - -Returns a character vector. - -## Author(s) - -[Marek Gagolewski](https://www.gagolewski.com/) and other contributors - -## See Also - -The official online manual of stringi at - -Gagolewski M., stringi: Fast and portable character string processing in R, *Journal of Statistical Software* 103(2), 2022, 1-59, [doi:10.18637/jss.v103.i02](https://doi.org/10.18637/jss.v103.i02) - -Other escape: [`stri_unescape_unicode()`](stri_unescape_unicode.md) - -## Examples - - - - -```r -stri_escape_unicode('a\u0105!') -``` - -``` -## [1] "a\\u0105!" -``` diff --git a/.devel/sphinx/rapi/stri_extract.md b/.devel/sphinx/rapi/stri_extract.md deleted file mode 100644 index be5241a6..00000000 --- a/.devel/sphinx/rapi/stri_extract.md +++ /dev/null @@ -1,395 +0,0 @@ -# stri_extract: Extract Pattern Occurrences - -## Description - -These functions extract all substrings matching a given pattern. - -`stri_extract_all_*` extracts all the matches. `stri_extract_first_*` and `stri_extract_last_*` yield the first or the last matches, respectively. - -## Usage - -``` r -stri_extract_all(str, ..., regex, fixed, coll, charclass) - -stri_extract_first(str, ..., regex, fixed, coll, charclass) - -stri_extract_last(str, ..., regex, fixed, coll, charclass) - -stri_extract( - str, - ..., - regex, - fixed, - coll, - charclass, - mode = c("first", "all", "last") -) - -stri_extract_all_charclass( - str, - pattern, - merge = TRUE, - simplify = FALSE, - omit_no_match = FALSE -) - -stri_extract_first_charclass(str, pattern) - -stri_extract_last_charclass(str, pattern) - -stri_extract_all_coll( - str, - pattern, - simplify = FALSE, - omit_no_match = FALSE, - ..., - opts_collator = NULL -) - -stri_extract_first_coll(str, pattern, ..., opts_collator = NULL) - -stri_extract_last_coll(str, pattern, ..., opts_collator = NULL) - -stri_extract_all_regex( - str, - pattern, - simplify = FALSE, - omit_no_match = FALSE, - ..., - opts_regex = NULL -) - -stri_extract_first_regex(str, pattern, ..., opts_regex = NULL) - -stri_extract_last_regex(str, pattern, ..., opts_regex = NULL) - -stri_extract_all_fixed( - str, - pattern, - simplify = FALSE, - omit_no_match = FALSE, - ..., - opts_fixed = NULL -) - -stri_extract_first_fixed(str, pattern, ..., opts_fixed = NULL) - -stri_extract_last_fixed(str, pattern, ..., opts_fixed = NULL) -``` - -## Arguments - -| | | -|--------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| `str` | character vector; strings to search in | -| `...` | supplementary arguments passed to the underlying functions, including additional settings for `opts_collator`, `opts_regex`, and so on | -| `mode` | single string; one of: `'first'` (the default), `'all'`, `'last'` | -| `pattern`, `regex`, `fixed`, `coll`, `charclass` | character vector; search patterns; for more details refer to [stringi-search](about_search.md) | -| `merge` | single logical value; indicates whether consecutive pattern matches will be merged into one string; `stri_extract_all_charclass` only | -| `simplify` | single logical value; if `TRUE` or `NA`, then a character matrix is returned; otherwise (the default), a list of character vectors is given, see Value; `stri_extract_all_*` only | -| `omit_no_match` | single logical value; if `FALSE`, then a missing value will indicate that there was no match; `stri_extract_all_*` only | -| `opts_collator`, `opts_fixed`, `opts_regex` | a named list to tune up the search engine\'s settings; see [`stri_opts_collator`](stri_opts_collator.md), [`stri_opts_fixed`](stri_opts_fixed.md), and [`stri_opts_regex`](stri_opts_regex.md), respectively; `NULL` for the defaults | - -## Details - -Vectorized over `str` and `pattern` (with recycling of the elements in the shorter vector if necessary). This allows to, for instance, search for one pattern in each given string, search for each pattern in one given string, and search for the i-th pattern within the i-th string. - -Check out [`stri_match`](stri_match.md) for the extraction of matches to individual regex capture groups. - -`stri_extract`, `stri_extract_all`, `stri_extract_first`, and `stri_extract_last` are convenience functions. They merely call `stri_extract_*_*`, depending on the arguments used. - -## Value - -For `stri_extract_all*`, if `simplify=FALSE` (the default), then a list of character vectors is returned. Each list element represents the results of a different search scenario. If a pattern is not found and `omit_no_match=FALSE`, then a character vector of length 1 with single `NA` value will be generated. - -Otherwise, i.e., if `simplify` is not `FALSE`, then [`stri_list2matrix`](stri_list2matrix.md) with `byrow=TRUE` argument is called on the resulting object. In such a case, the function yields a character matrix with an appropriate number of rows (according to the length of `str`, `pattern`, etc.). Note that [`stri_list2matrix`](stri_list2matrix.md)\'s `fill` argument is set either to an empty string or `NA`, depending on whether `simplify` is `TRUE` or `NA`, respectively. - -`stri_extract_first*` and `stri_extract_last*` return a character vector. A `NA` element indicates a no-match. - -Note that `stri_extract_last_regex` searches from start to end, but skips overlapping matches, see the example below. - -## Author(s) - -[Marek Gagolewski](https://www.gagolewski.com/) and other contributors - -## See Also - -The official online manual of stringi at - -Gagolewski M., stringi: Fast and portable character string processing in R, *Journal of Statistical Software* 103(2), 2022, 1-59, [doi:10.18637/jss.v103.i02](https://doi.org/10.18637/jss.v103.i02) - -Other search_extract: [`about_search`](about_search.md), [`stri_extract_all_boundaries()`](stri_extract_boundaries.md), [`stri_match_all()`](stri_match.md) - -## Examples - - - - -```r -stri_extract_all('XaaaaX', regex=c('\\p{Ll}', '\\p{Ll}+', '\\p{Ll}{2,3}', '\\p{Ll}{2,3}?')) -``` - -``` -## [[1]] -## [1] "a" "a" "a" "a" -## -## [[2]] -## [1] "aaaa" -## -## [[3]] -## [1] "aaa" -## -## [[4]] -## [1] "aa" "aa" -``` - -```r -stri_extract_all('Bartolini', coll='i') -``` - -``` -## [[1]] -## [1] "i" "i" -``` - -```r -stri_extract_all('stringi is so good!', charclass='\\p{Zs}') # all white-spaces -``` - -``` -## [[1]] -## [1] " " " " " " -``` - -```r -stri_extract_all_charclass(c('AbcdeFgHijK', 'abc', 'ABC'), '\\p{Ll}') -``` - -``` -## [[1]] -## [1] "bcde" "g" "ij" -## -## [[2]] -## [1] "abc" -## -## [[3]] -## [1] NA -``` - -```r -stri_extract_all_charclass(c('AbcdeFgHijK', 'abc', 'ABC'), '\\p{Ll}', merge=FALSE) -``` - -``` -## [[1]] -## [1] "b" "c" "d" "e" "g" "i" "j" -## -## [[2]] -## [1] "a" "b" "c" -## -## [[3]] -## [1] NA -``` - -```r -stri_extract_first_charclass('AaBbCc', '\\p{Ll}') -``` - -``` -## [1] "a" -``` - -```r -stri_extract_last_charclass('AaBbCc', '\\p{Ll}') -``` - -``` -## [1] "c" -``` - -```r -## Not run: -# emoji support available since ICU 57 -stri_extract_all_charclass(stri_enc_fromutf32(32:55200), '\\p{EMOJI}') -``` - -``` -## [[1]] -## [1] "#" "*" -## [3] "0123456789" "©" -## [5] "®" "‼" -## [7] "⁉" "™" -## [9] "ℹ" "↔↕↖↗↘↙" -## [11] "↩↪" "⌚⌛" -## [13] "⌨" "⏏" -## [15] "⏩⏪⏫⏬⏭⏮⏯⏰⏱⏲⏳" "⏸⏹⏺" -## [17] "Ⓜ" "▪▫" -## [19] "▶" "◀" -## [21] "◻◼◽◾" "☀☁☂☃☄" -## [23] "☎" "☑" -## [25] "☔☕" "☘" -## [27] "☝" "☠" -## [29] "☢☣" "☦" -## [31] "☪" "☮☯" -## [33] "☸☹☺" "♀" -## [35] "♂" "♈♉♊♋♌♍♎♏♐♑♒♓" -## [37] "♟♠" "♣" -## [39] "♥♦" "♨" -## [41] "♻" "♾♿" -## [43] "⚒⚓⚔⚕⚖⚗" "⚙" -## [45] "⚛⚜" "⚠⚡" -## [47] "⚧" "⚪⚫" -## [49] "⚰⚱" "⚽⚾" -## [51] "⛄⛅" "⛈" -## [53] "⛎⛏" "⛑" -## [55] "⛓⛔" "⛩⛪" -## [57] "⛰⛱⛲⛳⛴⛵" "⛷⛸⛹⛺" -## [59] "⛽" "✂" -## [61] "✅" "✈✉✊✋✌✍" -## [63] "✏" "✒" -## [65] "✔" "✖" -## [67] "✝" "✡" -## [69] "✨" "✳✴" -## [71] "❄" "❇" -## [73] "❌" "❎" -## [75] "❓❔❕" "❗" -## [77] "❣❤" "➕➖➗" -## [79] "➡" "➰" -## [81] "➿" "⤴⤵" -## [83] "⬅⬆⬇" "⬛⬜" -## [85] "⭐" "⭕" -## [87] "〰" "〽" -## [89] "㊗" "㊙" -``` - -```r -## End(Not run) - -stri_extract_all_coll(c('AaaaaaaA', 'AAAA'), 'a') -``` - -``` -## [[1]] -## [1] "a" "a" "a" "a" "a" "a" -## -## [[2]] -## [1] NA -``` - -```r -stri_extract_first_coll(c('Yy\u00FD', 'AAA'), 'y', strength=2, locale='sk_SK') -``` - -``` -## [1] "Y" NA -``` - -```r -stri_extract_last_coll(c('Yy\u00FD', 'AAA'), 'y', strength=1, locale='sk_SK') -``` - -``` -## [1] "ý" NA -``` - -```r -stri_extract_all_regex('XaaaaX', c('\\p{Ll}', '\\p{Ll}+', '\\p{Ll}{2,3}', '\\p{Ll}{2,3}?')) -``` - -``` -## [[1]] -## [1] "a" "a" "a" "a" -## -## [[2]] -## [1] "aaaa" -## -## [[3]] -## [1] "aaa" -## -## [[4]] -## [1] "aa" "aa" -``` - -```r -stri_extract_first_regex('XaaaaX', c('\\p{Ll}', '\\p{Ll}+', '\\p{Ll}{2,3}', '\\p{Ll}{2,3}?')) -``` - -``` -## [1] "a" "aaaa" "aaa" "aa" -``` - -```r -stri_extract_last_regex('XaaaaX', c('\\p{Ll}', '\\p{Ll}+', '\\p{Ll}{2,3}', '\\p{Ll}{2,3}?')) -``` - -``` -## [1] "a" "aaaa" "aaa" "aa" -``` - -```r -stri_list2matrix(stri_extract_all_regex('XaaaaX', c('\\p{Ll}', '\\p{Ll}+'))) -``` - -``` -## [,1] [,2] -## [1,] "a" "aaaa" -## [2,] "a" NA -## [3,] "a" NA -## [4,] "a" NA -``` - -```r -stri_extract_all_regex('XaaaaX', c('\\p{Ll}', '\\p{Ll}+'), simplify=TRUE) -``` - -``` -## [,1] [,2] [,3] [,4] -## [1,] "a" "a" "a" "a" -## [2,] "aaaa" "" "" "" -``` - -```r -stri_extract_all_regex('XaaaaX', c('\\p{Ll}', '\\p{Ll}+'), simplify=NA) -``` - -``` -## [,1] [,2] [,3] [,4] -## [1,] "a" "a" "a" "a" -## [2,] "aaaa" NA NA NA -``` - -```r -stri_extract_all_fixed('abaBAba', 'Aba', case_insensitive=TRUE) -``` - -``` -## [[1]] -## [1] "aba" "Aba" -``` - -```r -stri_extract_all_fixed('abaBAba', 'Aba', case_insensitive=TRUE, overlap=TRUE) -``` - -``` -## [[1]] -## [1] "aba" "aBA" "Aba" -``` - -```r -# Searching for the last occurrence: -# Note the difference - regex searches left to right, with no overlaps. -stri_extract_last_fixed("agAGA", "aga", case_insensitive=TRUE) -``` - -``` -## [1] "AGA" -``` - -```r -stri_extract_last_regex("agAGA", "aga", case_insensitive=TRUE) -``` - -``` -## [1] "agA" -``` diff --git a/.devel/sphinx/rapi/stri_extract_boundaries.md b/.devel/sphinx/rapi/stri_extract_boundaries.md deleted file mode 100644 index 5966da29..00000000 --- a/.devel/sphinx/rapi/stri_extract_boundaries.md +++ /dev/null @@ -1,90 +0,0 @@ -# stri_extract_boundaries: Extract Data Between Text Boundaries - -## Description - -These functions extract data between text boundaries. - -## Usage - -``` r -stri_extract_all_boundaries( - str, - simplify = FALSE, - omit_no_match = FALSE, - ..., - opts_brkiter = NULL -) - -stri_extract_last_boundaries(str, ..., opts_brkiter = NULL) - -stri_extract_first_boundaries(str, ..., opts_brkiter = NULL) - -stri_extract_all_words( - str, - simplify = FALSE, - omit_no_match = FALSE, - locale = NULL -) - -stri_extract_first_words(str, locale = NULL) - -stri_extract_last_words(str, locale = NULL) -``` - -## Arguments - -| | | -|-----------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| `str` | character vector or an object coercible to | -| `simplify` | single logical value; if `TRUE` or `NA`, then a character matrix is returned; otherwise (the default), a list of character vectors is given, see Value | -| `omit_no_match` | single logical value; if `FALSE`, then a missing value will indicate that there are no words | -| `...` | additional settings for `opts_brkiter` | -| `opts_brkiter` | a named list with ICU BreakIterator\'s settings, see [`stri_opts_brkiter`](stri_opts_brkiter.md); `NULL` for the default break iterator, i.e., `line_break` | -| `locale` | `NULL` or `''` for text boundary analysis following the conventions of the default locale, or a single string with locale identifier, see [stringi-locale](about_locale.md) | - -## Details - -Vectorized over `str`. - -For more information on text boundary analysis performed by ICU\'s `BreakIterator`, see [stringi-search-boundaries](about_search_boundaries.md). - -In case of `stri_extract_*_words`, just like in [`stri_count_words`](stri_count_boundaries.md), ICU\'s word `BreakIterator` iterator is used to locate the word boundaries, and all non-word characters (`UBRK_WORD_NONE` rule status) are ignored. - -## Value - -For `stri_extract_all_*`, if `simplify=FALSE` (the default), then a list of character vectors is returned. Each string consists of a separate word. In case of `omit_no_match=FALSE` and if there are no words or if a string is missing, a single `NA` is provided on output. - -Otherwise, [`stri_list2matrix`](stri_list2matrix.md) with `byrow=TRUE` argument is called on the resulting object. In such a case, a character matrix with `length(str)` rows is returned. Note that [`stri_list2matrix`](stri_list2matrix.md)\'s `fill` argument is set to an empty string and `NA`, for `simplify` `TRUE` and `NA`, respectively. - -For `stri_extract_first_*` and `stri_extract_last_*`, a character vector is returned. A `NA` element indicates a no-match. - -## Author(s) - -[Marek Gagolewski](https://www.gagolewski.com/) and other contributors - -## See Also - -The official online manual of stringi at - -Gagolewski M., stringi: Fast and portable character string processing in R, *Journal of Statistical Software* 103(2), 2022, 1-59, [doi:10.18637/jss.v103.i02](https://doi.org/10.18637/jss.v103.i02) - -Other search_extract: [`about_search`](about_search.md), [`stri_extract_all()`](stri_extract.md), [`stri_match_all()`](stri_match.md) - -Other locale_sensitive: [`%s<%()`](+25s+3C+25.md), [`about_locale`](about_locale.md), [`about_search_boundaries`](about_search_boundaries.md), [`about_search_coll`](about_search_coll.md), [`stri_compare()`](stri_compare.md), [`stri_count_boundaries()`](stri_count_boundaries.md), [`stri_duplicated()`](stri_duplicated.md), [`stri_enc_detect2()`](stri_enc_detect2.md), [`stri_locate_all_boundaries()`](stri_locate_boundaries.md), [`stri_opts_collator()`](stri_opts_collator.md), [`stri_order()`](stri_order.md), [`stri_rank()`](stri_rank.md), [`stri_sort_key()`](stri_sort_key.md), [`stri_sort()`](stri_sort.md), [`stri_split_boundaries()`](stri_split_boundaries.md), [`stri_trans_tolower()`](stri_trans_casemap.md), [`stri_unique()`](stri_unique.md), [`stri_wrap()`](stri_wrap.md) - -Other text_boundaries: [`about_search_boundaries`](about_search_boundaries.md), [`about_search`](about_search.md), [`stri_count_boundaries()`](stri_count_boundaries.md), [`stri_locate_all_boundaries()`](stri_locate_boundaries.md), [`stri_opts_brkiter()`](stri_opts_brkiter.md), [`stri_split_boundaries()`](stri_split_boundaries.md), [`stri_split_lines()`](stri_split_lines.md), [`stri_trans_tolower()`](stri_trans_casemap.md), [`stri_wrap()`](stri_wrap.md) - -## Examples - - - - -```r -stri_extract_all_words('stringi: THE string processing package 123.48...') -``` - -``` -## [[1]] -## [1] "stringi" "THE" "string" "processing" "package" -## [6] "123.48" -``` diff --git a/.devel/sphinx/rapi/stri_flatten.md b/.devel/sphinx/rapi/stri_flatten.md deleted file mode 100644 index 76abfd30..00000000 --- a/.devel/sphinx/rapi/stri_flatten.md +++ /dev/null @@ -1,89 +0,0 @@ -# stri_flatten: Flatten a String - -## Description - -Joins the elements of a character vector into one string. - -## Usage - -``` r -stri_flatten(str, collapse = "", na_empty = FALSE, omit_empty = FALSE) -``` - -## Arguments - -| | | -|--------------|----------------------------------------------------------------------------------------------------------------------------| -| `str` | a vector of strings to be coerced to character | -| `collapse` | a single string denoting the separator | -| `na_empty` | single logical value; should missing values in `str` be treated as empty strings (`TRUE`) or be omitted whatsoever (`NA`)? | -| `omit_empty` | single logical value; should empty strings in `str` be omitted? | - -## Details - -The `stri_flatten(str, collapse='XXX')` call is equivalent to [`paste(str, collapse='XXX', sep='')`](https://stat.ethz.ch/R-manual/R-devel/library/base/html/paste.html). - -If you wish to use some more fancy (e.g., differing) separators between flattened strings, call [`stri_join(str, separators, collapse='')`](stri_join.md). - -If `str` is not empty, then a single string is returned. If `collapse` has length \> 1, then only the first string will be used. - -## Value - -Returns a single string, i.e., a character vector of length 1. - -## Author(s) - -[Marek Gagolewski](https://www.gagolewski.com/) and other contributors - -## See Also - -The official online manual of stringi at - -Gagolewski M., stringi: Fast and portable character string processing in R, *Journal of Statistical Software* 103(2), 2022, 1-59, [doi:10.18637/jss.v103.i02](https://doi.org/10.18637/jss.v103.i02) - -Other join: [`%s+%()`](+25s+2B+25.md), [`stri_dup()`](stri_dup.md), [`stri_join_list()`](stri_join_list.md), [`stri_join()`](stri_join.md) - -## Examples - - - - -```r -stri_flatten(LETTERS) -``` - -``` -## [1] "ABCDEFGHIJKLMNOPQRSTUVWXYZ" -``` - -```r -stri_flatten(LETTERS, collapse=',') -``` - -``` -## [1] "A,B,C,D,E,F,G,H,I,J,K,L,M,N,O,P,Q,R,S,T,U,V,W,X,Y,Z" -``` - -```r -stri_flatten(stri_dup(letters[1:6], 1:3)) -``` - -``` -## [1] "abbcccdeefff" -``` - -```r -stri_flatten(c(NA, '', 'A', '', 'B', NA, 'C'), collapse=',', na_empty=TRUE, omit_empty=TRUE) -``` - -``` -## [1] "A,B,C" -``` - -```r -stri_flatten(c(NA, '', 'A', '', 'B', NA, 'C'), collapse=',', na_empty=NA) -``` - -``` -## [1] ",A,,B,C" -``` diff --git a/.devel/sphinx/rapi/stri_info.md b/.devel/sphinx/rapi/stri_info.md deleted file mode 100644 index 4503b68f..00000000 --- a/.devel/sphinx/rapi/stri_info.md +++ /dev/null @@ -1,47 +0,0 @@ -# stri_info: - -## Description - -Gives the current default settings used by the ICU library. - -## Usage - -``` r -stri_info(short = FALSE) -``` - -## Arguments - -| | | -|---------|-------------------------------------------------------------------------------------------| -| `short` | logical; whether or not the results should be given in a concise form; defaults to `TRUE` | - -## Value - -If `short` is `TRUE`, then a single string providing information on the default character encoding, locale, and Unicode as well as ICU version is returned. - -Otherwise, a list with the following components is returned: - -- `Unicode.version` -- version of Unicode supported by the ICU library; - -- `ICU.version` -- ICU library version used; - -- `Locale` -- contains information on default locale, as returned by [`stri_locale_info`](stri_locale_info.md); - -- `Charset.internal` -- fixed at `c('UTF-8', 'UTF-16')`; - -- `Charset.native` -- information on the default encoding, as returned by [`stri_enc_info`](stri_enc_info.md); - -- `ICU.system` -- logical; `TRUE` indicates that the system ICU libs are used, otherwise ICU was built together with stringi; - -- `ICU.UTF8` -- logical; `TRUE` if the internal `U_CHARSET_IS_UTF8` flag is defined and set. - -## Author(s) - -[Marek Gagolewski](https://www.gagolewski.com/) and other contributors - -## See Also - -The official online manual of stringi at - -Gagolewski M., stringi: Fast and portable character string processing in R, *Journal of Statistical Software* 103(2), 2022, 1-59, [doi:10.18637/jss.v103.i02](https://doi.org/10.18637/jss.v103.i02) diff --git a/.devel/sphinx/rapi/stri_isempty.md b/.devel/sphinx/rapi/stri_isempty.md deleted file mode 100644 index 98a446e1..00000000 --- a/.devel/sphinx/rapi/stri_isempty.md +++ /dev/null @@ -1,66 +0,0 @@ -# stri_isempty: Determine if a String is of Length Zero - -## Description - -This is the fastest way to find out whether the elements of a character vector are empty strings. - -## Usage - -``` r -stri_isempty(str) -``` - -## Arguments - -| | | -|-------|--------------------------------------------| -| `str` | character vector or an object coercible to | - -## Details - -Missing values are handled properly. - -## Value - -Returns a logical vector of the same length as `str`. - -## Author(s) - -[Marek Gagolewski](https://www.gagolewski.com/) and other contributors - -## See Also - -The official online manual of stringi at - -Gagolewski M., stringi: Fast and portable character string processing in R, *Journal of Statistical Software* 103(2), 2022, 1-59, [doi:10.18637/jss.v103.i02](https://doi.org/10.18637/jss.v103.i02) - -Other length: [`%s$%()`](+25s+24+25.md), [`stri_length()`](stri_length.md), [`stri_numbytes()`](stri_numbytes.md), [`stri_pad_both()`](stri_pad.md), [`stri_sprintf()`](stri_sprintf.md), [`stri_width()`](stri_width.md) - -## Examples - - - - -```r -stri_isempty(letters[1:3]) -``` - -``` -## [1] FALSE FALSE FALSE -``` - -```r -stri_isempty(c(',', '', 'abc', '123', '\u0105\u0104')) -``` - -``` -## [1] FALSE TRUE FALSE FALSE FALSE -``` - -```r -stri_isempty(character(1)) -``` - -``` -## [1] TRUE -``` diff --git a/.devel/sphinx/rapi/stri_join.md b/.devel/sphinx/rapi/stri_join.md deleted file mode 100644 index ba020c6b..00000000 --- a/.devel/sphinx/rapi/stri_join.md +++ /dev/null @@ -1,109 +0,0 @@ -# stri_join: Concatenate Character Vectors - -## Description - -These are the stringi\'s equivalents of the built-in [`paste`](https://stat.ethz.ch/R-manual/R-devel/library/base/html/paste.html) function. `stri_c` and `stri_paste` are aliases for `stri_join`. - -## Usage - -``` r -stri_join(..., sep = "", collapse = NULL, ignore_null = FALSE) - -stri_c(..., sep = "", collapse = NULL, ignore_null = FALSE) - -stri_paste(..., sep = "", collapse = NULL, ignore_null = FALSE) -``` - -## Arguments - -| | | -|---------------|-------------------------------------------------------------------------------------------------------------------| -| `...` | character vectors (or objects coercible to character vectors) whose corresponding elements are to be concatenated | -| `sep` | a single string; separates terms | -| `collapse` | a single string or `NULL`; an optional results separator | -| `ignore_null` | a single logical value; if `TRUE`, then empty vectors provided via `...` are silently ignored | - -## Details - -Vectorized over each atomic vector in \'`...`\'. - -Unless `collapse` is `NULL`, the result will be a single string. Otherwise, you get a character vector of length equal to the length of the longest argument. - -If any of the arguments in \'`...`\' is a vector of length 0 (not to be confused with vectors of empty strings) and `ignore_null` is `FALSE`, then you will get a 0-length character vector in result. - -If `collapse` or `sep` has length greater than 1, then only the first string will be used. - -In case where there are missing values in any of the input vectors, `NA` is set to the corresponding element. Note that this behavior is different from [`paste`](https://stat.ethz.ch/R-manual/R-devel/library/base/html/paste.html), which treats missing values as ordinary strings like `'NA'`. Moreover, as usual in stringi, the resulting strings are always in UTF-8. - -## Value - -Returns a character vector. - -## Author(s) - -[Marek Gagolewski](https://www.gagolewski.com/) and other contributors - -## See Also - -The official online manual of stringi at - -Gagolewski M., stringi: Fast and portable character string processing in R, *Journal of Statistical Software* 103(2), 2022, 1-59, [doi:10.18637/jss.v103.i02](https://doi.org/10.18637/jss.v103.i02) - -Other join: [`%s+%()`](+25s+2B+25.md), [`stri_dup()`](stri_dup.md), [`stri_flatten()`](stri_flatten.md), [`stri_join_list()`](stri_join_list.md) - -## Examples - - - - -```r -stri_join(1:13, letters) -``` - -``` -## [1] "1a" "2b" "3c" "4d" "5e" "6f" "7g" "8h" "9i" "10j" "11k" "12l" -## [13] "13m" "1n" "2o" "3p" "4q" "5r" "6s" "7t" "8u" "9v" "10w" "11x" -## [25] "12y" "13z" -``` - -```r -stri_join(1:13, letters, sep=',') -``` - -``` -## [1] "1,a" "2,b" "3,c" "4,d" "5,e" "6,f" "7,g" "8,h" "9,i" "10,j" -## [11] "11,k" "12,l" "13,m" "1,n" "2,o" "3,p" "4,q" "5,r" "6,s" "7,t" -## [21] "8,u" "9,v" "10,w" "11,x" "12,y" "13,z" -``` - -```r -stri_join(1:13, letters, collapse='; ') -``` - -``` -## [1] "1a; 2b; 3c; 4d; 5e; 6f; 7g; 8h; 9i; 10j; 11k; 12l; 13m; 1n; 2o; 3p; 4q; 5r; 6s; 7t; 8u; 9v; 10w; 11x; 12y; 13z" -``` - -```r -stri_join(1:13, letters, sep=',', collapse='; ') -``` - -``` -## [1] "1,a; 2,b; 3,c; 4,d; 5,e; 6,f; 7,g; 8,h; 9,i; 10,j; 11,k; 12,l; 13,m; 1,n; 2,o; 3,p; 4,q; 5,r; 6,s; 7,t; 8,u; 9,v; 10,w; 11,x; 12,y; 13,z" -``` - -```r -stri_join(c('abc', '123', 'xyz'),'###', 1:6, sep=',') -``` - -``` -## [1] "abc,###,1" "123,###,2" "xyz,###,3" "abc,###,4" "123,###,5" "xyz,###,6" -``` - -```r -stri_join(c('abc', '123', 'xyz'),'###', 1:6, sep=',', collapse='; ') -``` - -``` -## [1] "abc,###,1; 123,###,2; xyz,###,3; abc,###,4; 123,###,5; xyz,###,6" -``` diff --git a/.devel/sphinx/rapi/stri_join_list.md b/.devel/sphinx/rapi/stri_join_list.md deleted file mode 100644 index 9251aad6..00000000 --- a/.devel/sphinx/rapi/stri_join_list.md +++ /dev/null @@ -1,100 +0,0 @@ -# stri_join_list: Concatenate Strings in a List - -## Description - -These functions concatenate all the strings in each character vector in a given list. `stri_c_list` and `stri_paste_list` are aliases for `stri_join_list`. - -## Usage - -``` r -stri_join_list(x, sep = "", collapse = NULL) - -stri_c_list(x, sep = "", collapse = NULL) - -stri_paste_list(x, sep = "", collapse = NULL) -``` - -## Arguments - -| | | -|------------|----------------------------------------------------------------------------| -| `x` | a list consisting of character vectors | -| `sep` | a single string; separates strings in each of the character vectors in `x` | -| `collapse` | a single string or `NULL`; an optional results separator | - -## Details - -Unless `collapse` is `NULL`, the result will be a single string. Otherwise, you get a character vector of length equal to the length of `x`. - -Vectors in `x` of length 0 are silently ignored. - -If `collapse` or `sep` has length greater than 1, then only the first string will be used. - -## Value - -Returns a character vector. - -## Author(s) - -[Marek Gagolewski](https://www.gagolewski.com/) and other contributors - -## See Also - -The official online manual of stringi at - -Gagolewski M., stringi: Fast and portable character string processing in R, *Journal of Statistical Software* 103(2), 2022, 1-59, [doi:10.18637/jss.v103.i02](https://doi.org/10.18637/jss.v103.i02) - -Other join: [`%s+%()`](+25s+2B+25.md), [`stri_dup()`](stri_dup.md), [`stri_flatten()`](stri_flatten.md), [`stri_join()`](stri_join.md) - -## Examples - - - - -```r -stri_join_list( - stri_extract_all_words(c('Lorem ipsum dolor sit amet.', - 'Spam spam bacon sausage and spam.')), -sep=', ') -``` - -``` -## [1] "Lorem, ipsum, dolor, sit, amet" -## [2] "Spam, spam, bacon, sausage, and, spam" -``` - -```r -stri_join_list( - stri_extract_all_words(c('Lorem ipsum dolor sit amet.', - 'Spam spam bacon sausage and spam.')), -sep=', ', collapse='. ') -``` - -``` -## [1] "Lorem, ipsum, dolor, sit, amet. Spam, spam, bacon, sausage, and, spam" -``` - -```r -stri_join_list( - stri_extract_all_regex( - c('spam spam bacon', '123 456', 'spam 789 sausage'), '\\p{L}+' - ), -sep=',') -``` - -``` -## [1] "spam,spam,bacon" NA "spam,sausage" -``` - -```r -stri_join_list( - stri_extract_all_regex( - c('spam spam bacon', '123 456', 'spam 789 sausage'), '\\p{L}+', - omit_no_match=TRUE - ), -sep=',', collapse='; ') -``` - -``` -## [1] "spam,spam,bacon; spam,sausage" -``` diff --git a/.devel/sphinx/rapi/stri_length.md b/.devel/sphinx/rapi/stri_length.md deleted file mode 100644 index c0e745e7..00000000 --- a/.devel/sphinx/rapi/stri_length.md +++ /dev/null @@ -1,102 +0,0 @@ -# stri_length: Count the Number of Code Points - -## Description - -This function returns the number of code points in each string. - -## Usage - -``` r -stri_length(str) -``` - -## Arguments - -| | | -|-------|--------------------------------------------| -| `str` | character vector or an object coercible to | - -## Details - -Note that the number of code points is not the same as the \'width\' of the string when printed on the console. - -If a given string is in UTF-8 and has not been properly normalized (e.g., by [`stri_trans_nfc`](stri_trans_nf.md)), the returned counts may sometimes be misleading. See [`stri_count_boundaries`](stri_count_boundaries.md) for a method to count *Unicode characters*. Moreover, if an incorrect UTF-8 byte sequence is detected, then a warning is generated and the corresponding output element is set to `NA`, see also [`stri_enc_toutf8`](stri_enc_toutf8.md) for a method to deal with such cases. - -Missing values are handled properly. For \'byte\' encodings we get, as usual, an error. - -## Value - -Returns an integer vector of the same length as `str`. - -## Author(s) - -[Marek Gagolewski](https://www.gagolewski.com/) and other contributors - -## See Also - -The official online manual of stringi at - -Gagolewski M., stringi: Fast and portable character string processing in R, *Journal of Statistical Software* 103(2), 2022, 1-59, [doi:10.18637/jss.v103.i02](https://doi.org/10.18637/jss.v103.i02) - -Other length: [`%s$%()`](+25s+24+25.md), [`stri_isempty()`](stri_isempty.md), [`stri_numbytes()`](stri_numbytes.md), [`stri_pad_both()`](stri_pad.md), [`stri_sprintf()`](stri_sprintf.md), [`stri_width()`](stri_width.md) - -## Examples - - - - -```r -stri_length(LETTERS) -``` - -``` -## [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 -``` - -```r -stri_length(c('abc', '123', '\u0105\u0104')) -``` - -``` -## [1] 3 3 2 -``` - -```r -stri_length('\u0105') # length is one, but... -``` - -``` -## [1] 1 -``` - -```r -stri_numbytes('\u0105') # 2 bytes are used -``` - -``` -## [1] 2 -``` - -```r -stri_numbytes(stri_trans_nfkd('\u0105')) # 3 bytes here but... -``` - -``` -## [1] 3 -``` - -```r -stri_length(stri_trans_nfkd('\u0105')) # ...two code points (!) -``` - -``` -## [1] 2 -``` - -```r -stri_count_boundaries(stri_trans_nfkd('\u0105'), type='character') # ...and one Unicode character -``` - -``` -## [1] 1 -``` diff --git a/.devel/sphinx/rapi/stri_list2matrix.md b/.devel/sphinx/rapi/stri_list2matrix.md deleted file mode 100644 index 74cc31db..00000000 --- a/.devel/sphinx/rapi/stri_list2matrix.md +++ /dev/null @@ -1,134 +0,0 @@ -# stri_list2matrix: Convert a List to a Character Matrix - -## Description - -This function converts a given list of atomic vectors to a character matrix. - -## Usage - -``` r -stri_list2matrix( - x, - byrow = FALSE, - fill = NA_character_, - n_min = 0, - by_row = byrow -) -``` - -## Arguments - -| | | -|----------|----------------------------------------------------------------------------------------------------------------| -| `x` | a list of atomic vectors | -| `byrow` | a single logical value; should the resulting matrix be transposed? | -| `fill` | a single string, see Details | -| `n_min` | a single integer value; minimal number of rows (`byrow==FALSE`) or columns (otherwise) in the resulting matrix | -| `by_row` | alias of `byrow` | - -## Details - -This function is similar to the built-in [`simplify2array`](https://stat.ethz.ch/R-manual/R-devel/library/base/html/lapply.html) function. However, it always returns a character matrix, even if each element in `x` is of length 1 or if elements in `x` are not of the same lengths. Moreover, the elements in `x` are always coerced to character vectors. - -If `byrow` is `FALSE`, then a matrix with `length(x)` columns is returned. The number of rows is the length of the longest vector in `x`, but no less than `n_min`. Basically, we have `result[i,j] == x[[j]][i]` if `i <= length(x[[j]])` and `result[i,j] == fill` otherwise, see Examples. - -If `byrow` is `TRUE`, then the resulting matrix is a transposition of the above-described one. - -This function may be useful, e.g., in connection with [`stri_split`](stri_split.md) and [`stri_extract_all`](stri_extract.md). - -## Value - -Returns a character matrix. - -## Author(s) - -[Marek Gagolewski](https://www.gagolewski.com/) and other contributors - -## See Also - -The official online manual of stringi at - -Gagolewski M., stringi: Fast and portable character string processing in R, *Journal of Statistical Software* 103(2), 2022, 1-59, [doi:10.18637/jss.v103.i02](https://doi.org/10.18637/jss.v103.i02) - -Other utils: [`stri_na2empty()`](stri_na2empty.md), [`stri_remove_empty()`](stri_remove_empty.md), [`stri_replace_na()`](stri_replace_na.md) - -## Examples - - - - -```r -simplify2array(list(c('a', 'b'), c('c', 'd'), c('e', 'f'))) -``` - -``` -## [,1] [,2] [,3] -## [1,] "a" "c" "e" -## [2,] "b" "d" "f" -``` - -```r -stri_list2matrix(list(c('a', 'b'), c('c', 'd'), c('e', 'f'))) -``` - -``` -## [,1] [,2] [,3] -## [1,] "a" "c" "e" -## [2,] "b" "d" "f" -``` - -```r -stri_list2matrix(list(c('a', 'b'), c('c', 'd'), c('e', 'f')), byrow=TRUE) -``` - -``` -## [,1] [,2] -## [1,] "a" "b" -## [2,] "c" "d" -## [3,] "e" "f" -``` - -```r -simplify2array(list('a', c('b', 'c'))) -``` - -``` -## [[1]] -## [1] "a" -## -## [[2]] -## [1] "b" "c" -``` - -```r -stri_list2matrix(list('a', c('b', 'c'))) -``` - -``` -## [,1] [,2] -## [1,] "a" "b" -## [2,] NA "c" -``` - -```r -stri_list2matrix(list('a', c('b', 'c')), fill='') -``` - -``` -## [,1] [,2] -## [1,] "a" "b" -## [2,] "" "c" -``` - -```r -stri_list2matrix(list('a', c('b', 'c')), fill='', n_min=5) -``` - -``` -## [,1] [,2] -## [1,] "a" "b" -## [2,] "" "c" -## [3,] "" "" -## [4,] "" "" -## [5,] "" "" -``` diff --git a/.devel/sphinx/rapi/stri_locale_info.md b/.devel/sphinx/rapi/stri_locale_info.md deleted file mode 100644 index 57e82ce3..00000000 --- a/.devel/sphinx/rapi/stri_locale_info.md +++ /dev/null @@ -1,80 +0,0 @@ -# stri_locale_info: Query Given Locale - -## Description - -Provides some basic information on a given locale identifier. - -## Usage - -``` r -stri_locale_info(locale = NULL) -``` - -## Arguments - -| | | -|----------|-------------------------------------------------------------------------------| -| `locale` | `NULL` or `''` for default locale, or a single string with locale identifier. | - -## Details - -With this function you may obtain some basic information on any provided locale identifier, even if it is unsupported by ICU or if you pass a malformed locale identifier (the one that is not, e.g., of the form Language_Country). See [stringi-locale](about_locale.md) for discussion. - -This function does not do anything really complicated. In many cases it is similar to a call to [`as.list(stri_split_fixed(locale, '_', 3L)[[1]])`](https://stat.ethz.ch/R-manual/R-devel/library/base/html/list.html), with `locale` case mapped. It may be used, however, to get insight on how ICU understands a given locale identifier. - -## Value - -Returns a list with the following named character strings: `Language`, `Country`, `Variant`, and `Name`, being their underscore separated combination. - -## Author(s) - -[Marek Gagolewski](https://www.gagolewski.com/) and other contributors - -## See Also - -The official online manual of stringi at - -Gagolewski M., stringi: Fast and portable character string processing in R, *Journal of Statistical Software* 103(2), 2022, 1-59, [doi:10.18637/jss.v103.i02](https://doi.org/10.18637/jss.v103.i02) - -Other locale_management: [`about_locale`](about_locale.md), [`stri_locale_list()`](stri_locale_list.md), [`stri_locale_set()`](stri_locale_set.md) - -## Examples - - - - -```r -stri_locale_info('pl_PL') -``` - -``` -## $Language -## [1] "pl" -## -## $Country -## [1] "PL" -## -## $Variant -## [1] "" -## -## $Name -## [1] "pl_PL" -``` - -```r -stri_locale_info('Pl_pL') # the same result -``` - -``` -## $Language -## [1] "pl" -## -## $Country -## [1] "PL" -## -## $Variant -## [1] "" -## -## $Name -## [1] "pl_PL" -``` diff --git a/.devel/sphinx/rapi/stri_locale_list.md b/.devel/sphinx/rapi/stri_locale_list.md deleted file mode 100644 index d71bcbcb..00000000 --- a/.devel/sphinx/rapi/stri_locale_list.md +++ /dev/null @@ -1,216 +0,0 @@ -# stri_locale_list: List Available Locales - -## Description - -Creates a character vector with all available locale identifies. - -## Usage - -``` r -stri_locale_list() -``` - -## Details - -Note that some of the services may be unavailable in some locales. Querying for locale-specific services is always performed during the resource request. - -See [stringi-locale](about_locale.md) for more information. - -## Value - -Returns a character vector with locale identifiers that are known to ICU. - -## Author(s) - -[Marek Gagolewski](https://www.gagolewski.com/) and other contributors - -## See Also - -The official online manual of stringi at - -Gagolewski M., stringi: Fast and portable character string processing in R, *Journal of Statistical Software* 103(2), 2022, 1-59, [doi:10.18637/jss.v103.i02](https://doi.org/10.18637/jss.v103.i02) - -Other locale_management: [`about_locale`](about_locale.md), [`stri_locale_info()`](stri_locale_info.md), [`stri_locale_set()`](stri_locale_set.md) - -## Examples - - - - -```r -stri_locale_list() -``` - -``` -## [1] "af" "af_NA" "af_ZA" "agq" "agq_CM" -## [6] "ak" "ak_GH" "am" "am_ET" "ar" -## [11] "ar_001" "ar_AE" "ar_BH" "ar_DJ" "ar_DZ" -## [16] "ar_EG" "ar_EH" "ar_ER" "ar_IL" "ar_IQ" -## [21] "ar_JO" "ar_KM" "ar_KW" "ar_LB" "ar_LY" -## [26] "ar_MA" "ar_MR" "ar_OM" "ar_PS" "ar_QA" -## [31] "ar_SA" "ar_SD" "ar_SO" "ar_SS" "ar_SY" -## [36] "ar_TD" "ar_TN" "ar_YE" "as" "as_IN" -## [41] "asa" "asa_TZ" "ast" "ast_ES" "az" -## [46] "az_Cyrl" "az_Cyrl_AZ" "az_Latn" "az_Latn_AZ" "bas" -## [51] "bas_CM" "be" "be_BY" "bem" "bem_ZM" -## [56] "bez" "bez_TZ" "bg" "bg_BG" "bgc" -## [61] "bgc_IN" "bho" "bho_IN" "blo" "blo_BJ" -## [66] "bm" "bm_ML" "bn" "bn_BD" "bn_IN" -## [71] "bo" "bo_CN" "bo_IN" "br" "br_FR" -## [76] "brx" "brx_IN" "bs" "bs_Cyrl" "bs_Cyrl_BA" -## [81] "bs_Latn" "bs_Latn_BA" "ca" "ca_AD" "ca_ES" -## [86] "ca_FR" "ca_IT" "ccp" "ccp_BD" "ccp_IN" -## [91] "ce" "ce_RU" "ceb" "ceb_PH" "cgg" -## [96] "cgg_UG" "chr" "chr_US" "ckb" "ckb_IQ" -## [101] "ckb_IR" "cs" "cs_CZ" "csw" "csw_CA" -## [106] "cv" "cv_RU" "cy" "cy_GB" "da" -## [111] "da_DK" "da_GL" "dav" "dav_KE" "de" -## [116] "de_AT" "de_BE" "de_CH" "de_DE" "de_IT" -## [121] "de_LI" "de_LU" "dje" "dje_NE" "doi" -## [126] "doi_IN" "dsb" "dsb_DE" "dua" "dua_CM" -## [131] "dyo" "dyo_SN" "dz" "dz_BT" "ebu" -## [136] "ebu_KE" "ee" "ee_GH" "ee_TG" "el" -## [141] "el_CY" "el_GR" "en" "en_001" "en_150" -## [146] "en_AE" "en_AG" "en_AI" "en_AS" "en_AT" -## [151] "en_AU" "en_BB" "en_BE" "en_BI" "en_BM" -## [156] "en_BS" "en_BW" "en_BZ" "en_CA" "en_CC" -## [161] "en_CH" "en_CK" "en_CM" "en_CX" "en_CY" -## [166] "en_DE" "en_DG" "en_DK" "en_DM" "en_ER" -## [171] "en_FI" "en_FJ" "en_FK" "en_FM" "en_GB" -## [176] "en_GD" "en_GG" "en_GH" "en_GI" "en_GM" -## [181] "en_GU" "en_GY" "en_HK" "en_ID" "en_IE" -## [186] "en_IL" "en_IM" "en_IN" "en_IO" "en_JE" -## [191] "en_JM" "en_KE" "en_KI" "en_KN" "en_KY" -## [196] "en_LC" "en_LR" "en_LS" "en_MG" "en_MH" -## [201] "en_MO" "en_MP" "en_MS" "en_MT" "en_MU" -## [206] "en_MV" "en_MW" "en_MY" "en_NA" "en_NF" -## [211] "en_NG" "en_NL" "en_NR" "en_NU" "en_NZ" -## [216] "en_PG" "en_PH" "en_PK" "en_PN" "en_PR" -## [221] "en_PW" "en_RW" "en_SB" "en_SC" "en_SD" -## [226] "en_SE" "en_SG" "en_SH" "en_SI" "en_SL" -## [231] "en_SS" "en_SX" "en_SZ" "en_TC" "en_TK" -## [236] "en_TO" "en_TT" "en_TV" "en_TZ" "en_UG" -## [241] "en_UM" "en_US" "en_US_POSIX" "en_VC" "en_VG" -## [246] "en_VI" "en_VU" "en_WS" "en_ZA" "en_ZM" -## [251] "en_ZW" "eo" "eo_001" "es" "es_419" -## [256] "es_AR" "es_BO" "es_BR" "es_BZ" "es_CL" -## [261] "es_CO" "es_CR" "es_CU" "es_DO" "es_EA" -## [266] "es_EC" "es_ES" "es_GQ" "es_GT" "es_HN" -## [271] "es_IC" "es_MX" "es_NI" "es_PA" "es_PE" -## [276] "es_PH" "es_PR" "es_PY" "es_SV" "es_US" -## [281] "es_UY" "es_VE" "et" "et_EE" "eu" -## [286] "eu_ES" "ewo" "ewo_CM" "fa" "fa_AF" -## [291] "fa_IR" "ff" "ff_Adlm" "ff_Adlm_BF" "ff_Adlm_CM" -## [296] "ff_Adlm_GH" "ff_Adlm_GM" "ff_Adlm_GN" "ff_Adlm_GW" "ff_Adlm_LR" -## [301] "ff_Adlm_MR" "ff_Adlm_NE" "ff_Adlm_NG" "ff_Adlm_SL" "ff_Adlm_SN" -## [306] "ff_Latn" "ff_Latn_BF" "ff_Latn_CM" "ff_Latn_GH" "ff_Latn_GM" -## [311] "ff_Latn_GN" "ff_Latn_GW" "ff_Latn_LR" "ff_Latn_MR" "ff_Latn_NE" -## [316] "ff_Latn_NG" "ff_Latn_SL" "ff_Latn_SN" "fi" "fi_FI" -## [321] "fil" "fil_PH" "fo" "fo_DK" "fo_FO" -## [326] "fr" "fr_BE" "fr_BF" "fr_BI" "fr_BJ" -## [331] "fr_BL" "fr_CA" "fr_CD" "fr_CF" "fr_CG" -## [336] "fr_CH" "fr_CI" "fr_CM" "fr_DJ" "fr_DZ" -## [341] "fr_FR" "fr_GA" "fr_GF" "fr_GN" "fr_GP" -## [346] "fr_GQ" "fr_HT" "fr_KM" "fr_LU" "fr_MA" -## [351] "fr_MC" "fr_MF" "fr_MG" "fr_ML" "fr_MQ" -## [356] "fr_MR" "fr_MU" "fr_NC" "fr_NE" "fr_PF" -## [361] "fr_PM" "fr_RE" "fr_RW" "fr_SC" "fr_SN" -## [366] "fr_SY" "fr_TD" "fr_TG" "fr_TN" "fr_VU" -## [371] "fr_WF" "fr_YT" "fur" "fur_IT" "fy" -## [376] "fy_NL" "ga" "ga_GB" "ga_IE" "gd" -## [381] "gd_GB" "gl" "gl_ES" "gsw" "gsw_CH" -## [386] "gsw_FR" "gsw_LI" "gu" "gu_IN" "guz" -## [391] "guz_KE" "gv" "gv_IM" "ha" "ha_GH" -## [396] "ha_NE" "ha_NG" "haw" "haw_US" "he" -## [401] "he_IL" "hi" "hi_IN" "hi_Latn" "hi_Latn_IN" -## [406] "hr" "hr_BA" "hr_HR" "hsb" "hsb_DE" -## [411] "hu" "hu_HU" "hy" "hy_AM" "ia" -## [416] "ia_001" "id" "id_ID" "ie" "ie_EE" -## [421] "ig" "ig_NG" "ii" "ii_CN" "is" -## [426] "is_IS" "it" "it_CH" "it_IT" "it_SM" -## [431] "it_VA" "ja" "ja_JP" "jgo" "jgo_CM" -## [436] "jmc" "jmc_TZ" "jv" "jv_ID" "ka" -## [441] "ka_GE" "kab" "kab_DZ" "kam" "kam_KE" -## [446] "kde" "kde_TZ" "kea" "kea_CV" "kgp" -## [451] "kgp_BR" "khq" "khq_ML" "ki" "ki_KE" -## [456] "kk" "kk_KZ" "kkj" "kkj_CM" "kl" -## [461] "kl_GL" "kln" "kln_KE" "km" "km_KH" -## [466] "kn" "kn_IN" "ko" "ko_CN" "ko_KP" -## [471] "ko_KR" "kok" "kok_IN" "ks" "ks_Arab" -## [476] "ks_Arab_IN" "ks_Deva" "ks_Deva_IN" "ksb" "ksb_TZ" -## [481] "ksf" "ksf_CM" "ksh" "ksh_DE" "ku" -## [486] "ku_TR" "kw" "kw_GB" "kxv" "kxv_Deva" -## [491] "kxv_Deva_IN" "kxv_Latn" "kxv_Latn_IN" "kxv_Orya" "kxv_Orya_IN" -## [496] "kxv_Telu" "kxv_Telu_IN" "ky" "ky_KG" "lag" -## [501] "lag_TZ" "lb" "lb_LU" "lg" "lg_UG" -## [506] "lij" "lij_IT" "lkt" "lkt_US" "lmo" -## [511] "lmo_IT" "ln" "ln_AO" "ln_CD" "ln_CF" -## [516] "ln_CG" "lo" "lo_LA" "lrc" "lrc_IQ" -## [521] "lrc_IR" "lt" "lt_LT" "lu" "lu_CD" -## [526] "luo" "luo_KE" "luy" "luy_KE" "lv" -## [531] "lv_LV" "mai" "mai_IN" "mas" "mas_KE" -## [536] "mas_TZ" "mer" "mer_KE" "mfe" "mfe_MU" -## [541] "mg" "mg_MG" "mgh" "mgh_MZ" "mgo" -## [546] "mgo_CM" "mi" "mi_NZ" "mk" "mk_MK" -## [551] "ml" "ml_IN" "mn" "mn_MN" "mni" -## [556] "mni_Beng" "mni_Beng_IN" "mr" "mr_IN" "ms" -## [561] "ms_BN" "ms_ID" "ms_MY" "ms_SG" "mt" -## [566] "mt_MT" "mua" "mua_CM" "my" "my_MM" -## [571] "mzn" "mzn_IR" "naq" "naq_NA" "nb" -## [576] "nb_NO" "nb_SJ" "nd" "nd_ZW" "nds" -## [581] "nds_DE" "nds_NL" "ne" "ne_IN" "ne_NP" -## [586] "nl" "nl_AW" "nl_BE" "nl_BQ" "nl_CW" -## [591] "nl_NL" "nl_SR" "nl_SX" "nmg" "nmg_CM" -## [596] "nn" "nn_NO" "nnh" "nnh_CM" "no" -## [601] "nqo" "nqo_GN" "nus" "nus_SS" "nyn" -## [606] "nyn_UG" "oc" "oc_ES" "oc_FR" "om" -## [611] "om_ET" "om_KE" "or" "or_IN" "os" -## [616] "os_GE" "os_RU" "pa" "pa_Arab" "pa_Arab_PK" -## [621] "pa_Guru" "pa_Guru_IN" "pcm" "pcm_NG" "pl" -## [626] "pl_PL" "prg" "prg_PL" "ps" "ps_AF" -## [631] "ps_PK" "pt" "pt_AO" "pt_BR" "pt_CH" -## [636] "pt_CV" "pt_GQ" "pt_GW" "pt_LU" "pt_MO" -## [641] "pt_MZ" "pt_PT" "pt_ST" "pt_TL" "qu" -## [646] "qu_BO" "qu_EC" "qu_PE" "raj" "raj_IN" -## [651] "rm" "rm_CH" "rn" "rn_BI" "ro" -## [656] "ro_MD" "ro_RO" "rof" "rof_TZ" "ru" -## [661] "ru_BY" "ru_KG" "ru_KZ" "ru_MD" "ru_RU" -## [666] "ru_UA" "rw" "rw_RW" "rwk" "rwk_TZ" -## [671] "sa" "sa_IN" "sah" "sah_RU" "saq" -## [676] "saq_KE" "sat" "sat_Olck" "sat_Olck_IN" "sbp" -## [681] "sbp_TZ" "sc" "sc_IT" "sd" "sd_Arab" -## [686] "sd_Arab_PK" "sd_Deva" "sd_Deva_IN" "se" "se_FI" -## [691] "se_NO" "se_SE" "seh" "seh_MZ" "ses" -## [696] "ses_ML" "sg" "sg_CF" "shi" "shi_Latn" -## [701] "shi_Latn_MA" "shi_Tfng" "shi_Tfng_MA" "si" "si_LK" -## [706] "sk" "sk_SK" "sl" "sl_SI" "smn" -## [711] "smn_FI" "sn" "sn_ZW" "so" "so_DJ" -## [716] "so_ET" "so_KE" "so_SO" "sq" "sq_AL" -## [721] "sq_MK" "sq_XK" "sr" "sr_Cyrl" "sr_Cyrl_BA" -## [726] "sr_Cyrl_ME" "sr_Cyrl_RS" "sr_Cyrl_XK" "sr_Latn" "sr_Latn_BA" -## [731] "sr_Latn_ME" "sr_Latn_RS" "sr_Latn_XK" "su" "su_Latn" -## [736] "su_Latn_ID" "sv" "sv_AX" "sv_FI" "sv_SE" -## [741] "sw" "sw_CD" "sw_KE" "sw_TZ" "sw_UG" -## [746] "syr" "syr_IQ" "syr_SY" "szl" "szl_PL" -## [751] "ta" "ta_IN" "ta_LK" "ta_MY" "ta_SG" -## [756] "te" "te_IN" "teo" "teo_KE" "teo_UG" -## [761] "tg" "tg_TJ" "th" "th_TH" "ti" -## [766] "ti_ER" "ti_ET" "tk" "tk_TM" "to" -## [771] "to_TO" "tok" "tok_001" "tr" "tr_CY" -## [776] "tr_TR" "tt" "tt_RU" "twq" "twq_NE" -## [781] "tzm" "tzm_MA" "ug" "ug_CN" "uk" -## [786] "uk_UA" "ur" "ur_IN" "ur_PK" "uz" -## [791] "uz_Arab" "uz_Arab_AF" "uz_Cyrl" "uz_Cyrl_UZ" "uz_Latn" -## [796] "uz_Latn_UZ" "vai" "vai_Latn" "vai_Latn_LR" "vai_Vaii" -## [801] "vai_Vaii_LR" "vec" "vec_IT" "vi" "vi_VN" -## [806] "vmw" "vmw_MZ" "vun" "vun_TZ" "wae" -## [811] "wae_CH" "wo" "wo_SN" "xh" "xh_ZA" -## [816] "xnr" "xnr_IN" "xog" "xog_UG" "yav" -## [821] "yav_CM" "yi" "yi_UA" "yo" "yo_BJ" -## [826] "yo_NG" "yrl" "yrl_BR" "yrl_CO" "yrl_VE" -## [831] "yue" "yue_Hans" "yue_Hans_CN" "yue_Hant" "yue_Hant_HK" -## [836] "za" "za_CN" "zgh" "zgh_MA" "zh" -## [841] "zh_Hans" "zh_Hans_CN" "zh_Hans_HK" "zh_Hans_MO" "zh_Hans_SG" -## [846] "zh_Hant" "zh_Hant_HK" "zh_Hant_MO" "zh_Hant_TW" "zu" -## [851] "zu_ZA" -``` diff --git a/.devel/sphinx/rapi/stri_locale_set.md b/.devel/sphinx/rapi/stri_locale_set.md deleted file mode 100644 index 976b25f0..00000000 --- a/.devel/sphinx/rapi/stri_locale_set.md +++ /dev/null @@ -1,73 +0,0 @@ -# stri_locale_set: - -## Description - -`stri_locale_set` changes the default locale for all the functions in the stringi package, i.e., establishes the meaning of the "`NULL` locale" argument of locale-sensitive functions. `stri_locale_get` gives the current default locale. - -## Usage - -``` r -stri_locale_set(locale) - -stri_locale_get() -``` - -## Arguments - -| | | -|----------|----------------------------------------------------------------------------------------------------------------------------------------------------------| -| `locale` | single string of the form `Language`, `Language_Country`, or `Language_Country_Variant`, e.g., `'en_US'`, see [`stri_locale_list`](stri_locale_list.md). | - -## Details - -See [stringi-locale](about_locale.md) for more information on the effect of changing the default locale. - -`stri_locale_get` is the same as [`stri_locale_info(NULL)$Name`](stri_locale_info.md). - -## Value - -`stri_locale_set` returns a string with previously used locale, invisibly. - -`stri_locale_get` returns a string of the form `Language`, `Language_Country`, or `Language_Country_Variant`, e.g., `'en_US'`. - -## Author(s) - -[Marek Gagolewski](https://www.gagolewski.com/) and other contributors - -## See Also - -The official online manual of stringi at - -Gagolewski M., stringi: Fast and portable character string processing in R, *Journal of Statistical Software* 103(2), 2022, 1-59, [doi:10.18637/jss.v103.i02](https://doi.org/10.18637/jss.v103.i02) - -Other locale_management: [`about_locale`](about_locale.md), [`stri_locale_info()`](stri_locale_info.md), [`stri_locale_list()`](stri_locale_list.md) - -## Examples - - - - -```r -## Not run: -oldloc <- stri_locale_set('pt_BR') -``` - -``` -## You are now working with stringi_1.7.9003 (pt_BR.UTF-8; ICU4C 74.1 [bundle]; Unicode 15.1) -``` - -```r -# ... some locale-dependent operations -# ... note that you may always modify a locale per-call -# ... changing the default locale is convenient if you perform -# ... many operations -stri_locale_set(oldloc) # restore the previous default locale -``` - -``` -## You are now working with stringi_1.7.9003 (en_AU.UTF-8; ICU4C 74.1 [bundle]; Unicode 15.1) -``` - -```r -## End(Not run) -``` diff --git a/.devel/sphinx/rapi/stri_locate.md b/.devel/sphinx/rapi/stri_locate.md deleted file mode 100644 index 39b4f0d1..00000000 --- a/.devel/sphinx/rapi/stri_locate.md +++ /dev/null @@ -1,395 +0,0 @@ -# stri_locate: Locate Pattern Occurrences - -## Description - -These functions find the indexes (positions) where there is a match to some pattern. The functions `stri_locate_all_*` locate all the matches. `stri_locate_first_*` and `stri_locate_last_*` give the first and the last matches, respectively. - -## Usage - -``` r -stri_locate_all(str, ..., regex, fixed, coll, charclass) - -stri_locate_first(str, ..., regex, fixed, coll, charclass) - -stri_locate_last(str, ..., regex, fixed, coll, charclass) - -stri_locate( - str, - ..., - regex, - fixed, - coll, - charclass, - mode = c("first", "all", "last") -) - -stri_locate_all_charclass( - str, - pattern, - merge = TRUE, - omit_no_match = FALSE, - get_length = FALSE -) - -stri_locate_first_charclass(str, pattern, get_length = FALSE) - -stri_locate_last_charclass(str, pattern, get_length = FALSE) - -stri_locate_all_coll( - str, - pattern, - omit_no_match = FALSE, - get_length = FALSE, - ..., - opts_collator = NULL -) - -stri_locate_first_coll( - str, - pattern, - get_length = FALSE, - ..., - opts_collator = NULL -) - -stri_locate_last_coll( - str, - pattern, - get_length = FALSE, - ..., - opts_collator = NULL -) - -stri_locate_all_regex( - str, - pattern, - omit_no_match = FALSE, - capture_groups = FALSE, - get_length = FALSE, - ..., - opts_regex = NULL -) - -stri_locate_first_regex( - str, - pattern, - capture_groups = FALSE, - get_length = FALSE, - ..., - opts_regex = NULL -) - -stri_locate_last_regex( - str, - pattern, - capture_groups = FALSE, - get_length = FALSE, - ..., - opts_regex = NULL -) - -stri_locate_all_fixed( - str, - pattern, - omit_no_match = FALSE, - get_length = FALSE, - ..., - opts_fixed = NULL -) - -stri_locate_first_fixed( - str, - pattern, - get_length = FALSE, - ..., - opts_fixed = NULL -) - -stri_locate_last_fixed( - str, - pattern, - get_length = FALSE, - ..., - opts_fixed = NULL -) -``` - -## Arguments - -| | | -|--------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| `str` | character vector; strings to search in | -| `...` | supplementary arguments passed to the underlying functions, including additional settings for `opts_collator`, `opts_regex`, `opts_fixed`, and so on | -| `mode` | single string; one of: `'first'` (the default), `'all'`, `'last'` | -| `pattern`, `regex`, `fixed`, `coll`, `charclass` | character vector; search patterns; for more details refer to [stringi-search](about_search.md) | -| `merge` | single logical value; indicates whether consecutive sequences of indexes in the resulting matrix should be merged; `stri_locate_all_charclass` only | -| `omit_no_match` | single logical value; if `TRUE`, a no-match will be indicated by a matrix with 0 rows `stri_locate_all_*` only | -| `get_length` | single logical value; if `FALSE` (default), generate *from-to* matrices; otherwise, output *from-length* ones | -| `opts_collator`, `opts_fixed`, `opts_regex` | named list used to tune up the selected search engine\'s settings; see [`stri_opts_collator`](stri_opts_collator.md), [`stri_opts_fixed`](stri_opts_fixed.md), and [`stri_opts_regex`](stri_opts_regex.md), respectively; `NULL` for the defaults | -| `capture_groups` | single logical value; whether positions of matches to parenthesized subexpressions should be returned too (as `capture_groups` attribute); `stri_locate_*_regex` only | - -## Details - -Vectorized over `str` and `pattern` (with recycling of the elements in the shorter vector if necessary). This allows to, for instance, search for one pattern in each string, search for each pattern in one string, and search for the i-th pattern within the i-th string. - -The matches may be extracted by calling [`stri_sub`](stri_sub.md) or [`stri_sub_all`](stri_sub_all.md). Alternatively, you may call [`stri_extract`](stri_extract.md) directly. - -`stri_locate`, `stri_locate_all`, `stri_locate_first`, and `stri_locate_last` are convenience functions. They just call `stri_locate_*_*`, depending on the arguments used. - -## Value - -For `stri_locate_all_*`, a list of integer matrices is returned. Each list element represents the results of a separate search scenario. The first column gives the start positions of the matches, and the second column gives the end positions. Moreover, two `NA`s in a row denote `NA` arguments or a no-match (the latter only if `omit_no_match` is `FALSE`). - -`stri_locate_first_*` and `stri_locate_last_*` return an integer matrix with two columns, giving the start and end positions of the first or the last matches, respectively, and two `NA`s if and only if they are not found. - -For `stri_locate_*_regex`, if the match is of zero length, `end` will be one character less than `start`. Note that `stri_locate_last_regex` searches from start to end, but skips overlapping matches, see the example below. - -Setting `get_length=TRUE` results in the 2nd column representing the length of the match instead of the end position. In this case, negative length denotes a no-match. - -If `capture_groups=TRUE`, then the outputs are equipped with the `capture_groups` attribute, which is a list of matrices giving the start-end positions of matches to parenthesized subexpressions. Similarly to `stri_match_regex`, capture group names are extracted unless looking for first/last occurrences of many different patterns. - -## Author(s) - -[Marek Gagolewski](https://www.gagolewski.com/) and other contributors - -## See Also - -The official online manual of stringi at - -Gagolewski M., stringi: Fast and portable character string processing in R, *Journal of Statistical Software* 103(2), 2022, 1-59, [doi:10.18637/jss.v103.i02](https://doi.org/10.18637/jss.v103.i02) - -Other search_locate: [`about_search`](about_search.md), [`stri_locate_all_boundaries()`](stri_locate_boundaries.md) - -Other indexing: [`stri_locate_all_boundaries()`](stri_locate_boundaries.md), [`stri_sub_all()`](stri_sub_all.md), [`stri_sub()`](stri_sub.md) - -## Examples - - - - -```r -stri_locate_all('stringi', fixed='i') -``` - -``` -## [[1]] -## start end -## [1,] 4 4 -## [2,] 7 7 -``` - -```r -stri_locate_first_coll('hladn\u00FD', 'HLADNY', strength=1, locale='sk_SK') -``` - -``` -## start end -## [1,] 1 6 -``` - -```r -stri_locate_all_regex( - c('breakfast=eggs;lunch=pizza', 'breakfast=spam', 'no food here'), - '(?\\w+)=(?\\w+)', - capture_groups=TRUE -) # named capture groups -``` - -``` -## [[1]] -## start end -## [1,] 1 14 -## [2,] 16 26 -## attr(,"capture_groups") -## attr(,"capture_groups")$when -## start end -## [1,] 1 9 -## [2,] 16 20 -## -## attr(,"capture_groups")$what -## start end -## [1,] 11 14 -## [2,] 22 26 -## -## -## [[2]] -## start end -## [1,] 1 14 -## attr(,"capture_groups") -## attr(,"capture_groups")$when -## start end -## [1,] 1 9 -## -## attr(,"capture_groups")$what -## start end -## [1,] 11 14 -## -## -## [[3]] -## start end -## [1,] NA NA -## attr(,"capture_groups") -## attr(,"capture_groups")$when -## start end -## [1,] NA NA -## -## attr(,"capture_groups")$what -## start end -## [1,] NA NA -``` - -```r -stri_locate_all_fixed("abababa", "ABA", case_insensitive=TRUE, overlap=TRUE) -``` - -``` -## [[1]] -## start end -## [1,] 1 3 -## [2,] 3 5 -## [3,] 5 7 -``` - -```r -stri_locate_first_fixed("ababa", "aba") -``` - -``` -## start end -## [1,] 1 3 -``` - -```r -stri_locate_last_fixed("ababa", "aba") # starts from end -``` - -``` -## start end -## [1,] 3 5 -``` - -```r -stri_locate_last_regex("ababa", "aba") # no overlaps, from left to right -``` - -``` -## start end -## [1,] 1 3 -``` - -```r -x <- c("yes yes", "no", NA) -stri_locate_all_fixed(x, "yes") -``` - -``` -## [[1]] -## start end -## [1,] 1 3 -## [2,] 5 7 -## -## [[2]] -## start end -## [1,] NA NA -## -## [[3]] -## start end -## [1,] NA NA -``` - -```r -stri_locate_all_fixed(x, "yes", omit_no_match=TRUE) -``` - -``` -## [[1]] -## start end -## [1,] 1 3 -## [2,] 5 7 -## -## [[2]] -## start end -## -## [[3]] -## start end -## [1,] NA NA -``` - -```r -stri_locate_all_fixed(x, "yes", get_length=TRUE) -``` - -``` -## [[1]] -## start length -## [1,] 1 3 -## [2,] 5 3 -## -## [[2]] -## start length -## [1,] -1 -1 -## -## [[3]] -## start length -## [1,] NA NA -``` - -```r -stri_locate_all_fixed(x, "yes", get_length=TRUE, omit_no_match=TRUE) -``` - -``` -## [[1]] -## start length -## [1,] 1 3 -## [2,] 5 3 -## -## [[2]] -## start length -## -## [[3]] -## start length -## [1,] NA NA -``` - -```r -stri_locate_first_fixed(x, "yes") -``` - -``` -## start end -## [1,] 1 3 -## [2,] NA NA -## [3,] NA NA -``` - -```r -stri_locate_first_fixed(x, "yes", get_length=TRUE) -``` - -``` -## start length -## [1,] 1 3 -## [2,] -1 -1 -## [3,] NA NA -``` - -```r -# Use regex positive-lookahead to locate overlapping pattern matches: -stri_locate_all_regex('ACAGAGACTTTAGATAGAGAAGA', '(?=AGA)') -``` - -``` -## [[1]] -## start end -## [1,] 3 2 -## [2,] 5 4 -## [3,] 12 11 -## [4,] 16 15 -## [5,] 18 17 -## [6,] 21 20 -``` - -```r -# note that start > end here (match of length zero) -``` diff --git a/.devel/sphinx/rapi/stri_locate_boundaries.md b/.devel/sphinx/rapi/stri_locate_boundaries.md deleted file mode 100644 index 4db6ee11..00000000 --- a/.devel/sphinx/rapi/stri_locate_boundaries.md +++ /dev/null @@ -1,116 +0,0 @@ -# stri_locate_boundaries: Locate Text Boundaries - -## Description - -These functions locate text boundaries (like character, word, line, or sentence boundaries). Use `stri_locate_all_*` to locate all the matches. `stri_locate_first_*` and `stri_locate_last_*` give the first or the last matches, respectively. - -## Usage - -``` r -stri_locate_all_boundaries( - str, - omit_no_match = FALSE, - get_length = FALSE, - ..., - opts_brkiter = NULL -) - -stri_locate_last_boundaries(str, get_length = FALSE, ..., opts_brkiter = NULL) - -stri_locate_first_boundaries(str, get_length = FALSE, ..., opts_brkiter = NULL) - -stri_locate_all_words( - str, - omit_no_match = FALSE, - locale = NULL, - get_length = FALSE -) - -stri_locate_last_words(str, locale = NULL, get_length = FALSE) - -stri_locate_first_words(str, locale = NULL, get_length = FALSE) -``` - -## Arguments - -| | | -|-----------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| `str` | character vector or an object coercible to | -| `omit_no_match` | single logical value; if `TRUE`, a no-match will be indicated by a matrix with 0 rows `stri_locate_all_*` only | -| `get_length` | single logical value; if `FALSE` (default), generate *from-to* matrices; otherwise, output *from-length* ones | -| `...` | additional settings for `opts_brkiter` | -| `opts_brkiter` | named list with ICU BreakIterator\'s settings, see [`stri_opts_brkiter`](stri_opts_brkiter.md); `NULL` for default break iterator, i.e., `line_break` | -| `locale` | `NULL` or `''` for text boundary analysis following the conventions of the default locale, or a single string with locale identifier, see [stringi-locale](about_locale.md) | - -## Details - -Vectorized over `str`. - -For more information on text boundary analysis performed by ICU\'s `BreakIterator`, see [stringi-search-boundaries](about_search_boundaries.md). - -For `stri_locate_*_words`, just like in [`stri_extract_all_words`](stri_extract_boundaries.md) and [`stri_count_words`](stri_count_boundaries.md), ICU\'s word `BreakIterator` iterator is used to locate the word boundaries, and all non-word characters (`UBRK_WORD_NONE` rule status) are ignored. This function is equivalent to a call to `stri_locate_*_boundaries(str, type='word', skip_word_none=TRUE, locale=locale)` - -## Value - -`stri_locate_all_*` yields a list of `length(str)` integer matrices. `stri_locate_first_*` and `stri_locate_last_*` generate return an integer matrix. See [`stri_locate`](stri_locate.md) for more details. - -## Author(s) - -[Marek Gagolewski](https://www.gagolewski.com/) and other contributors - -## See Also - -The official online manual of stringi at - -Gagolewski M., stringi: Fast and portable character string processing in R, *Journal of Statistical Software* 103(2), 2022, 1-59, [doi:10.18637/jss.v103.i02](https://doi.org/10.18637/jss.v103.i02) - -Other search_locate: [`about_search`](about_search.md), [`stri_locate_all()`](stri_locate.md) - -Other indexing: [`stri_locate_all()`](stri_locate.md), [`stri_sub_all()`](stri_sub_all.md), [`stri_sub()`](stri_sub.md) - -Other locale_sensitive: [`%s<%()`](+25s+3C+25.md), [`about_locale`](about_locale.md), [`about_search_boundaries`](about_search_boundaries.md), [`about_search_coll`](about_search_coll.md), [`stri_compare()`](stri_compare.md), [`stri_count_boundaries()`](stri_count_boundaries.md), [`stri_duplicated()`](stri_duplicated.md), [`stri_enc_detect2()`](stri_enc_detect2.md), [`stri_extract_all_boundaries()`](stri_extract_boundaries.md), [`stri_opts_collator()`](stri_opts_collator.md), [`stri_order()`](stri_order.md), [`stri_rank()`](stri_rank.md), [`stri_sort_key()`](stri_sort_key.md), [`stri_sort()`](stri_sort.md), [`stri_split_boundaries()`](stri_split_boundaries.md), [`stri_trans_tolower()`](stri_trans_casemap.md), [`stri_unique()`](stri_unique.md), [`stri_wrap()`](stri_wrap.md) - -Other text_boundaries: [`about_search_boundaries`](about_search_boundaries.md), [`about_search`](about_search.md), [`stri_count_boundaries()`](stri_count_boundaries.md), [`stri_extract_all_boundaries()`](stri_extract_boundaries.md), [`stri_opts_brkiter()`](stri_opts_brkiter.md), [`stri_split_boundaries()`](stri_split_boundaries.md), [`stri_split_lines()`](stri_split_lines.md), [`stri_trans_tolower()`](stri_trans_casemap.md), [`stri_wrap()`](stri_wrap.md) - -## Examples - - - - -```r -test <- 'The\u00a0above-mentioned features are very useful. Spam, spam, eggs, bacon, and spam.' -stri_locate_all_words(test) -``` - -``` -## [[1]] -## start end -## [1,] 1 3 -## [2,] 5 9 -## [3,] 11 19 -## [4,] 24 31 -## [5,] 33 35 -## [6,] 37 40 -## [7,] 42 47 -## [8,] 50 53 -## [9,] 56 59 -## [10,] 62 65 -## [11,] 68 72 -## [12,] 75 77 -## [13,] 79 82 -``` - -```r -stri_locate_all_boundaries( - 'Mr. Jones and Mrs. Brown are very happy. So am I, Prof. Smith.', - type='sentence', - locale='en_US@ss=standard' # ICU >= 56 only -) -``` - -``` -## [[1]] -## start end -## [1,] 1 41 -## [2,] 42 62 -``` diff --git a/.devel/sphinx/rapi/stri_match.md b/.devel/sphinx/rapi/stri_match.md deleted file mode 100644 index 6dc67b2c..00000000 --- a/.devel/sphinx/rapi/stri_match.md +++ /dev/null @@ -1,232 +0,0 @@ -# stri_match: Extract Regex Pattern Matches, Together with Capture Groups - -## Description - -These functions extract substrings in `str` that match a given regex `pattern`. Additionally, they extract matches to every *capture group*, i.e., to all the sub-patterns given in round parentheses. - -## Usage - -``` r -stri_match_all(str, ..., regex) - -stri_match_first(str, ..., regex) - -stri_match_last(str, ..., regex) - -stri_match(str, ..., regex, mode = c("first", "all", "last")) - -stri_match_all_regex( - str, - pattern, - omit_no_match = FALSE, - cg_missing = NA_character_, - ..., - opts_regex = NULL -) - -stri_match_first_regex( - str, - pattern, - cg_missing = NA_character_, - ..., - opts_regex = NULL -) - -stri_match_last_regex( - str, - pattern, - cg_missing = NA_character_, - ..., - opts_regex = NULL -) -``` - -## Arguments - -| | | -|--------------------|-----------------------------------------------------------------------------------------------------------------------------------------| -| `str` | character vector; strings to search in | -| `...` | supplementary arguments passed to the underlying functions, including additional settings for `opts_regex` | -| `mode` | single string; one of: `'first'` (the default), `'all'`, `'last'` | -| `pattern`, `regex` | character vector; search patterns; for more details refer to [stringi-search](about_search.md) | -| `omit_no_match` | single logical value; if `FALSE`, then a row with missing values will indicate that there was no match; `stri_match_all_*` only | -| `cg_missing` | single string to be used if a capture group match is unavailable | -| `opts_regex` | a named list with ICU Regex settings, see [`stri_opts_regex`](stri_opts_regex.md); `NULL` for default settings | - -## Details - -Vectorized over `str` and `pattern` (with recycling of the elements in the shorter vector if necessary). This allows to, for instance, search for one pattern in each given string, search for each pattern in one given string, and search for the i-th pattern within the i-th string. - -If no pattern match is detected and `omit_no_match=FALSE`, then `NA`s are included in the resulting matrix (matrices), see Examples. - -`stri_match`, `stri_match_all`, `stri_match_first`, and `stri_match_last` are convenience functions. They merely call `stri_match_*_regex` and are provided for consistency with other string searching functions\' wrappers, see, among others, [`stri_extract`](stri_extract.md). - -## Value - -For `stri_match_all*`, a list of character matrices is returned. Each list element represents the results of a different search scenario. - -For `stri_match_first*` and `stri_match_last*` a character matrix is returned. Each row corresponds to a different search result. - -The first matrix column gives the whole match. The second one corresponds to the first capture group, the third -- the second capture group, and so on. - -If regular expressions feature a named capture group, the matrix columns will be named accordingly. However, for `stri_match_first*` and `stri_match_last*` this will only be the case if there is a single pattern. - -## Author(s) - -[Marek Gagolewski](https://www.gagolewski.com/) and other contributors - -## See Also - -The official online manual of stringi at - -Gagolewski M., stringi: Fast and portable character string processing in R, *Journal of Statistical Software* 103(2), 2022, 1-59, [doi:10.18637/jss.v103.i02](https://doi.org/10.18637/jss.v103.i02) - -Other search_extract: [`about_search`](about_search.md), [`stri_extract_all_boundaries()`](stri_extract_boundaries.md), [`stri_extract_all()`](stri_extract.md) - -## Examples - - - - -```r -stri_match_all_regex('breakfast=eggs, lunch=pizza, dessert=icecream', - '(\\w+)=(\\w+)') -``` - -``` -## [[1]] -## [,1] [,2] [,3] -## [1,] "breakfast=eggs" "breakfast" "eggs" -## [2,] "lunch=pizza" "lunch" "pizza" -## [3,] "dessert=icecream" "dessert" "icecream" -``` - -```r -stri_match_all_regex(c('breakfast=eggs', 'lunch=pizza', 'no food here'), - '(\\w+)=(\\w+)') -``` - -``` -## [[1]] -## [,1] [,2] [,3] -## [1,] "breakfast=eggs" "breakfast" "eggs" -## -## [[2]] -## [,1] [,2] [,3] -## [1,] "lunch=pizza" "lunch" "pizza" -## -## [[3]] -## [,1] [,2] [,3] -## [1,] NA NA NA -``` - -```r -stri_match_all_regex(c('breakfast=eggs;lunch=pizza', - 'breakfast=bacon;lunch=spaghetti', 'no food here'), - '(\\w+)=(\\w+)') -``` - -``` -## [[1]] -## [,1] [,2] [,3] -## [1,] "breakfast=eggs" "breakfast" "eggs" -## [2,] "lunch=pizza" "lunch" "pizza" -## -## [[2]] -## [,1] [,2] [,3] -## [1,] "breakfast=bacon" "breakfast" "bacon" -## [2,] "lunch=spaghetti" "lunch" "spaghetti" -## -## [[3]] -## [,1] [,2] [,3] -## [1,] NA NA NA -``` - -```r -stri_match_all_regex(c('breakfast=eggs;lunch=pizza', - 'breakfast=bacon;lunch=spaghetti', 'no food here'), - '(?\\w+)=(?\\w+)') # named capture groups -``` - -``` -## [[1]] -## when what -## [1,] "breakfast=eggs" "breakfast" "eggs" -## [2,] "lunch=pizza" "lunch" "pizza" -## -## [[2]] -## when what -## [1,] "breakfast=bacon" "breakfast" "bacon" -## [2,] "lunch=spaghetti" "lunch" "spaghetti" -## -## [[3]] -## when what -## [1,] NA NA NA -``` - -```r -stri_match_first_regex(c('breakfast=eggs;lunch=pizza', - 'breakfast=bacon;lunch=spaghetti', 'no food here'), - '(\\w+)=(\\w+)') -``` - -``` -## [,1] [,2] [,3] -## [1,] "breakfast=eggs" "breakfast" "eggs" -## [2,] "breakfast=bacon" "breakfast" "bacon" -## [3,] NA NA NA -``` - -```r -stri_match_last_regex(c('breakfast=eggs;lunch=pizza', - 'breakfast=bacon;lunch=spaghetti', 'no food here'), - '(\\w+)=(\\w+)') -``` - -``` -## [,1] [,2] [,3] -## [1,] "lunch=pizza" "lunch" "pizza" -## [2,] "lunch=spaghetti" "lunch" "spaghetti" -## [3,] NA NA NA -``` - -```r -stri_match_first_regex(c('abcd', ':abcd', ':abcd:'), '^(:)?([^:]*)(:)?$') -``` - -``` -## [,1] [,2] [,3] [,4] -## [1,] "abcd" NA "abcd" NA -## [2,] ":abcd" ":" "abcd" NA -## [3,] ":abcd:" ":" "abcd" ":" -``` - -```r -stri_match_first_regex(c('abcd', ':abcd', ':abcd:'), '^(:)?([^:]*)(:)?$', cg_missing='') -``` - -``` -## [,1] [,2] [,3] [,4] -## [1,] "abcd" "" "abcd" "" -## [2,] ":abcd" ":" "abcd" "" -## [3,] ":abcd:" ":" "abcd" ":" -``` - -```r -# Match all the pattern of the form XYX, including overlapping matches: -stri_match_all_regex('ACAGAGACTTTAGATAGAGAAGA', '(?=(([ACGT])[ACGT]\\2))')[[1]][,2] -``` - -``` -## [1] "ACA" "AGA" "GAG" "AGA" "TTT" "AGA" "ATA" "AGA" "GAG" "AGA" "AGA" -``` - -```r -# Compare the above to: -stri_extract_all_regex('ACAGAGACTTTAGATAGAGAAGA', '([ACGT])[ACGT]\\1') -``` - -``` -## [[1]] -## [1] "ACA" "GAG" "TTT" "AGA" "AGA" "AGA" -``` diff --git a/.devel/sphinx/rapi/stri_na2empty.md b/.devel/sphinx/rapi/stri_na2empty.md deleted file mode 100644 index 8a7be8f8..00000000 --- a/.devel/sphinx/rapi/stri_na2empty.md +++ /dev/null @@ -1,46 +0,0 @@ -# stri_na2empty: Replace NAs with Empty Strings - -## Description - -This function replaces all missing values with empty strings. See [`stri_replace_na`](stri_replace_na.md) for a generalization. - -## Usage - -``` r -stri_na2empty(x) -``` - -## Arguments - -| | | -|-----|--------------------| -| `x` | a character vector | - -## Value - -Returns a character vector. - -## Author(s) - -[Marek Gagolewski](https://www.gagolewski.com/) and other contributors - -## See Also - -The official online manual of stringi at - -Gagolewski M., stringi: Fast and portable character string processing in R, *Journal of Statistical Software* 103(2), 2022, 1-59, [doi:10.18637/jss.v103.i02](https://doi.org/10.18637/jss.v103.i02) - -Other utils: [`stri_list2matrix()`](stri_list2matrix.md), [`stri_remove_empty()`](stri_remove_empty.md), [`stri_replace_na()`](stri_replace_na.md) - -## Examples - - - - -```r -stri_na2empty(c('a', NA, '', 'b')) -``` - -``` -## [1] "a" "" "" "b" -``` diff --git a/.devel/sphinx/rapi/stri_numbytes.md b/.devel/sphinx/rapi/stri_numbytes.md deleted file mode 100644 index 81e94752..00000000 --- a/.devel/sphinx/rapi/stri_numbytes.md +++ /dev/null @@ -1,81 +0,0 @@ -# stri_numbytes: Count the Number of Bytes - -## Description - -Counts the number of bytes needed to store each string in the computer\'s memory. - -## Usage - -``` r -stri_numbytes(str) -``` - -## Arguments - -| | | -|-------|--------------------------------------------| -| `str` | character vector or an object coercible to | - -## Details - -Often, this is not the function you would normally use in your string processing activities. See [`stri_length`](stri_length.md) instead. - -For 8-bit encoded strings, this is the same as [`stri_length`](stri_length.md). For UTF-8 strings, the returned values may be greater than the number of code points, as UTF-8 is not a fixed-byte encoding: one code point may be encoded by 1-4 bytes (according to the current Unicode standard). - -Missing values are handled properly. - -The strings do not need to be re-encoded to perform this operation. - -The returned values do not include the trailing NUL bytes, which are used internally to mark the end of string data (in C). - -## Value - -Returns an integer vector of the same length as `str`. - -## Author(s) - -[Marek Gagolewski](https://www.gagolewski.com/) and other contributors - -## See Also - -The official online manual of stringi at - -Gagolewski M., stringi: Fast and portable character string processing in R, *Journal of Statistical Software* 103(2), 2022, 1-59, [doi:10.18637/jss.v103.i02](https://doi.org/10.18637/jss.v103.i02) - -Other length: [`%s$%()`](+25s+24+25.md), [`stri_isempty()`](stri_isempty.md), [`stri_length()`](stri_length.md), [`stri_pad_both()`](stri_pad.md), [`stri_sprintf()`](stri_sprintf.md), [`stri_width()`](stri_width.md) - -## Examples - - - - -```r -stri_numbytes(letters) -``` - -``` -## [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 -``` - -```r -stri_numbytes(c('abc', '123', '\u0105\u0104')) -``` - -``` -## [1] 3 3 4 -``` - -```r -## Not run: -# this used to fail on Windows, where there were no native support -# for 4-bytes Unicode characters; see, however, stri_unescape_unicode(): -stri_numbytes('\U001F600') # compare stri_length('\U001F600') -``` - -``` -## [1] 4 -``` - -```r -## End(Not run) -``` diff --git a/.devel/sphinx/rapi/stri_opts_brkiter.md b/.devel/sphinx/rapi/stri_opts_brkiter.md deleted file mode 100644 index 79747583..00000000 --- a/.devel/sphinx/rapi/stri_opts_brkiter.md +++ /dev/null @@ -1,69 +0,0 @@ -# stri_opts_brkiter: Generate a List with BreakIterator Settings - -## Description - -A convenience function to tune the ICU `BreakIterator`\'s behavior in some text boundary analysis functions, see [stringi-search-boundaries](about_search_boundaries.md). - -## Usage - -``` r -stri_opts_brkiter( - type, - locale, - skip_word_none, - skip_word_number, - skip_word_letter, - skip_word_kana, - skip_word_ideo, - skip_line_soft, - skip_line_hard, - skip_sentence_term, - skip_sentence_sep, - ... -) -``` - -## Arguments - -| | | -|----------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| `type` | single string; either the break iterator type, one of `character`, `line_break`, `sentence`, `word`, or a custom set of ICU break iteration rules; see [stringi-search-boundaries](about_search_boundaries.md) | -| `locale` | single string, `NULL` or `''` for default locale | -| `skip_word_none` | logical; perform no action for \'words\' that do not fit into any other categories | -| `skip_word_number` | logical; perform no action for words that appear to be numbers | -| `skip_word_letter` | logical; perform no action for words that contain letters, excluding hiragana, katakana, or ideographic characters | -| `skip_word_kana` | logical; perform no action for words containing kana characters | -| `skip_word_ideo` | logical; perform no action for words containing ideographic characters | -| `skip_line_soft` | logical; perform no action for soft line breaks, i.e., positions where a line break is acceptable but not required | -| `skip_line_hard` | logical; perform no action for hard, or mandatory line breaks | -| `skip_sentence_term` | logical; perform no action for sentences ending with a sentence terminator (\'`.`\', \'`,`\', \'`?`\', \'`!`\'), possibly followed by a hard separator (`CR`, `LF`, `PS`, etc.) | -| `skip_sentence_sep` | logical; perform no action for sentences that do not contain an ending sentence terminator, but are ended by a hard separator or end of input | -| `...` | \[DEPRECATED\] any other arguments passed to this function generate a warning; this argument will be removed in the future | - -## Details - -The `skip_*` family of settings may be used to prevent performing any special actions on particular types of text boundaries, e.g., in case of the [`stri_locate_all_boundaries`](stri_locate_boundaries.md) and [`stri_split_boundaries`](stri_split_boundaries.md) functions. - -Note that custom break iterator rules (advanced users only) should be specified as a single string. For a detailed description of the syntax of RBBI rules, please refer to the ICU User Guide on Boundary Analysis. - -## Value - -Returns a named list object. Omitted `skip_*` values act as they have been set to `FALSE`. - -## Author(s) - -[Marek Gagolewski](https://www.gagolewski.com/) and other contributors - -## References - -*`ubrk.h` File Reference* -- ICU4C API Documentation, - -*Boundary Analysis* -- ICU User Guide, - -## See Also - -The official online manual of stringi at - -Gagolewski M., stringi: Fast and portable character string processing in R, *Journal of Statistical Software* 103(2), 2022, 1-59, [doi:10.18637/jss.v103.i02](https://doi.org/10.18637/jss.v103.i02) - -Other text_boundaries: [`about_search_boundaries`](about_search_boundaries.md), [`about_search`](about_search.md), [`stri_count_boundaries()`](stri_count_boundaries.md), [`stri_extract_all_boundaries()`](stri_extract_boundaries.md), [`stri_locate_all_boundaries()`](stri_locate_boundaries.md), [`stri_split_boundaries()`](stri_split_boundaries.md), [`stri_split_lines()`](stri_split_lines.md), [`stri_trans_tolower()`](stri_trans_casemap.md), [`stri_wrap()`](stri_wrap.md) diff --git a/.devel/sphinx/rapi/stri_opts_collator.md b/.devel/sphinx/rapi/stri_opts_collator.md deleted file mode 100644 index b3158cff..00000000 --- a/.devel/sphinx/rapi/stri_opts_collator.md +++ /dev/null @@ -1,125 +0,0 @@ -# stri_opts_collator: Generate a List with Collator Settings - -## Description - -A convenience function to tune the ICU Collator\'s behavior, e.g., in [`stri_compare`](stri_compare.md), [`stri_order`](stri_order.md), [`stri_unique`](stri_unique.md), [`stri_duplicated`](stri_duplicated.md), as well as [`stri_detect_coll`](stri_detect.md) and other [stringi-search-coll](about_search_coll.md) functions. - -## Usage - -``` r -stri_opts_collator( - locale = NULL, - strength = 3L, - alternate_shifted = FALSE, - french = FALSE, - uppercase_first = NA, - case_level = FALSE, - normalization = FALSE, - normalisation = normalization, - numeric = FALSE, - ... -) - -stri_coll( - locale = NULL, - strength = 3L, - alternate_shifted = FALSE, - french = FALSE, - uppercase_first = NA, - case_level = FALSE, - normalization = FALSE, - normalisation = normalization, - numeric = FALSE, - ... -) -``` - -## Arguments - -| | | -|---------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| `locale` | single string, `NULL` or `''` for default locale | -| `strength` | single integer in {1,2,3,4}, which defines collation strength; `1` for the most permissive collation rules, `4` for the strictest ones | -| `alternate_shifted` | single logical value; `FALSE` treats all the code points with non-ignorable primary weights in the same way, `TRUE` causes code points with primary weights that are equal or below the variable top value to be ignored on primary level and moved to the quaternary level | -| `french` | single logical value; used in Canadian French; `TRUE` results in secondary weights being considered backwards | -| `uppercase_first` | single logical value; `NA` orders upper and lower case letters in accordance to their tertiary weights, `TRUE` forces upper case letters to sort before lower case letters, `FALSE` does the opposite | -| `case_level` | single logical value; controls whether an extra case level (positioned before the third level) is generated or not | -| `normalization` | single logical value; if `TRUE`, then incremental check is performed to see whether the input data is in the FCD form. If the data is not in the FCD form, incremental NFD normalization is performed | -| `normalisation` | alias of `normalization` | -| `numeric` | single logical value; when turned on, this attribute generates a collation key for the numeric value of substrings of digits; this is a way to get \'100\' to sort AFTER \'2\'; note that negative or non-integer numbers will not be ordered properly | -| `...` | \[DEPRECATED\] any other arguments passed to this function generate a warning; this argument will be removed in the future | - -## Details - -ICU\'s *collator* performs a locale-aware, natural-language alike string comparison. This is a more reliable way of establishing relationships between strings than the one provided by base **R**, and definitely one that is more complex and appropriate than ordinary bytewise comparison. - -## Value - -Returns a named list object; missing settings are left with default values. - -## Author(s) - -[Marek Gagolewski](https://www.gagolewski.com/) and other contributors - -## References - -*Collation* -- ICU User Guide, - -*ICU Collation Service Architecture* -- ICU User Guide, - -*`icu::Collator` Class Reference* -- ICU4C API Documentation, - -## See Also - -The official online manual of stringi at - -Gagolewski M., stringi: Fast and portable character string processing in R, *Journal of Statistical Software* 103(2), 2022, 1-59, [doi:10.18637/jss.v103.i02](https://doi.org/10.18637/jss.v103.i02) - -Other locale_sensitive: [`%s<%()`](+25s+3C+25.md), [`about_locale`](about_locale.md), [`about_search_boundaries`](about_search_boundaries.md), [`about_search_coll`](about_search_coll.md), [`stri_compare()`](stri_compare.md), [`stri_count_boundaries()`](stri_count_boundaries.md), [`stri_duplicated()`](stri_duplicated.md), [`stri_enc_detect2()`](stri_enc_detect2.md), [`stri_extract_all_boundaries()`](stri_extract_boundaries.md), [`stri_locate_all_boundaries()`](stri_locate_boundaries.md), [`stri_order()`](stri_order.md), [`stri_rank()`](stri_rank.md), [`stri_sort_key()`](stri_sort_key.md), [`stri_sort()`](stri_sort.md), [`stri_split_boundaries()`](stri_split_boundaries.md), [`stri_trans_tolower()`](stri_trans_casemap.md), [`stri_unique()`](stri_unique.md), [`stri_wrap()`](stri_wrap.md) - -Other search_coll: [`about_search_coll`](about_search_coll.md), [`about_search`](about_search.md) - -## Examples - - - - -```r -stri_cmp('number100', 'number2') -``` - -``` -## [1] -1 -``` - -```r -stri_cmp('number100', 'number2', opts_collator=stri_opts_collator(numeric=TRUE)) -``` - -``` -## [1] 1 -``` - -```r -stri_cmp('number100', 'number2', numeric=TRUE) # equivalent -``` - -``` -## [1] 1 -``` - -```r -stri_cmp('above mentioned', 'above-mentioned') -``` - -``` -## [1] -1 -``` - -```r -stri_cmp('above mentioned', 'above-mentioned', alternate_shifted=TRUE) -``` - -``` -## [1] 0 -``` diff --git a/.devel/sphinx/rapi/stri_opts_fixed.md b/.devel/sphinx/rapi/stri_opts_fixed.md deleted file mode 100644 index 583454eb..00000000 --- a/.devel/sphinx/rapi/stri_opts_fixed.md +++ /dev/null @@ -1,74 +0,0 @@ -# stri_opts_fixed: Generate a List with Fixed Pattern Search Engine\'s Settings - -## Description - -A convenience function used to tune up the behavior of `stri_*_fixed` functions, see [stringi-search-fixed](about_search_fixed.md). - -## Usage - -``` r -stri_opts_fixed(case_insensitive = FALSE, overlap = FALSE, ...) -``` - -## Arguments - -| | | -|--------------------|----------------------------------------------------------------------------------------------------------------------------| -| `case_insensitive` | logical; enable simple case insensitive matching | -| `overlap` | logical; enable overlapping matches\' detection | -| `...` | \[DEPRECATED\] any other arguments passed to this function generate a warning; this argument will be removed in the future | - -## Details - -Case-insensitive matching uses a simple, single-code point case mapping (via ICU\'s `u_toupper()` function). Full case mappings should be used whenever possible because they produce better results by working on whole strings. They also take into account the string context and the language, see [stringi-search-coll](about_search_coll.md). - -Searching for overlapping pattern matches is available in [`stri_extract_all_fixed`](stri_extract.md), [`stri_locate_all_fixed`](stri_locate.md), and [`stri_count_fixed`](stri_count.md) functions. - -## Value - -Returns a named list object. - -## Author(s) - -[Marek Gagolewski](https://www.gagolewski.com/) and other contributors - -## References - -*C/POSIX Migration* -- ICU User Guide, - -## See Also - -The official online manual of stringi at - -Gagolewski M., stringi: Fast and portable character string processing in R, *Journal of Statistical Software* 103(2), 2022, 1-59, [doi:10.18637/jss.v103.i02](https://doi.org/10.18637/jss.v103.i02) - -Other search_fixed: [`about_search_fixed`](about_search_fixed.md), [`about_search`](about_search.md) - -## Examples - - - - -```r -stri_detect_fixed('ala', 'ALA') # case-sensitive by default -``` - -``` -## [1] FALSE -``` - -```r -stri_detect_fixed('ala', 'ALA', opts_fixed=stri_opts_fixed(case_insensitive=TRUE)) -``` - -``` -## [1] TRUE -``` - -```r -stri_detect_fixed('ala', 'ALA', case_insensitive=TRUE) # equivalent -``` - -``` -## [1] TRUE -``` diff --git a/.devel/sphinx/rapi/stri_opts_regex.md b/.devel/sphinx/rapi/stri_opts_regex.md deleted file mode 100644 index f69464c2..00000000 --- a/.devel/sphinx/rapi/stri_opts_regex.md +++ /dev/null @@ -1,106 +0,0 @@ -# stri_opts_regex: Generate a List with Regex Matcher Settings - -## Description - -A convenience function to tune the ICU regular expressions matcher\'s behavior, e.g., in [`stri_count_regex`](stri_count.md) and other [stringi-search-regex](about_search_regex.md) functions. - -## Usage - -``` r -stri_opts_regex( - case_insensitive, - comments, - dotall, - dot_all = dotall, - literal, - multiline, - multi_line = multiline, - unix_lines, - uword, - error_on_unknown_escapes, - time_limit = 0L, - stack_limit = 0L, - ... -) -``` - -## Arguments - -| | | -|----------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| `case_insensitive` | logical; enables case insensitive matching \[regex flag `(?i)`\] | -| `comments` | logical; allows white space and comments within patterns \[regex flag `(?x)`\] | -| `dotall` | logical; if set, \'`.`\' matches line terminators, otherwise matching of \'`.`\' stops at a line end \[regex flag `(?s)`\] | -| `dot_all` | alias of `dotall` | -| `literal` | logical; if set, treat the entire pattern as a literal string: metacharacters or escape sequences in the input sequence will be given no special meaning; note that in most cases you would rather use the [stringi-search-fixed](about_search_fixed.md) facilities in this case | -| `multiline` | logical; controls the behavior of \'`$`\' and \'`^`\'. If set, recognize line terminators within a string, otherwise, match only at start and end of input string \[regex flag `(?m)`\] | -| `multi_line` | alias of `multiline` | -| `unix_lines` | logical; Unix-only line endings; when enabled, only `U+000a` is recognized as a line ending by \'`.`\', \'`$`\', and \'`^`\'. | -| `uword` | logical; Unicode word boundaries; if set, uses the Unicode TR 29 definition of word boundaries; warning: Unicode word boundaries are quite different from traditional regex word boundaries. \[regex flag `(?w)`\] See | -| `error_on_unknown_escapes` | logical; whether to generate an error on unrecognized backslash escapes; if set, fail with an error on patterns that contain backslash-escaped ASCII letters without a known special meaning; otherwise, these escaped letters represent themselves | -| `time_limit` | integer; processing time limit, in \~milliseconds (but not precisely so, depends on the CPU speed), for match operations; setting a limit is desirable if poorly written regexes are expected on input; 0 for no limit | -| `stack_limit` | integer; maximal size, in bytes, of the heap storage available for the match backtracking stack; setting a limit is desirable if poorly written regexes are expected on input; 0 for no limit | -| `...` | \[DEPRECATED\] any other arguments passed to this function generate a warning; this argument will be removed in the future | - -## Details - -Note that some regex settings may be changed using ICU regex flags inside regexes. For example, `'(?i)pattern'` performs a case-insensitive match of a given pattern, see the ICU User Guide entry on Regular Expressions in the References section or [stringi-search-regex](about_search_regex.md). - -## Value - -Returns a named list object; missing settings are left with default values. - -## Author(s) - -[Marek Gagolewski](https://www.gagolewski.com/) and other contributors - -## References - -*`enum URegexpFlag`: Constants for Regular Expression Match Modes* -- ICU4C API Documentation, - -*Regular Expressions* -- ICU User Guide, - -## See Also - -The official online manual of stringi at - -Gagolewski M., stringi: Fast and portable character string processing in R, *Journal of Statistical Software* 103(2), 2022, 1-59, [doi:10.18637/jss.v103.i02](https://doi.org/10.18637/jss.v103.i02) - -Other search_regex: [`about_search_regex`](about_search_regex.md), [`about_search`](about_search.md) - -## Examples - - - - -```r -stri_detect_regex('ala', 'ALA') # case-sensitive by default -``` - -``` -## [1] FALSE -``` - -```r -stri_detect_regex('ala', 'ALA', opts_regex=stri_opts_regex(case_insensitive=TRUE)) -``` - -``` -## [1] TRUE -``` - -```r -stri_detect_regex('ala', 'ALA', case_insensitive=TRUE) # equivalent -``` - -``` -## [1] TRUE -``` - -```r -stri_detect_regex('ala', '(?i)ALA') # equivalent -``` - -``` -## [1] TRUE -``` diff --git a/.devel/sphinx/rapi/stri_order.md b/.devel/sphinx/rapi/stri_order.md deleted file mode 100644 index fa248dfa..00000000 --- a/.devel/sphinx/rapi/stri_order.md +++ /dev/null @@ -1,96 +0,0 @@ -# stri_order: Ordering Permutation - -## Description - -This function finds a permutation which rearranges the strings in a given character vector into the ascending or descending locale-dependent lexicographic order. - -## Usage - -``` r -stri_order(str, decreasing = FALSE, na_last = TRUE, ..., opts_collator = NULL) -``` - -## Arguments - -| | | -|-----------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| `str` | a character vector | -| `decreasing` | a single logical value; should the sort order be nondecreasing (`FALSE`, default) or nonincreasing (`TRUE`)? | -| `na_last` | a single logical value; controls the treatment of `NA`s in `str`. If `TRUE`, then missing values in `str` are put at the end; if `FALSE`, they are put at the beginning; if `NA`, then they are removed from the output | -| `...` | additional settings for `opts_collator` | -| `opts_collator` | a named list with ICU Collator\'s options, see [`stri_opts_collator`](stri_opts_collator.md), `NULL` for default collation options | - -## Details - -For more information on ICU\'s Collator and how to tune it up in stringi, refer to [`stri_opts_collator`](stri_opts_collator.md). - -As usual in stringi, non-character inputs are coerced to strings, see an example below for a somewhat non-intuitive behavior of lexicographic sorting on numeric inputs. - -This function uses a stable sort algorithm (STL\'s `stable_sort`), which performs up to $N*log^2(N)$ element comparisons, where $N$ is the length of `str`. - -For ordering with regards to multiple criteria (such as sorting data frames by more than 1 column), see [`stri_rank`](stri_rank.md). - -## Value - -The function yields an integer vector that gives the sort order. - -## Author(s) - -[Marek Gagolewski](https://www.gagolewski.com/) and other contributors - -## References - -*Collation* - ICU User Guide, - -## See Also - -The official online manual of stringi at - -Gagolewski M., stringi: Fast and portable character string processing in R, *Journal of Statistical Software* 103(2), 2022, 1-59, [doi:10.18637/jss.v103.i02](https://doi.org/10.18637/jss.v103.i02) - -Other locale_sensitive: [`%s<%()`](+25s+3C+25.md), [`about_locale`](about_locale.md), [`about_search_boundaries`](about_search_boundaries.md), [`about_search_coll`](about_search_coll.md), [`stri_compare()`](stri_compare.md), [`stri_count_boundaries()`](stri_count_boundaries.md), [`stri_duplicated()`](stri_duplicated.md), [`stri_enc_detect2()`](stri_enc_detect2.md), [`stri_extract_all_boundaries()`](stri_extract_boundaries.md), [`stri_locate_all_boundaries()`](stri_locate_boundaries.md), [`stri_opts_collator()`](stri_opts_collator.md), [`stri_rank()`](stri_rank.md), [`stri_sort_key()`](stri_sort_key.md), [`stri_sort()`](stri_sort.md), [`stri_split_boundaries()`](stri_split_boundaries.md), [`stri_trans_tolower()`](stri_trans_casemap.md), [`stri_unique()`](stri_unique.md), [`stri_wrap()`](stri_wrap.md) - -## Examples - - - - -```r -stri_order(c('hladny', 'chladny'), locale='pl_PL') -``` - -``` -## [1] 2 1 -``` - -```r -stri_order(c('hladny', 'chladny'), locale='sk_SK') -``` - -``` -## [1] 1 2 -``` - -```r -stri_order(c(1, 100, 2, 101, 11, 10)) # lexicographic order -``` - -``` -## [1] 1 6 2 4 5 3 -``` - -```r -stri_order(c(1, 100, 2, 101, 11, 10), numeric=TRUE) # OK for integers -``` - -``` -## [1] 1 3 6 5 2 4 -``` - -```r -stri_order(c(0.25, 0.5, 1, -1, -2, -3), numeric=TRUE) # incorrect -``` - -``` -## [1] 4 5 6 2 1 3 -``` diff --git a/.devel/sphinx/rapi/stri_pad.md b/.devel/sphinx/rapi/stri_pad.md deleted file mode 100644 index d403b13e..00000000 --- a/.devel/sphinx/rapi/stri_pad.md +++ /dev/null @@ -1,120 +0,0 @@ -# stri_pad: Pad (Center/Left/Right Align) a String - -## Description - -Add multiple `pad` characters at the given `side`(s) of each string so that each output string is of total width of at least `width`. These functions may be used to center or left/right-align each string. - -## Usage - -``` r -stri_pad_both( - str, - width = floor(0.9 * getOption("width")), - pad = " ", - use_length = FALSE -) - -stri_pad_left( - str, - width = floor(0.9 * getOption("width")), - pad = " ", - use_length = FALSE -) - -stri_pad_right( - str, - width = floor(0.9 * getOption("width")), - pad = " ", - use_length = FALSE -) - -stri_pad( - str, - width = floor(0.9 * getOption("width")), - side = c("left", "right", "both"), - pad = " ", - use_length = FALSE -) -``` - -## Arguments - -| | | -|--------------|-------------------------------------------------------------------------------------------------------------------------------------------| -| `str` | character vector | -| `width` | integer vector giving minimal output string lengths | -| `pad` | character vector giving padding code points | -| `use_length` | single logical value; should the number of code points be used instead of the total code point width (see [`stri_width`](stri_width.md))? | -| `side` | \[`stri_pad` only\] single character string; sides on which padding character is added (`left` (default), `right`, or `both`) | - -## Details - -Vectorized over `str`, `width`, and `pad`. Each string in `pad` should consist of a code points of total width equal to 1 or, if `use_length` is `TRUE`, exactly one code point. - -`stri_pad` is a convenience function, which dispatches to `stri_pad_*`. - -Note that Unicode code points may have various widths when printed on the console and that, by default, the function takes that into account. By changing the state of the `use_length` argument, this function starts acting like each code point was of width 1. This feature should rather be used with text in Latin script. - -See [`stri_trim_left`](stri_trim.md) (among others) for reverse operation. Also check out [`stri_wrap`](stri_wrap.md) for line wrapping. - -## Value - -These functions return a character vector. - -## Author(s) - -[Marek Gagolewski](https://www.gagolewski.com/) and other contributors - -## See Also - -The official online manual of stringi at - -Gagolewski M., stringi: Fast and portable character string processing in R, *Journal of Statistical Software* 103(2), 2022, 1-59, [doi:10.18637/jss.v103.i02](https://doi.org/10.18637/jss.v103.i02) - -Other length: [`%s$%()`](+25s+24+25.md), [`stri_isempty()`](stri_isempty.md), [`stri_length()`](stri_length.md), [`stri_numbytes()`](stri_numbytes.md), [`stri_sprintf()`](stri_sprintf.md), [`stri_width()`](stri_width.md) - -## Examples - - - - -```r -stri_pad_left('stringi', 10, pad='#') -``` - -``` -## [1] "###stringi" -``` - -```r -stri_pad_both('stringi', 8:12, pad='*') -``` - -``` -## [1] "stringi*" "*stringi*" "*stringi**" "**stringi**" "**stringi***" -``` - -```r -# center on screen: -cat(stri_pad_both(c('the', 'string', 'processing', 'package'), - getOption('width')*0.9), sep='\n') -``` - -``` -## the -## string -## processing -## package -``` - -```r -cat(stri_pad_both(c('\ud6c8\ubbfc\uc815\uc74c', # takes width into account - stri_trans_nfkd('\ud6c8\ubbfc\uc815\uc74c'), 'abcd'), - width=10), sep='\n') -``` - -``` -## 훈민정음 -## 훈민정음 -## abcd -``` diff --git a/.devel/sphinx/rapi/stri_rand_lipsum.md b/.devel/sphinx/rapi/stri_rand_lipsum.md deleted file mode 100644 index 6d213ca7..00000000 --- a/.devel/sphinx/rapi/stri_rand_lipsum.md +++ /dev/null @@ -1,171 +0,0 @@ -# stri_rand_lipsum: A Lorem Ipsum Generator - -## Description - -Generates (pseudo)random *lorem ipsum* text consisting of a given number of text paragraphs. - -## Usage - -``` r -stri_rand_lipsum(n_paragraphs, start_lipsum = TRUE, nparagraphs = n_paragraphs) -``` - -## Arguments - -| | | -|----------------|------------------------------------------------------------------------------------------| -| `n_paragraphs` | single integer, number of paragraphs to generate | -| `start_lipsum` | single logical value; should the resulting text start with *Lorem ipsum dolor sit amet*? | -| `nparagraphs` | deprecated alias of `n_paragraphs` | - -## Details - -*Lorem ipsum* is a dummy text often used as a source of data for string processing and displaying/lay-outing exercises. - -The current implementation is very simple: words are selected randomly from a Zipf distribution (based on a set of ca. 190 predefined Latin words). The number of words per sentence and sentences per paragraph follows a discretized, truncated normal distribution. No Markov chain modeling, just i.i.d. word selection. - -## Value - -Returns a character vector of length `n_paragraphs`. - -## Author(s) - -[Marek Gagolewski](https://www.gagolewski.com/) and other contributors - -## See Also - -The official online manual of stringi at - -Gagolewski M., stringi: Fast and portable character string processing in R, *Journal of Statistical Software* 103(2), 2022, 1-59, [doi:10.18637/jss.v103.i02](https://doi.org/10.18637/jss.v103.i02) - -Other random: [`stri_rand_shuffle()`](stri_rand_shuffle.md), [`stri_rand_strings()`](stri_rand_strings.md) - -## Examples - - - - -```r -cat(sapply( - stri_wrap(stri_rand_lipsum(10), 80, simplify=FALSE), - stri_flatten, collapse='\n'), sep='\n\n') -``` - -``` -## Lorem ipsum dolor sit amet, tincidunt fermentum erat, penatibus parturient -## porta quis mauris volutpat nunc, urna aliquet! Nec, eros diam molestie sociosqu -## etiam phasellus dis arcu. Varius donec ligula sed tempor semper sed ut, nec. -## Pulvinar sodales ridiculus, quam ut tristique facilisis eu. Erat mauris in erat -## in mauris. In lacinia vestibulum et ut dignissim quisque cursus facilisi et et -## ultricies pretium. Sed, mollis porta elementum dolor nec sed lacus, augue. Velit -## quam. Iaculis in egestas, curabitur proin vitae ligula vivamus morbi vestibulum. -## -## Non, imperdiet et sed platea sed, donec. Himenaeos luctus id feugiat proin. -## Tincidunt augue efficitur maecenas malesuada adipiscing leo. Ac, tortor mauris -## sem sapien. Cubilia nisl a porttitor eu parturient. Arcu nec porttitor curae -## lacinia magna! Aliquam proin non. Sit, fames pellentesque nibh pretium vel -## sed eros dolor justo, turpis. Molestie, lacus libero natoque condimentum at -## tincidunt penatibus. Massa finibus sapien pulvinar pharetra. -## -## Himenaeos sit nulla et at sociis vestibulum fermentum aliquet et vitae nunc. -## Et ac dictumst curae eu aptent varius velit est. Imperdiet ut donec dapibus -## aliquam convallis at. Neque nulla sit dis aliquam risus sed faucibus malesuada -## blandit aliquam. Per auctor, pellentesque nisl, nec bibendum magnis felis ipsum -## hac a. Nisi ac sem et nec nulla massa scelerisque nec molestie. Aenean finibus -## non egestas, phasellus tortor ligula vitae in a. Sollicitudin mattis vulputate -## nec eu sociis mi quam nec massa. Nunc a commodo nulla mattis et euismod enim. -## Quisque nullam purus auctor sed mauris imperdiet. At, viverra pellentesque -## commodo torquent ac eu. Accumsan enim proin penatibus ut lorem. Elit, ut -## habitasse eget in eleifend aliquet. Ligula nibh id ut. Nibh amet libero tempor -## primis turpis quam, ut praesent, velit sodales, amet lacus pulvinar in. Viverra -## pellentesque nibh tincidunt sed metus accumsan aptent, sed dictumst pellentesque -## netus. -## -## Eleifend, est cursus in feugiat. Ligula venenatis libero nunc ultricies -## et convallis. Nulla quisque natoque ut morbi curabitur nisl. Ipsum at odio -## sollicitudin urna tellus consequat urna dui sed. In taciti pulvinar vel, -## tristique ullamcorper velit mattis. Tempus hendrerit, lectus aptent tellus -## justo dis aenean leo! Sed odio sem dignissim, viverra morbi nibh fringilla. -## Non nulla consequat, adipiscing massa tortor in penatibus. Ac et dignissim, dui -## donec fames sed, vitae eleifend mauris aliquam, amet. Ultricies non rutrum ipsum -## sapien elementum et. Sed habitasse massa, pretium per quisque adipiscing in -## aptent molestie condimentum ante in. -## -## Nibh nunc integer. Nibh pellentesque facilisis sagittis lorem porta et mauris -## magnis dictum. Cursus magna volutpat ultricies, sollicitudin nisl et auctor. -## Vulputate ut vestibulum nisi quisque inceptos risus, odio, sodales? Vivamus -## class in tempor ligula sagittis gravida ac, iaculis. A ut habitant nec tristique -## amet. Sed metus ut nulla magna tellus gravida. Vel lorem est scelerisque -## iaculis. Convallis hendrerit magnis faucibus tortor. Amet risus eget in ex -## pharetra non id massa. Nec et enim egestas sagittis quis sed bibendum donec. -## -## Elementum tempus ante sit enim elementum metus nullam. Porta sem nisl sed, sed, -## hac ac magna nam laoreet fringilla! Maximus facilisis cras nisi posuere sed -## magna fringilla tristique sociosqu amet tincidunt curabitur dictumst. Aliquam -## sed habitasse non in blandit aliquam. Urna suscipit, ut duis, dis nulla eget nec -## ut suspendisse. Sem augue sollicitudin sed vel arcu a orci dolor odio non. Non -## leo dapibus ullamcorper, inceptos viverra. Accumsan gravida eu eget ipsum eros. -## Ex gravida quis euismod sed ullamcorper mattis lorem, vel sed nulla! Himenaeos -## habitant tempus mauris, sem ultricies eros. Et nulla egestas quis. Diam nibh ac -## in quis parturient sem, risus vulputate. Lacinia in duis, nibh etiam condimentum -## eu vestibulum. Eu volutpat, felis commodo tincidunt, lobortis dictumst laoreet. -## Ut est in donec scelerisque sed rhoncus quam consectetur. Consequat orci -## imperdiet, ultrices id et nascetur. -## -## Sodales, nisi ac faucibus, quis potenti sed. A eu ipsum fermentum, habitasse -## nam, tempor mi dis. Ac, ut sollicitudin justo in tristique, diam luctus nunc -## nec ac. Dui, faucibus non amet finibus, urna praesent phasellus sed. Duis -## per elementum ac litora phasellus non. Vulputate primis magna vestibulum quis -## mauris, felis facilisis lacinia tempus mattis. Facilisis sed ante in suscipit -## nostra, tempus integer massa lacinia dui. Finibus aptent euismod ut, sed in -## molestie varius tincidunt mus. Volutpat urna nisl aliquam gravida in nibh -## vivamus, efficitur. Eleifend ligula lectus eu aliquet hendrerit. Vel rhoncus -## blandit mus nec, tortor fringilla semper sed sociis sem, velit. Magna et nec -## eros turpis magna. -## -## Pellentesque suspendisse nec montes in, sapien nascetur malesuada in leo justo, -## dui est porttitor, eu. Class odio faucibus ac finibus risus pretium in euismod -## nunc nulla malesuada cum. Sed ligula, magna lorem iaculis, litora auctor. Ad -## facilisis eu non sit. Enim ut mauris orci erat felis. Convallis maecenas velit, -## aenean ac nunc volutpat nec morbi. Cras risus rhoncus vestibulum in purus lorem. -## -## Turpis sit dui sed rhoncus suspendisse maecenas. Diam nulla lorem posuere -## tellus. Velit mattis aliquam, massa lacus nunc lectus a. Tempus est, eu -## porttitor faucibus non. Suspendisse justo est. Proin consectetur lacus metus -## vitae ut velit. Sed molestie habitasse aenean venenatis per pharetra lectus -## nulla ultrices vitae. Id nam porta amet pellentesque sapien. Ut iaculis faucibus -## eu ridiculus felis congue cras, fusce ultricies. -## -## Maecenas auctor nunc. Sed magna egestas velit amet, aliquam leo facilisis. Nunc -## ac sed gravida dolor gravida ut ac eu feugiat. Facilisis habitasse porttitor -## id vel ultricies porta mauris laoreet. Molestie urna blandit netus dis nullam -## ut venenatis. Risus velit vestibulum vitae justo netus. Quis odio vel sit nam. -## Tincidunt, eu in, torquent odio. Ac felis pharetra euismod elit odio consectetur -## dictum. Ante id urna quis convallis. -``` - -```r -cat(stri_rand_lipsum(10), sep='\n\n') -``` - -``` -## Lorem ipsum dolor sit amet, laoreet ut urna ac, accumsan in suscipit nullam. Vitae nec in sed proin quis, in ligula. Varius curabitur turpis eu rhoncus fusce curae. Ullamcorper eget maecenas. Est vel, lobortis sociis vel mi donec sed et magna in. Sed amet ante tellus donec augue dictum amet sagittis, aliquam. A tincidunt congue eget nostra non mauris auctor quis. Imperdiet laoreet quis, orci inceptos aenean, sagittis, litora. Vestibulum tristique. Non morbi, consequat scelerisque tincidunt lacinia quam tristique aliquet ad. Urna nec nunc eu vitae fermentum auctor, lacus ex, urna. Ipsum rhoncus ex condimentum amet. Erat nascetur ante ut urna dis vivamus faucibus consequat neque nostra, et. -## -## Neque pretium semper ad mattis non porta facilisis nullam class. Velit montes, lacus vel volutpat nec metus leo venenatis. Felis lacus sit diam, a. Tincidunt molestie purus mi diam proin sed tincidunt ut rutrum. Diam libero. Velit lacus ac sed, sed ut egestas finibus laoreet. Et nascetur non dolor felis torquent euismod libero nisl tempus. Suscipit auctor, et sit placerat, risus. Condimentum cursus vestibulum a luctus sapien hendrerit eu, nec. Maecenas sit consequat arcu. Laoreet donec ac eros. Tempus feugiat amet tellus neque habitant conubia non pellentesque. Efficitur, finibus, dui a, nec tincidunt. In lacinia vulputate, eget quam in et potenti nisi. Maecenas lorem nulla. Primis sed. -## -## Amet penatibus ultricies platea quis massa ut. Curabitur dignissim sollicitudin, sed vitae. Lobortis nibh aenean in ultrices a nunc scelerisque, amet nisl eleifend. Magna pharetra, lacus eu sed nec ultricies non, ut. Sed magna morbi ipsum purus leo ligula taciti. Taciti nulla porta, mauris, senectus in! Curae at aliquam ac, massa ultrices hac vel cursus luctus cubilia purus fermentum ut. Elementum at litora mattis vivamus cursus magna adipiscing neque. -## -## Neque turpis, ut class mauris vestibulum, ultrices odio penatibus et, tempus, inceptos fringilla aptent. Mollis nec ac ac enim condimentum aptent, justo mattis quam accumsan. Facilisis dapibus cras tincidunt sit et sed, ad suscipit, ut. Et amet urna sodales, ac et. Lectus id purus ac, nostra scelerisque lorem phasellus id consequat sapien lacinia leo iaculis nulla. Feugiat, orci sed, nibh purus eros tempus bibendum ornare in. Ac tincidunt pellentesque scelerisque non adipiscing. Morbi massa, potenti sed pretium class ac. Quam phasellus quam fusce erat odio ullamcorper id per, eu, suspendisse eget ut finibus. -## -## Lobortis donec conubia volutpat cum ad nisl nam venenatis himenaeos eu. In curae velit aenean tortor. Nisl nam. Dignissim ac nibh congue at luctus ante lobortis, felis duis viverra lacus. Sapien rutrum arcu laoreet integer purus! Eros integer porta nisi elit vulputate quis. Nulla, vel. Primis nunc neque sed ultricies eros lorem torquent velit, vulputate pretium at in. Ipsum eleifend pulvinar ullamcorper habitant sed, ante, malesuada. Et sed, non. Vulputate morbi ipsum, nunc vitae montes neque duis himenaeos maximus quis litora. Iaculis in elementum morbi ac magnis in amet. Sed velit aenean, ultrices, sem. Ut parturient suspendisse, nam. -## -## Nulla curabitur auctor class erat pellentesque scelerisque duis. Mi gravida, pellentesque himenaeos elementum condimentum nam nullam elit sit blandit pulvinar. A commodo, et sodales primis, consequat, consequat. Magna eu diam et quisque nibh. Odio nisl vel libero nisl bibendum et in, nec habitant purus. Phasellus sed diam in luctus bibendum vel nulla bibendum sed pretium. Lacus ut, class sit, vulputate enim lacinia sit, etiam, in mauris quis. Pulvinar nisl nulla ut neque lacus curae non proin. Sed mollis a eu sit. Mauris habitant, dictumst tincidunt eu nullam massa turpis viverra. -## -## Nulla sagittis, eu praesent ut mi dapibus. At primis ante auctor sit ultrices. Sodales, augue semper convallis cubilia consequat malesuada in sit. Sed cras justo cursus non eros consectetur in ipsum condimentum placerat. Semper magnis nisl dis imperdiet justo velit at quisque. Purus lacus ut nibh fames tempus sed nisl ac. Nulla consectetur ante ex, neque gravida massa amet. Netus in, ut placerat magna, nam sodales curae nascetur non. -## -## Consequat sed augue congue. Ante cum nullam enim vehicula curabitur justo ipsum? Felis, praesent, tellus malesuada pulvinar duis. Molestie et maximus vitae, diam id litora erat felis ut. Primis sodales risus sit. Scelerisque nisi in in lacus sit augue facilisi mauris iaculis non tortor venenatis. Eu euismod vitae, et nec consequat accumsan eu adipiscing non senectus elit. Odio, vitae turpis placerat nostra. Arcu sollicitudin imperdiet justo vestibulum natoque eu dapibus euismod mauris volutpat nec. Eros vitae urna lacinia. Class donec vitae tincidunt a ac nunc. Nostra in tempor posuere vehicula varius vitae massa. Eu dolor et pellentesque accumsan at velit curabitur a ut mollis. -## -## Lacinia dui at sit maximus in a rutrum vestibulum sed. Nascetur ut vulputate. Vehicula sed morbi faucibus donec, ipsum. Maecenas eu tincidunt massa facilisis faucibus commodo. Sapien class sociis ornare sed tortor ultricies at nisl tempus. Laoreet donec, ipsum justo, nulla auctor in lobortis magna in habitasse auctor. Vehicula erat magnis, ante euismod sed. Fringilla ut cursus tellus dapibus iaculis nulla phasellus vestibulum. Odio in ligula neque urna viverra dui cras volutpat. Pharetra auctor id odio platea litora pulvinar. Sagittis nostra interdum nibh vulputate dolor sed, phasellus. Eros, magnis eu ante, potenti curabitur. Tempor sed eget pellentesque nullam et natoque molestie. Id mi turpis ligula pellentesque himenaeos id molestie cursus sit a consequat quam. Ac non, aliquam lobortis cum justo sociosqu et. Molestie quis id nec, sed et ac. Vel convallis sed nisl himenaeos. Sed tincidunt enim felis, litora arcu feugiat luctus at dui. In condimentum vel vivamus arcu eget viverra ut sed blandit. -## -## Cras lacus tempor at nam libero in rutrum eget metus ipsum ac integer porttitor cras eu. Lacinia maximus cum massa pellentesque habitant, sagittis justo. Sem pulvinar eget mattis euismod magna inceptos, ut, mattis sem pellentesque luctus. Aliquam ullamcorper inceptos odio ex magna suscipit bibendum nibh. Sem fames quam at accumsan sodales ullamcorper at nascetur sed ad, et. Vehicula lectus lobortis mattis quis in eu. Et amet nascetur vel ipsum, duis duis id vestibulum! Enim nisl class libero lobortis tempor tempor justo. Mauris aliquam, vitae sed mauris penatibus enim natoque eget. Eu felis enim bibendum inceptos, luctus placerat. Id potenti non non. Varius hac libero eu amet condimentum tristique lectus. Vulputate turpis velit ligula porttitor, nec vitae. Cum curae libero in interdum consectetur massa tortor ligula ut eu! Nibh interdum vitae sed tincidunt ut eget. Non sapien non odio tempus nec primis auctor. In amet lacinia class viverra purus nec nec risus erat posuere. Pulvinar erat elementum leo nibh luctus montes risus luctus id. -``` diff --git a/.devel/sphinx/rapi/stri_rand_shuffle.md b/.devel/sphinx/rapi/stri_rand_shuffle.md deleted file mode 100644 index 1c057ff6..00000000 --- a/.devel/sphinx/rapi/stri_rand_shuffle.md +++ /dev/null @@ -1,65 +0,0 @@ -# stri_rand_shuffle: Randomly Shuffle Code Points in Each String - -## Description - -Generates a (pseudo)random permutation of the code points in each string. - -## Usage - -``` r -stri_rand_shuffle(str) -``` - -## Arguments - -| | | -|-------|------------------| -| `str` | character vector | - -## Details - -This operation may result in non-Unicode-normalized strings and may give peculiar outputs in case of bidirectional strings. - -See also [`stri_reverse`](stri_reverse.md) for reversing the order of code points. - -## Value - -Returns a character vector. - -## Author(s) - -[Marek Gagolewski](https://www.gagolewski.com/) and other contributors - -## See Also - -The official online manual of stringi at - -Gagolewski M., stringi: Fast and portable character string processing in R, *Journal of Statistical Software* 103(2), 2022, 1-59, [doi:10.18637/jss.v103.i02](https://doi.org/10.18637/jss.v103.i02) - -Other random: [`stri_rand_lipsum()`](stri_rand_lipsum.md), [`stri_rand_strings()`](stri_rand_strings.md) - -## Examples - - - - -```r -stri_rand_shuffle(c('abcdefghi', '0123456789')) -``` - -``` -## [1] "cheidfbag" "5096873241" -``` - -```r -# you can do better than this with stri_rand_strings: -stri_rand_shuffle(rep(stri_paste(letters, collapse=''), 10)) -``` - -``` -## [1] "bjyxtszeufhpogcwdrvmaliqkn" "kafgjolxiqwmtpnhyucdbrzevs" -## [3] "vumbrtgqlpfhniwkxeazjdocsy" "irjhvgpqsobzayneumlfdkcxtw" -## [5] "yplrcekozfjnvmawxgqhtisbud" "afwijgkuxrqonshelmcvdpbyzt" -## [7] "flircxuthpsygadwkjvmnzqebo" "zqynmsjreatfhcloipvubdwgkx" -## [9] "kvyjzutiprsbclgfqonhmaedwx" "eqjtmzfuaidpkxbchygsrlownv" -``` diff --git a/.devel/sphinx/rapi/stri_rand_strings.md b/.devel/sphinx/rapi/stri_rand_strings.md deleted file mode 100644 index 1476334c..00000000 --- a/.devel/sphinx/rapi/stri_rand_strings.md +++ /dev/null @@ -1,91 +0,0 @@ -# stri_rand_strings: Generate Random Strings - -## Description - -Generates (pseudo)random strings of desired lengths. - -## Usage - -``` r -stri_rand_strings(n, length, pattern = "[A-Za-z0-9]") -``` - -## Arguments - -| | | -|-----------|--------------------------------------------------------------------------------------------------------------------------------| -| `n` | single integer, number of observations | -| `length` | integer vector, desired string lengths | -| `pattern` | character vector specifying character classes to draw elements from, see [stringi-search-charclass](about_search_charclass.md) | - -## Details - -Vectorized over `length` and `pattern`. If length of `length` or `pattern` is greater than `n`, then redundant elements are ignored. Otherwise, these vectors are recycled if necessary. - -This operation may result in non-Unicode-normalized strings and may give peculiar outputs for bidirectional strings. - -Sampling of code points from the set specified by `pattern` is always done with replacement and each code point appears with equal probability. - -## Value - -Returns a character vector. - -## Author(s) - -[Marek Gagolewski](https://www.gagolewski.com/) and other contributors - -## See Also - -The official online manual of stringi at - -Gagolewski M., stringi: Fast and portable character string processing in R, *Journal of Statistical Software* 103(2), 2022, 1-59, [doi:10.18637/jss.v103.i02](https://doi.org/10.18637/jss.v103.i02) - -Other random: [`stri_rand_lipsum()`](stri_rand_lipsum.md), [`stri_rand_shuffle()`](stri_rand_shuffle.md) - -## Examples - - - - -```r -stri_rand_strings(5, 10) # 5 strings of length 10 -``` - -``` -## [1] "HmPsw2WtYS" "xSgZ6tF2Kx" "tgdzehXaH9" "xtgn1TlDJE" "8PPM98ESGr" -``` - -```r -stri_rand_strings(5, sample(1:10, 5, replace=TRUE)) # 5 strings of random lengths -``` - -``` -## [1] "tNf5N" "HoRoonR" "kdi0T" "DNbL6F" "fPm6QztsA" -``` - -```r -stri_rand_strings(10, 5, '[\\p{script=latin}&\\p{Ll}]') # small letters from the Latin script -``` - -``` -## [1] "ŏặɹẽɧ" "ưꝵęᵬᶏ" "ṯɰᵽ𝼁ᵹ" "ꭔfflṻʬũ" "nəòwⱹ" "šḡ𝼁ṙʨ" "ắɧɝǧʌ" -## [8] "𝼙ųĕšữ" "ẋꭕổꜳᶖ" "ềꞹꝸ𝼕ᴒ" -``` - -```r -# generate n random passwords of length in [8, 14] -# consisting of at least one digit, small and big ASCII letter: -n <- 10 -stri_rand_shuffle(stri_paste( - stri_rand_strings(n, 1, '[0-9]'), - stri_rand_strings(n, 1, '[a-z]'), - stri_rand_strings(n, 1, '[A-Z]'), - stri_rand_strings(n, sample(5:11, 5, replace=TRUE), '[a-zA-Z0-9]') -)) -``` - -``` -## [1] "3hGsaJNqZTaGw" "wJGmtzJcuPS4" "k0MsQHEx9bOZeV" "FTAJ1Dgf2A" -## [5] "8LxJPujoHhc" "uNkX3Ygc2QThO" "O9oTfpCn3X2G" "aidxJ0jyFFDfOA" -## [9] "TKGrmAlP2W" "sffGLWZ7vKw" -``` diff --git a/.devel/sphinx/rapi/stri_rank.md b/.devel/sphinx/rapi/stri_rank.md deleted file mode 100644 index a1262e54..00000000 --- a/.devel/sphinx/rapi/stri_rank.md +++ /dev/null @@ -1,108 +0,0 @@ -# stri_rank: Ranking - -## Description - -This function ranks each string in a character vector according to a locale-dependent lexicographic order. It is a portable replacement for the base `xtfrm` function. - -## Usage - -``` r -stri_rank(str, ..., opts_collator = NULL) -``` - -## Arguments - -| | | -|-----------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------| -| `str` | a character vector | -| `...` | additional settings for `opts_collator` | -| `opts_collator` | a named list with ICU Collator\'s options, see [`stri_opts_collator`](stri_opts_collator.md), `NULL` for default collation options | - -## Details - -Missing values result in missing ranks and tied observations receive the same ranks (based on min). - -For more information on ICU\'s Collator and how to tune it up in stringi, refer to [`stri_opts_collator`](stri_opts_collator.md). - -## Value - -The result is a vector of ranks corresponding to each string in `str`. - -## Author(s) - -[Marek Gagolewski](https://www.gagolewski.com/) and other contributors - -## References - -*Collation* -- ICU User Guide, - -## See Also - -The official online manual of stringi at - -Gagolewski M., stringi: Fast and portable character string processing in R, *Journal of Statistical Software* 103(2), 2022, 1-59, [doi:10.18637/jss.v103.i02](https://doi.org/10.18637/jss.v103.i02) - -Other locale_sensitive: [`%s<%()`](+25s+3C+25.md), [`about_locale`](about_locale.md), [`about_search_boundaries`](about_search_boundaries.md), [`about_search_coll`](about_search_coll.md), [`stri_compare()`](stri_compare.md), [`stri_count_boundaries()`](stri_count_boundaries.md), [`stri_duplicated()`](stri_duplicated.md), [`stri_enc_detect2()`](stri_enc_detect2.md), [`stri_extract_all_boundaries()`](stri_extract_boundaries.md), [`stri_locate_all_boundaries()`](stri_locate_boundaries.md), [`stri_opts_collator()`](stri_opts_collator.md), [`stri_order()`](stri_order.md), [`stri_sort_key()`](stri_sort_key.md), [`stri_sort()`](stri_sort.md), [`stri_split_boundaries()`](stri_split_boundaries.md), [`stri_trans_tolower()`](stri_trans_casemap.md), [`stri_unique()`](stri_unique.md), [`stri_wrap()`](stri_wrap.md) - -## Examples - - - - -```r -stri_rank(c('hladny', 'chladny'), locale='pl_PL') -``` - -``` -## [1] 2 1 -``` - -```r -stri_rank(c('hladny', 'chladny'), locale='sk_SK') -``` - -``` -## [1] 1 2 -``` - -```r -stri_rank("a" %s+% c(1, 100, 2, 101, 11, 10)) # lexicographic order -``` - -``` -## [1] 1 3 6 4 5 2 -``` - -```r -stri_rank("a" %s+% c(1, 100, 2, 101, 11, 10), numeric=TRUE) # OK -``` - -``` -## [1] 1 5 2 6 4 3 -``` - -```r -stri_rank("a" %s+% c(0.25, 0.5, 1, -1, -2, -3), numeric=TRUE) # incorrect -``` - -``` -## [1] 5 4 6 1 2 3 -``` - -```r -# Ordering a data frame with respect to two criteria: -X <- data.frame(a=c("b", NA, "b", "b", NA, "a", "a", "c"), b=runif(8)) -X[order(stri_rank(X$a), X$b), ] -``` - -``` -## a b -## 6 a 0.0455565 -## 7 a 0.5281055 -## 1 b 0.2875775 -## 3 b 0.4089769 -## 4 b 0.8830174 -## 8 c 0.8924190 -## 2 0.7883051 -## 5 0.9404673 -``` diff --git a/.devel/sphinx/rapi/stri_read_lines.md b/.devel/sphinx/rapi/stri_read_lines.md deleted file mode 100644 index ec121e44..00000000 --- a/.devel/sphinx/rapi/stri_read_lines.md +++ /dev/null @@ -1,44 +0,0 @@ -# stri_read_lines: Read Text Lines from a Text File - -## Description - -Reads a text file in ins entirety, re-encodes it, and splits it into text lines. - -## Usage - -``` r -stri_read_lines(con, encoding = NULL, fname = con, fallback_encoding = NULL) -``` - -## Arguments - -| | | -|---------------------|---------------------------------------------------------------------------------| -| `con` | name of the output file or a connection object (opened in the binary mode) | -| `encoding` | single string; input encoding; `NULL` or `''` for the current default encoding. | -| `fname` | deprecated alias of `con` | -| `fallback_encoding` | deprecated argument, no longer used | - -## Details - -This aims to be a substitute for the [`readLines`](https://stat.ethz.ch/R-manual/R-devel/library/base/html/readLines.html) function, with the ability to re-encode the input file in a much more robust way, and split the text into lines with [`stri_split_lines1`](stri_split_lines.md) (which conforms with the Unicode guidelines for newline markers). - -The function calls [`stri_read_raw`](stri_read_raw.md), [`stri_encode`](stri_encode.md), and [`stri_split_lines1`](stri_split_lines.md), in this order. - -Because of the way this function is currently implemented, maximal file size cannot exceed \~0.67 GB. - -## Value - -Returns a character vector, each text line is a separate string. The output is always marked as UTF-8. - -## Author(s) - -[Marek Gagolewski](https://www.gagolewski.com/) and other contributors - -## See Also - -The official online manual of stringi at - -Gagolewski M., stringi: Fast and portable character string processing in R, *Journal of Statistical Software* 103(2), 2022, 1-59, [doi:10.18637/jss.v103.i02](https://doi.org/10.18637/jss.v103.i02) - -Other files: [`stri_read_raw()`](stri_read_raw.md), [`stri_write_lines()`](stri_write_lines.md) diff --git a/.devel/sphinx/rapi/stri_read_raw.md b/.devel/sphinx/rapi/stri_read_raw.md deleted file mode 100644 index 77f9d3c0..00000000 --- a/.devel/sphinx/rapi/stri_read_raw.md +++ /dev/null @@ -1,38 +0,0 @@ -# stri_read_raw: Read Text File as Raw - -## Description - -Reads a text file as-is, with no conversion or text line splitting. - -## Usage - -``` r -stri_read_raw(con, fname = con) -``` - -## Arguments - -| | | -|---------|----------------------------------------------------------------------------| -| `con` | name of the output file or a connection object (opened in the binary mode) | -| `fname` | deprecated alias of `con` | - -## Details - -Once a text file is read into memory, encoding detection (see [`stri_enc_detect`](stri_enc_detect.md)), conversion (see [`stri_encode`](stri_encode.md)), and/or splitting of text into lines (see [`stri_split_lines1`](stri_split_lines.md)) can be performed. - -## Value - -Returns a vector of type `raw`. - -## Author(s) - -[Marek Gagolewski](https://www.gagolewski.com/) and other contributors - -## See Also - -The official online manual of stringi at - -Gagolewski M., stringi: Fast and portable character string processing in R, *Journal of Statistical Software* 103(2), 2022, 1-59, [doi:10.18637/jss.v103.i02](https://doi.org/10.18637/jss.v103.i02) - -Other files: [`stri_read_lines()`](stri_read_lines.md), [`stri_write_lines()`](stri_write_lines.md) diff --git a/.devel/sphinx/rapi/stri_remove_empty.md b/.devel/sphinx/rapi/stri_remove_empty.md deleted file mode 100644 index f0d36144..00000000 --- a/.devel/sphinx/rapi/stri_remove_empty.md +++ /dev/null @@ -1,85 +0,0 @@ -# stri_remove_empty: Remove All Empty Strings from a Character Vector - -## Description - -`stri_remove_empty` (alias `stri_omit_empty`) removes all empty strings from a character vector, and, if `na_empty` is `TRUE`, also gets rid of all missing values. - -`stri_remove_empty_na` (alias `stri_omit_empty_na`) removes both empty strings and missing values. - -`stri_remove_na` (alias `stri_omit_na`) returns a version of `x` with missing values removed. - -## Usage - -``` r -stri_remove_empty(x, na_empty = FALSE) - -stri_omit_empty(x, na_empty = FALSE) - -stri_remove_empty_na(x) - -stri_omit_empty_na(x) - -stri_remove_na(x) - -stri_omit_na(x) -``` - -## Arguments - -| | | -|------------|----------------------------------------------------| -| `x` | a character vector | -| `na_empty` | should missing values be treated as empty strings? | - -## Value - -Returns a character vector. - -## Author(s) - -[Marek Gagolewski](https://www.gagolewski.com/) and other contributors - -## See Also - -The official online manual of stringi at - -Gagolewski M., stringi: Fast and portable character string processing in R, *Journal of Statistical Software* 103(2), 2022, 1-59, [doi:10.18637/jss.v103.i02](https://doi.org/10.18637/jss.v103.i02) - -Other utils: [`stri_list2matrix()`](stri_list2matrix.md), [`stri_na2empty()`](stri_na2empty.md), [`stri_replace_na()`](stri_replace_na.md) - -## Examples - - - - -```r -stri_remove_empty(stri_na2empty(c('a', NA, '', 'b'))) -``` - -``` -## [1] "a" "b" -``` - -```r -stri_remove_empty(c('a', NA, '', 'b')) -``` - -``` -## [1] "a" NA "b" -``` - -```r -stri_remove_empty(c('a', NA, '', 'b'), TRUE) -``` - -``` -## [1] "a" "b" -``` - -```r -stri_omit_empty_na(c('a', NA, '', 'b')) -``` - -``` -## [1] "a" "b" -``` diff --git a/.devel/sphinx/rapi/stri_replace.md b/.devel/sphinx/rapi/stri_replace.md deleted file mode 100644 index ed55c2ab..00000000 --- a/.devel/sphinx/rapi/stri_replace.md +++ /dev/null @@ -1,296 +0,0 @@ -# stri_replace: Replace Pattern Occurrences - -## Description - -These functions replace, with the given replacement string, every/first/last substring of the input that matches the specified `pattern`. - -## Usage - -``` r -stri_replace_all(str, replacement, ..., regex, fixed, coll, charclass) - -stri_replace_first(str, replacement, ..., regex, fixed, coll, charclass) - -stri_replace_last(str, replacement, ..., regex, fixed, coll, charclass) - -stri_replace( - str, - replacement, - ..., - regex, - fixed, - coll, - charclass, - mode = c("first", "all", "last") -) - -stri_replace_all_charclass( - str, - pattern, - replacement, - merge = FALSE, - vectorize_all = TRUE, - vectorise_all = vectorize_all -) - -stri_replace_first_charclass(str, pattern, replacement) - -stri_replace_last_charclass(str, pattern, replacement) - -stri_replace_all_coll( - str, - pattern, - replacement, - vectorize_all = TRUE, - vectorise_all = vectorize_all, - ..., - opts_collator = NULL -) - -stri_replace_first_coll(str, pattern, replacement, ..., opts_collator = NULL) - -stri_replace_last_coll(str, pattern, replacement, ..., opts_collator = NULL) - -stri_replace_all_fixed( - str, - pattern, - replacement, - vectorize_all = TRUE, - vectorise_all = vectorize_all, - ..., - opts_fixed = NULL -) - -stri_replace_first_fixed(str, pattern, replacement, ..., opts_fixed = NULL) - -stri_replace_last_fixed(str, pattern, replacement, ..., opts_fixed = NULL) - -stri_replace_all_regex( - str, - pattern, - replacement, - vectorize_all = TRUE, - vectorise_all = vectorize_all, - ..., - opts_regex = NULL -) - -stri_replace_first_regex(str, pattern, replacement, ..., opts_regex = NULL) - -stri_replace_last_regex(str, pattern, replacement, ..., opts_regex = NULL) -``` - -## Arguments - -| | | -|--------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| `str` | character vector; strings to search in | -| `replacement` | character vector with replacements for matched patterns | -| `...` | supplementary arguments passed to the underlying functions, including additional settings for `opts_collator`, `opts_regex`, `opts_fixed`, and so on | -| `mode` | single string; one of: `'first'` (the default), `'all'`, `'last'` | -| `pattern`, `regex`, `fixed`, `coll`, `charclass` | character vector; search patterns; for more details refer to [stringi-search](about_search.md) | -| `merge` | single logical value; should consecutive matches be merged into one string; `stri_replace_all_charclass` only | -| `vectorize_all` | single logical value; should each occurrence of a pattern in every string be replaced by a corresponding replacement string?; `stri_replace_all_*` only | -| `vectorise_all` | alias of `vectorize_all` | -| `opts_collator`, `opts_fixed`, `opts_regex` | a named list used to tune up the search engine\'s settings; see [`stri_opts_collator`](stri_opts_collator.md), [`stri_opts_fixed`](stri_opts_fixed.md), and [`stri_opts_regex`](stri_opts_regex.md), respectively; `NULL` for the defaults | - -## Details - -By default, all the functions are vectorized over `str`, `pattern`, `replacement` (with recycling of the elements in the shorter vector if necessary). Input that is not part of any match is left unchanged; each match is replaced in the result by the replacement string. - -However, for `stri_replace_all*`, if `vectorize_all` is `FALSE`, then each substring matching any of the supplied `pattern`s is replaced by a corresponding `replacement` string. In such a case, the vectorization is over `str`, and - independently - over `pattern` and `replacement`. In other words, this is equivalent to something like `for (i in 1:npatterns) str <- stri_replace_all(str, pattern[i], replacement[i]`. Note that you must set `length(pattern) >= length(replacement)`. - -In case of `stri_replace_*_regex`, the replacement string may contain references to capture groups (in round parentheses). References are of the form `$n`, where `n` is the number of the capture group (`$1` denotes the first group). For the literal `$`, escape it with a backslash. Moreover, `${name}` are used for named capture groups. - -Note that `stri_replace_last_regex` searches from start to end, but skips overlapping matches, see the example below. - -`stri_replace`, `stri_replace_all`, `stri_replace_first`, and `stri_replace_last` are convenience functions; they just call `stri_replace_*_*` variants, depending on the arguments used. - -If you wish to remove white-spaces from the start or end of a string, see [`stri_trim`](stri_trim.md). - -## Value - -All the functions return a character vector. - -## Author(s) - -[Marek Gagolewski](https://www.gagolewski.com/) and other contributors - -## See Also - -The official online manual of stringi at - -Gagolewski M., stringi: Fast and portable character string processing in R, *Journal of Statistical Software* 103(2), 2022, 1-59, [doi:10.18637/jss.v103.i02](https://doi.org/10.18637/jss.v103.i02) - -Other search_replace: [`about_search`](about_search.md), [`stri_replace_rstr()`](stri_replace_rstr.md), [`stri_trim_both()`](stri_trim.md) - -## Examples - - - - -```r -stri_replace_all_charclass('aaaa', '[a]', 'b', merge=c(TRUE, FALSE)) -``` - -``` -## Warning in stri_replace_all_charclass("aaaa", "[a]", "b", merge = c(TRUE, : -## argument `merge` should be a single logical value; only the first element is -## used -``` - -``` -## [1] "b" -``` - -```r -stri_replace_all_charclass('a\nb\tc d', '\\p{WHITE_SPACE}', ' ') -``` - -``` -## [1] "a b c d" -``` - -```r -stri_replace_all_charclass('a\nb\tc d', '\\p{WHITE_SPACE}', ' ', merge=TRUE) -``` - -``` -## [1] "a b c d" -``` - -```r -s <- 'Lorem ipsum dolor sit amet, consectetur adipisicing elit.' -stri_replace_all_fixed(s, ' ', '#') -``` - -``` -## [1] "Lorem#ipsum#dolor#sit#amet,#consectetur#adipisicing#elit." -``` - -```r -stri_replace_all_fixed(s, 'o', '0') -``` - -``` -## [1] "L0rem ipsum d0l0r sit amet, c0nsectetur adipisicing elit." -``` - -```r -stri_replace_all_fixed(c('1', 'NULL', '3'), 'NULL', NA) -``` - -``` -## [1] "1" NA "3" -``` - -```r -stri_replace_all_regex(s, ' .*? ', '#') -``` - -``` -## [1] "Lorem#dolor#amet,#adipisicing elit." -``` - -```r -stri_replace_all_regex(s, '(el|s)it', '1234') -``` - -``` -## [1] "Lorem ipsum dolor 1234 amet, consectetur adipisicing 1234." -``` - -```r -stri_replace_all_regex('abaca', 'a', c('!', '*')) -``` - -``` -## [1] "!b!c!" "*b*c*" -``` - -```r -stri_replace_all_regex('123|456|789', '(\\p{N}).(\\p{N})', '$2-$1') -``` - -``` -## [1] "3-1|6-4|9-7" -``` - -```r -stri_replace_all_regex(c('stringi R', 'REXAMINE', '123'), '( R|R.)', ' r ') -``` - -``` -## [1] "stringi r " " r XAMINE" "123" -``` - -```r -# named capture groups are available since ICU 55 -## Not run: -stri_replace_all_regex('words 123 and numbers 456', - '(?[0-9]+)', '!${numbers}!') -``` - -``` -## [1] "words !123! and numbers !456!" -``` - -```r -## End(Not run) - -# Compare the results: -stri_replace_all_fixed('The quick brown fox jumped over the lazy dog.', - c('quick', 'brown', 'fox'), c('slow', 'black', 'bear'), vectorize_all=TRUE) -``` - -``` -## [1] "The slow brown fox jumped over the lazy dog." -## [2] "The quick black fox jumped over the lazy dog." -## [3] "The quick brown bear jumped over the lazy dog." -``` - -```r -stri_replace_all_fixed('The quick brown fox jumped over the lazy dog.', - c('quick', 'brown', 'fox'), c('slow', 'black', 'bear'), vectorize_all=FALSE) -``` - -``` -## [1] "The slow black bear jumped over the lazy dog." -``` - -```r -# Compare the results: -stri_replace_all_fixed('The quicker brown fox jumped over the lazy dog.', - c('quick', 'brown', 'fox'), c('slow', 'black', 'bear'), vectorize_all=FALSE) -``` - -``` -## [1] "The slower black bear jumped over the lazy dog." -``` - -```r -stri_replace_all_regex('The quicker brown fox jumped over the lazy dog.', - '\\b'%s+%c('quick', 'brown', 'fox')%s+%'\\b', c('slow', 'black', 'bear'), vectorize_all=FALSE) -``` - -``` -## [1] "The quicker black bear jumped over the lazy dog." -``` - -```r -# Searching for the last occurrence: -# Note the difference - regex searches left to right, with no overlaps. -stri_replace_last_fixed("agAGA", "aga", "*", case_insensitive=TRUE) -``` - -``` -## [1] "ag*" -``` - -```r -stri_replace_last_regex("agAGA", "aga", "*", case_insensitive=TRUE) -``` - -``` -## [1] "*GA" -``` diff --git a/.devel/sphinx/rapi/stri_replace_na.md b/.devel/sphinx/rapi/stri_replace_na.md deleted file mode 100644 index a385396b..00000000 --- a/.devel/sphinx/rapi/stri_replace_na.md +++ /dev/null @@ -1,68 +0,0 @@ -# stri_replace_na: Replace Missing Values in a Character Vector - -## Description - -This function gives a convenient way to replace each missing (`NA`) value with a given string. - -## Usage - -``` r -stri_replace_na(str, replacement = "NA") -``` - -## Arguments - -| | | -|---------------|--------------------------------------------| -| `str` | character vector or an object coercible to | -| `replacement` | single string | - -## Details - -This function is roughly equivalent to `str2 <- stri_enc_toutf8(str); str2[is.na(str2)] <- stri_enc_toutf8(replacement); str2`. It may be used, e.g., wherever the \'plain R\' `NA` handling is desired, see Examples. - -## Value - -Returns a character vector. - -## Author(s) - -[Marek Gagolewski](https://www.gagolewski.com/) and other contributors - -## See Also - -The official online manual of stringi at - -Gagolewski M., stringi: Fast and portable character string processing in R, *Journal of Statistical Software* 103(2), 2022, 1-59, [doi:10.18637/jss.v103.i02](https://doi.org/10.18637/jss.v103.i02) - -Other utils: [`stri_list2matrix()`](stri_list2matrix.md), [`stri_na2empty()`](stri_na2empty.md), [`stri_remove_empty()`](stri_remove_empty.md) - -## Examples - - - - -```r -x <- c('test', NA) -stri_paste(x, 1:2) # 'test1' NA -``` - -``` -## [1] "test1" NA -``` - -```r -paste(x, 1:2) # 'test 1' 'NA 2' -``` - -``` -## [1] "test 1" "NA 2" -``` - -```r -stri_paste(stri_replace_na(x), 1:2, sep=' ') # 'test 1' 'NA 2' -``` - -``` -## [1] "test 1" "NA 2" -``` diff --git a/.devel/sphinx/rapi/stri_replace_rstr.md b/.devel/sphinx/rapi/stri_replace_rstr.md deleted file mode 100644 index fcc11050..00000000 --- a/.devel/sphinx/rapi/stri_replace_rstr.md +++ /dev/null @@ -1,33 +0,0 @@ -# stri_replace_rstr: Convert gsub-Style Replacement Strings - -## Description - -Converts a [`gsub`](https://stat.ethz.ch/R-manual/R-devel/library/base/help/gsub.html)-style replacement strings to those which can be used in [`stri_replace`](stri_replace.md). In particular, `$` becomes `\$` and `\1` becomes `$1`. - -## Usage - -``` r -stri_replace_rstr(x) -``` - -## Arguments - -| | | -|-----|------------------| -| `x` | character vector | - -## Value - -Returns a character vector. - -## Author(s) - -[Marek Gagolewski](https://www.gagolewski.com/) and other contributors - -## See Also - -The official online manual of stringi at - -Gagolewski M., stringi: Fast and portable character string processing in R, *Journal of Statistical Software* 103(2), 2022, 1-59, [doi:10.18637/jss.v103.i02](https://doi.org/10.18637/jss.v103.i02) - -Other search_replace: [`about_search`](about_search.md), [`stri_replace_all()`](stri_replace.md), [`stri_trim_both()`](stri_trim.md) diff --git a/.devel/sphinx/rapi/stri_reverse.md b/.devel/sphinx/rapi/stri_reverse.md deleted file mode 100644 index b808fde0..00000000 --- a/.devel/sphinx/rapi/stri_reverse.md +++ /dev/null @@ -1,66 +0,0 @@ -# stri_reverse: Reverse Each String - -## Description - -Reverses the order of the code points in every string. - -## Usage - -``` r -stri_reverse(str) -``` - -## Arguments - -| | | -|-------|------------------| -| `str` | character vector | - -## Details - -Note that this operation may result in non-Unicode-normalized strings and may give peculiar outputs for bidirectional strings. - -See also [`stri_rand_shuffle`](stri_rand_shuffle.md) for a random permutation of code points. - -## Value - -Returns a character vector. - -## Author(s) - -[Marek Gagolewski](https://www.gagolewski.com/) and other contributors - -## See Also - -The official online manual of stringi at - -Gagolewski M., stringi: Fast and portable character string processing in R, *Journal of Statistical Software* 103(2), 2022, 1-59, [doi:10.18637/jss.v103.i02](https://doi.org/10.18637/jss.v103.i02) - -## Examples - - - - -```r -stri_reverse(c('123', 'abc d e f')) -``` - -``` -## [1] "321" "f e d cba" -``` - -```r -stri_reverse('ZXY (\u0105\u0104123$^).') -``` - -``` -## [1] ".)^$321Ąą( YXZ" -``` - -```r -stri_reverse(stri_trans_nfd('\u0105')) == stri_trans_nfd('\u0105') # A, ogonek -> agonek, A -``` - -``` -## [1] FALSE -``` diff --git a/.devel/sphinx/rapi/stri_sort.md b/.devel/sphinx/rapi/stri_sort.md deleted file mode 100644 index 19ed2596..00000000 --- a/.devel/sphinx/rapi/stri_sort.md +++ /dev/null @@ -1,103 +0,0 @@ -# stri_sort: String Sorting - -## Description - -This function sorts a character vector according to a locale-dependent lexicographic order. - -## Usage - -``` r -stri_sort(str, decreasing = FALSE, na_last = NA, ..., opts_collator = NULL) -``` - -## Arguments - -| | | -|-----------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| `str` | a character vector | -| `decreasing` | a single logical value; should the sort order be nondecreasing (`FALSE`, default, i.e., weakly increasing) or nonincreasing (`TRUE`)? | -| `na_last` | a single logical value; controls the treatment of `NA`s in `str`. If `TRUE`, then missing values in `str` are put at the end; if `FALSE`, they are put at the beginning; if `NA`, then they are removed from the output | -| `...` | additional settings for `opts_collator` | -| `opts_collator` | a named list with ICU Collator\'s options, see [`stri_opts_collator`](stri_opts_collator.md), `NULL` for default collation options | - -## Details - -For more information on ICU\'s Collator and how to tune it up in stringi, refer to [`stri_opts_collator`](stri_opts_collator.md). - -As usual in stringi, non-character inputs are coerced to strings, see an example below for a somewhat non-intuitive behavior of lexicographic sorting on numeric inputs. - -This function uses a stable sort algorithm (STL\'s `stable_sort`), which performs up to $N*log^2(N)$ element comparisons, where $N$ is the length of `str`. - -## Value - -The result is a sorted version of `str`, i.e., a character vector. - -## Author(s) - -[Marek Gagolewski](https://www.gagolewski.com/) and other contributors - -## References - -*Collation* - ICU User Guide, - -## See Also - -The official online manual of stringi at - -Gagolewski M., stringi: Fast and portable character string processing in R, *Journal of Statistical Software* 103(2), 2022, 1-59, [doi:10.18637/jss.v103.i02](https://doi.org/10.18637/jss.v103.i02) - -Other locale_sensitive: [`%s<%()`](+25s+3C+25.md), [`about_locale`](about_locale.md), [`about_search_boundaries`](about_search_boundaries.md), [`about_search_coll`](about_search_coll.md), [`stri_compare()`](stri_compare.md), [`stri_count_boundaries()`](stri_count_boundaries.md), [`stri_duplicated()`](stri_duplicated.md), [`stri_enc_detect2()`](stri_enc_detect2.md), [`stri_extract_all_boundaries()`](stri_extract_boundaries.md), [`stri_locate_all_boundaries()`](stri_locate_boundaries.md), [`stri_opts_collator()`](stri_opts_collator.md), [`stri_order()`](stri_order.md), [`stri_rank()`](stri_rank.md), [`stri_sort_key()`](stri_sort_key.md), [`stri_split_boundaries()`](stri_split_boundaries.md), [`stri_trans_tolower()`](stri_trans_casemap.md), [`stri_unique()`](stri_unique.md), [`stri_wrap()`](stri_wrap.md) - -## Examples - - - - -```r -stri_sort(c('hladny', 'chladny'), locale='pl_PL') -``` - -``` -## [1] "chladny" "hladny" -``` - -```r -stri_sort(c('hladny', 'chladny'), locale='sk_SK') -``` - -``` -## [1] "hladny" "chladny" -``` - -```r -stri_sort(sample(LETTERS)) -``` - -``` -## [1] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J" "K" "L" "M" "N" "O" "P" "Q" "R" "S" -## [20] "T" "U" "V" "W" "X" "Y" "Z" -``` - -```r -stri_sort(c(1, 100, 2, 101, 11, 10)) # lexicographic order -``` - -``` -## [1] "1" "10" "100" "101" "11" "2" -``` - -```r -stri_sort(c(1, 100, 2, 101, 11, 10), numeric=TRUE) # OK for integers -``` - -``` -## [1] "1" "2" "10" "11" "100" "101" -``` - -```r -stri_sort(c(0.25, 0.5, 1, -1, -2, -3), numeric=TRUE) # incorrect -``` - -``` -## [1] "-1" "-2" "-3" "0.5" "0.25" "1" -``` diff --git a/.devel/sphinx/rapi/stri_sort_key.md b/.devel/sphinx/rapi/stri_sort_key.md deleted file mode 100644 index d4ccb531..00000000 --- a/.devel/sphinx/rapi/stri_sort_key.md +++ /dev/null @@ -1,66 +0,0 @@ -# stri_sort_key: Sort Keys - -## Description - -This function computes a locale-dependent sort key, which is an alternative character representation of the string that, when ordered in the C locale (which orders using the underlying bytes directly), will give an equivalent ordering to the original string. It is useful for enhancing algorithms that sort only in the C locale (e.g., the `strcmp` function in libc) with the ability to be locale-aware. - -## Usage - -``` r -stri_sort_key(str, ..., opts_collator = NULL) -``` - -## Arguments - -| | | -|-----------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------| -| `str` | a character vector | -| `...` | additional settings for `opts_collator` | -| `opts_collator` | a named list with ICU Collator\'s options, see [`stri_opts_collator`](stri_opts_collator.md), `NULL` for default collation options | - -## Details - -For more information on ICU\'s Collator and how to tune it up in stringi, refer to [`stri_opts_collator`](stri_opts_collator.md). - -See also [`stri_rank`](stri_rank.md) for ranking strings with a single character vector, i.e., generating relative sort keys. - -## Value - -The result is a character vector with the same length as `str` that contains the sort keys. The output is marked as `bytes`-encoded. - -## Author(s) - -[Marek Gagolewski](https://www.gagolewski.com/) and other contributors - -## References - -*Collation* - ICU User Guide, - -## See Also - -The official online manual of stringi at - -Gagolewski M., stringi: Fast and portable character string processing in R, *Journal of Statistical Software* 103(2), 2022, 1-59, [doi:10.18637/jss.v103.i02](https://doi.org/10.18637/jss.v103.i02) - -Other locale_sensitive: [`%s<%()`](+25s+3C+25.md), [`about_locale`](about_locale.md), [`about_search_boundaries`](about_search_boundaries.md), [`about_search_coll`](about_search_coll.md), [`stri_compare()`](stri_compare.md), [`stri_count_boundaries()`](stri_count_boundaries.md), [`stri_duplicated()`](stri_duplicated.md), [`stri_enc_detect2()`](stri_enc_detect2.md), [`stri_extract_all_boundaries()`](stri_extract_boundaries.md), [`stri_locate_all_boundaries()`](stri_locate_boundaries.md), [`stri_opts_collator()`](stri_opts_collator.md), [`stri_order()`](stri_order.md), [`stri_rank()`](stri_rank.md), [`stri_sort()`](stri_sort.md), [`stri_split_boundaries()`](stri_split_boundaries.md), [`stri_trans_tolower()`](stri_trans_casemap.md), [`stri_unique()`](stri_unique.md), [`stri_wrap()`](stri_wrap.md) - -## Examples - - - - -```r -stri_sort_key(c('hladny', 'chladny'), locale='pl_PL') -``` - -``` -## [1] "8@*0DZ\001\n\001\n" ".8@*0DZ\001\v\001\v" -``` - -```r -stri_sort_key(c('hladny', 'chladny'), locale='sk_SK') -``` - -``` -## [1] "8@*0DZ\001\n\001\n" "9\002@*0DZ\001\n\001\n" -``` diff --git a/.devel/sphinx/rapi/stri_split.md b/.devel/sphinx/rapi/stri_split.md deleted file mode 100644 index 479350b8..00000000 --- a/.devel/sphinx/rapi/stri_split.md +++ /dev/null @@ -1,331 +0,0 @@ -# stri_split: Split a String By Pattern Matches - -## Description - -These functions split each element in `str` into substrings. `pattern` defines the delimiters that separate the inputs into tokens. The input data between the matches become the fields themselves. - -## Usage - -``` r -stri_split(str, ..., regex, fixed, coll, charclass) - -stri_split_fixed( - str, - pattern, - n = -1L, - omit_empty = FALSE, - tokens_only = FALSE, - simplify = FALSE, - ..., - opts_fixed = NULL -) - -stri_split_regex( - str, - pattern, - n = -1L, - omit_empty = FALSE, - tokens_only = FALSE, - simplify = FALSE, - ..., - opts_regex = NULL -) - -stri_split_coll( - str, - pattern, - n = -1L, - omit_empty = FALSE, - tokens_only = FALSE, - simplify = FALSE, - ..., - opts_collator = NULL -) - -stri_split_charclass( - str, - pattern, - n = -1L, - omit_empty = FALSE, - tokens_only = FALSE, - simplify = FALSE -) -``` - -## Arguments - -| | | -|--------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| `str` | character vector; strings to search in | -| `...` | supplementary arguments passed to the underlying functions, including additional settings for `opts_collator`, `opts_regex`, `opts_fixed`, and so on | -| `pattern`, `regex`, `fixed`, `coll`, `charclass` | character vector; search patterns; for more details refer to [stringi-search](about_search.md) | -| `n` | integer vector, maximal number of strings to return, and, at the same time, maximal number of text boundaries to look for | -| `omit_empty` | logical vector; determines whether empty tokens should be removed from the result (`TRUE` or `FALSE`) or replaced with `NA`s (`NA`) | -| `tokens_only` | single logical value; may affect the result if `n` is positive, see Details | -| `simplify` | single logical value; if `TRUE` or `NA`, then a character matrix is returned; otherwise (the default), a list of character vectors is given, see Value | -| `opts_collator`, `opts_fixed`, `opts_regex` | a named list used to tune up the search engine\'s settings; see [`stri_opts_collator`](stri_opts_collator.md), [`stri_opts_fixed`](stri_opts_fixed.md), and [`stri_opts_regex`](stri_opts_regex.md), respectively; `NULL` for the defaults | - -## Details - -Vectorized over `str`, `pattern`, `n`, and `omit_empty` (with recycling of the elements in the shorter vector if necessary). - -If `n` is negative, then all pieces are extracted. Otherwise, if `tokens_only` is `FALSE` (which is the default), then `n-1` tokens are extracted (if possible) and the `n`-th string gives the remainder (see Examples). On the other hand, if `tokens_only` is `TRUE`, then only full tokens (up to `n` pieces) are extracted. - -`omit_empty` is applied during the split process: if it is set to `TRUE`, then tokens of zero length are ignored. Thus, empty strings will never appear in the resulting vector. On the other hand, if `omit_empty` is `NA`, then empty tokens are substituted with missing strings. - -Empty search patterns are not supported. If you wish to split a string into individual characters, use, e.g., [`stri_split_boundaries(str, type='character')`](stri_split_boundaries.md) for THE Unicode way. - -`stri_split` is a convenience function. It calls either `stri_split_regex`, `stri_split_fixed`, `stri_split_coll`, or `stri_split_charclass`, depending on the argument used. - -## Value - -If `simplify=FALSE` (the default), then the functions return a list of character vectors. - -Otherwise, [`stri_list2matrix`](stri_list2matrix.md) with `byrow=TRUE` and `n_min=n` arguments is called on the resulting object. In such a case, a character matrix with an appropriate number of rows (according to the length of `str`, `pattern`, etc.) is returned. Note that [`stri_list2matrix`](stri_list2matrix.md)\'s `fill` argument is set to an empty string and `NA`, for `simplify` equal to `TRUE` and `NA`, respectively. - -## Author(s) - -[Marek Gagolewski](https://www.gagolewski.com/) and other contributors - -## See Also - -The official online manual of stringi at - -Gagolewski M., stringi: Fast and portable character string processing in R, *Journal of Statistical Software* 103(2), 2022, 1-59, [doi:10.18637/jss.v103.i02](https://doi.org/10.18637/jss.v103.i02) - -Other search_split: [`about_search`](about_search.md), [`stri_split_boundaries()`](stri_split_boundaries.md), [`stri_split_lines()`](stri_split_lines.md) - -## Examples - - - - -```r -stri_split_fixed('a_b_c_d', '_') -``` - -``` -## [[1]] -## [1] "a" "b" "c" "d" -``` - -```r -stri_split_fixed('a_b_c__d', '_') -``` - -``` -## [[1]] -## [1] "a" "b" "c" "" "d" -``` - -```r -stri_split_fixed('a_b_c__d', '_', omit_empty=TRUE) -``` - -``` -## [[1]] -## [1] "a" "b" "c" "d" -``` - -```r -stri_split_fixed('a_b_c__d', '_', n=2, tokens_only=FALSE) # 'a' & remainder -``` - -``` -## [[1]] -## [1] "a" "b_c__d" -``` - -```r -stri_split_fixed('a_b_c__d', '_', n=2, tokens_only=TRUE) # 'a' & 'b' only -``` - -``` -## [[1]] -## [1] "a" "b" -``` - -```r -stri_split_fixed('a_b_c__d', '_', n=4, omit_empty=TRUE, tokens_only=TRUE) -``` - -``` -## [[1]] -## [1] "a" "b" "c" "d" -``` - -```r -stri_split_fixed('a_b_c__d', '_', n=4, omit_empty=FALSE, tokens_only=TRUE) -``` - -``` -## [[1]] -## [1] "a" "b" "c" "" -``` - -```r -stri_split_fixed('a_b_c__d', '_', omit_empty=NA) -``` - -``` -## [[1]] -## [1] "a" "b" "c" NA "d" -``` - -```r -stri_split_fixed(c('ab_c', 'd_ef_g', 'h', ''), '_', n=1, tokens_only=TRUE, omit_empty=TRUE) -``` - -``` -## [[1]] -## [1] "ab" -## -## [[2]] -## [1] "d" -## -## [[3]] -## [1] "h" -## -## [[4]] -## character(0) -``` - -```r -stri_split_fixed(c('ab_c', 'd_ef_g', 'h', ''), '_', n=2, tokens_only=TRUE, omit_empty=TRUE) -``` - -``` -## [[1]] -## [1] "ab" "c" -## -## [[2]] -## [1] "d" "ef" -## -## [[3]] -## [1] "h" -## -## [[4]] -## character(0) -``` - -```r -stri_split_fixed(c('ab_c', 'd_ef_g', 'h', ''), '_', n=3, tokens_only=TRUE, omit_empty=TRUE) -``` - -``` -## [[1]] -## [1] "ab" "c" -## -## [[2]] -## [1] "d" "ef" "g" -## -## [[3]] -## [1] "h" -## -## [[4]] -## character(0) -``` - -```r -stri_list2matrix(stri_split_fixed(c('ab,c', 'd,ef,g', ',h', ''), ',', omit_empty=TRUE)) -``` - -``` -## [,1] [,2] [,3] [,4] -## [1,] "ab" "d" "h" NA -## [2,] "c" "ef" NA NA -## [3,] NA "g" NA NA -``` - -```r -stri_split_fixed(c('ab,c', 'd,ef,g', ',h', ''), ',', omit_empty=FALSE, simplify=TRUE) -``` - -``` -## [,1] [,2] [,3] -## [1,] "ab" "c" "" -## [2,] "d" "ef" "g" -## [3,] "" "h" "" -## [4,] "" "" "" -``` - -```r -stri_split_fixed(c('ab,c', 'd,ef,g', ',h', ''), ',', omit_empty=NA, simplify=TRUE) -``` - -``` -## [,1] [,2] [,3] -## [1,] "ab" "c" "" -## [2,] "d" "ef" "g" -## [3,] NA "h" "" -## [4,] NA "" "" -``` - -```r -stri_split_fixed(c('ab,c', 'd,ef,g', ',h', ''), ',', omit_empty=TRUE, simplify=TRUE) -``` - -``` -## [,1] [,2] [,3] -## [1,] "ab" "c" "" -## [2,] "d" "ef" "g" -## [3,] "h" "" "" -## [4,] "" "" "" -``` - -```r -stri_split_fixed(c('ab,c', 'd,ef,g', ',h', ''), ',', omit_empty=NA, simplify=NA) -``` - -``` -## [,1] [,2] [,3] -## [1,] "ab" "c" NA -## [2,] "d" "ef" "g" -## [3,] NA "h" NA -## [4,] NA NA NA -``` - -```r -stri_split_regex(c('ab,c', 'd,ef , g', ', h', ''), - '\\p{WHITE_SPACE}*,\\p{WHITE_SPACE}*', omit_empty=NA, simplify=TRUE) -``` - -``` -## [,1] [,2] [,3] -## [1,] "ab" "c" "" -## [2,] "d" "ef" "g" -## [3,] NA "h" "" -## [4,] NA "" "" -``` - -```r -stri_split_charclass('Lorem ipsum dolor sit amet', '\\p{WHITE_SPACE}') -``` - -``` -## [[1]] -## [1] "Lorem" "ipsum" "dolor" "sit" "amet" -``` - -```r -stri_split_charclass(' Lorem ipsum dolor', '\\p{WHITE_SPACE}', n=3, - omit_empty=c(FALSE, TRUE)) -``` - -``` -## [[1]] -## [1] "" "Lorem" " ipsum dolor" -## -## [[2]] -## [1] "Lorem" "ipsum" "dolor" -``` - -```r -stri_split_regex('Lorem ipsum dolor sit amet', - '\\p{Z}+') # see also stri_split_charclass -``` - -``` -## [[1]] -## [1] "Lorem" "ipsum" "dolor" "sit" "amet" -``` diff --git a/.devel/sphinx/rapi/stri_split_boundaries.md b/.devel/sphinx/rapi/stri_split_boundaries.md deleted file mode 100644 index 6f5609e8..00000000 --- a/.devel/sphinx/rapi/stri_split_boundaries.md +++ /dev/null @@ -1,172 +0,0 @@ -# stri_split_boundaries: Split a String at Text Boundaries - -## Description - -This function locates text boundaries (like character, word, line, or sentence boundaries) and splits strings at the indicated positions. - -## Usage - -``` r -stri_split_boundaries( - str, - n = -1L, - tokens_only = FALSE, - simplify = FALSE, - ..., - opts_brkiter = NULL -) -``` - -## Arguments - -| | | -|----------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| `str` | character vector or an object coercible to | -| `n` | integer vector, maximal number of strings to return | -| `tokens_only` | single logical value; may affect the result if `n` is positive, see Details | -| `simplify` | single logical value; if `TRUE` or `NA`, then a character matrix is returned; otherwise (the default), a list of character vectors is given, see Value | -| `...` | additional settings for `opts_brkiter` | -| `opts_brkiter` | a named list with ICU BreakIterator\'s settings, see [`stri_opts_brkiter`](stri_opts_brkiter.md); `NULL` for the default break iterator, i.e., `line_break` | - -## Details - -Vectorized over `str` and `n`. - -If `n` is negative (the default), then all text pieces are extracted. - -Otherwise, if `tokens_only` is `FALSE` (which is the default), then `n-1` tokens are extracted (if possible) and the `n`-th string gives the (non-split) remainder (see Examples). On the other hand, if `tokens_only` is `TRUE`, then only full tokens (up to `n` pieces) are extracted. - -For more information on text boundary analysis performed by ICU\'s `BreakIterator`, see [stringi-search-boundaries](about_search_boundaries.md). - -## Value - -If `simplify=FALSE` (the default), then the functions return a list of character vectors. - -Otherwise, [`stri_list2matrix`](stri_list2matrix.md) with `byrow=TRUE` and `n_min=n` arguments is called on the resulting object. In such a case, a character matrix with `length(str)` rows is returned. Note that [`stri_list2matrix`](stri_list2matrix.md)\'s `fill` argument is set to an empty string and `NA`, for `simplify` equal to `TRUE` and `NA`, respectively. - -## Author(s) - -[Marek Gagolewski](https://www.gagolewski.com/) and other contributors - -## See Also - -The official online manual of stringi at - -Gagolewski M., stringi: Fast and portable character string processing in R, *Journal of Statistical Software* 103(2), 2022, 1-59, [doi:10.18637/jss.v103.i02](https://doi.org/10.18637/jss.v103.i02) - -Other search_split: [`about_search`](about_search.md), [`stri_split_lines()`](stri_split_lines.md), [`stri_split()`](stri_split.md) - -Other locale_sensitive: [`%s<%()`](+25s+3C+25.md), [`about_locale`](about_locale.md), [`about_search_boundaries`](about_search_boundaries.md), [`about_search_coll`](about_search_coll.md), [`stri_compare()`](stri_compare.md), [`stri_count_boundaries()`](stri_count_boundaries.md), [`stri_duplicated()`](stri_duplicated.md), [`stri_enc_detect2()`](stri_enc_detect2.md), [`stri_extract_all_boundaries()`](stri_extract_boundaries.md), [`stri_locate_all_boundaries()`](stri_locate_boundaries.md), [`stri_opts_collator()`](stri_opts_collator.md), [`stri_order()`](stri_order.md), [`stri_rank()`](stri_rank.md), [`stri_sort_key()`](stri_sort_key.md), [`stri_sort()`](stri_sort.md), [`stri_trans_tolower()`](stri_trans_casemap.md), [`stri_unique()`](stri_unique.md), [`stri_wrap()`](stri_wrap.md) - -Other text_boundaries: [`about_search_boundaries`](about_search_boundaries.md), [`about_search`](about_search.md), [`stri_count_boundaries()`](stri_count_boundaries.md), [`stri_extract_all_boundaries()`](stri_extract_boundaries.md), [`stri_locate_all_boundaries()`](stri_locate_boundaries.md), [`stri_opts_brkiter()`](stri_opts_brkiter.md), [`stri_split_lines()`](stri_split_lines.md), [`stri_trans_tolower()`](stri_trans_casemap.md), [`stri_wrap()`](stri_wrap.md) - -## Examples - - - - -```r -test <- 'The\u00a0above-mentioned features are very useful. ' %s+% - 'Spam, spam, eggs, bacon, and spam. 123 456 789' -stri_split_boundaries(test, type='line') -``` - -``` -## [[1]] -## [1] "The above-" "mentioned " "features " "are " -## [5] "very " "useful. " "Spam, " "spam, " -## [9] "eggs, " "bacon, " "and " "spam. " -## [13] "123 " "456 " "789" -``` - -```r -stri_split_boundaries(test, type='word') -``` - -``` -## [[1]] -## [1] "The" " " "above" "-" "mentioned" " " -## [7] "features" " " "are" " " "very" " " -## [13] "useful" "." " " "Spam" "," " " -## [19] "spam" "," " " "eggs" "," " " -## [25] "bacon" "," " " "and" " " "spam" -## [31] "." " " "123" " " "456" " " -## [37] "789" -``` - -```r -stri_split_boundaries(test, type='word', skip_word_none=TRUE) -``` - -``` -## [[1]] -## [1] "The" "above" "mentioned" "features" "are" "very" -## [7] "useful" "Spam" "spam" "eggs" "bacon" "and" -## [13] "spam" "123" "456" "789" -``` - -```r -stri_split_boundaries(test, type='word', skip_word_none=TRUE, skip_word_letter=TRUE) -``` - -``` -## [[1]] -## [1] "123" "456" "789" -``` - -```r -stri_split_boundaries(test, type='word', skip_word_none=TRUE, skip_word_number=TRUE) -``` - -``` -## [[1]] -## [1] "The" "above" "mentioned" "features" "are" "very" -## [7] "useful" "Spam" "spam" "eggs" "bacon" "and" -## [13] "spam" -``` - -```r -stri_split_boundaries(test, type='sentence') -``` - -``` -## [[1]] -## [1] "The above-mentioned features are very useful. " -## [2] "Spam, spam, eggs, bacon, and spam. " -## [3] "123 456 789" -``` - -```r -stri_split_boundaries(test, type='sentence', skip_sentence_sep=TRUE) -``` - -``` -## [[1]] -## [1] "The above-mentioned features are very useful. " -## [2] "Spam, spam, eggs, bacon, and spam. " -``` - -```r -stri_split_boundaries(test, type='character') -``` - -``` -## [[1]] -## [1] "T" "h" "e" " " "a" "b" "o" "v" "e" "-" "m" "e" "n" "t" "i" "o" "n" "e" "d" -## [20] " " " " " " " " "f" "e" "a" "t" "u" "r" "e" "s" " " "a" "r" "e" " " "v" "e" -## [39] "r" "y" " " "u" "s" "e" "f" "u" "l" "." " " "S" "p" "a" "m" "," " " "s" "p" -## [58] "a" "m" "," " " "e" "g" "g" "s" "," " " "b" "a" "c" "o" "n" "," " " "a" "n" -## [77] "d" " " "s" "p" "a" "m" "." " " "1" "2" "3" " " "4" "5" "6" " " "7" "8" "9" -``` - -```r -# a filtered break iterator with the new ICU: -stri_split_boundaries('Mr. Jones and Mrs. Brown are very happy. -So am I, Prof. Smith.', type='sentence', locale='en_US@ss=standard') # ICU >= 56 only -``` - -``` -## [[1]] -## [1] "Mr. Jones and Mrs. Brown are very happy.\n" -## [2] "So am I, Prof. Smith." -``` diff --git a/.devel/sphinx/rapi/stri_split_lines.md b/.devel/sphinx/rapi/stri_split_lines.md deleted file mode 100644 index c365720a..00000000 --- a/.devel/sphinx/rapi/stri_split_lines.md +++ /dev/null @@ -1,56 +0,0 @@ -# stri_split_lines: Split a String Into Text Lines - -## Description - -These functions split each character string in a given vector into text lines. - -## Usage - -``` r -stri_split_lines(str, omit_empty = FALSE) - -stri_split_lines1(str) -``` - -## Arguments - -| | | -|--------------|----------------------------------------------------------------------------------------------------------------| -| `str` | character vector (`stri_split_lines`) or a single string (`stri_split_lines1`) | -| `omit_empty` | logical vector; determines whether empty strings should be removed from the result \[`stri_split_lines` only\] | - -## Details - -Vectorized over `str` and `omit_empty`. - -`omit_empty` is applied when splitting. If set to `TRUE`, then empty strings will never appear in the resulting vector. - -Newlines are represented with the Carriage Return (CR, 0x0D), Line Feed (LF, 0x0A), CRLF, or Next Line (NEL, 0x85) characters, depending on the platform. Moreover, the Unicode Standard defines two unambiguous separator characters, the Paragraph Separator (PS, 0x2029) and the Line Separator (LS, 0x2028). Sometimes also the Vertical Tab (VT, 0x0B) and the Form Feed (FF, 0x0C) are used for this purpose. - -These stringi functions follow UTR#18 rules, where a newline sequence corresponds to the following regular expression: `(?:\u{D A}|(?!\u{D A})[\u{A}-\u{D}\u{85}\u{2028}\u{2029}]`. Each match serves as a text line separator. - -## Value - -`stri_split_lines` returns a list of character vectors. If any input string is `NA`, then the corresponding list element is a single `NA` string. - -`stri_split_lines1(str)` is equivalent to `stri_split_lines(str[1])[[1]]` (with default parameters), therefore it returns a character vector. Moreover, if the input string ends with a newline sequence, the last empty string is omitted from the file\'s contents into text lines. - -## Author(s) - -[Marek Gagolewski](https://www.gagolewski.com/) and other contributors - -## References - -*Unicode Newline Guidelines* -- Unicode Technical Report #13, - -*Unicode Regular Expressions* -- Unicode Technical Standard #18, - -## See Also - -The official online manual of stringi at - -Gagolewski M., stringi: Fast and portable character string processing in R, *Journal of Statistical Software* 103(2), 2022, 1-59, [doi:10.18637/jss.v103.i02](https://doi.org/10.18637/jss.v103.i02) - -Other search_split: [`about_search`](about_search.md), [`stri_split_boundaries()`](stri_split_boundaries.md), [`stri_split()`](stri_split.md) - -Other text_boundaries: [`about_search_boundaries`](about_search_boundaries.md), [`about_search`](about_search.md), [`stri_count_boundaries()`](stri_count_boundaries.md), [`stri_extract_all_boundaries()`](stri_extract_boundaries.md), [`stri_locate_all_boundaries()`](stri_locate_boundaries.md), [`stri_opts_brkiter()`](stri_opts_brkiter.md), [`stri_split_boundaries()`](stri_split_boundaries.md), [`stri_trans_tolower()`](stri_trans_casemap.md), [`stri_wrap()`](stri_wrap.md) diff --git a/.devel/sphinx/rapi/stri_sprintf.md b/.devel/sphinx/rapi/stri_sprintf.md deleted file mode 100644 index 8e98e364..00000000 --- a/.devel/sphinx/rapi/stri_sprintf.md +++ /dev/null @@ -1,233 +0,0 @@ -# stri_sprintf: Format Strings - -## Description - -`stri_sprintf` (synonym: `stri_string_format`) is a Unicode-aware replacement for and enhancement of the built-in [`sprintf`](https://stat.ethz.ch/R-manual/R-devel/library/base/help/sprintf.html) function. Moreover, `stri_printf` prints formatted strings. - -## Usage - -``` r -stri_sprintf( - format, - ..., - na_string = NA_character_, - inf_string = "Inf", - nan_string = "NaN", - use_length = FALSE -) - -stri_string_format( - format, - ..., - na_string = NA_character_, - inf_string = "Inf", - nan_string = "NaN", - use_length = FALSE -) - -stri_printf( - format, - ..., - file = "", - sep = "\n", - append = FALSE, - na_string = "NA", - inf_string = "Inf", - nan_string = "NaN", - use_length = FALSE -) -``` - -## Arguments - -| | | -|--------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| `format` | character vector of format strings | -| `...` | vectors (coercible to integer, real, or character) | -| `na_string` | single string to represent missing values; if `NA`, missing values in `...` result in the corresponding outputs be missing too; use `"NA"` for compatibility with base R | -| `inf_string` | single string to represent the (unsigned) infinity (`NA` allowed) | -| `nan_string` | single string to represent the not-a-number (`NA` allowed) | -| `use_length` | single logical value; should the number of code points be used when applying modifiers such as `%20s` instead of the total code point width? | -| `file` | see [`cat`](https://stat.ethz.ch/R-manual/R-devel/library/base/help/cat.html) | -| `sep` | see [`cat`](https://stat.ethz.ch/R-manual/R-devel/library/base/help/cat.html) | -| `append` | see [`cat`](https://stat.ethz.ch/R-manual/R-devel/library/base/help/cat.html) | - -## Details - -Vectorized over `format` and all vectors passed via `...`. - -Unicode code points may have various widths when printed on the console (compare [`stri_width`](stri_width.md)). These functions, by default (see the `use_length` argument), take this into account. - -These functions are not locale sensitive. For instance, numbers are always formatted in the \"POSIX\" style, e.g., `-123456.789` (no thousands separator, dot as a fractional separator). Such a feature might be added at a later date, though. - -All arguments passed via `...` are evaluated. If some of them are unused, a warning is generated. Too few arguments result in an error. - -Note that `stri_printf` treats missing values in `...` as `"NA"` strings by default. - -All format specifiers supported [`sprintf`](https://stat.ethz.ch/R-manual/R-devel/library/base/help/sprintf.html) are also available here. For the formatting of integers and floating-point values, currently the system `std::snprintf()` is called, but this may change in the future. Format specifiers are normalized and necessary sanity checks are performed. - -Supported conversion specifiers: `dioxX` (integers) `feEgGaA` (floats) and `s` (character strings). Supported flags: `-` (left-align), `+` (force output sign or blank when `NaN` or `NA`; numeric only), `` (output minus or space for a sign; numeric only) `0` (pad with 0s; numeric only), `#` (alternative output of some numerics). - -## Value - -`stri_printf` is used for its side effect, which is printing text on the standard output or other connection/file. Hence, it returns `invisible(NULL)`. - -The other functions return a character vector. - -## Author(s) - -[Marek Gagolewski](https://www.gagolewski.com/) and other contributors - -## References - -`printf` in `glibc`, - -`printf` format strings -- Wikipedia, - -## See Also - -The official online manual of stringi at - -Gagolewski M., stringi: Fast and portable character string processing in R, *Journal of Statistical Software* 103(2), 2022, 1-59, [doi:10.18637/jss.v103.i02](https://doi.org/10.18637/jss.v103.i02) - -Other length: [`%s$%()`](+25s+24+25.md), [`stri_isempty()`](stri_isempty.md), [`stri_length()`](stri_length.md), [`stri_numbytes()`](stri_numbytes.md), [`stri_pad_both()`](stri_pad.md), [`stri_width()`](stri_width.md) - -## Examples - - - - -```r -stri_printf("%4s=%.3f", c("e", "e\u00b2", "\u03c0", "\u03c0\u00b2"), - c(exp(1), exp(2), pi, pi^2)) -``` - -``` -## e=2.718 -## e²=7.389 -## π=3.142 -## π²=9.870 -``` - -```r -x <- c( - "xxabcd", - "xx\u0105\u0106\u0107\u0108", - stri_paste( - "\u200b\u200b\u200b\u200b", - "\U0001F3F4\U000E0067\U000E0062\U000E0073\U000E0063\U000E0074\U000E007F", - "abcd" - )) -stri_printf("[%10s]", x) # minimum width = 10 -``` - -``` -## [ xxabcd] -## [ xxąĆćĈ] -## [ ​​​​🏴󠁧󠁢󠁳󠁣󠁴󠁿abcd] -``` - -```r -stri_printf("[%-10.3s]", x) # output of max width = 3, but pad to width of 10 -``` - -``` -## [xxa ] -## [xxą ] -## [​​​​🏴󠁧󠁢󠁳󠁣󠁴󠁿a ] -``` - -```r -stri_printf("[%10s]", x, use_length=TRUE) # minimum number of Unicode code points = 10 -``` - -``` -## [ xxabcd] -## [ xxąĆćĈ] -## [​​​​🏴󠁧󠁢󠁳󠁣󠁴󠁿abcd] -``` - -```r -# vectorization wrt all arguments: -p <- runif(10) -stri_sprintf(ifelse(p > 0.5, "P(Y=1)=%1$.2f", "P(Y=0)=%2$.2f"), p, 1-p) -``` - -``` -## [1] "P(Y=0)=0.71" "P(Y=1)=0.79" "P(Y=0)=0.59" "P(Y=1)=0.88" "P(Y=1)=0.94" -## [6] "P(Y=0)=0.95" "P(Y=1)=0.53" "P(Y=1)=0.89" "P(Y=1)=0.55" "P(Y=0)=0.54" -``` - -```r -# using a "preformatted" logical vector: -x <- c(TRUE, FALSE, FALSE, NA, TRUE, FALSE) -stri_sprintf("%s) %s", letters[seq_along(x)], c("\u2718", "\u2713")[x+1]) -``` - -``` -## [1] "a) ✓" "b) ✘" "c) ✘" NA "e) ✓" "f) ✘" -``` - -```r -# custom NA/Inf/NaN strings: -stri_printf("%+10.3f", c(-Inf, -0, 0, Inf, NaN, NA_real_), - na_string="", nan_string="\U0001F4A9", inf_string="\u221E") -``` - -``` -## -∞ -## -0.000 -## +0.000 -## +∞ -## 💩 -## -``` - -```r -stri_sprintf("UNIX time %1$f is %1$s.", Sys.time()) -``` - -``` -## [1] "UNIX time 1699156720.181697 is 2023-11-05 14:58:40.181697." -``` - -```r -# the following do not work in sprintf() -stri_sprintf("%1$#- *2$.*3$f", 1.23456, 10, 3) # two asterisks -``` - -``` -## [1] " 1.235 " -``` - -```r -stri_sprintf(c("%s", "%f"), pi) # re-coercion needed -``` - -``` -## [1] "3.14159265358979" "3.141593" -``` - -```r -stri_sprintf("%1$s is %1$f UNIX time.", Sys.time()) # re-coercion needed -``` - -``` -## [1] "2023-11-05 14:58:40.183604 is 1699156720.183604 UNIX time." -``` - -```r -stri_sprintf(c("%d", "%s"), factor(11:12)) # re-coercion needed -``` - -``` -## [1] "1" "12" -``` - -```r -stri_sprintf(c("%s", "%d"), factor(11:12)) # re-coercion needed -``` - -``` -## [1] "11" "2" -``` diff --git a/.devel/sphinx/rapi/stri_startsendswith.md b/.devel/sphinx/rapi/stri_startsendswith.md deleted file mode 100644 index 01b2f8c3..00000000 --- a/.devel/sphinx/rapi/stri_startsendswith.md +++ /dev/null @@ -1,162 +0,0 @@ -# stri_startsendswith: Determine if the Start or End of a String Matches a Pattern - -## Description - -These functions check if a string starts or ends with a match to a given pattern. Also, it is possible to check if there is a match at a specific position. - -## Usage - -``` r -stri_startswith(str, ..., fixed, coll, charclass) - -stri_endswith(str, ..., fixed, coll, charclass) - -stri_startswith_fixed( - str, - pattern, - from = 1L, - negate = FALSE, - ..., - opts_fixed = NULL -) - -stri_endswith_fixed( - str, - pattern, - to = -1L, - negate = FALSE, - ..., - opts_fixed = NULL -) - -stri_startswith_charclass(str, pattern, from = 1L, negate = FALSE) - -stri_endswith_charclass(str, pattern, to = -1L, negate = FALSE) - -stri_startswith_coll( - str, - pattern, - from = 1L, - negate = FALSE, - ..., - opts_collator = NULL -) - -stri_endswith_coll( - str, - pattern, - to = -1L, - negate = FALSE, - ..., - opts_collator = NULL -) -``` - -## Arguments - -| | | -|-----------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| `str` | character vector | -| `...` | supplementary arguments passed to the underlying functions, including additional settings for `opts_collator`, `opts_fixed`, and so on. | -| `pattern`, `fixed`, `coll`, `charclass` | character vector defining search patterns; for more details refer to [stringi-search](about_search.md) | -| `from` | integer vector | -| `negate` | single logical value; whether a no-match to a pattern is rather of interest | -| `to` | integer vector | -| `opts_collator`, `opts_fixed` | a named list used to tune up the search engine\'s settings; see [`stri_opts_collator`](stri_opts_collator.md) and [`stri_opts_fixed`](stri_opts_fixed.md), respectively; `NULL` for the defaults | - -## Details - -Vectorized over `str`, `pattern`, and `from` or `to` (with recycling of the elements in the shorter vector if necessary). - -If `pattern` is empty, then the result is `NA` and a warning is generated. - -Argument `start` controls the start position in `str` where there is a match to a `pattern`. `to` gives the end position. - -Indexes given by `from` or `to` are of course 1-based, i.e., an index 1 denotes the first character in a string. This gives a typical R look-and-feel. - -For negative indexes in `from` or `to`, counting starts at the end of the string. For instance, index -1 denotes the last code point in the string. - -If you wish to test for a pattern match at an arbitrary position in `str`, use [`stri_detect`](stri_detect.md). - -`stri_startswith` and `stri_endswith` are convenience functions. They call either `stri_*_fixed`, `stri_*_coll`, or `stri_*_charclass`, depending on the argument used. Relying on these underlying functions directly will make your code run slightly faster. - -Note that testing for a pattern match at the start or end of a string has not been implemented separately for regex patterns. For that you may use the \'`^`\' and \'`$`\' meta-characters, see [stringi-search-regex](about_search_regex.md). - -## Value - -Each function returns a logical vector. - -## Author(s) - -[Marek Gagolewski](https://www.gagolewski.com/) and other contributors - -## See Also - -The official online manual of stringi at - -Gagolewski M., stringi: Fast and portable character string processing in R, *Journal of Statistical Software* 103(2), 2022, 1-59, [doi:10.18637/jss.v103.i02](https://doi.org/10.18637/jss.v103.i02) - -Other search_detect: [`about_search`](about_search.md), [`stri_detect()`](stri_detect.md) - -## Examples - - - - -```r -stri_startswith_charclass(' trim me! ', '\\p{WSpace}') -``` - -``` -## [1] TRUE -``` - -```r -stri_startswith_fixed(c('a1', 'a2', 'b3', 'a4', 'c5'), 'a') -``` - -``` -## [1] TRUE TRUE FALSE TRUE FALSE -``` - -```r -stri_detect_regex(c('a1', 'a2', 'b3', 'a4', 'c5'), '^a') -``` - -``` -## [1] TRUE TRUE FALSE TRUE FALSE -``` - -```r -stri_startswith_fixed('ababa', 'ba') -``` - -``` -## [1] FALSE -``` - -```r -stri_startswith_fixed('ababa', 'ba', from=2) -``` - -``` -## [1] TRUE -``` - -```r -stri_startswith_coll(c('a1', 'A2', 'b3', 'A4', 'C5'), 'a', strength=1) -``` - -``` -## [1] TRUE TRUE FALSE TRUE FALSE -``` - -```r -pat <- stri_paste('\u0635\u0644\u0649 \u0627\u0644\u0644\u0647 ', - '\u0639\u0644\u064a\u0647 \u0648\u0633\u0644\u0645XYZ') -stri_endswith_coll('\ufdfa\ufdfa\ufdfaXYZ', pat, strength=1) -``` - -``` -## [1] TRUE -``` diff --git a/.devel/sphinx/rapi/stri_stats_general.md b/.devel/sphinx/rapi/stri_stats_general.md deleted file mode 100644 index 9293355d..00000000 --- a/.devel/sphinx/rapi/stri_stats_general.md +++ /dev/null @@ -1,67 +0,0 @@ -# stri_stats_general: General Statistics for a Character Vector - -## Description - -This function gives general statistics for a character vector, e.g., obtained by loading a text file with the [`readLines`](https://stat.ethz.ch/R-manual/R-devel/library/base/html/readLines.html) or [`stri_read_lines`](stri_read_lines.md) function, where each text line\' is represented by a separate string. - -## Usage - -``` r -stri_stats_general(str) -``` - -## Arguments - -| | | -|-------|-----------------------------------| -| `str` | character vector to be aggregated | - -## Details - -None of the strings may contain `\r` or `\n` characters, otherwise you will get at error. - -Below by \'white space\' we mean the Unicode binary property `WHITE_SPACE`, see `stringi-search-charclass`. - -## Value - -Returns an integer vector with the following named elements: - -1. `Lines` - number of lines (number of non-missing strings in the vector); - -2. `LinesNEmpty` - number of lines with at least one non-`WHITE_SPACE` character; - -3. `Chars` - total number of Unicode code points detected; - -4. `CharsNWhite` - number of Unicode code points that are not `WHITE_SPACE`s; - -5. \... (Other stuff that may appear in future releases of stringi). - -## Author(s) - -[Marek Gagolewski](https://www.gagolewski.com/) and other contributors - -## See Also - -The official online manual of stringi at - -Gagolewski M., stringi: Fast and portable character string processing in R, *Journal of Statistical Software* 103(2), 2022, 1-59, [doi:10.18637/jss.v103.i02](https://doi.org/10.18637/jss.v103.i02) - -Other stats: [`stri_stats_latex()`](stri_stats_latex.md) - -## Examples - - - - -```r -s <- c('Lorem ipsum dolor sit amet, consectetur adipisicing elit.', - 'nibh augue, suscipit a, scelerisque sed, lacinia in, mi.', - 'Cras vel lorem. Etiam pellentesque aliquet tellus.', - '') -stri_stats_general(s) -``` - -``` -## Lines LinesNEmpty Chars CharsNWhite -## 4 3 163 142 -``` diff --git a/.devel/sphinx/rapi/stri_stats_latex.md b/.devel/sphinx/rapi/stri_stats_latex.md deleted file mode 100644 index 665532ef..00000000 --- a/.devel/sphinx/rapi/stri_stats_latex.md +++ /dev/null @@ -1,70 +0,0 @@ -# stri_stats_latex: Statistics for a Character Vector Containing LaTeX Commands - -## Description - -This function gives LaTeX-oriented statistics for a character vector, e.g., obtained by loading a text file with the [`readLines`](https://stat.ethz.ch/R-manual/R-devel/library/base/html/readLines.html) function, where each text line is represented by a separate string. - -## Usage - -``` r -stri_stats_latex(str) -``` - -## Arguments - -| | | -|-------|-----------------------------------| -| `str` | character vector to be aggregated | - -## Details - -We use a slightly modified LaTeX Word Count algorithm implemented in Kile 2.1.3, see for the original contributors. - -## Value - -Returns an integer vector with the following named elements: - -1. `CharsWord` - number of word characters; - -2. `CharsCmdEnvir` - command and words characters; - -3. `CharsWhite` - LaTeX white spaces, including { and } in some contexts; - -4. `Words` - number of words; - -5. `Cmds` - number of commands; - -6. `Envirs` - number of environments; - -7. \... (Other stuff that may appear in future releases of stringi). - -## Author(s) - -[Marek Gagolewski](https://www.gagolewski.com/) and other contributors - -## See Also - -The official online manual of stringi at - -Gagolewski M., stringi: Fast and portable character string processing in R, *Journal of Statistical Software* 103(2), 2022, 1-59, [doi:10.18637/jss.v103.i02](https://doi.org/10.18637/jss.v103.i02) - -Other stats: [`stri_stats_general()`](stri_stats_general.md) - -## Examples - - - - -```r -s <- c('Lorem \\textbf{ipsum} dolor sit \\textit{amet}, consectetur adipisicing elit.', - '\\begin{small}Proin nibh augue,\\end{small} suscipit a, scelerisque sed, lacinia in, mi.', - '') -stri_stats_latex(s) -``` - -``` -## CharsWord CharsCmdEnvir CharsWhite Words Cmds -## 96 38 27 18 2 -## Envirs -## 1 -``` diff --git a/.devel/sphinx/rapi/stri_sub.md b/.devel/sphinx/rapi/stri_sub.md deleted file mode 100644 index 1ccb4928..00000000 --- a/.devel/sphinx/rapi/stri_sub.md +++ /dev/null @@ -1,142 +0,0 @@ -# stri_sub: Extract a Substring From or Replace a Substring In a Character Vector - -## Description - -`stri_sub` extracts particular substrings at code point-based index ranges provided. Its replacement version allows to substitute (in-place) parts of a string with given replacement strings. `stri_sub_replace` is its forward pipe operator-friendly variant that returns a copy of the input vector. - -For extracting/replacing multiple substrings from/within each string, see [`stri_sub_all`](stri_sub_all.md). - -## Usage - -``` r -stri_sub( - str, - from = 1L, - to = -1L, - length, - use_matrix = TRUE, - ignore_negative_length = FALSE -) - -stri_sub(str, from = 1L, to = -1L, length, omit_na = FALSE, use_matrix = TRUE) <- value - -stri_sub_replace(..., replacement, value = replacement) -``` - -## Arguments - -| | | -|--------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| `str` | character vector | -| `from` | integer vector giving the start indexes; alternatively, if `use_matrix=TRUE`, a two-column matrix of type `cbind(from, to)` (unnamed columns or the 2nd column named other than `length`) or `cbind(from, length=length)` (2nd column named `length`) | -| `to` | integer vector giving the end indexes; mutually exclusive with `length` and `from` being a matrix | -| `length` | integer vector giving the substring lengths; mutually exclusive with `to` and `from` being a matrix | -| `use_matrix` | single logical value; see `from` | -| `ignore_negative_length` | single logical value; whether negative lengths should be ignored or result in missing values | -| `omit_na` | single logical value; indicates whether missing values in any of the indexes or in `value` leave the corresponding input string unchanged \[replacement function only\] | -| `value` | a character vector defining the replacement strings \[replacement function only\] | -| `...` | arguments to be passed to `stri_sub<-` | -| `replacement` | alias of `value` \[wherever applicable\] | - -## Details - -Vectorized over `str`, \[`value`\], `from` and (`to` or `length`). Parameters `to` and `length` are mutually exclusive. - -Indexes are 1-based, i.e., the start of a string is at index 1. For negative indexes in `from` or `to`, counting starts at the end of the string. For instance, index -1 denotes the last code point in the string. Non-positive `length` gives an empty string. - -Argument `from` gives the start of a substring to extract. Argument `to` defines the last index of a substring, inclusive. Alternatively, its `length` may be provided. - -If `from` is a two-column matrix, then these two columns are used as `from` and `to`, respectively, unless the second column is named `length`. In such a case anything passed explicitly as `to` or `length` is ignored. Such types of index matrices are generated by [`stri_locate_first`](stri_locate.md) and [`stri_locate_last`](stri_locate.md). If extraction based on [`stri_locate_all`](stri_locate.md) is needed, see [`stri_sub_all`](stri_sub_all.md). - -In `stri_sub`, out-of-bound indexes are silently corrected. If `from` \> `to`, then an empty string is returned. By default, negative `length` results in the corresponding output being `NA`, see `ignore_negative_length`, though. - -In `stri_sub<-`, some configurations of indexes may work as substring \'injection\' at the front, back, or in middle. Negative `length` does not alter the corresponding input string. - -If both `to` and `length` are provided, `length` has priority over `to`. - -Note that for some Unicode strings, the extracted substrings might not be well-formed, especially if input strings are not normalized (see [`stri_trans_nfc`](stri_trans_nf.md)), include byte order marks, Bidirectional text marks, and so on. Handle with care. - -## Value - -`stri_sub` and `stri_sub_replace` return a character vector. `stri_sub<-` changes the `str` object \'in-place\'. - -## Author(s) - -[Marek Gagolewski](https://www.gagolewski.com/) and other contributors - -## See Also - -The official online manual of stringi at - -Gagolewski M., stringi: Fast and portable character string processing in R, *Journal of Statistical Software* 103(2), 2022, 1-59, [doi:10.18637/jss.v103.i02](https://doi.org/10.18637/jss.v103.i02) - -Other indexing: [`stri_locate_all_boundaries()`](stri_locate_boundaries.md), [`stri_locate_all()`](stri_locate.md), [`stri_sub_all()`](stri_sub_all.md) - -## Examples - - - - -```r -s <- c("spam, spam, bacon, and spam", "eggs and spam") -stri_sub(s, from=-4) -``` - -``` -## [1] "spam" "spam" -``` - -```r -stri_sub(s, from=1, length=c(10, 4)) -``` - -``` -## [1] "spam, spam" "eggs" -``` - -```r -(stri_sub(s, 1, 4) <- 'stringi') -``` - -``` -## [1] "stringi" -``` - -```r -x <- c('12 3456 789', 'abc', '', NA, '667') -stri_sub(x, stri_locate_first_regex(x, '[0-9]+')) # see stri_extract_first -``` - -``` -## [1] "12" NA NA NA "667" -``` - -```r -stri_sub(x, stri_locate_last_regex(x, '[0-9]+')) # see stri_extract_last -``` - -``` -## [1] "789" NA NA NA "667" -``` - -```r -stri_sub_replace(x, stri_locate_first_regex(x, '[0-9]+'), - omit_na=TRUE, replacement='***') # see stri_replace_first -``` - -``` -## [1] "*** 3456 789" "abc" "" NA "***" -``` - -```r -stri_sub_replace(x, stri_locate_last_regex(x, '[0-9]+'), - omit_na=TRUE, replacement='***') # see stri_replace_last -``` - -``` -## [1] "12 3456 ***" "abc" "" NA "***" -``` - -```r -## Not run: x |> stri_sub_replace(1, 5, replacement='new_substring') -``` diff --git a/.devel/sphinx/rapi/stri_sub_all.md b/.devel/sphinx/rapi/stri_sub_all.md deleted file mode 100644 index 8e17af9c..00000000 --- a/.devel/sphinx/rapi/stri_sub_all.md +++ /dev/null @@ -1,141 +0,0 @@ -# stri_sub_all: Extract or Replace Multiple Substrings - -## Description - -`stri_sub_all` extracts multiple substrings from each string. Its replacement version substitutes (in-place) multiple substrings with the corresponding replacement strings. `stri_sub_replace_all` (alias `stri_sub_all_replace`) is its forward pipe operator-friendly variant, returning a copy of the input vector. - -For extracting/replacing single substrings from/within each string, see [`stri_sub`](stri_sub.md). - -## Usage - -``` r -stri_sub_all( - str, - from = list(1L), - to = list(-1L), - length, - use_matrix = TRUE, - ignore_negative_length = TRUE -) - -stri_sub_all( - str, - from = list(1L), - to = list(-1L), - length, - omit_na = FALSE, - use_matrix = TRUE -) <- value - -stri_sub_replace_all(..., replacement, value = replacement) - -stri_sub_all_replace(..., replacement, value = replacement) -``` - -## Arguments - -| | | -|--------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| `str` | character vector | -| `from` | list of integer vector giving the start indexes; alternatively, if `use_matrix=TRUE`, a list of two-column matrices of type `cbind(from, to)` (unnamed columns or the 2nd column named other than `length`) or `cbind(from, length=length)` (2nd column named `length`) | -| `to` | list of integer vectors giving the end indexes | -| `length` | list of integer vectors giving the substring lengths | -| `use_matrix` | single logical value; see `from` | -| `ignore_negative_length` | single logical value; whether negative lengths should be ignored or result in missing values | -| `omit_na` | single logical value; indicates whether missing values in any of the indexes or in `value` leave the part of the corresponding input string unchanged \[replacement function only\] | -| `value` | a list of character vectors defining the replacement strings \[replacement function only\] | -| `...` | arguments to be passed to `stri_sub_all<-` | -| `replacement` | alias of `value` \[wherever applicable\] | - -## Details - -Vectorized over `str`, \[`value`\], `from` and (`to` or `length`). Just like in [`stri_sub`](stri_sub.md), parameters `to` and `length` are mutually exclusive. - -In one of the simplest scenarios, `stri_sub_all(str, from, to)`, the i-th element of the resulting list generated like `stri_sub(str[i], from[[i]], to[[i]])`. As usual, if one of the inputs is shorter than the others, recycling rule is applied. - -If any of `from`, `to`, `length`, or `value` is not a list, it is wrapped into a list. - -If `from` consists of a two-column matrix, then these two columns are used as `from` and `to`, respectively, unless the second column is named `length`. Such types of index matrices are generated by [`stri_locate_all`](stri_locate.md). If extraction or replacement based on [`stri_locate_first`](stri_locate.md) or [`stri_locate_last`](stri_locate.md) is needed, see [`stri_sub`](stri_sub.md). - -In the replacement function, the index ranges must be sorted with respect to `from` and must be mutually disjoint. Negative `length` does not result in any altering of the corresponding input string. On the other hand, in `stri_sub_all`, this make the corresponding chunk be ignored, see `ignore_negative_length`, though. - -## Value - -`stri_sub_all` returns a list of character vectors. Its replacement versions modify the input \'in-place\'. - -## Author(s) - -[Marek Gagolewski](https://www.gagolewski.com/) and other contributors - -## See Also - -The official online manual of stringi at - -Gagolewski M., stringi: Fast and portable character string processing in R, *Journal of Statistical Software* 103(2), 2022, 1-59, [doi:10.18637/jss.v103.i02](https://doi.org/10.18637/jss.v103.i02) - -Other indexing: [`stri_locate_all_boundaries()`](stri_locate_boundaries.md), [`stri_locate_all()`](stri_locate.md), [`stri_sub()`](stri_sub.md) - -## Examples - - - - -```r -x <- c('12 3456 789', 'abc', '', NA, '667') -stri_sub_all(x, stri_locate_all_regex(x, '[0-9]+')) # see stri_extract_all -``` - -``` -## [[1]] -## [1] "12" "3456" "789" -## -## [[2]] -## [1] NA -## -## [[3]] -## [1] NA -## -## [[4]] -## [1] NA -## -## [[5]] -## [1] "667" -``` - -```r -stri_sub_all(x, stri_locate_all_regex(x, '[0-9]+', omit_no_match=TRUE)) -``` - -``` -## [[1]] -## [1] "12" "3456" "789" -## -## [[2]] -## character(0) -## -## [[3]] -## character(0) -## -## [[4]] -## [1] NA -## -## [[5]] -## [1] "667" -``` - -```r -stri_sub_all(x, stri_locate_all_regex(x, '[0-9]+', omit_no_match=TRUE)) <- '***' -print(x) -``` - -``` -## [1] "*** *** ***" "abc" "" NA "***" -``` - -```r -stri_sub_replace_all('a b c', c(1, 3, 5), c(1, 3, 5), replacement=c('A', 'B', 'C')) -``` - -``` -## [1] "A B C" -``` diff --git a/.devel/sphinx/rapi/stri_subset.md b/.devel/sphinx/rapi/stri_subset.md deleted file mode 100644 index 5fad8508..00000000 --- a/.devel/sphinx/rapi/stri_subset.md +++ /dev/null @@ -1,117 +0,0 @@ -# stri_subset: Select Elements that Match a Given Pattern - -## Description - -These functions return or modify a sub-vector where there is a match to a given pattern. In other words, they are roughly equivalent (but faster and easier to use) to a call to `str[stri_detect(str, ...)]` or `str[stri_detect(str, ...)] <- value`. - -## Usage - -``` r -stri_subset(str, ..., regex, fixed, coll, charclass) - -stri_subset(str, ..., regex, fixed, coll, charclass) <- value - -stri_subset_fixed( - str, - pattern, - omit_na = FALSE, - negate = FALSE, - ..., - opts_fixed = NULL -) - -stri_subset_fixed(str, pattern, negate=FALSE, ..., opts_fixed=NULL) <- value - -stri_subset_charclass(str, pattern, omit_na = FALSE, negate = FALSE) - -stri_subset_charclass(str, pattern, negate=FALSE) <- value - -stri_subset_coll( - str, - pattern, - omit_na = FALSE, - negate = FALSE, - ..., - opts_collator = NULL -) - -stri_subset_coll(str, pattern, negate=FALSE, ..., opts_collator=NULL) <- value - -stri_subset_regex( - str, - pattern, - omit_na = FALSE, - negate = FALSE, - ..., - opts_regex = NULL -) - -stri_subset_regex(str, pattern, negate=FALSE, ..., opts_regex=NULL) <- value -``` - -## Arguments - -| | | -|--------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| `str` | character vector; strings to search within | -| `...` | supplementary arguments passed to the underlying functions, including additional settings for `opts_collator`, `opts_regex`, `opts_fixed`, and so on | -| `value` | non-empty character vector of replacement strings; replacement function only | -| `pattern`, `regex`, `fixed`, `coll`, `charclass` | character vector; search patterns (no more than the length of `str`); for more details refer to [stringi-search](about_search.md) | -| `omit_na` | single logical value; should missing values be excluded from the result? | -| `negate` | single logical value; whether a no-match is rather of interest | -| `opts_collator`, `opts_fixed`, `opts_regex` | a named list used to tune up the search engine\'s settings; see [`stri_opts_collator`](stri_opts_collator.md), [`stri_opts_fixed`](stri_opts_fixed.md), and [`stri_opts_regex`](stri_opts_regex.md), respectively; `NULL` for the defaults | - -## Details - -Vectorized over `str` as well as partially over `pattern` and `value`, with recycling of the elements in the shorter vector if necessary. As the aim here is to subset `str`, `pattern` cannot be longer than the former. Moreover, if the number of items to replace is not a multiple of length of `value`, a warning is emitted and the unused elements are ignored. Hence, the length of the output will be the same as length of `str`. - -`stri_subset` and `stri_subset<-` are convenience functions. They call either `stri_subset_regex`, `stri_subset_fixed`, `stri_subset_coll`, or `stri_subset_charclass`, depending on the argument used. - -## Value - -The `stri_subset_*` functions return a character vector. As usual, the output encoding is UTF-8. - -The `stri_subset_*<-` functions modifies `str` \'in-place\'. - -## Author(s) - -[Marek Gagolewski](https://www.gagolewski.com/) and other contributors - -## See Also - -The official online manual of stringi at - -Gagolewski M., stringi: Fast and portable character string processing in R, *Journal of Statistical Software* 103(2), 2022, 1-59, [doi:10.18637/jss.v103.i02](https://doi.org/10.18637/jss.v103.i02) - -Other search_subset: [`about_search`](about_search.md) - -## Examples - - - - -```r -stri_subset_regex(c('stringi R', '123', 'ID456', ''), '^[0-9]+$') -``` - -``` -## [1] "123" -``` - -```r -x <- c('stringi R', '123', 'ID456', '') -`stri_subset_regex<-`(x, '[0-9]+$', negate=TRUE, value=NA) # returns a copy -``` - -``` -## [1] NA "123" "ID456" NA -``` - -```r -stri_subset_regex(x, '[0-9]+$') <- NA # modifies `x` in-place -print(x) -``` - -``` -## [1] "stringi R" NA NA "" -``` diff --git a/.devel/sphinx/rapi/stri_timezone_info.md b/.devel/sphinx/rapi/stri_timezone_info.md deleted file mode 100644 index 705f2749..00000000 --- a/.devel/sphinx/rapi/stri_timezone_info.md +++ /dev/null @@ -1,145 +0,0 @@ -# stri_timezone_info: Query a Given Time Zone - -## Description - -Provides some basic information on a given time zone identifier. - -## Usage - -``` r -stri_timezone_info(tz = NULL, locale = NULL, display_type = "long") -``` - -## Arguments - -| | | -|----------------|-----------------------------------------------------------------------------------------------------------------------------------------------| -| `tz` | `NULL` or `''` for default time zone, or a single string with time zone ID otherwise | -| `locale` | `NULL` or `''` for default locale, or a single string with locale identifier | -| `display_type` | single string; one of `'short'`, `'long'`, `'generic_short'`, `'generic_long'`, `'gmt_short'`, `'gmt_long'`, `'common'`, `'generic_location'` | - -## Details - -Used to fetch basic information on any supported time zone. - -For more information on time zone representation in ICU, see [`stri_timezone_list`](stri_timezone_list.md). - -## Value - -Returns a list with the following named components: - -1. `ID` (time zone identifier), - -2. `Name` (localized human-readable time zone name), - -3. `Name.Daylight` (localized human-readable time zone name when DST is used, if available), - -4. `Name.Windows` (Windows time zone ID, if available), - -5. `RawOffset` (raw GMT offset, in hours, before taking daylight savings into account), and - -6. `UsesDaylightTime` (states whether a time zone uses daylight savings time in the current Gregorian calendar year). - -## Author(s) - -[Marek Gagolewski](https://www.gagolewski.com/) and other contributors - -## See Also - -The official online manual of stringi at - -Gagolewski M., stringi: Fast and portable character string processing in R, *Journal of Statistical Software* 103(2), 2022, 1-59, [doi:10.18637/jss.v103.i02](https://doi.org/10.18637/jss.v103.i02) - -Other datetime: [`stri_datetime_add()`](stri_datetime_add.md), [`stri_datetime_create()`](stri_datetime_create.md), [`stri_datetime_fields()`](stri_datetime_fields.md), [`stri_datetime_format()`](stri_datetime_format.md), [`stri_datetime_fstr()`](stri_datetime_fstr.md), [`stri_datetime_now()`](stri_datetime_now.md), [`stri_datetime_symbols()`](stri_datetime_symbols.md), [`stri_timezone_get()`](stri_timezone_set.md), [`stri_timezone_list()`](stri_timezone_list.md) - -Other timezone: [`stri_timezone_get()`](stri_timezone_set.md), [`stri_timezone_list()`](stri_timezone_list.md) - -## Examples - - - - -```r -stri_timezone_info() -``` - -``` -## $ID -## [1] "Australia/Melbourne" -## -## $Name -## [1] "Australian Eastern Standard Time" -## -## $Name.Daylight -## [1] "Australian Eastern Daylight Time" -## -## $Name.Windows -## [1] "AUS Eastern Standard Time" -## -## $RawOffset -## [1] 10 -## -## $UsesDaylightTime -## [1] TRUE -``` - -```r -stri_timezone_info(locale='sk_SK') -``` - -``` -## $ID -## [1] "Australia/Melbourne" -## -## $Name -## [1] "východoaustrálsky štandardný čas" -## -## $Name.Daylight -## [1] "východoaustrálsky letný čas" -## -## $Name.Windows -## [1] "AUS Eastern Standard Time" -## -## $RawOffset -## [1] 10 -## -## $UsesDaylightTime -## [1] TRUE -``` - -```r -sapply(c('short', 'long', 'generic_short', 'generic_long', - 'gmt_short', 'gmt_long', 'common', 'generic_location'), - function(e) stri_timezone_info('Europe/London', display_type=e)) -``` - -``` -## short long -## ID "Europe/London" "Europe/London" -## Name "GMT" "Greenwich Mean Time" -## Name.Daylight "GMT+1" "British Summer Time" -## Name.Windows "GMT Standard Time" "GMT Standard Time" -## RawOffset 0 0 -## UsesDaylightTime TRUE TRUE -## generic_short generic_long -## ID "Europe/London" "Europe/London" -## Name "United Kingdom Time" "United Kingdom Time" -## Name.Daylight "United Kingdom Time" "United Kingdom Time" -## Name.Windows "GMT Standard Time" "GMT Standard Time" -## RawOffset 0 0 -## UsesDaylightTime TRUE TRUE -## gmt_short gmt_long common -## ID "Europe/London" "Europe/London" "Europe/London" -## Name "+0000" "GMT" "GMT" -## Name.Daylight "+0100" "GMT+01:00" "GMT+1" -## Name.Windows "GMT Standard Time" "GMT Standard Time" "GMT Standard Time" -## RawOffset 0 0 0 -## UsesDaylightTime TRUE TRUE TRUE -## generic_location -## ID "Europe/London" -## Name "United Kingdom Time" -## Name.Daylight "United Kingdom Time" -## Name.Windows "GMT Standard Time" -## RawOffset 0 -## UsesDaylightTime TRUE -``` diff --git a/.devel/sphinx/rapi/stri_timezone_list.md b/.devel/sphinx/rapi/stri_timezone_list.md deleted file mode 100644 index 91ddb1ba..00000000 --- a/.devel/sphinx/rapi/stri_timezone_list.md +++ /dev/null @@ -1,2370 +0,0 @@ -# stri_timezone_list: List Available Time Zone Identifiers - -## Description - -Returns a list of available time zone identifiers. - -## Usage - -``` r -stri_timezone_list(region = NA_character_, offset = NA_integer_) -``` - -## Arguments - -| | | -|----------|----------------------------------------------------------------------------------------------------------| -| `region` | single string; a ISO 3166 two-letter country code or UN M.49 three-digit area code; `NA` for all regions | -| `offset` | single numeric value; a given raw offset from GMT, in hours; `NA` for all offsets | - -## Details - -If `offset` and `region` are `NA` (the default), then all time zones are returned. Otherwise, only time zone identifiers with a given raw offset from GMT and/or time zones corresponding to a given region are provided. Note that the effect of daylight savings time is ignored. - -A time zone represents an offset applied to the Greenwich Mean Time (GMT) to obtain local time (Universal Coordinated Time, or UTC, is similar, but not precisely identical, to GMT; in ICU the two terms are used interchangeably since ICU does not concern itself with either leap seconds or historical behavior). The offset might vary throughout the year, if daylight savings time (DST) is used, or might be the same all year long. Typically, regions closer to the equator do not use DST. If DST is in use, then specific rules define the point where the offset changes and the amount by which it changes. - -If DST is observed, then three additional bits of information are needed: - -1. The precise date and time during the year when DST begins. In the first half of the year it is in the northern hemisphere, and in the second half of the year it is in the southern hemisphere. - -2. The precise date and time during the year when DST ends. In the first half of the year it is in the southern hemisphere, and in the second half of the year it is in the northern hemisphere. - -3. The amount by which the GMT offset changes when DST is in effect. This is almost always one hour. - -## Value - -Returns a character vector. - -## Author(s) - -[Marek Gagolewski](https://www.gagolewski.com/) and other contributors - -## References - -*TimeZone* class -- ICU API Documentation, - -*ICU TimeZone classes* -- ICU User Guide, - -*Date/Time Services* -- ICU User Guide, - -## See Also - -The official online manual of stringi at - -Gagolewski M., stringi: Fast and portable character string processing in R, *Journal of Statistical Software* 103(2), 2022, 1-59, [doi:10.18637/jss.v103.i02](https://doi.org/10.18637/jss.v103.i02) - -Other datetime: [`stri_datetime_add()`](stri_datetime_add.md), [`stri_datetime_create()`](stri_datetime_create.md), [`stri_datetime_fields()`](stri_datetime_fields.md), [`stri_datetime_format()`](stri_datetime_format.md), [`stri_datetime_fstr()`](stri_datetime_fstr.md), [`stri_datetime_now()`](stri_datetime_now.md), [`stri_datetime_symbols()`](stri_datetime_symbols.md), [`stri_timezone_get()`](stri_timezone_set.md), [`stri_timezone_info()`](stri_timezone_info.md) - -Other timezone: [`stri_timezone_get()`](stri_timezone_set.md), [`stri_timezone_info()`](stri_timezone_info.md) - -## Examples - - - - -```r -stri_timezone_list() -``` - -``` -## [1] "ACT" "AET" -## [3] "Africa/Abidjan" "Africa/Accra" -## [5] "Africa/Addis_Ababa" "Africa/Algiers" -## [7] "Africa/Asmara" "Africa/Asmera" -## [9] "Africa/Bamako" "Africa/Bangui" -## [11] "Africa/Banjul" "Africa/Bissau" -## [13] "Africa/Blantyre" "Africa/Brazzaville" -## [15] "Africa/Bujumbura" "Africa/Cairo" -## [17] "Africa/Casablanca" "Africa/Ceuta" -## [19] "Africa/Conakry" "Africa/Dakar" -## [21] "Africa/Dar_es_Salaam" "Africa/Djibouti" -## [23] "Africa/Douala" "Africa/El_Aaiun" -## [25] "Africa/Freetown" "Africa/Gaborone" -## [27] "Africa/Harare" "Africa/Johannesburg" -## [29] "Africa/Juba" "Africa/Kampala" -## [31] "Africa/Khartoum" "Africa/Kigali" -## [33] "Africa/Kinshasa" "Africa/Lagos" -## [35] "Africa/Libreville" "Africa/Lome" -## [37] "Africa/Luanda" "Africa/Lubumbashi" -## [39] "Africa/Lusaka" "Africa/Malabo" -## [41] "Africa/Maputo" "Africa/Maseru" -## [43] "Africa/Mbabane" "Africa/Mogadishu" -## [45] "Africa/Monrovia" "Africa/Nairobi" -## [47] "Africa/Ndjamena" "Africa/Niamey" -## [49] "Africa/Nouakchott" "Africa/Ouagadougou" -## [51] "Africa/Porto-Novo" "Africa/Sao_Tome" -## [53] "Africa/Timbuktu" "Africa/Tripoli" -## [55] "Africa/Tunis" "Africa/Windhoek" -## [57] "AGT" "America/Adak" -## [59] "America/Anchorage" "America/Anguilla" -## [61] "America/Antigua" "America/Araguaina" -## [63] "America/Argentina/Buenos_Aires" "America/Argentina/Catamarca" -## [65] "America/Argentina/ComodRivadavia" "America/Argentina/Cordoba" -## [67] "America/Argentina/Jujuy" "America/Argentina/La_Rioja" -## [69] "America/Argentina/Mendoza" "America/Argentina/Rio_Gallegos" -## [71] "America/Argentina/Salta" "America/Argentina/San_Juan" -## [73] "America/Argentina/San_Luis" "America/Argentina/Tucuman" -## [75] "America/Argentina/Ushuaia" "America/Aruba" -## [77] "America/Asuncion" "America/Atikokan" -## [79] "America/Atka" "America/Bahia" -## [81] "America/Bahia_Banderas" "America/Barbados" -## [83] "America/Belem" "America/Belize" -## [85] "America/Blanc-Sablon" "America/Boa_Vista" -## [87] "America/Bogota" "America/Boise" -## [89] "America/Buenos_Aires" "America/Cambridge_Bay" -## [91] "America/Campo_Grande" "America/Cancun" -## [93] "America/Caracas" "America/Catamarca" -## [95] "America/Cayenne" "America/Cayman" -## [97] "America/Chicago" "America/Chihuahua" -## [99] "America/Ciudad_Juarez" "America/Coral_Harbour" -## [101] "America/Cordoba" "America/Costa_Rica" -## [103] "America/Creston" "America/Cuiaba" -## [105] "America/Curacao" "America/Danmarkshavn" -## [107] "America/Dawson" "America/Dawson_Creek" -## [109] "America/Denver" "America/Detroit" -## [111] "America/Dominica" "America/Edmonton" -## [113] "America/Eirunepe" "America/El_Salvador" -## [115] "America/Ensenada" "America/Fort_Nelson" -## [117] "America/Fort_Wayne" "America/Fortaleza" -## [119] "America/Glace_Bay" "America/Godthab" -## [121] "America/Goose_Bay" "America/Grand_Turk" -## [123] "America/Grenada" "America/Guadeloupe" -## [125] "America/Guatemala" "America/Guayaquil" -## [127] "America/Guyana" "America/Halifax" -## [129] "America/Havana" "America/Hermosillo" -## [131] "America/Indiana/Indianapolis" "America/Indiana/Knox" -## [133] "America/Indiana/Marengo" "America/Indiana/Petersburg" -## [135] "America/Indiana/Tell_City" "America/Indiana/Vevay" -## [137] "America/Indiana/Vincennes" "America/Indiana/Winamac" -## [139] "America/Indianapolis" "America/Inuvik" -## [141] "America/Iqaluit" "America/Jamaica" -## [143] "America/Jujuy" "America/Juneau" -## [145] "America/Kentucky/Louisville" "America/Kentucky/Monticello" -## [147] "America/Knox_IN" "America/Kralendijk" -## [149] "America/La_Paz" "America/Lima" -## [151] "America/Los_Angeles" "America/Louisville" -## [153] "America/Lower_Princes" "America/Maceio" -## [155] "America/Managua" "America/Manaus" -## [157] "America/Marigot" "America/Martinique" -## [159] "America/Matamoros" "America/Mazatlan" -## [161] "America/Mendoza" "America/Menominee" -## [163] "America/Merida" "America/Metlakatla" -## [165] "America/Mexico_City" "America/Miquelon" -## [167] "America/Moncton" "America/Monterrey" -## [169] "America/Montevideo" "America/Montreal" -## [171] "America/Montserrat" "America/Nassau" -## [173] "America/New_York" "America/Nipigon" -## [175] "America/Nome" "America/Noronha" -## [177] "America/North_Dakota/Beulah" "America/North_Dakota/Center" -## [179] "America/North_Dakota/New_Salem" "America/Nuuk" -## [181] "America/Ojinaga" "America/Panama" -## [183] "America/Pangnirtung" "America/Paramaribo" -## [185] "America/Phoenix" "America/Port_of_Spain" -## [187] "America/Port-au-Prince" "America/Porto_Acre" -## [189] "America/Porto_Velho" "America/Puerto_Rico" -## [191] "America/Punta_Arenas" "America/Rainy_River" -## [193] "America/Rankin_Inlet" "America/Recife" -## [195] "America/Regina" "America/Resolute" -## [197] "America/Rio_Branco" "America/Rosario" -## [199] "America/Santa_Isabel" "America/Santarem" -## [201] "America/Santiago" "America/Santo_Domingo" -## [203] "America/Sao_Paulo" "America/Scoresbysund" -## [205] "America/Shiprock" "America/Sitka" -## [207] "America/St_Barthelemy" "America/St_Johns" -## [209] "America/St_Kitts" "America/St_Lucia" -## [211] "America/St_Thomas" "America/St_Vincent" -## [213] "America/Swift_Current" "America/Tegucigalpa" -## [215] "America/Thule" "America/Thunder_Bay" -## [217] "America/Tijuana" "America/Toronto" -## [219] "America/Tortola" "America/Vancouver" -## [221] "America/Virgin" "America/Whitehorse" -## [223] "America/Winnipeg" "America/Yakutat" -## [225] "America/Yellowknife" "Antarctica/Casey" -## [227] "Antarctica/Davis" "Antarctica/DumontDUrville" -## [229] "Antarctica/Macquarie" "Antarctica/Mawson" -## [231] "Antarctica/McMurdo" "Antarctica/Palmer" -## [233] "Antarctica/Rothera" "Antarctica/South_Pole" -## [235] "Antarctica/Syowa" "Antarctica/Troll" -## [237] "Antarctica/Vostok" "Arctic/Longyearbyen" -## [239] "ART" "Asia/Aden" -## [241] "Asia/Almaty" "Asia/Amman" -## [243] "Asia/Anadyr" "Asia/Aqtau" -## [245] "Asia/Aqtobe" "Asia/Ashgabat" -## [247] "Asia/Ashkhabad" "Asia/Atyrau" -## [249] "Asia/Baghdad" "Asia/Bahrain" -## [251] "Asia/Baku" "Asia/Bangkok" -## [253] "Asia/Barnaul" "Asia/Beirut" -## [255] "Asia/Bishkek" "Asia/Brunei" -## [257] "Asia/Calcutta" "Asia/Chita" -## [259] "Asia/Choibalsan" "Asia/Chongqing" -## [261] "Asia/Chungking" "Asia/Colombo" -## [263] "Asia/Dacca" "Asia/Damascus" -## [265] "Asia/Dhaka" "Asia/Dili" -## [267] "Asia/Dubai" "Asia/Dushanbe" -## [269] "Asia/Famagusta" "Asia/Gaza" -## [271] "Asia/Harbin" "Asia/Hebron" -## [273] "Asia/Ho_Chi_Minh" "Asia/Hong_Kong" -## [275] "Asia/Hovd" "Asia/Irkutsk" -## [277] "Asia/Istanbul" "Asia/Jakarta" -## [279] "Asia/Jayapura" "Asia/Jerusalem" -## [281] "Asia/Kabul" "Asia/Kamchatka" -## [283] "Asia/Karachi" "Asia/Kashgar" -## [285] "Asia/Kathmandu" "Asia/Katmandu" -## [287] "Asia/Khandyga" "Asia/Kolkata" -## [289] "Asia/Krasnoyarsk" "Asia/Kuala_Lumpur" -## [291] "Asia/Kuching" "Asia/Kuwait" -## [293] "Asia/Macao" "Asia/Macau" -## [295] "Asia/Magadan" "Asia/Makassar" -## [297] "Asia/Manila" "Asia/Muscat" -## [299] "Asia/Nicosia" "Asia/Novokuznetsk" -## [301] "Asia/Novosibirsk" "Asia/Omsk" -## [303] "Asia/Oral" "Asia/Phnom_Penh" -## [305] "Asia/Pontianak" "Asia/Pyongyang" -## [307] "Asia/Qatar" "Asia/Qostanay" -## [309] "Asia/Qyzylorda" "Asia/Rangoon" -## [311] "Asia/Riyadh" "Asia/Saigon" -## [313] "Asia/Sakhalin" "Asia/Samarkand" -## [315] "Asia/Seoul" "Asia/Shanghai" -## [317] "Asia/Singapore" "Asia/Srednekolymsk" -## [319] "Asia/Taipei" "Asia/Tashkent" -## [321] "Asia/Tbilisi" "Asia/Tehran" -## [323] "Asia/Tel_Aviv" "Asia/Thimbu" -## [325] "Asia/Thimphu" "Asia/Tokyo" -## [327] "Asia/Tomsk" "Asia/Ujung_Pandang" -## [329] "Asia/Ulaanbaatar" "Asia/Ulan_Bator" -## [331] "Asia/Urumqi" "Asia/Ust-Nera" -## [333] "Asia/Vientiane" "Asia/Vladivostok" -## [335] "Asia/Yakutsk" "Asia/Yangon" -## [337] "Asia/Yekaterinburg" "Asia/Yerevan" -## [339] "AST" "Atlantic/Azores" -## [341] "Atlantic/Bermuda" "Atlantic/Canary" -## [343] "Atlantic/Cape_Verde" "Atlantic/Faeroe" -## [345] "Atlantic/Faroe" "Atlantic/Jan_Mayen" -## [347] "Atlantic/Madeira" "Atlantic/Reykjavik" -## [349] "Atlantic/South_Georgia" "Atlantic/St_Helena" -## [351] "Atlantic/Stanley" "Australia/ACT" -## [353] "Australia/Adelaide" "Australia/Brisbane" -## [355] "Australia/Broken_Hill" "Australia/Canberra" -## [357] "Australia/Currie" "Australia/Darwin" -## [359] "Australia/Eucla" "Australia/Hobart" -## [361] "Australia/LHI" "Australia/Lindeman" -## [363] "Australia/Lord_Howe" "Australia/Melbourne" -## [365] "Australia/North" "Australia/NSW" -## [367] "Australia/Perth" "Australia/Queensland" -## [369] "Australia/South" "Australia/Sydney" -## [371] "Australia/Tasmania" "Australia/Victoria" -## [373] "Australia/West" "Australia/Yancowinna" -## [375] "BET" "Brazil/Acre" -## [377] "Brazil/DeNoronha" "Brazil/East" -## [379] "Brazil/West" "BST" -## [381] "Canada/Atlantic" "Canada/Central" -## [383] "Canada/East-Saskatchewan" "Canada/Eastern" -## [385] "Canada/Mountain" "Canada/Newfoundland" -## [387] "Canada/Pacific" "Canada/Saskatchewan" -## [389] "Canada/Yukon" "CAT" -## [391] "CET" "Chile/Continental" -## [393] "Chile/EasterIsland" "CNT" -## [395] "CST" "CST6CDT" -## [397] "CTT" "Cuba" -## [399] "EAT" "ECT" -## [401] "EET" "Egypt" -## [403] "Eire" "EST" -## [405] "EST5EDT" "Etc/GMT" -## [407] "Etc/GMT-0" "Etc/GMT-1" -## [409] "Etc/GMT-2" "Etc/GMT-3" -## [411] "Etc/GMT-4" "Etc/GMT-5" -## [413] "Etc/GMT-6" "Etc/GMT-7" -## [415] "Etc/GMT-8" "Etc/GMT-9" -## [417] "Etc/GMT-10" "Etc/GMT-11" -## [419] "Etc/GMT-12" "Etc/GMT-13" -## [421] "Etc/GMT-14" "Etc/GMT+0" -## [423] "Etc/GMT+1" "Etc/GMT+2" -## [425] "Etc/GMT+3" "Etc/GMT+4" -## [427] "Etc/GMT+5" "Etc/GMT+6" -## [429] "Etc/GMT+7" "Etc/GMT+8" -## [431] "Etc/GMT+9" "Etc/GMT+10" -## [433] "Etc/GMT+11" "Etc/GMT+12" -## [435] "Etc/GMT0" "Etc/Greenwich" -## [437] "Etc/UCT" "Etc/Universal" -## [439] "Etc/UTC" "Etc/Zulu" -## [441] "Europe/Amsterdam" "Europe/Andorra" -## [443] "Europe/Astrakhan" "Europe/Athens" -## [445] "Europe/Belfast" "Europe/Belgrade" -## [447] "Europe/Berlin" "Europe/Bratislava" -## [449] "Europe/Brussels" "Europe/Bucharest" -## [451] "Europe/Budapest" "Europe/Busingen" -## [453] "Europe/Chisinau" "Europe/Copenhagen" -## [455] "Europe/Dublin" "Europe/Gibraltar" -## [457] "Europe/Guernsey" "Europe/Helsinki" -## [459] "Europe/Isle_of_Man" "Europe/Istanbul" -## [461] "Europe/Jersey" "Europe/Kaliningrad" -## [463] "Europe/Kiev" "Europe/Kirov" -## [465] "Europe/Kyiv" "Europe/Lisbon" -## [467] "Europe/Ljubljana" "Europe/London" -## [469] "Europe/Luxembourg" "Europe/Madrid" -## [471] "Europe/Malta" "Europe/Mariehamn" -## [473] "Europe/Minsk" "Europe/Monaco" -## [475] "Europe/Moscow" "Europe/Nicosia" -## [477] "Europe/Oslo" "Europe/Paris" -## [479] "Europe/Podgorica" "Europe/Prague" -## [481] "Europe/Riga" "Europe/Rome" -## [483] "Europe/Samara" "Europe/San_Marino" -## [485] "Europe/Sarajevo" "Europe/Saratov" -## [487] "Europe/Simferopol" "Europe/Skopje" -## [489] "Europe/Sofia" "Europe/Stockholm" -## [491] "Europe/Tallinn" "Europe/Tirane" -## [493] "Europe/Tiraspol" "Europe/Ulyanovsk" -## [495] "Europe/Uzhgorod" "Europe/Vaduz" -## [497] "Europe/Vatican" "Europe/Vienna" -## [499] "Europe/Vilnius" "Europe/Volgograd" -## [501] "Europe/Warsaw" "Europe/Zagreb" -## [503] "Europe/Zaporozhye" "Europe/Zurich" -## [505] "Factory" "GB" -## [507] "GB-Eire" "GMT" -## [509] "GMT-0" "GMT+0" -## [511] "GMT0" "Greenwich" -## [513] "Hongkong" "HST" -## [515] "Iceland" "IET" -## [517] "Indian/Antananarivo" "Indian/Chagos" -## [519] "Indian/Christmas" "Indian/Cocos" -## [521] "Indian/Comoro" "Indian/Kerguelen" -## [523] "Indian/Mahe" "Indian/Maldives" -## [525] "Indian/Mauritius" "Indian/Mayotte" -## [527] "Indian/Reunion" "Iran" -## [529] "Israel" "IST" -## [531] "Jamaica" "Japan" -## [533] "JST" "Kwajalein" -## [535] "Libya" "MET" -## [537] "Mexico/BajaNorte" "Mexico/BajaSur" -## [539] "Mexico/General" "MIT" -## [541] "MST" "MST7MDT" -## [543] "Navajo" "NET" -## [545] "NST" "NZ" -## [547] "NZ-CHAT" "Pacific/Apia" -## [549] "Pacific/Auckland" "Pacific/Bougainville" -## [551] "Pacific/Chatham" "Pacific/Chuuk" -## [553] "Pacific/Easter" "Pacific/Efate" -## [555] "Pacific/Enderbury" "Pacific/Fakaofo" -## [557] "Pacific/Fiji" "Pacific/Funafuti" -## [559] "Pacific/Galapagos" "Pacific/Gambier" -## [561] "Pacific/Guadalcanal" "Pacific/Guam" -## [563] "Pacific/Honolulu" "Pacific/Johnston" -## [565] "Pacific/Kanton" "Pacific/Kiritimati" -## [567] "Pacific/Kosrae" "Pacific/Kwajalein" -## [569] "Pacific/Majuro" "Pacific/Marquesas" -## [571] "Pacific/Midway" "Pacific/Nauru" -## [573] "Pacific/Niue" "Pacific/Norfolk" -## [575] "Pacific/Noumea" "Pacific/Pago_Pago" -## [577] "Pacific/Palau" "Pacific/Pitcairn" -## [579] "Pacific/Pohnpei" "Pacific/Ponape" -## [581] "Pacific/Port_Moresby" "Pacific/Rarotonga" -## [583] "Pacific/Saipan" "Pacific/Samoa" -## [585] "Pacific/Tahiti" "Pacific/Tarawa" -## [587] "Pacific/Tongatapu" "Pacific/Truk" -## [589] "Pacific/Wake" "Pacific/Wallis" -## [591] "Pacific/Yap" "PLT" -## [593] "PNT" "Poland" -## [595] "Portugal" "PRC" -## [597] "PRT" "PST" -## [599] "PST8PDT" "ROC" -## [601] "ROK" "Singapore" -## [603] "SST" "SystemV/AST4" -## [605] "SystemV/AST4ADT" "SystemV/CST6" -## [607] "SystemV/CST6CDT" "SystemV/EST5" -## [609] "SystemV/EST5EDT" "SystemV/HST10" -## [611] "SystemV/MST7" "SystemV/MST7MDT" -## [613] "SystemV/PST8" "SystemV/PST8PDT" -## [615] "SystemV/YST9" "SystemV/YST9YDT" -## [617] "Turkey" "UCT" -## [619] "Universal" "US/Alaska" -## [621] "US/Aleutian" "US/Arizona" -## [623] "US/Central" "US/East-Indiana" -## [625] "US/Eastern" "US/Hawaii" -## [627] "US/Indiana-Starke" "US/Michigan" -## [629] "US/Mountain" "US/Pacific" -## [631] "US/Pacific-New" "US/Samoa" -## [633] "UTC" "VST" -## [635] "W-SU" "WET" -## [637] "Zulu" -``` - -```r -stri_timezone_list(offset=1) -``` - -``` -## [1] "Africa/Algiers" "Africa/Bangui" "Africa/Brazzaville" -## [4] "Africa/Ceuta" "Africa/Douala" "Africa/Kinshasa" -## [7] "Africa/Lagos" "Africa/Libreville" "Africa/Luanda" -## [10] "Africa/Malabo" "Africa/Ndjamena" "Africa/Niamey" -## [13] "Africa/Porto-Novo" "Africa/Tunis" "Arctic/Longyearbyen" -## [16] "Atlantic/Jan_Mayen" "CET" "ECT" -## [19] "Etc/GMT-1" "Europe/Amsterdam" "Europe/Andorra" -## [22] "Europe/Belgrade" "Europe/Berlin" "Europe/Bratislava" -## [25] "Europe/Brussels" "Europe/Budapest" "Europe/Busingen" -## [28] "Europe/Copenhagen" "Europe/Gibraltar" "Europe/Ljubljana" -## [31] "Europe/Luxembourg" "Europe/Madrid" "Europe/Malta" -## [34] "Europe/Monaco" "Europe/Oslo" "Europe/Paris" -## [37] "Europe/Podgorica" "Europe/Prague" "Europe/Rome" -## [40] "Europe/San_Marino" "Europe/Sarajevo" "Europe/Skopje" -## [43] "Europe/Stockholm" "Europe/Tirane" "Europe/Vaduz" -## [46] "Europe/Vatican" "Europe/Vienna" "Europe/Warsaw" -## [49] "Europe/Zagreb" "Europe/Zurich" "MET" -## [52] "Poland" -``` - -```r -stri_timezone_list(offset=5.5) -``` - -``` -## [1] "Asia/Calcutta" "Asia/Colombo" "Asia/Kolkata" "IST" -``` - -```r -stri_timezone_list(offset=5.75) -``` - -``` -## [1] "Asia/Kathmandu" "Asia/Katmandu" -``` - -```r -stri_timezone_list(region='PL') -``` - -``` -## [1] "Europe/Warsaw" "Poland" -``` - -```r -stri_timezone_list(region='US', offset=-10) -``` - -``` -## [1] "America/Adak" "America/Atka" "Pacific/Honolulu" "US/Aleutian" -## [5] "US/Hawaii" -``` - -```r -# Fetch information on all time zones -do.call(rbind.data.frame, - lapply(stri_timezone_list(), function(tz) stri_timezone_info(tz))) -``` - -``` -## ID Name -## 1 ACT Australian Central Standard Time -## 2 AET Australian Eastern Standard Time -## 3 Africa/Abidjan Greenwich Mean Time -## 4 Africa/Accra Greenwich Mean Time -## 5 Africa/Addis_Ababa Eastern Africa Time -## 6 Africa/Algiers Central European Standard Time -## 7 Africa/Asmara Eastern Africa Time -## 8 Africa/Asmera Eastern Africa Time -## 9 Africa/Bamako Greenwich Mean Time -## 10 Africa/Bangui West Africa Standard Time -## 11 Africa/Banjul Greenwich Mean Time -## 12 Africa/Bissau Greenwich Mean Time -## 13 Africa/Blantyre Central Africa Time -## 14 Africa/Brazzaville West Africa Standard Time -## 15 Africa/Bujumbura Central Africa Time -## 16 Africa/Cairo Eastern European Standard Time -## 17 Africa/Casablanca GMT -## 18 Africa/Ceuta Central European Standard Time -## 19 Africa/Conakry Greenwich Mean Time -## 20 Africa/Dakar Greenwich Mean Time -## 21 Africa/Dar_es_Salaam Eastern Africa Time -## 22 Africa/Djibouti Eastern Africa Time -## 23 Africa/Douala West Africa Standard Time -## 24 Africa/El_Aaiun GMT -## 25 Africa/Freetown Greenwich Mean Time -## 26 Africa/Gaborone Central Africa Time -## 27 Africa/Harare Central Africa Time -## 28 Africa/Johannesburg South Africa Standard Time -## 29 Africa/Juba Central Africa Time -## 30 Africa/Kampala Eastern Africa Time -## 31 Africa/Khartoum Central Africa Time -## 32 Africa/Kigali Central Africa Time -## 33 Africa/Kinshasa West Africa Standard Time -## 34 Africa/Lagos West Africa Standard Time -## 35 Africa/Libreville West Africa Standard Time -## 36 Africa/Lome Greenwich Mean Time -## 37 Africa/Luanda West Africa Standard Time -## 38 Africa/Lubumbashi Central Africa Time -## 39 Africa/Lusaka Central Africa Time -## 40 Africa/Malabo West Africa Standard Time -## 41 Africa/Maputo Central Africa Time -## 42 Africa/Maseru South Africa Standard Time -## 43 Africa/Mbabane South Africa Standard Time -## 44 Africa/Mogadishu Eastern Africa Time -## 45 Africa/Monrovia Greenwich Mean Time -## 46 Africa/Nairobi Eastern Africa Time -## 47 Africa/Ndjamena West Africa Standard Time -## 48 Africa/Niamey West Africa Standard Time -## 49 Africa/Nouakchott Greenwich Mean Time -## 50 Africa/Ouagadougou Greenwich Mean Time -## 51 Africa/Porto-Novo West Africa Standard Time -## 52 Africa/Sao_Tome Greenwich Mean Time -## 53 Africa/Timbuktu Greenwich Mean Time -## 54 Africa/Tripoli Eastern European Standard Time -## 55 Africa/Tunis Central European Standard Time -## 56 Africa/Windhoek Central Africa Time -## 57 AGT Argentina Standard Time -## 58 America/Adak Hawaii-Aleutian Standard Time -## 59 America/Anchorage Alaska Standard Time -## 60 America/Anguilla Atlantic Standard Time -## 61 America/Antigua Atlantic Standard Time -## 62 America/Araguaina Brasilia Standard Time -## 63 America/Argentina/Buenos_Aires Argentina Standard Time -## 64 America/Argentina/Catamarca Argentina Standard Time -## 65 America/Argentina/ComodRivadavia Argentina Standard Time -## 66 America/Argentina/Cordoba Argentina Standard Time -## 67 America/Argentina/Jujuy Argentina Standard Time -## 68 America/Argentina/La_Rioja Argentina Standard Time -## 69 America/Argentina/Mendoza Argentina Standard Time -## 70 America/Argentina/Rio_Gallegos Argentina Standard Time -## 71 America/Argentina/Salta Argentina Standard Time -## 72 America/Argentina/San_Juan Argentina Standard Time -## 73 America/Argentina/San_Luis Argentina Standard Time -## 74 America/Argentina/Tucuman Argentina Standard Time -## 75 America/Argentina/Ushuaia Argentina Standard Time -## 76 America/Aruba Atlantic Standard Time -## 77 America/Asuncion Paraguay Standard Time -## 78 America/Atikokan Eastern Standard Time -## 79 America/Atka Hawaii-Aleutian Standard Time -## 80 America/Bahia Brasilia Standard Time -## 81 America/Bahia_Banderas Central Standard Time -## 82 America/Barbados Atlantic Standard Time -## 83 America/Belem Brasilia Standard Time -## 84 America/Belize Central Standard Time -## 85 America/Blanc-Sablon Atlantic Standard Time -## 86 America/Boa_Vista Amazon Standard Time -## 87 America/Bogota Colombia Standard Time -## 88 America/Boise Mountain Standard Time -## 89 America/Buenos_Aires Argentina Standard Time -## 90 America/Cambridge_Bay Mountain Standard Time -## 91 America/Campo_Grande Amazon Standard Time -## 92 America/Cancun Eastern Standard Time -## 93 America/Caracas Venezuela Time -## 94 America/Catamarca Argentina Standard Time -## 95 America/Cayenne French Guiana Time -## 96 America/Cayman Eastern Standard Time -## 97 America/Chicago Central Standard Time -## 98 America/Chihuahua Central Standard Time -## 99 America/Ciudad_Juarez Mountain Standard Time -## 100 America/Coral_Harbour Eastern Standard Time -## 101 America/Cordoba Argentina Standard Time -## 102 America/Costa_Rica Central Standard Time -## 103 America/Creston Mountain Standard Time -## 104 America/Cuiaba Amazon Standard Time -## 105 America/Curacao Atlantic Standard Time -## 106 America/Danmarkshavn Greenwich Mean Time -## 107 America/Dawson Yukon Time -## 108 America/Dawson_Creek Mountain Standard Time -## 109 America/Denver Mountain Standard Time -## 110 America/Detroit Eastern Standard Time -## 111 America/Dominica Atlantic Standard Time -## 112 America/Edmonton Mountain Standard Time -## 113 America/Eirunepe Acre Standard Time -## 114 America/El_Salvador Central Standard Time -## 115 America/Ensenada Pacific Standard Time -## 116 America/Fort_Nelson Mountain Standard Time -## 117 America/Fort_Wayne Eastern Standard Time -## 118 America/Fortaleza Brasilia Standard Time -## 119 America/Glace_Bay Atlantic Standard Time -## 120 America/Godthab West Greenland Standard Time -## 121 America/Goose_Bay Atlantic Standard Time -## 122 America/Grand_Turk Eastern Standard Time -## 123 America/Grenada Atlantic Standard Time -## 124 America/Guadeloupe Atlantic Standard Time -## 125 America/Guatemala Central Standard Time -## 126 America/Guayaquil Ecuador Time -## 127 America/Guyana Guyana Time -## 128 America/Halifax Atlantic Standard Time -## 129 America/Havana Cuba Standard Time -## 130 America/Hermosillo Mexican Pacific Standard Time -## 131 America/Indiana/Indianapolis Eastern Standard Time -## 132 America/Indiana/Knox Central Standard Time -## 133 America/Indiana/Marengo Eastern Standard Time -## 134 America/Indiana/Petersburg Eastern Standard Time -## 135 America/Indiana/Tell_City Central Standard Time -## 136 America/Indiana/Vevay Eastern Standard Time -## 137 America/Indiana/Vincennes Eastern Standard Time -## 138 America/Indiana/Winamac Eastern Standard Time -## 139 America/Indianapolis Eastern Standard Time -## 140 America/Inuvik Mountain Standard Time -## 141 America/Iqaluit Eastern Standard Time -## 142 America/Jamaica Eastern Standard Time -## 143 America/Jujuy Argentina Standard Time -## 144 America/Juneau Alaska Standard Time -## 145 America/Kentucky/Louisville Eastern Standard Time -## 146 America/Kentucky/Monticello Eastern Standard Time -## 147 America/Knox_IN Central Standard Time -## 148 America/Kralendijk Atlantic Standard Time -## 149 America/La_Paz Bolivia Time -## 150 America/Lima Peru Standard Time -## 151 America/Los_Angeles Pacific Standard Time -## 152 America/Louisville Eastern Standard Time -## 153 America/Lower_Princes Atlantic Standard Time -## 154 America/Maceio Brasilia Standard Time -## 155 America/Managua Central Standard Time -## 156 America/Manaus Amazon Standard Time -## 157 America/Marigot Atlantic Standard Time -## 158 America/Martinique Atlantic Standard Time -## 159 America/Matamoros Central Standard Time -## 160 America/Mazatlan Mexican Pacific Standard Time -## 161 America/Mendoza Argentina Standard Time -## 162 America/Menominee Central Standard Time -## 163 America/Merida Central Standard Time -## 164 America/Metlakatla Alaska Standard Time -## 165 America/Mexico_City Central Standard Time -## 166 America/Miquelon St Pierre & Miquelon Standard Time -## 167 America/Moncton Atlantic Standard Time -## 168 America/Monterrey Central Standard Time -## 169 America/Montevideo Uruguay Standard Time -## 170 America/Montreal Eastern Standard Time -## 171 America/Montserrat Atlantic Standard Time -## 172 America/Nassau Eastern Standard Time -## 173 America/New_York Eastern Standard Time -## 174 America/Nipigon Eastern Standard Time -## 175 America/Nome Alaska Standard Time -## 176 America/Noronha Fernando de Noronha Standard Time -## 177 America/North_Dakota/Beulah Central Standard Time -## 178 America/North_Dakota/Center Central Standard Time -## 179 America/North_Dakota/New_Salem Central Standard Time -## 180 America/Nuuk West Greenland Standard Time -## 181 America/Ojinaga Central Standard Time -## 182 America/Panama Eastern Standard Time -## 183 America/Pangnirtung Eastern Standard Time -## 184 America/Paramaribo Suriname Time -## 185 America/Phoenix Mountain Standard Time -## 186 America/Port_of_Spain Atlantic Standard Time -## 187 America/Port-au-Prince Eastern Standard Time -## 188 America/Porto_Acre Acre Standard Time -## 189 America/Porto_Velho Amazon Standard Time -## 190 America/Puerto_Rico Atlantic Standard Time -## 191 America/Punta_Arenas GMT-03:00 -## 192 America/Rainy_River Central Standard Time -## 193 America/Rankin_Inlet Central Standard Time -## 194 America/Recife Brasilia Standard Time -## 195 America/Regina Central Standard Time -## 196 America/Resolute Central Standard Time -## 197 America/Rio_Branco Acre Standard Time -## 198 America/Rosario Argentina Standard Time -## 199 America/Santa_Isabel Pacific Standard Time -## 200 America/Santarem Brasilia Standard Time -## 201 America/Santiago Chile Standard Time -## 202 America/Santo_Domingo Atlantic Standard Time -## 203 America/Sao_Paulo Brasilia Standard Time -## 204 America/Scoresbysund East Greenland Standard Time -## 205 America/Shiprock Mountain Standard Time -## 206 America/Sitka Alaska Standard Time -## 207 America/St_Barthelemy Atlantic Standard Time -## 208 America/St_Johns Newfoundland Standard Time -## 209 America/St_Kitts Atlantic Standard Time -## 210 America/St_Lucia Atlantic Standard Time -## 211 America/St_Thomas Atlantic Standard Time -## 212 America/St_Vincent Atlantic Standard Time -## 213 America/Swift_Current Central Standard Time -## 214 America/Tegucigalpa Central Standard Time -## 215 America/Thule Atlantic Standard Time -## 216 America/Thunder_Bay Eastern Standard Time -## 217 America/Tijuana Pacific Standard Time -## 218 America/Toronto Eastern Standard Time -## 219 America/Tortola Atlantic Standard Time -## 220 America/Vancouver Pacific Standard Time -## 221 America/Virgin Atlantic Standard Time -## 222 America/Whitehorse Yukon Time -## 223 America/Winnipeg Central Standard Time -## 224 America/Yakutat Alaska Standard Time -## 225 America/Yellowknife Mountain Standard Time -## 226 Antarctica/Casey Casey Time -## 227 Antarctica/Davis Davis Time -## 228 Antarctica/DumontDUrville Dumont-d’Urville Time -## 229 Antarctica/Macquarie Australian Eastern Standard Time -## 230 Antarctica/Mawson Mawson Time -## 231 Antarctica/McMurdo New Zealand Standard Time -## 232 Antarctica/Palmer GMT-03:00 -## 233 Antarctica/Rothera Rothera Time -## 234 Antarctica/South_Pole New Zealand Standard Time -## 235 Antarctica/Syowa Syowa Time -## 236 Antarctica/Troll Greenwich Mean Time -## 237 Antarctica/Vostok Vostok Time -## 238 Arctic/Longyearbyen Central European Standard Time -## 239 ART Eastern European Standard Time -## 240 Asia/Aden Arabia Standard Time -## 241 Asia/Almaty East Kazakhstan Time -## 242 Asia/Amman GMT+03:00 -## 243 Asia/Anadyr Anadyr Standard Time -## 244 Asia/Aqtau West Kazakhstan Time -## 245 Asia/Aqtobe West Kazakhstan Time -## 246 Asia/Ashgabat Turkmenistan Standard Time -## 247 Asia/Ashkhabad Turkmenistan Standard Time -## 248 Asia/Atyrau West Kazakhstan Time -## 249 Asia/Baghdad Arabia Standard Time -## 250 Asia/Bahrain Arabia Standard Time -## 251 Asia/Baku Azerbaijan Standard Time -## 252 Asia/Bangkok Indochina Time -## 253 Asia/Barnaul GMT+07:00 -## 254 Asia/Beirut Eastern European Standard Time -## 255 Asia/Bishkek Kyrgyzstan Time -## 256 Asia/Brunei Brunei Darussalam Time -## 257 Asia/Calcutta India Standard Time -## 258 Asia/Chita Yakutsk Standard Time -## 259 Asia/Choibalsan Ulaanbaatar Standard Time -## 260 Asia/Chongqing China Standard Time -## 261 Asia/Chungking China Standard Time -## 262 Asia/Colombo India Standard Time -## 263 Asia/Dacca Bangladesh Standard Time -## 264 Asia/Damascus GMT+03:00 -## 265 Asia/Dhaka Bangladesh Standard Time -## 266 Asia/Dili East Timor Time -## 267 Asia/Dubai Gulf Standard Time -## 268 Asia/Dushanbe Tajikistan Time -## 269 Asia/Famagusta GMT+02:00 -## 270 Asia/Gaza Eastern European Standard Time -## 271 Asia/Harbin China Standard Time -## 272 Asia/Hebron Eastern European Standard Time -## 273 Asia/Ho_Chi_Minh Indochina Time -## 274 Asia/Hong_Kong Hong Kong Standard Time -## 275 Asia/Hovd Hovd Standard Time -## 276 Asia/Irkutsk Irkutsk Standard Time -## 277 Asia/Istanbul GMT+03:00 -## 278 Asia/Jakarta Western Indonesia Time -## 279 Asia/Jayapura Eastern Indonesia Time -## 280 Asia/Jerusalem Israel Standard Time -## 281 Asia/Kabul Afghanistan Time -## 282 Asia/Kamchatka Petropavlovsk-Kamchatski Standard Time -## 283 Asia/Karachi Pakistan Standard Time -## 284 Asia/Kashgar GMT+06:00 -## 285 Asia/Kathmandu Nepal Time -## 286 Asia/Katmandu Nepal Time -## 287 Asia/Khandyga Yakutsk Standard Time -## 288 Asia/Kolkata India Standard Time -## 289 Asia/Krasnoyarsk Krasnoyarsk Standard Time -## 290 Asia/Kuala_Lumpur Malaysia Time -## 291 Asia/Kuching Malaysia Time -## 292 Asia/Kuwait Arabia Standard Time -## 293 Asia/Macao China Standard Time -## 294 Asia/Macau China Standard Time -## 295 Asia/Magadan Magadan Standard Time -## 296 Asia/Makassar Central Indonesia Time -## 297 Asia/Manila Philippine Standard Time -## 298 Asia/Muscat Gulf Standard Time -## 299 Asia/Nicosia Eastern European Standard Time -## 300 Asia/Novokuznetsk Krasnoyarsk Standard Time -## 301 Asia/Novosibirsk Novosibirsk Standard Time -## 302 Asia/Omsk Omsk Standard Time -## 303 Asia/Oral West Kazakhstan Time -## 304 Asia/Phnom_Penh Indochina Time -## 305 Asia/Pontianak Western Indonesia Time -## 306 Asia/Pyongyang Korean Standard Time -## 307 Asia/Qatar Arabia Standard Time -## 308 Asia/Qostanay East Kazakhstan Time -## 309 Asia/Qyzylorda West Kazakhstan Time -## 310 Asia/Rangoon Myanmar Time -## 311 Asia/Riyadh Arabia Standard Time -## 312 Asia/Saigon Indochina Time -## 313 Asia/Sakhalin Sakhalin Standard Time -## 314 Asia/Samarkand Uzbekistan Standard Time -## 315 Asia/Seoul Korean Standard Time -## 316 Asia/Shanghai China Standard Time -## 317 Asia/Singapore Singapore Standard Time -## 318 Asia/Srednekolymsk GMT+11:00 -## 319 Asia/Taipei Taipei Standard Time -## 320 Asia/Tashkent Uzbekistan Standard Time -## 321 Asia/Tbilisi Georgia Standard Time -## 322 Asia/Tehran Iran Standard Time -## 323 Asia/Tel_Aviv Israel Standard Time -## 324 Asia/Thimbu Bhutan Time -## 325 Asia/Thimphu Bhutan Time -## 326 Asia/Tokyo Japan Standard Time -## 327 Asia/Tomsk GMT+07:00 -## 328 Asia/Ujung_Pandang Central Indonesia Time -## 329 Asia/Ulaanbaatar Ulaanbaatar Standard Time -## 330 Asia/Ulan_Bator Ulaanbaatar Standard Time -## 331 Asia/Urumqi GMT+06:00 -## 332 Asia/Ust-Nera Vladivostok Standard Time -## 333 Asia/Vientiane Indochina Time -## 334 Asia/Vladivostok Vladivostok Standard Time -## 335 Asia/Yakutsk Yakutsk Standard Time -## 336 Asia/Yangon Myanmar Time -## 337 Asia/Yekaterinburg Yekaterinburg Standard Time -## 338 Asia/Yerevan Armenia Standard Time -## 339 AST Alaska Standard Time -## 340 Atlantic/Azores Azores Standard Time -## 341 Atlantic/Bermuda Atlantic Standard Time -## 342 Atlantic/Canary Western European Standard Time -## 343 Atlantic/Cape_Verde Cape Verde Standard Time -## 344 Atlantic/Faeroe Western European Standard Time -## 345 Atlantic/Faroe Western European Standard Time -## 346 Atlantic/Jan_Mayen Central European Standard Time -## 347 Atlantic/Madeira Western European Standard Time -## 348 Atlantic/Reykjavik Greenwich Mean Time -## 349 Atlantic/South_Georgia South Georgia Time -## 350 Atlantic/St_Helena Greenwich Mean Time -## 351 Atlantic/Stanley Falkland Islands Standard Time -## 352 Australia/ACT Australian Eastern Standard Time -## 353 Australia/Adelaide Australian Central Standard Time -## 354 Australia/Brisbane Australian Eastern Standard Time -## 355 Australia/Broken_Hill Australian Central Standard Time -## 356 Australia/Canberra Australian Eastern Standard Time -## 357 Australia/Currie Australian Eastern Standard Time -## 358 Australia/Darwin Australian Central Standard Time -## 359 Australia/Eucla Australian Central Western Standard Time -## 360 Australia/Hobart Australian Eastern Standard Time -## 361 Australia/LHI Lord Howe Standard Time -## 362 Australia/Lindeman Australian Eastern Standard Time -## 363 Australia/Lord_Howe Lord Howe Standard Time -## 364 Australia/Melbourne Australian Eastern Standard Time -## 365 Australia/North Australian Central Standard Time -## 366 Australia/NSW Australian Eastern Standard Time -## 367 Australia/Perth Australian Western Standard Time -## 368 Australia/Queensland Australian Eastern Standard Time -## 369 Australia/South Australian Central Standard Time -## 370 Australia/Sydney Australian Eastern Standard Time -## 371 Australia/Tasmania Australian Eastern Standard Time -## 372 Australia/Victoria Australian Eastern Standard Time -## 373 Australia/West Australian Western Standard Time -## 374 Australia/Yancowinna Australian Central Standard Time -## 375 BET Brasilia Standard Time -## 376 Brazil/Acre Acre Standard Time -## 377 Brazil/DeNoronha Fernando de Noronha Standard Time -## 378 Brazil/East Brasilia Standard Time -## 379 Brazil/West Amazon Standard Time -## 380 BST Bangladesh Standard Time -## 381 Canada/Atlantic Atlantic Standard Time -## 382 Canada/Central Central Standard Time -## 383 Canada/East-Saskatchewan Central Standard Time -## 384 Canada/Eastern Eastern Standard Time -## 385 Canada/Mountain Mountain Standard Time -## 386 Canada/Newfoundland Newfoundland Standard Time -## 387 Canada/Pacific Pacific Standard Time -## 388 Canada/Saskatchewan Central Standard Time -## 389 Canada/Yukon Yukon Time -## 390 CAT Central Africa Time -## 391 CET GMT+01:00 -## 392 Chile/Continental Chile Standard Time -## 393 Chile/EasterIsland Easter Island Standard Time -## 394 CNT Newfoundland Standard Time -## 395 CST Central Standard Time -## 396 CST6CDT Central Standard Time -## 397 CTT China Standard Time -## 398 Cuba Cuba Standard Time -## 399 EAT Eastern Africa Time -## 400 ECT Central European Standard Time -## 401 EET GMT+02:00 -## 402 Egypt Eastern European Standard Time -## 403 Eire Greenwich Mean Time -## 404 EST GMT-05:00 -## 405 EST5EDT Eastern Standard Time -## 406 Etc/GMT Greenwich Mean Time -## 407 Etc/GMT-0 Greenwich Mean Time -## 408 Etc/GMT-1 GMT+01:00 -## 409 Etc/GMT-2 GMT+02:00 -## 410 Etc/GMT-3 GMT+03:00 -## 411 Etc/GMT-4 GMT+04:00 -## 412 Etc/GMT-5 GMT+05:00 -## 413 Etc/GMT-6 GMT+06:00 -## 414 Etc/GMT-7 GMT+07:00 -## 415 Etc/GMT-8 GMT+08:00 -## 416 Etc/GMT-9 GMT+09:00 -## 417 Etc/GMT-10 GMT+10:00 -## 418 Etc/GMT-11 GMT+11:00 -## 419 Etc/GMT-12 GMT+12:00 -## 420 Etc/GMT-13 GMT+13:00 -## 421 Etc/GMT-14 GMT+14:00 -## 422 Etc/GMT+0 Greenwich Mean Time -## 423 Etc/GMT+1 GMT-01:00 -## 424 Etc/GMT+2 GMT-02:00 -## 425 Etc/GMT+3 GMT-03:00 -## 426 Etc/GMT+4 GMT-04:00 -## 427 Etc/GMT+5 GMT-05:00 -## 428 Etc/GMT+6 GMT-06:00 -## 429 Etc/GMT+7 GMT-07:00 -## 430 Etc/GMT+8 GMT-08:00 -## 431 Etc/GMT+9 GMT-09:00 -## 432 Etc/GMT+10 GMT-10:00 -## 433 Etc/GMT+11 GMT-11:00 -## 434 Etc/GMT+12 GMT-12:00 -## 435 Etc/GMT0 Greenwich Mean Time -## 436 Etc/Greenwich Greenwich Mean Time -## 437 Etc/UCT Coordinated Universal Time -## 438 Etc/Universal Coordinated Universal Time -## 439 Etc/UTC Coordinated Universal Time -## 440 Etc/Zulu Coordinated Universal Time -## 441 Europe/Amsterdam Central European Standard Time -## 442 Europe/Andorra Central European Standard Time -## 443 Europe/Astrakhan GMT+04:00 -## 444 Europe/Athens Eastern European Standard Time -## 445 Europe/Belfast Greenwich Mean Time -## 446 Europe/Belgrade Central European Standard Time -## 447 Europe/Berlin Central European Standard Time -## 448 Europe/Bratislava Central European Standard Time -## 449 Europe/Brussels Central European Standard Time -## 450 Europe/Bucharest Eastern European Standard Time -## 451 Europe/Budapest Central European Standard Time -## 452 Europe/Busingen Central European Standard Time -## 453 Europe/Chisinau Eastern European Standard Time -## 454 Europe/Copenhagen Central European Standard Time -## 455 Europe/Dublin Greenwich Mean Time -## 456 Europe/Gibraltar Central European Standard Time -## 457 Europe/Guernsey Greenwich Mean Time -## 458 Europe/Helsinki Eastern European Standard Time -## 459 Europe/Isle_of_Man Greenwich Mean Time -## 460 Europe/Istanbul GMT+03:00 -## 461 Europe/Jersey Greenwich Mean Time -## 462 Europe/Kaliningrad Eastern European Standard Time -## 463 Europe/Kiev Eastern European Standard Time -## 464 Europe/Kirov GMT+03:00 -## 465 Europe/Kyiv Eastern European Standard Time -## 466 Europe/Lisbon Western European Standard Time -## 467 Europe/Ljubljana Central European Standard Time -## 468 Europe/London Greenwich Mean Time -## 469 Europe/Luxembourg Central European Standard Time -## 470 Europe/Madrid Central European Standard Time -## 471 Europe/Malta Central European Standard Time -## 472 Europe/Mariehamn Eastern European Standard Time -## 473 Europe/Minsk Moscow Standard Time -## 474 Europe/Monaco Central European Standard Time -## 475 Europe/Moscow Moscow Standard Time -## 476 Europe/Nicosia Eastern European Standard Time -## 477 Europe/Oslo Central European Standard Time -## 478 Europe/Paris Central European Standard Time -## 479 Europe/Podgorica Central European Standard Time -## 480 Europe/Prague Central European Standard Time -## 481 Europe/Riga Eastern European Standard Time -## 482 Europe/Rome Central European Standard Time -## 483 Europe/Samara Samara Standard Time -## 484 Europe/San_Marino Central European Standard Time -## 485 Europe/Sarajevo Central European Standard Time -## 486 Europe/Saratov GMT+04:00 -## 487 Europe/Simferopol Moscow Standard Time -## 488 Europe/Skopje Central European Standard Time -## 489 Europe/Sofia Eastern European Standard Time -## 490 Europe/Stockholm Central European Standard Time -## 491 Europe/Tallinn Eastern European Standard Time -## 492 Europe/Tirane Central European Standard Time -## 493 Europe/Tiraspol Eastern European Standard Time -## 494 Europe/Ulyanovsk GMT+04:00 -## 495 Europe/Uzhgorod Eastern European Standard Time -## 496 Europe/Vaduz Central European Standard Time -## 497 Europe/Vatican Central European Standard Time -## 498 Europe/Vienna Central European Standard Time -## 499 Europe/Vilnius Eastern European Standard Time -## 500 Europe/Volgograd Volgograd Standard Time -## 501 Europe/Warsaw Central European Standard Time -## 502 Europe/Zagreb Central European Standard Time -## 503 Europe/Zaporozhye Eastern European Standard Time -## 504 Europe/Zurich Central European Standard Time -## 505 Factory GMT -## 506 GB Greenwich Mean Time -## 507 GB-Eire Greenwich Mean Time -## 508 GMT Greenwich Mean Time -## 509 GMT-0 Greenwich Mean Time -## 510 GMT+0 Greenwich Mean Time -## 511 GMT0 Greenwich Mean Time -## 512 Greenwich Greenwich Mean Time -## 513 Hongkong Hong Kong Standard Time -## 514 HST GMT-10:00 -## 515 Iceland Greenwich Mean Time -## 516 IET Eastern Standard Time -## 517 Indian/Antananarivo Eastern Africa Time -## 518 Indian/Chagos Indian Ocean Time -## 519 Indian/Christmas Christmas Island Time -## 520 Indian/Cocos Cocos Islands Time -## 521 Indian/Comoro Eastern Africa Time -## 522 Indian/Kerguelen French Southern & Antarctic Time -## 523 Indian/Mahe Seychelles Time -## 524 Indian/Maldives Maldives Time -## 525 Indian/Mauritius Mauritius Standard Time -## 526 Indian/Mayotte Eastern Africa Time -## 527 Indian/Reunion Réunion Time -## 528 Iran Iran Standard Time -## 529 Israel Israel Standard Time -## 530 IST India Standard Time -## 531 Jamaica Eastern Standard Time -## 532 Japan Japan Standard Time -## 533 JST Japan Standard Time -## 534 Kwajalein Marshall Islands Time -## 535 Libya Eastern European Standard Time -## 536 MET GMT+01:00 -## 537 Mexico/BajaNorte Pacific Standard Time -## 538 Mexico/BajaSur Mexican Pacific Standard Time -## 539 Mexico/General Central Standard Time -## 540 MIT Apia Standard Time -## 541 MST GMT-07:00 -## 542 MST7MDT Mountain Standard Time -## 543 Navajo Mountain Standard Time -## 544 NET Armenia Standard Time -## 545 NST New Zealand Standard Time -## 546 NZ New Zealand Standard Time -## 547 NZ-CHAT Chatham Standard Time -## 548 Pacific/Apia Apia Standard Time -## 549 Pacific/Auckland New Zealand Standard Time -## 550 Pacific/Bougainville GMT+11:00 -## 551 Pacific/Chatham Chatham Standard Time -## 552 Pacific/Chuuk Chuuk Time -## 553 Pacific/Easter Easter Island Standard Time -## 554 Pacific/Efate Vanuatu Standard Time -## 555 Pacific/Enderbury Phoenix Islands Time -## 556 Pacific/Fakaofo Tokelau Time -## 557 Pacific/Fiji Fiji Standard Time -## 558 Pacific/Funafuti Tuvalu Time -## 559 Pacific/Galapagos Galapagos Time -## 560 Pacific/Gambier Gambier Time -## 561 Pacific/Guadalcanal Solomon Islands Time -## 562 Pacific/Guam Chamorro Standard Time -## 563 Pacific/Honolulu Hawaii-Aleutian Standard Time -## 564 Pacific/Johnston Hawaii-Aleutian Standard Time -## 565 Pacific/Kanton Phoenix Islands Time -## 566 Pacific/Kiritimati Line Islands Time -## 567 Pacific/Kosrae Kosrae Time -## 568 Pacific/Kwajalein Marshall Islands Time -## 569 Pacific/Majuro Marshall Islands Time -## 570 Pacific/Marquesas Marquesas Time -## 571 Pacific/Midway Samoa Standard Time -## 572 Pacific/Nauru Nauru Time -## 573 Pacific/Niue Niue Time -## 574 Pacific/Norfolk Norfolk Island Standard Time -## 575 Pacific/Noumea New Caledonia Standard Time -## 576 Pacific/Pago_Pago Samoa Standard Time -## 577 Pacific/Palau Palau Time -## 578 Pacific/Pitcairn Pitcairn Time -## 579 Pacific/Pohnpei Ponape Time -## 580 Pacific/Ponape Ponape Time -## 581 Pacific/Port_Moresby Papua New Guinea Time -## 582 Pacific/Rarotonga Cook Island Standard Time -## 583 Pacific/Saipan Chamorro Standard Time -## 584 Pacific/Samoa Samoa Standard Time -## 585 Pacific/Tahiti Tahiti Time -## 586 Pacific/Tarawa Gilbert Islands Time -## 587 Pacific/Tongatapu Tonga Standard Time -## 588 Pacific/Truk Chuuk Time -## 589 Pacific/Wake Wake Island Time -## 590 Pacific/Wallis Wallis & Futuna Time -## 591 Pacific/Yap Chuuk Time -## 592 PLT Pakistan Standard Time -## 593 PNT Mountain Standard Time -## 594 Poland Central European Standard Time -## 595 Portugal Western European Standard Time -## 596 PRC China Standard Time -## 597 PRT Atlantic Standard Time -## 598 PST Pacific Standard Time -## 599 PST8PDT Pacific Standard Time -## 600 ROC Taipei Standard Time -## 601 ROK Korean Standard Time -## 602 Singapore Singapore Standard Time -## 603 SST Solomon Islands Time -## 604 SystemV/AST4 GMT-04:00 -## 605 SystemV/AST4ADT GMT-04:00 -## 606 SystemV/CST6 GMT-06:00 -## 607 SystemV/CST6CDT GMT-06:00 -## 608 SystemV/EST5 GMT-05:00 -## 609 SystemV/EST5EDT GMT-05:00 -## 610 SystemV/HST10 GMT-10:00 -## 611 SystemV/MST7 GMT-07:00 -## 612 SystemV/MST7MDT GMT-07:00 -## 613 SystemV/PST8 GMT-08:00 -## 614 SystemV/PST8PDT GMT-08:00 -## 615 SystemV/YST9 GMT-09:00 -## 616 SystemV/YST9YDT GMT-09:00 -## 617 Turkey GMT+03:00 -## 618 UCT Coordinated Universal Time -## 619 Universal Coordinated Universal Time -## 620 US/Alaska Alaska Standard Time -## 621 US/Aleutian Hawaii-Aleutian Standard Time -## 622 US/Arizona Mountain Standard Time -## 623 US/Central Central Standard Time -## 624 US/East-Indiana Eastern Standard Time -## 625 US/Eastern Eastern Standard Time -## 626 US/Hawaii Hawaii-Aleutian Standard Time -## 627 US/Indiana-Starke Central Standard Time -## 628 US/Michigan Eastern Standard Time -## 629 US/Mountain Mountain Standard Time -## 630 US/Pacific Pacific Standard Time -## 631 US/Pacific-New Pacific Standard Time -## 632 US/Samoa Samoa Standard Time -## 633 UTC Coordinated Universal Time -## 634 VST Indochina Time -## 635 W-SU Moscow Standard Time -## 636 WET GMT -## 637 Zulu Coordinated Universal Time -## Name.Daylight Name.Windows -## 1 AUS Central Standard Time -## 2 Australian Eastern Daylight Time AUS Eastern Standard Time -## 3 Greenwich Standard Time -## 4 Greenwich Standard Time -## 5 E. Africa Standard Time -## 6 W. Central Africa Standard Time -## 7 E. Africa Standard Time -## 8 E. Africa Standard Time -## 9 Greenwich Standard Time -## 10 W. Central Africa Standard Time -## 11 Greenwich Standard Time -## 12 Greenwich Standard Time -## 13 South Africa Standard Time -## 14 W. Central Africa Standard Time -## 15 South Africa Standard Time -## 16 Eastern European Summer Time Egypt Standard Time -## 17 GMT+01:00 Morocco Standard Time -## 18 Central European Summer Time Romance Standard Time -## 19 Greenwich Standard Time -## 20 Greenwich Standard Time -## 21 E. Africa Standard Time -## 22 E. Africa Standard Time -## 23 W. Central Africa Standard Time -## 24 GMT+01:00 Morocco Standard Time -## 25 Greenwich Standard Time -## 26 South Africa Standard Time -## 27 South Africa Standard Time -## 28 South Africa Standard Time -## 29 South Sudan Standard Time -## 30 E. Africa Standard Time -## 31 Sudan Standard Time -## 32 South Africa Standard Time -## 33 W. Central Africa Standard Time -## 34 W. Central Africa Standard Time -## 35 W. Central Africa Standard Time -## 36 Greenwich Standard Time -## 37 W. Central Africa Standard Time -## 38 South Africa Standard Time -## 39 South Africa Standard Time -## 40 W. Central Africa Standard Time -## 41 South Africa Standard Time -## 42 South Africa Standard Time -## 43 South Africa Standard Time -## 44 E. Africa Standard Time -## 45 Greenwich Standard Time -## 46 E. Africa Standard Time -## 47 W. Central Africa Standard Time -## 48 W. Central Africa Standard Time -## 49 Greenwich Standard Time -## 50 Greenwich Standard Time -## 51 W. Central Africa Standard Time -## 52 Sao Tome Standard Time -## 53 Greenwich Standard Time -## 54 Libya Standard Time -## 55 W. Central Africa Standard Time -## 56 Namibia Standard Time -## 57 Argentina Standard Time -## 58 Hawaii-Aleutian Daylight Time Aleutian Standard Time -## 59 Alaska Daylight Time Alaskan Standard Time -## 60 SA Western Standard Time -## 61 SA Western Standard Time -## 62 Tocantins Standard Time -## 63 Argentina Standard Time -## 64 Argentina Standard Time -## 65 Argentina Standard Time -## 66 Argentina Standard Time -## 67 Argentina Standard Time -## 68 Argentina Standard Time -## 69 Argentina Standard Time -## 70 Argentina Standard Time -## 71 Argentina Standard Time -## 72 Argentina Standard Time -## 73 Argentina Standard Time -## 74 Argentina Standard Time -## 75 Argentina Standard Time -## 76 SA Western Standard Time -## 77 Paraguay Summer Time Paraguay Standard Time -## 78 SA Pacific Standard Time -## 79 Hawaii-Aleutian Daylight Time Aleutian Standard Time -## 80 Bahia Standard Time -## 81 Central Standard Time (Mexico) -## 82 SA Western Standard Time -## 83 SA Eastern Standard Time -## 84 Central America Standard Time -## 85 SA Western Standard Time -## 86 SA Western Standard Time -## 87 SA Pacific Standard Time -## 88 Mountain Daylight Time Mountain Standard Time -## 89 Argentina Standard Time -## 90 Mountain Daylight Time Mountain Standard Time -## 91 Central Brazilian Standard Time -## 92 Eastern Standard Time (Mexico) -## 93 Venezuela Standard Time -## 94 Argentina Standard Time -## 95 SA Eastern Standard Time -## 96 SA Pacific Standard Time -## 97 Central Daylight Time Central Standard Time -## 98 Central Standard Time (Mexico) -## 99 Mountain Daylight Time Mountain Standard Time -## 100 SA Pacific Standard Time -## 101 Argentina Standard Time -## 102 Central America Standard Time -## 103 US Mountain Standard Time -## 104 Central Brazilian Standard Time -## 105 SA Western Standard Time -## 106 Greenwich Standard Time -## 107 Yukon Standard Time -## 108 US Mountain Standard Time -## 109 Mountain Daylight Time Mountain Standard Time -## 110 Eastern Daylight Time Eastern Standard Time -## 111 SA Western Standard Time -## 112 Mountain Daylight Time Mountain Standard Time -## 113 SA Pacific Standard Time -## 114 Central America Standard Time -## 115 Pacific Daylight Time Pacific Standard Time (Mexico) -## 116 US Mountain Standard Time -## 117 Eastern Daylight Time US Eastern Standard Time -## 118 SA Eastern Standard Time -## 119 Atlantic Daylight Time Atlantic Standard Time -## 120 West Greenland Summer Time Greenland Standard Time -## 121 Atlantic Daylight Time Atlantic Standard Time -## 122 Eastern Daylight Time Turks And Caicos Standard Time -## 123 SA Western Standard Time -## 124 SA Western Standard Time -## 125 Central America Standard Time -## 126 SA Pacific Standard Time -## 127 SA Western Standard Time -## 128 Atlantic Daylight Time Atlantic Standard Time -## 129 Cuba Daylight Time Cuba Standard Time -## 130 US Mountain Standard Time -## 131 Eastern Daylight Time US Eastern Standard Time -## 132 Central Daylight Time Central Standard Time -## 133 Eastern Daylight Time US Eastern Standard Time -## 134 Eastern Daylight Time Eastern Standard Time -## 135 Central Daylight Time Central Standard Time -## 136 Eastern Daylight Time US Eastern Standard Time -## 137 Eastern Daylight Time Eastern Standard Time -## 138 Eastern Daylight Time Eastern Standard Time -## 139 Eastern Daylight Time US Eastern Standard Time -## 140 Mountain Daylight Time Mountain Standard Time -## 141 Eastern Daylight Time Eastern Standard Time -## 142 SA Pacific Standard Time -## 143 Argentina Standard Time -## 144 Alaska Daylight Time Alaskan Standard Time -## 145 Eastern Daylight Time Eastern Standard Time -## 146 Eastern Daylight Time Eastern Standard Time -## 147 Central Daylight Time Central Standard Time -## 148 SA Western Standard Time -## 149 SA Western Standard Time -## 150 SA Pacific Standard Time -## 151 Pacific Daylight Time Pacific Standard Time -## 152 Eastern Daylight Time Eastern Standard Time -## 153 SA Western Standard Time -## 154 SA Eastern Standard Time -## 155 Central America Standard Time -## 156 SA Western Standard Time -## 157 SA Western Standard Time -## 158 SA Western Standard Time -## 159 Central Daylight Time Central Standard Time -## 160 Mountain Standard Time (Mexico) -## 161 Argentina Standard Time -## 162 Central Daylight Time Central Standard Time -## 163 Central Standard Time (Mexico) -## 164 Alaska Daylight Time Alaskan Standard Time -## 165 Central Standard Time (Mexico) -## 166 St Pierre & Miquelon Daylight Time Saint Pierre Standard Time -## 167 Atlantic Daylight Time Atlantic Standard Time -## 168 Central Standard Time (Mexico) -## 169 Montevideo Standard Time -## 170 Eastern Daylight Time Eastern Standard Time -## 171 SA Western Standard Time -## 172 Eastern Daylight Time Eastern Standard Time -## 173 Eastern Daylight Time Eastern Standard Time -## 174 Eastern Daylight Time Eastern Standard Time -## 175 Alaska Daylight Time Alaskan Standard Time -## 176 UTC-02 -## 177 Central Daylight Time Central Standard Time -## 178 Central Daylight Time Central Standard Time -## 179 Central Daylight Time Central Standard Time -## 180 West Greenland Summer Time Greenland Standard Time -## 181 Central Daylight Time Central Standard Time -## 182 SA Pacific Standard Time -## 183 Eastern Daylight Time Eastern Standard Time -## 184 SA Eastern Standard Time -## 185 US Mountain Standard Time -## 186 SA Western Standard Time -## 187 Eastern Daylight Time Haiti Standard Time -## 188 SA Pacific Standard Time -## 189 SA Western Standard Time -## 190 SA Western Standard Time -## 191 Magallanes Standard Time -## 192 Central Daylight Time Central Standard Time -## 193 Central Daylight Time Central Standard Time -## 194 SA Eastern Standard Time -## 195 Canada Central Standard Time -## 196 Central Daylight Time Central Standard Time -## 197 SA Pacific Standard Time -## 198 Argentina Standard Time -## 199 Pacific Daylight Time Pacific Standard Time (Mexico) -## 200 SA Eastern Standard Time -## 201 Chile Summer Time Pacific SA Standard Time -## 202 SA Western Standard Time -## 203 E. South America Standard Time -## 204 East Greenland Summer Time Azores Standard Time -## 205 Mountain Daylight Time Mountain Standard Time -## 206 Alaska Daylight Time Alaskan Standard Time -## 207 SA Western Standard Time -## 208 Newfoundland Daylight Time Newfoundland Standard Time -## 209 SA Western Standard Time -## 210 SA Western Standard Time -## 211 SA Western Standard Time -## 212 SA Western Standard Time -## 213 Canada Central Standard Time -## 214 Central America Standard Time -## 215 Atlantic Daylight Time Atlantic Standard Time -## 216 Eastern Daylight Time Eastern Standard Time -## 217 Pacific Daylight Time Pacific Standard Time (Mexico) -## 218 Eastern Daylight Time Eastern Standard Time -## 219 SA Western Standard Time -## 220 Pacific Daylight Time Pacific Standard Time -## 221 SA Western Standard Time -## 222 Yukon Standard Time -## 223 Central Daylight Time Central Standard Time -## 224 Alaska Daylight Time Alaskan Standard Time -## 225 Mountain Daylight Time Mountain Standard Time -## 226 Central Pacific Standard Time -## 227 SE Asia Standard Time -## 228 West Pacific Standard Time -## 229 Australian Eastern Daylight Time Tasmania Standard Time -## 230 West Asia Standard Time -## 231 New Zealand Daylight Time New Zealand Standard Time -## 232 SA Eastern Standard Time -## 233 SA Eastern Standard Time -## 234 New Zealand Daylight Time New Zealand Standard Time -## 235 E. Africa Standard Time -## 236 GMT+02:00 -## 237 Central Asia Standard Time -## 238 Central European Summer Time W. Europe Standard Time -## 239 Eastern European Summer Time Egypt Standard Time -## 240 Arab Standard Time -## 241 Central Asia Standard Time -## 242 Jordan Standard Time -## 243 Russia Time Zone 11 -## 244 West Asia Standard Time -## 245 West Asia Standard Time -## 246 West Asia Standard Time -## 247 West Asia Standard Time -## 248 West Asia Standard Time -## 249 Arabic Standard Time -## 250 Arab Standard Time -## 251 Azerbaijan Standard Time -## 252 SE Asia Standard Time -## 253 Altai Standard Time -## 254 Eastern European Summer Time Middle East Standard Time -## 255 Central Asia Standard Time -## 256 Singapore Standard Time -## 257 India Standard Time -## 258 Transbaikal Standard Time -## 259 Ulaanbaatar Standard Time -## 260 China Standard Time -## 261 China Standard Time -## 262 Sri Lanka Standard Time -## 263 Bangladesh Standard Time -## 264 Syria Standard Time -## 265 Bangladesh Standard Time -## 266 Tokyo Standard Time -## 267 Arabian Standard Time -## 268 West Asia Standard Time -## 269 GMT+03:00 GTB Standard Time -## 270 Eastern European Summer Time West Bank Standard Time -## 271 China Standard Time -## 272 Eastern European Summer Time West Bank Standard Time -## 273 SE Asia Standard Time -## 274 China Standard Time -## 275 W. Mongolia Standard Time -## 276 North Asia East Standard Time -## 277 Turkey Standard Time -## 278 SE Asia Standard Time -## 279 Tokyo Standard Time -## 280 Israel Daylight Time Israel Standard Time -## 281 Afghanistan Standard Time -## 282 Russia Time Zone 11 -## 283 Pakistan Standard Time -## 284 Central Asia Standard Time -## 285 Nepal Standard Time -## 286 Nepal Standard Time -## 287 Yakutsk Standard Time -## 288 India Standard Time -## 289 North Asia Standard Time -## 290 Singapore Standard Time -## 291 Singapore Standard Time -## 292 Arab Standard Time -## 293 China Standard Time -## 294 China Standard Time -## 295 Magadan Standard Time -## 296 Singapore Standard Time -## 297 Singapore Standard Time -## 298 Arabian Standard Time -## 299 Eastern European Summer Time GTB Standard Time -## 300 North Asia Standard Time -## 301 N. Central Asia Standard Time -## 302 Omsk Standard Time -## 303 West Asia Standard Time -## 304 SE Asia Standard Time -## 305 SE Asia Standard Time -## 306 North Korea Standard Time -## 307 Arab Standard Time -## 308 Central Asia Standard Time -## 309 Qyzylorda Standard Time -## 310 Myanmar Standard Time -## 311 Arab Standard Time -## 312 SE Asia Standard Time -## 313 Sakhalin Standard Time -## 314 West Asia Standard Time -## 315 Korea Standard Time -## 316 China Standard Time -## 317 Singapore Standard Time -## 318 Russia Time Zone 10 -## 319 Taipei Standard Time -## 320 West Asia Standard Time -## 321 Georgian Standard Time -## 322 Iran Standard Time -## 323 Israel Daylight Time Israel Standard Time -## 324 Bangladesh Standard Time -## 325 Bangladesh Standard Time -## 326 Tokyo Standard Time -## 327 Tomsk Standard Time -## 328 Singapore Standard Time -## 329 Ulaanbaatar Standard Time -## 330 Ulaanbaatar Standard Time -## 331 Central Asia Standard Time -## 332 Vladivostok Standard Time -## 333 SE Asia Standard Time -## 334 Vladivostok Standard Time -## 335 Yakutsk Standard Time -## 336 Myanmar Standard Time -## 337 Ekaterinburg Standard Time -## 338 Caucasus Standard Time -## 339 Alaska Daylight Time Alaskan Standard Time -## 340 Azores Summer Time Azores Standard Time -## 341 Atlantic Daylight Time Atlantic Standard Time -## 342 Western European Summer Time GMT Standard Time -## 343 Cape Verde Standard Time -## 344 Western European Summer Time GMT Standard Time -## 345 Western European Summer Time GMT Standard Time -## 346 Central European Summer Time W. Europe Standard Time -## 347 Western European Summer Time GMT Standard Time -## 348 Greenwich Standard Time -## 349 UTC-02 -## 350 Greenwich Standard Time -## 351 SA Eastern Standard Time -## 352 Australian Eastern Daylight Time AUS Eastern Standard Time -## 353 Australian Central Daylight Time Cen. Australia Standard Time -## 354 E. Australia Standard Time -## 355 Australian Central Daylight Time Cen. Australia Standard Time -## 356 Australian Eastern Daylight Time AUS Eastern Standard Time -## 357 Australian Eastern Daylight Time Tasmania Standard Time -## 358 AUS Central Standard Time -## 359 Aus Central W. Standard Time -## 360 Australian Eastern Daylight Time Tasmania Standard Time -## 361 Lord Howe Daylight Time Lord Howe Standard Time -## 362 E. Australia Standard Time -## 363 Lord Howe Daylight Time Lord Howe Standard Time -## 364 Australian Eastern Daylight Time AUS Eastern Standard Time -## 365 AUS Central Standard Time -## 366 Australian Eastern Daylight Time AUS Eastern Standard Time -## 367 W. Australia Standard Time -## 368 E. Australia Standard Time -## 369 Australian Central Daylight Time Cen. Australia Standard Time -## 370 Australian Eastern Daylight Time AUS Eastern Standard Time -## 371 Australian Eastern Daylight Time Tasmania Standard Time -## 372 Australian Eastern Daylight Time AUS Eastern Standard Time -## 373 W. Australia Standard Time -## 374 Australian Central Daylight Time Cen. Australia Standard Time -## 375 E. South America Standard Time -## 376 SA Pacific Standard Time -## 377 UTC-02 -## 378 E. South America Standard Time -## 379 SA Western Standard Time -## 380 Bangladesh Standard Time -## 381 Atlantic Daylight Time Atlantic Standard Time -## 382 Central Daylight Time Central Standard Time -## 383 Canada Central Standard Time -## 384 Eastern Daylight Time Eastern Standard Time -## 385 Mountain Daylight Time Mountain Standard Time -## 386 Newfoundland Daylight Time Newfoundland Standard Time -## 387 Pacific Daylight Time Pacific Standard Time -## 388 Canada Central Standard Time -## 389 Yukon Standard Time -## 390 South Africa Standard Time -## 391 GMT+02:00 -## 392 Chile Summer Time Pacific SA Standard Time -## 393 Easter Island Summer Time Easter Island Standard Time -## 394 Newfoundland Daylight Time Newfoundland Standard Time -## 395 Central Daylight Time Central Standard Time -## 396 Central Daylight Time Central Standard Time -## 397 China Standard Time -## 398 Cuba Daylight Time Cuba Standard Time -## 399 E. Africa Standard Time -## 400 Central European Summer Time Romance Standard Time -## 401 GMT+03:00 -## 402 Eastern European Summer Time Egypt Standard Time -## 403 Irish Standard Time GMT Standard Time -## 404 SA Pacific Standard Time -## 405 Eastern Daylight Time Eastern Standard Time -## 406 UTC -## 407 UTC -## 408 W. Central Africa Standard Time -## 409 South Africa Standard Time -## 410 E. Africa Standard Time -## 411 Arabian Standard Time -## 412 West Asia Standard Time -## 413 Central Asia Standard Time -## 414 SE Asia Standard Time -## 415 Singapore Standard Time -## 416 Tokyo Standard Time -## 417 West Pacific Standard Time -## 418 Central Pacific Standard Time -## 419 UTC+12 -## 420 UTC+13 -## 421 Line Islands Standard Time -## 422 UTC -## 423 Cape Verde Standard Time -## 424 UTC-02 -## 425 SA Eastern Standard Time -## 426 SA Western Standard Time -## 427 SA Pacific Standard Time -## 428 Central America Standard Time -## 429 US Mountain Standard Time -## 430 UTC-08 -## 431 UTC-09 -## 432 Hawaiian Standard Time -## 433 UTC-11 -## 434 Dateline Standard Time -## 435 UTC -## 436 UTC -## 437 UTC -## 438 UTC -## 439 UTC -## 440 UTC -## 441 Central European Summer Time W. Europe Standard Time -## 442 Central European Summer Time W. Europe Standard Time -## 443 Astrakhan Standard Time -## 444 Eastern European Summer Time GTB Standard Time -## 445 British Summer Time GMT Standard Time -## 446 Central European Summer Time Central Europe Standard Time -## 447 Central European Summer Time W. Europe Standard Time -## 448 Central European Summer Time Central Europe Standard Time -## 449 Central European Summer Time Romance Standard Time -## 450 Eastern European Summer Time GTB Standard Time -## 451 Central European Summer Time Central Europe Standard Time -## 452 Central European Summer Time W. Europe Standard Time -## 453 Eastern European Summer Time E. Europe Standard Time -## 454 Central European Summer Time Romance Standard Time -## 455 Irish Standard Time GMT Standard Time -## 456 Central European Summer Time W. Europe Standard Time -## 457 GMT+01:00 GMT Standard Time -## 458 Eastern European Summer Time FLE Standard Time -## 459 GMT+01:00 GMT Standard Time -## 460 Turkey Standard Time -## 461 GMT+01:00 GMT Standard Time -## 462 Kaliningrad Standard Time -## 463 Eastern European Summer Time FLE Standard Time -## 464 Russian Standard Time -## 465 Eastern European Summer Time FLE Standard Time -## 466 Western European Summer Time GMT Standard Time -## 467 Central European Summer Time Central Europe Standard Time -## 468 British Summer Time GMT Standard Time -## 469 Central European Summer Time W. Europe Standard Time -## 470 Central European Summer Time Romance Standard Time -## 471 Central European Summer Time W. Europe Standard Time -## 472 Eastern European Summer Time FLE Standard Time -## 473 Belarus Standard Time -## 474 Central European Summer Time W. Europe Standard Time -## 475 Russian Standard Time -## 476 Eastern European Summer Time GTB Standard Time -## 477 Central European Summer Time W. Europe Standard Time -## 478 Central European Summer Time Romance Standard Time -## 479 Central European Summer Time Central Europe Standard Time -## 480 Central European Summer Time Central Europe Standard Time -## 481 Eastern European Summer Time FLE Standard Time -## 482 Central European Summer Time W. Europe Standard Time -## 483 Russia Time Zone 3 -## 484 Central European Summer Time W. Europe Standard Time -## 485 Central European Summer Time Central European Standard Time -## 486 Saratov Standard Time -## 487 Russian Standard Time -## 488 Central European Summer Time Central European Standard Time -## 489 Eastern European Summer Time FLE Standard Time -## 490 Central European Summer Time W. Europe Standard Time -## 491 Eastern European Summer Time FLE Standard Time -## 492 Central European Summer Time Central Europe Standard Time -## 493 Eastern European Summer Time E. Europe Standard Time -## 494 Astrakhan Standard Time -## 495 Eastern European Summer Time FLE Standard Time -## 496 Central European Summer Time W. Europe Standard Time -## 497 Central European Summer Time W. Europe Standard Time -## 498 Central European Summer Time W. Europe Standard Time -## 499 Eastern European Summer Time FLE Standard Time -## 500 Volgograd Standard Time -## 501 Central European Summer Time Central European Standard Time -## 502 Central European Summer Time Central European Standard Time -## 503 Eastern European Summer Time FLE Standard Time -## 504 Central European Summer Time W. Europe Standard Time -## 505 -## 506 British Summer Time GMT Standard Time -## 507 British Summer Time GMT Standard Time -## 508 UTC -## 509 UTC -## 510 UTC -## 511 UTC -## 512 UTC -## 513 China Standard Time -## 514 Hawaiian Standard Time -## 515 Greenwich Standard Time -## 516 Eastern Daylight Time US Eastern Standard Time -## 517 E. Africa Standard Time -## 518 Central Asia Standard Time -## 519 SE Asia Standard Time -## 520 Myanmar Standard Time -## 521 E. Africa Standard Time -## 522 West Asia Standard Time -## 523 Mauritius Standard Time -## 524 West Asia Standard Time -## 525 Mauritius Standard Time -## 526 E. Africa Standard Time -## 527 Mauritius Standard Time -## 528 Iran Standard Time -## 529 Israel Daylight Time Israel Standard Time -## 530 India Standard Time -## 531 SA Pacific Standard Time -## 532 Tokyo Standard Time -## 533 Tokyo Standard Time -## 534 UTC+12 -## 535 Libya Standard Time -## 536 GMT+02:00 -## 537 Pacific Daylight Time Pacific Standard Time (Mexico) -## 538 Mountain Standard Time (Mexico) -## 539 Central Standard Time (Mexico) -## 540 Samoa Standard Time -## 541 US Mountain Standard Time -## 542 Mountain Daylight Time Mountain Standard Time -## 543 Mountain Daylight Time Mountain Standard Time -## 544 Caucasus Standard Time -## 545 New Zealand Daylight Time New Zealand Standard Time -## 546 New Zealand Daylight Time New Zealand Standard Time -## 547 Chatham Daylight Time Chatham Islands Standard Time -## 548 Samoa Standard Time -## 549 New Zealand Daylight Time New Zealand Standard Time -## 550 Bougainville Standard Time -## 551 Chatham Daylight Time Chatham Islands Standard Time -## 552 West Pacific Standard Time -## 553 Easter Island Summer Time Easter Island Standard Time -## 554 Central Pacific Standard Time -## 555 UTC+13 -## 556 UTC+13 -## 557 Fiji Standard Time -## 558 UTC+12 -## 559 Central America Standard Time -## 560 UTC-09 -## 561 Central Pacific Standard Time -## 562 West Pacific Standard Time -## 563 Hawaiian Standard Time -## 564 Hawaiian Standard Time -## 565 UTC+13 -## 566 Line Islands Standard Time -## 567 Central Pacific Standard Time -## 568 UTC+12 -## 569 UTC+12 -## 570 Marquesas Standard Time -## 571 UTC-11 -## 572 UTC+12 -## 573 UTC-11 -## 574 Norfolk Island Daylight Time Norfolk Standard Time -## 575 Central Pacific Standard Time -## 576 UTC-11 -## 577 Tokyo Standard Time -## 578 UTC-08 -## 579 Central Pacific Standard Time -## 580 Central Pacific Standard Time -## 581 West Pacific Standard Time -## 582 Hawaiian Standard Time -## 583 West Pacific Standard Time -## 584 UTC-11 -## 585 Hawaiian Standard Time -## 586 UTC+12 -## 587 Tonga Standard Time -## 588 West Pacific Standard Time -## 589 UTC+12 -## 590 UTC+12 -## 591 West Pacific Standard Time -## 592 Pakistan Standard Time -## 593 US Mountain Standard Time -## 594 Central European Summer Time Central European Standard Time -## 595 Western European Summer Time GMT Standard Time -## 596 China Standard Time -## 597 SA Western Standard Time -## 598 Pacific Daylight Time Pacific Standard Time -## 599 Pacific Daylight Time Pacific Standard Time -## 600 Taipei Standard Time -## 601 Korea Standard Time -## 602 Singapore Standard Time -## 603 Central Pacific Standard Time -## 604 -## 605 GMT-03:00 -## 606 -## 607 GMT-05:00 -## 608 -## 609 GMT-04:00 -## 610 -## 611 -## 612 GMT-06:00 -## 613 -## 614 GMT-07:00 -## 615 -## 616 GMT-08:00 -## 617 Turkey Standard Time -## 618 UTC -## 619 UTC -## 620 Alaska Daylight Time Alaskan Standard Time -## 621 Hawaii-Aleutian Daylight Time Aleutian Standard Time -## 622 US Mountain Standard Time -## 623 Central Daylight Time Central Standard Time -## 624 Eastern Daylight Time US Eastern Standard Time -## 625 Eastern Daylight Time Eastern Standard Time -## 626 Hawaiian Standard Time -## 627 Central Daylight Time Central Standard Time -## 628 Eastern Daylight Time Eastern Standard Time -## 629 Mountain Daylight Time Mountain Standard Time -## 630 Pacific Daylight Time Pacific Standard Time -## 631 Pacific Daylight Time Pacific Standard Time -## 632 UTC-11 -## 633 UTC -## 634 SE Asia Standard Time -## 635 Russian Standard Time -## 636 GMT+01:00 -## 637 UTC -## RawOffset UsesDaylightTime -## 1 9.50 FALSE -## 2 10.00 TRUE -## 3 0.00 FALSE -## 4 0.00 FALSE -## 5 3.00 FALSE -## 6 1.00 FALSE -## 7 3.00 FALSE -## 8 3.00 FALSE -## 9 0.00 FALSE -## 10 1.00 FALSE -## 11 0.00 FALSE -## 12 0.00 FALSE -## 13 2.00 FALSE -## 14 1.00 FALSE -## 15 2.00 FALSE -## 16 2.00 TRUE -## 17 0.00 TRUE -## 18 1.00 TRUE -## 19 0.00 FALSE -## 20 0.00 FALSE -## 21 3.00 FALSE -## 22 3.00 FALSE -## 23 1.00 FALSE -## 24 0.00 TRUE -## 25 0.00 FALSE -## 26 2.00 FALSE -## 27 2.00 FALSE -## 28 2.00 FALSE -## 29 2.00 FALSE -## 30 3.00 FALSE -## 31 2.00 FALSE -## 32 2.00 FALSE -## 33 1.00 FALSE -## 34 1.00 FALSE -## 35 1.00 FALSE -## 36 0.00 FALSE -## 37 1.00 FALSE -## 38 2.00 FALSE -## 39 2.00 FALSE -## 40 1.00 FALSE -## 41 2.00 FALSE -## 42 2.00 FALSE -## 43 2.00 FALSE -## 44 3.00 FALSE -## 45 0.00 FALSE -## 46 3.00 FALSE -## 47 1.00 FALSE -## 48 1.00 FALSE -## 49 0.00 FALSE -## 50 0.00 FALSE -## 51 1.00 FALSE -## 52 0.00 FALSE -## 53 0.00 FALSE -## 54 2.00 FALSE -## 55 1.00 FALSE -## 56 2.00 FALSE -## 57 -3.00 FALSE -## 58 -10.00 TRUE -## 59 -9.00 TRUE -## 60 -4.00 FALSE -## 61 -4.00 FALSE -## 62 -3.00 FALSE -## 63 -3.00 FALSE -## 64 -3.00 FALSE -## 65 -3.00 FALSE -## 66 -3.00 FALSE -## 67 -3.00 FALSE -## 68 -3.00 FALSE -## 69 -3.00 FALSE -## 70 -3.00 FALSE -## 71 -3.00 FALSE -## 72 -3.00 FALSE -## 73 -3.00 FALSE -## 74 -3.00 FALSE -## 75 -3.00 FALSE -## 76 -4.00 FALSE -## 77 -4.00 TRUE -## 78 -5.00 FALSE -## 79 -10.00 TRUE -## 80 -3.00 FALSE -## 81 -6.00 FALSE -## 82 -4.00 FALSE -## 83 -3.00 FALSE -## 84 -6.00 FALSE -## 85 -4.00 FALSE -## 86 -4.00 FALSE -## 87 -5.00 FALSE -## 88 -7.00 TRUE -## 89 -3.00 FALSE -## 90 -7.00 TRUE -## 91 -4.00 FALSE -## 92 -5.00 FALSE -## 93 -4.00 FALSE -## 94 -3.00 FALSE -## 95 -3.00 FALSE -## 96 -5.00 FALSE -## 97 -6.00 TRUE -## 98 -6.00 FALSE -## 99 -7.00 TRUE -## 100 -5.00 FALSE -## 101 -3.00 FALSE -## 102 -6.00 FALSE -## 103 -7.00 FALSE -## 104 -4.00 FALSE -## 105 -4.00 FALSE -## 106 0.00 FALSE -## 107 -7.00 FALSE -## 108 -7.00 FALSE -## 109 -7.00 TRUE -## 110 -5.00 TRUE -## 111 -4.00 FALSE -## 112 -7.00 TRUE -## 113 -5.00 FALSE -## 114 -6.00 FALSE -## 115 -8.00 TRUE -## 116 -7.00 FALSE -## 117 -5.00 TRUE -## 118 -3.00 FALSE -## 119 -4.00 TRUE -## 120 -2.00 TRUE -## 121 -4.00 TRUE -## 122 -5.00 TRUE -## 123 -4.00 FALSE -## 124 -4.00 FALSE -## 125 -6.00 FALSE -## 126 -5.00 FALSE -## 127 -4.00 FALSE -## 128 -4.00 TRUE -## 129 -5.00 TRUE -## 130 -7.00 FALSE -## 131 -5.00 TRUE -## 132 -6.00 TRUE -## 133 -5.00 TRUE -## 134 -5.00 TRUE -## 135 -6.00 TRUE -## 136 -5.00 TRUE -## 137 -5.00 TRUE -## 138 -5.00 TRUE -## 139 -5.00 TRUE -## 140 -7.00 TRUE -## 141 -5.00 TRUE -## 142 -5.00 FALSE -## 143 -3.00 FALSE -## 144 -9.00 TRUE -## 145 -5.00 TRUE -## 146 -5.00 TRUE -## 147 -6.00 TRUE -## 148 -4.00 FALSE -## 149 -4.00 FALSE -## 150 -5.00 FALSE -## 151 -8.00 TRUE -## 152 -5.00 TRUE -## 153 -4.00 FALSE -## 154 -3.00 FALSE -## 155 -6.00 FALSE -## 156 -4.00 FALSE -## 157 -4.00 FALSE -## 158 -4.00 FALSE -## 159 -6.00 TRUE -## 160 -7.00 FALSE -## 161 -3.00 FALSE -## 162 -6.00 TRUE -## 163 -6.00 FALSE -## 164 -9.00 TRUE -## 165 -6.00 FALSE -## 166 -3.00 TRUE -## 167 -4.00 TRUE -## 168 -6.00 FALSE -## 169 -3.00 FALSE -## 170 -5.00 TRUE -## 171 -4.00 FALSE -## 172 -5.00 TRUE -## 173 -5.00 TRUE -## 174 -5.00 TRUE -## 175 -9.00 TRUE -## 176 -2.00 FALSE -## 177 -6.00 TRUE -## 178 -6.00 TRUE -## 179 -6.00 TRUE -## 180 -2.00 TRUE -## 181 -6.00 TRUE -## 182 -5.00 FALSE -## 183 -5.00 TRUE -## 184 -3.00 FALSE -## 185 -7.00 FALSE -## 186 -4.00 FALSE -## 187 -5.00 TRUE -## 188 -5.00 FALSE -## 189 -4.00 FALSE -## 190 -4.00 FALSE -## 191 -3.00 FALSE -## 192 -6.00 TRUE -## 193 -6.00 TRUE -## 194 -3.00 FALSE -## 195 -6.00 FALSE -## 196 -6.00 TRUE -## 197 -5.00 FALSE -## 198 -3.00 FALSE -## 199 -8.00 TRUE -## 200 -3.00 FALSE -## 201 -4.00 TRUE -## 202 -4.00 FALSE -## 203 -3.00 FALSE -## 204 -1.00 TRUE -## 205 -7.00 TRUE -## 206 -9.00 TRUE -## 207 -4.00 FALSE -## 208 -3.50 TRUE -## 209 -4.00 FALSE -## 210 -4.00 FALSE -## 211 -4.00 FALSE -## 212 -4.00 FALSE -## 213 -6.00 FALSE -## 214 -6.00 FALSE -## 215 -4.00 TRUE -## 216 -5.00 TRUE -## 217 -8.00 TRUE -## 218 -5.00 TRUE -## 219 -4.00 FALSE -## 220 -8.00 TRUE -## 221 -4.00 FALSE -## 222 -7.00 FALSE -## 223 -6.00 TRUE -## 224 -9.00 TRUE -## 225 -7.00 TRUE -## 226 11.00 FALSE -## 227 7.00 FALSE -## 228 10.00 FALSE -## 229 10.00 TRUE -## 230 5.00 FALSE -## 231 12.00 TRUE -## 232 -3.00 FALSE -## 233 -3.00 FALSE -## 234 12.00 TRUE -## 235 3.00 FALSE -## 236 0.00 TRUE -## 237 6.00 FALSE -## 238 1.00 TRUE -## 239 2.00 TRUE -## 240 3.00 FALSE -## 241 6.00 FALSE -## 242 3.00 FALSE -## 243 12.00 FALSE -## 244 5.00 FALSE -## 245 5.00 FALSE -## 246 5.00 FALSE -## 247 5.00 FALSE -## 248 5.00 FALSE -## 249 3.00 FALSE -## 250 3.00 FALSE -## 251 4.00 FALSE -## 252 7.00 FALSE -## 253 7.00 FALSE -## 254 2.00 TRUE -## 255 6.00 FALSE -## 256 8.00 FALSE -## 257 5.50 FALSE -## 258 9.00 FALSE -## 259 8.00 FALSE -## 260 8.00 FALSE -## 261 8.00 FALSE -## 262 5.50 FALSE -## 263 6.00 FALSE -## 264 3.00 FALSE -## 265 6.00 FALSE -## 266 9.00 FALSE -## 267 4.00 FALSE -## 268 5.00 FALSE -## 269 2.00 TRUE -## 270 2.00 TRUE -## 271 8.00 FALSE -## 272 2.00 TRUE -## 273 7.00 FALSE -## 274 8.00 FALSE -## 275 7.00 FALSE -## 276 8.00 FALSE -## 277 3.00 FALSE -## 278 7.00 FALSE -## 279 9.00 FALSE -## 280 2.00 TRUE -## 281 4.50 FALSE -## 282 12.00 FALSE -## 283 5.00 FALSE -## 284 6.00 FALSE -## 285 5.75 FALSE -## 286 5.75 FALSE -## 287 9.00 FALSE -## 288 5.50 FALSE -## 289 7.00 FALSE -## 290 8.00 FALSE -## 291 8.00 FALSE -## 292 3.00 FALSE -## 293 8.00 FALSE -## 294 8.00 FALSE -## 295 11.00 FALSE -## 296 8.00 FALSE -## 297 8.00 FALSE -## 298 4.00 FALSE -## 299 2.00 TRUE -## 300 7.00 FALSE -## 301 7.00 FALSE -## 302 6.00 FALSE -## 303 5.00 FALSE -## 304 7.00 FALSE -## 305 7.00 FALSE -## 306 9.00 FALSE -## 307 3.00 FALSE -## 308 6.00 FALSE -## 309 5.00 FALSE -## 310 6.50 FALSE -## 311 3.00 FALSE -## 312 7.00 FALSE -## 313 11.00 FALSE -## 314 5.00 FALSE -## 315 9.00 FALSE -## 316 8.00 FALSE -## 317 8.00 FALSE -## 318 11.00 FALSE -## 319 8.00 FALSE -## 320 5.00 FALSE -## 321 4.00 FALSE -## 322 3.50 FALSE -## 323 2.00 TRUE -## 324 6.00 FALSE -## 325 6.00 FALSE -## 326 9.00 FALSE -## 327 7.00 FALSE -## 328 8.00 FALSE -## 329 8.00 FALSE -## 330 8.00 FALSE -## 331 6.00 FALSE -## 332 10.00 FALSE -## 333 7.00 FALSE -## 334 10.00 FALSE -## 335 9.00 FALSE -## 336 6.50 FALSE -## 337 5.00 FALSE -## 338 4.00 FALSE -## 339 -9.00 TRUE -## 340 -1.00 TRUE -## 341 -4.00 TRUE -## 342 0.00 TRUE -## 343 -1.00 FALSE -## 344 0.00 TRUE -## 345 0.00 TRUE -## 346 1.00 TRUE -## 347 0.00 TRUE -## 348 0.00 FALSE -## 349 -2.00 FALSE -## 350 0.00 FALSE -## 351 -3.00 FALSE -## 352 10.00 TRUE -## 353 9.50 TRUE -## 354 10.00 FALSE -## 355 9.50 TRUE -## 356 10.00 TRUE -## 357 10.00 TRUE -## 358 9.50 FALSE -## 359 8.75 FALSE -## 360 10.00 TRUE -## 361 10.50 TRUE -## 362 10.00 FALSE -## 363 10.50 TRUE -## 364 10.00 TRUE -## 365 9.50 FALSE -## 366 10.00 TRUE -## 367 8.00 FALSE -## 368 10.00 FALSE -## 369 9.50 TRUE -## 370 10.00 TRUE -## 371 10.00 TRUE -## 372 10.00 TRUE -## 373 8.00 FALSE -## 374 9.50 TRUE -## 375 -3.00 FALSE -## 376 -5.00 FALSE -## 377 -2.00 FALSE -## 378 -3.00 FALSE -## 379 -4.00 FALSE -## 380 6.00 FALSE -## 381 -4.00 TRUE -## 382 -6.00 TRUE -## 383 -6.00 FALSE -## 384 -5.00 TRUE -## 385 -7.00 TRUE -## 386 -3.50 TRUE -## 387 -8.00 TRUE -## 388 -6.00 FALSE -## 389 -7.00 FALSE -## 390 2.00 FALSE -## 391 1.00 TRUE -## 392 -4.00 TRUE -## 393 -6.00 TRUE -## 394 -3.50 TRUE -## 395 -6.00 TRUE -## 396 -6.00 TRUE -## 397 8.00 FALSE -## 398 -5.00 TRUE -## 399 3.00 FALSE -## 400 1.00 TRUE -## 401 2.00 TRUE -## 402 2.00 TRUE -## 403 0.00 TRUE -## 404 -5.00 FALSE -## 405 -5.00 TRUE -## 406 0.00 FALSE -## 407 0.00 FALSE -## 408 1.00 FALSE -## 409 2.00 FALSE -## 410 3.00 FALSE -## 411 4.00 FALSE -## 412 5.00 FALSE -## 413 6.00 FALSE -## 414 7.00 FALSE -## 415 8.00 FALSE -## 416 9.00 FALSE -## 417 10.00 FALSE -## 418 11.00 FALSE -## 419 12.00 FALSE -## 420 13.00 FALSE -## 421 14.00 FALSE -## 422 0.00 FALSE -## 423 -1.00 FALSE -## 424 -2.00 FALSE -## 425 -3.00 FALSE -## 426 -4.00 FALSE -## 427 -5.00 FALSE -## 428 -6.00 FALSE -## 429 -7.00 FALSE -## 430 -8.00 FALSE -## 431 -9.00 FALSE -## 432 -10.00 FALSE -## 433 -11.00 FALSE -## 434 -12.00 FALSE -## 435 0.00 FALSE -## 436 0.00 FALSE -## 437 0.00 FALSE -## 438 0.00 FALSE -## 439 0.00 FALSE -## 440 0.00 FALSE -## 441 1.00 TRUE -## 442 1.00 TRUE -## 443 4.00 FALSE -## 444 2.00 TRUE -## 445 0.00 TRUE -## 446 1.00 TRUE -## 447 1.00 TRUE -## 448 1.00 TRUE -## 449 1.00 TRUE -## 450 2.00 TRUE -## 451 1.00 TRUE -## 452 1.00 TRUE -## 453 2.00 TRUE -## 454 1.00 TRUE -## 455 0.00 TRUE -## 456 1.00 TRUE -## 457 0.00 TRUE -## 458 2.00 TRUE -## 459 0.00 TRUE -## 460 3.00 FALSE -## 461 0.00 TRUE -## 462 2.00 FALSE -## 463 2.00 TRUE -## 464 3.00 FALSE -## 465 2.00 TRUE -## 466 0.00 TRUE -## 467 1.00 TRUE -## 468 0.00 TRUE -## 469 1.00 TRUE -## 470 1.00 TRUE -## 471 1.00 TRUE -## 472 2.00 TRUE -## 473 3.00 FALSE -## 474 1.00 TRUE -## 475 3.00 FALSE -## 476 2.00 TRUE -## 477 1.00 TRUE -## 478 1.00 TRUE -## 479 1.00 TRUE -## 480 1.00 TRUE -## 481 2.00 TRUE -## 482 1.00 TRUE -## 483 4.00 FALSE -## 484 1.00 TRUE -## 485 1.00 TRUE -## 486 4.00 FALSE -## 487 3.00 FALSE -## 488 1.00 TRUE -## 489 2.00 TRUE -## 490 1.00 TRUE -## 491 2.00 TRUE -## 492 1.00 TRUE -## 493 2.00 TRUE -## 494 4.00 FALSE -## 495 2.00 TRUE -## 496 1.00 TRUE -## 497 1.00 TRUE -## 498 1.00 TRUE -## 499 2.00 TRUE -## 500 3.00 FALSE -## 501 1.00 TRUE -## 502 1.00 TRUE -## 503 2.00 TRUE -## 504 1.00 TRUE -## 505 0.00 FALSE -## 506 0.00 TRUE -## 507 0.00 TRUE -## 508 0.00 FALSE -## 509 0.00 FALSE -## 510 0.00 FALSE -## 511 0.00 FALSE -## 512 0.00 FALSE -## 513 8.00 FALSE -## 514 -10.00 FALSE -## 515 0.00 FALSE -## 516 -5.00 TRUE -## 517 3.00 FALSE -## 518 6.00 FALSE -## 519 7.00 FALSE -## 520 6.50 FALSE -## 521 3.00 FALSE -## 522 5.00 FALSE -## 523 4.00 FALSE -## 524 5.00 FALSE -## 525 4.00 FALSE -## 526 3.00 FALSE -## 527 4.00 FALSE -## 528 3.50 FALSE -## 529 2.00 TRUE -## 530 5.50 FALSE -## 531 -5.00 FALSE -## 532 9.00 FALSE -## 533 9.00 FALSE -## 534 12.00 FALSE -## 535 2.00 FALSE -## 536 1.00 TRUE -## 537 -8.00 TRUE -## 538 -7.00 FALSE -## 539 -6.00 FALSE -## 540 13.00 FALSE -## 541 -7.00 FALSE -## 542 -7.00 TRUE -## 543 -7.00 TRUE -## 544 4.00 FALSE -## 545 12.00 TRUE -## 546 12.00 TRUE -## 547 12.75 TRUE -## 548 13.00 FALSE -## 549 12.00 TRUE -## 550 11.00 FALSE -## 551 12.75 TRUE -## 552 10.00 FALSE -## 553 -6.00 TRUE -## 554 11.00 FALSE -## 555 13.00 FALSE -## 556 13.00 FALSE -## 557 12.00 FALSE -## 558 12.00 FALSE -## 559 -6.00 FALSE -## 560 -9.00 FALSE -## 561 11.00 FALSE -## 562 10.00 FALSE -## 563 -10.00 FALSE -## 564 -10.00 FALSE -## 565 13.00 FALSE -## 566 14.00 FALSE -## 567 11.00 FALSE -## 568 12.00 FALSE -## 569 12.00 FALSE -## 570 -9.50 FALSE -## 571 -11.00 FALSE -## 572 12.00 FALSE -## 573 -11.00 FALSE -## 574 11.00 TRUE -## 575 11.00 FALSE -## 576 -11.00 FALSE -## 577 9.00 FALSE -## 578 -8.00 FALSE -## 579 11.00 FALSE -## 580 11.00 FALSE -## 581 10.00 FALSE -## 582 -10.00 FALSE -## 583 10.00 FALSE -## 584 -11.00 FALSE -## 585 -10.00 FALSE -## 586 12.00 FALSE -## 587 13.00 FALSE -## 588 10.00 FALSE -## 589 12.00 FALSE -## 590 12.00 FALSE -## 591 10.00 FALSE -## 592 5.00 FALSE -## 593 -7.00 FALSE -## 594 1.00 TRUE -## 595 0.00 TRUE -## 596 8.00 FALSE -## 597 -4.00 FALSE -## 598 -8.00 TRUE -## 599 -8.00 TRUE -## 600 8.00 FALSE -## 601 9.00 FALSE -## 602 8.00 FALSE -## 603 11.00 FALSE -## 604 -4.00 FALSE -## 605 -4.00 TRUE -## 606 -6.00 FALSE -## 607 -6.00 TRUE -## 608 -5.00 FALSE -## 609 -5.00 TRUE -## 610 -10.00 FALSE -## 611 -7.00 FALSE -## 612 -7.00 TRUE -## 613 -8.00 FALSE -## 614 -8.00 TRUE -## 615 -9.00 FALSE -## 616 -9.00 TRUE -## 617 3.00 FALSE -## 618 0.00 FALSE -## 619 0.00 FALSE -## 620 -9.00 TRUE -## 621 -10.00 TRUE -## 622 -7.00 FALSE -## 623 -6.00 TRUE -## 624 -5.00 TRUE -## 625 -5.00 TRUE -## 626 -10.00 FALSE -## 627 -6.00 TRUE -## 628 -5.00 TRUE -## 629 -7.00 TRUE -## 630 -8.00 TRUE -## 631 -8.00 TRUE -## 632 -11.00 FALSE -## 633 0.00 FALSE -## 634 7.00 FALSE -## 635 3.00 FALSE -## 636 0.00 TRUE -## 637 0.00 FALSE -``` diff --git a/.devel/sphinx/rapi/stri_timezone_set.md b/.devel/sphinx/rapi/stri_timezone_set.md deleted file mode 100644 index c10c2cb6..00000000 --- a/.devel/sphinx/rapi/stri_timezone_set.md +++ /dev/null @@ -1,65 +0,0 @@ -# stri_timezone_set: - -## Description - -`stri_timezone_set` changes the current default time zone for all functions in the stringi package, i.e., establishes the meaning of the "`NULL` time zone" argument to date/time processing functions. - -`stri_timezone_get` gets the current default time zone. - -For more information on time zone representation in ICU and stringi, refer to [`stri_timezone_list`](stri_timezone_list.md). - -## Usage - -``` r -stri_timezone_get() - -stri_timezone_set(tz) -``` - -## Arguments - -| | | -|------|-------------------------------------| -| `tz` | single string; time zone identifier | - -## Details - -Unless the default time zone has already been set using `stri_timezone_set`, the default time zone is determined by querying the OS with methods in ICU\'s internal platform utilities. - -## Value - -`stri_timezone_set` returns a string with previously used timezone, invisibly. - -`stri_timezone_get` returns a single string with the current default time zone. - -## Author(s) - -[Marek Gagolewski](https://www.gagolewski.com/) and other contributors - -## References - -*TimeZone* class -- ICU API Documentation, - -## See Also - -The official online manual of stringi at - -Gagolewski M., stringi: Fast and portable character string processing in R, *Journal of Statistical Software* 103(2), 2022, 1-59, [doi:10.18637/jss.v103.i02](https://doi.org/10.18637/jss.v103.i02) - -Other datetime: [`stri_datetime_add()`](stri_datetime_add.md), [`stri_datetime_create()`](stri_datetime_create.md), [`stri_datetime_fields()`](stri_datetime_fields.md), [`stri_datetime_format()`](stri_datetime_format.md), [`stri_datetime_fstr()`](stri_datetime_fstr.md), [`stri_datetime_now()`](stri_datetime_now.md), [`stri_datetime_symbols()`](stri_datetime_symbols.md), [`stri_timezone_info()`](stri_timezone_info.md), [`stri_timezone_list()`](stri_timezone_list.md) - -Other timezone: [`stri_timezone_info()`](stri_timezone_info.md), [`stri_timezone_list()`](stri_timezone_list.md) - -## Examples - - - - -```r -## Not run: -oldtz <- stri_timezone_set('Europe/Warsaw') -# ... many time zone-dependent operations -stri_timezone_set(oldtz) # restore previous default time zone - -## End(Not run) -``` diff --git a/.devel/sphinx/rapi/stri_trans_casemap.md b/.devel/sphinx/rapi/stri_trans_casemap.md deleted file mode 100644 index 05a67ae1..00000000 --- a/.devel/sphinx/rapi/stri_trans_casemap.md +++ /dev/null @@ -1,137 +0,0 @@ -# stri_trans_casemap: Transform Strings with Case Mapping or Folding - -## Description - -These functions transform strings either to lower case, UPPER CASE, or Title Case or perform case folding. - -## Usage - -``` r -stri_trans_tolower(str, locale = NULL) - -stri_trans_toupper(str, locale = NULL) - -stri_trans_casefold(str) - -stri_trans_totitle(str, ..., opts_brkiter = NULL) -``` - -## Arguments - -| | | -|----------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| `str` | character vector | -| `locale` | `NULL` or `''` for case mapping following the conventions of the default locale, or a single string with locale identifier, see [stringi-locale](about_locale.md). | -| `...` | additional settings for `opts_brkiter` | -| `opts_brkiter` | a named list with ICU BreakIterator\'s settings, see [`stri_opts_brkiter`](stri_opts_brkiter.md); `NULL` for default break iterator, i.e., `word`; `stri_trans_totitle` only | - -## Details - -Vectorized over `str`. - -ICU implements full Unicode string case mappings. It is worth noting that, generally, case mapping: - -- can change the number of code points and/or code units of a string, - -- is language-sensitive (results may differ depending on the locale), and - -- is context-sensitive (a character in the input string may map differently depending on surrounding characters). - -With `stri_trans_totitle`, if `word` `BreakIterator` is used (the default), then the first letter of each word will be capitalized and the rest will be transformed to lower case. With the break iterator of type `sentence`, the first letter of each sentence will be capitalized only. Note that according the ICU User Guide, the string `'one. two. three.'` consists of one sentence. - -Case folding, on the other hand, is locale-independent. Its purpose is to make two pieces of text that differ only in case identical. This may come in handy when comparing strings. - -For more general (but not locale dependent) text transforms refer to [`stri_trans_general`](stri_trans_general.md). - -## Value - -Each function returns a character vector. - -## Author(s) - -[Marek Gagolewski](https://www.gagolewski.com/) and other contributors - -## References - -*Case Mappings* -- ICU User Guide, - -## See Also - -The official online manual of stringi at - -Gagolewski M., stringi: Fast and portable character string processing in R, *Journal of Statistical Software* 103(2), 2022, 1-59, [doi:10.18637/jss.v103.i02](https://doi.org/10.18637/jss.v103.i02) - -Other locale_sensitive: [`%s<%()`](+25s+3C+25.md), [`about_locale`](about_locale.md), [`about_search_boundaries`](about_search_boundaries.md), [`about_search_coll`](about_search_coll.md), [`stri_compare()`](stri_compare.md), [`stri_count_boundaries()`](stri_count_boundaries.md), [`stri_duplicated()`](stri_duplicated.md), [`stri_enc_detect2()`](stri_enc_detect2.md), [`stri_extract_all_boundaries()`](stri_extract_boundaries.md), [`stri_locate_all_boundaries()`](stri_locate_boundaries.md), [`stri_opts_collator()`](stri_opts_collator.md), [`stri_order()`](stri_order.md), [`stri_rank()`](stri_rank.md), [`stri_sort_key()`](stri_sort_key.md), [`stri_sort()`](stri_sort.md), [`stri_split_boundaries()`](stri_split_boundaries.md), [`stri_unique()`](stri_unique.md), [`stri_wrap()`](stri_wrap.md) - -Other transform: [`stri_trans_char()`](stri_trans_char.md), [`stri_trans_general()`](stri_trans_general.md), [`stri_trans_list()`](stri_trans_list.md), [`stri_trans_nfc()`](stri_trans_nf.md) - -Other text_boundaries: [`about_search_boundaries`](about_search_boundaries.md), [`about_search`](about_search.md), [`stri_count_boundaries()`](stri_count_boundaries.md), [`stri_extract_all_boundaries()`](stri_extract_boundaries.md), [`stri_locate_all_boundaries()`](stri_locate_boundaries.md), [`stri_opts_brkiter()`](stri_opts_brkiter.md), [`stri_split_boundaries()`](stri_split_boundaries.md), [`stri_split_lines()`](stri_split_lines.md), [`stri_wrap()`](stri_wrap.md) - -## Examples - - - - -```r -stri_trans_toupper('\u00DF', 'de_DE') # small German Eszett / scharfes S -``` - -``` -## [1] "SS" -``` - -```r -stri_cmp_eq(stri_trans_toupper('i', 'en_US'), stri_trans_toupper('i', 'tr_TR')) -``` - -``` -## [1] FALSE -``` - -```r -stri_trans_toupper(c('abc', '123', '\u0105\u0104')) -``` - -``` -## [1] "ABC" "123" "ĄĄ" -``` - -```r -stri_trans_tolower(c('AbC', '123', '\u0105\u0104')) -``` - -``` -## [1] "abc" "123" "ąą" -``` - -```r -stri_trans_totitle(c('AbC', '123', '\u0105\u0104')) -``` - -``` -## [1] "Abc" "123" "Ąą" -``` - -```r -stri_trans_casefold(c('AbC', '123', '\u0105\u0104')) -``` - -``` -## [1] "abc" "123" "ąą" -``` - -```r -stri_trans_totitle('stringi is a FREE R pAcKaGe. WItH NO StrinGS attached.') # word boundary -``` - -``` -## [1] "Stringi Is A Free R Package. With No Strings Attached." -``` - -```r -stri_trans_totitle('stringi is a FREE R pAcKaGe. WItH NO StrinGS attached.', type='sentence') -``` - -``` -## [1] "Stringi is a free r package. With no strings attached." -``` diff --git a/.devel/sphinx/rapi/stri_trans_char.md b/.devel/sphinx/rapi/stri_trans_char.md deleted file mode 100644 index df70e75c..00000000 --- a/.devel/sphinx/rapi/stri_trans_char.md +++ /dev/null @@ -1,74 +0,0 @@ -# stri_trans_char: Translate Characters - -## Description - -Translates Unicode code points in each input string. - -## Usage - -``` r -stri_trans_char(str, pattern, replacement) -``` - -## Arguments - -| | | -|---------------|------------------------------------------------------------------| -| `str` | character vector | -| `pattern` | a single character string providing code points to be translated | -| `replacement` | a single character string giving translated code points | - -## Details - -Vectorized over `str` and with respect to each code point in `pattern` and `replacement`. - -If `pattern` and `replacement` consist of a different number of code points, then the extra code points in the longer of the two are ignored, with a warning. - -If code points in a given `pattern` are not unique, the last corresponding replacement code point is used. - -Time complexity for each string in `str` is O(`stri_length(str)*stri_length(pattern)`). - -## Value - -Returns a character vector. - -## Author(s) - -[Marek Gagolewski](https://www.gagolewski.com/) and other contributors - -## See Also - -The official online manual of stringi at - -Gagolewski M., stringi: Fast and portable character string processing in R, *Journal of Statistical Software* 103(2), 2022, 1-59, [doi:10.18637/jss.v103.i02](https://doi.org/10.18637/jss.v103.i02) - -Other transform: [`stri_trans_general()`](stri_trans_general.md), [`stri_trans_list()`](stri_trans_list.md), [`stri_trans_nfc()`](stri_trans_nf.md), [`stri_trans_tolower()`](stri_trans_casemap.md) - -## Examples - - - - -```r -stri_trans_char('id.123', '.', '_') -``` - -``` -## [1] "id_123" -``` - -```r -stri_trans_char('babaab', 'ab', '01') -``` - -``` -## [1] "101001" -``` - -```r -stri_trans_char('GCUACGGAGCUUCGGAGCUAG', 'ACGT', 'TGCA') -``` - -``` -## [1] "CGUTGCCTCGUUGCCTCGUTC" -``` diff --git a/.devel/sphinx/rapi/stri_trans_general.md b/.devel/sphinx/rapi/stri_trans_general.md deleted file mode 100644 index afb0325a..00000000 --- a/.devel/sphinx/rapi/stri_trans_general.md +++ /dev/null @@ -1,173 +0,0 @@ -# stri_trans_general: General Text Transforms, Including Transliteration - -## Description - -ICU General transforms provide different ways for processing Unicode text. They are useful in handling a variety of different tasks, including: - -- locale-independent upper case, lower case, title case, full/halfwidth conversions, - -- normalization, - -- hex and character name conversions, - -- script to script conversion/transliteration. - -## Usage - -``` r -stri_trans_general(str, id, rules = FALSE, forward = TRUE) -``` - -## Arguments - -| | | -|-----------|---------------------------------------------------------------------------------------------------------------------------------| -| `str` | character vector | -| `id` | a single string with transform identifier, see [`stri_trans_list`](stri_trans_list.md), or custom transliteration rules | -| `rules` | if `TRUE`, treat `id` as a string with semicolon-separated transliteration rules (see the ICU manual); | -| `forward` | transliteration direction (`TRUE` for forward, `FALSE` for reverse) | - -## Details - -ICU Transforms were mainly designed to transliterate characters from one script to another (for example, from Greek to Latin, or Japanese Katakana to Latin). However, these services are also capable of handling a much broader range of tasks. In particular, the Transforms include prebuilt transformations for case conversions, for normalization conversions, for the removal of given characters, and also for a variety of language and script transliterations. Transforms can be chained together to perform a series of operations and each step of the process can use a UnicodeSet to restrict the characters that are affected. - -To get the list of available transforms, call [`stri_trans_list`](stri_trans_list.md). - -Note that transliterators are often combined in sequence to achieve a desired transformation. This is analogous to the composition of mathematical functions. For example, given a script that converts lowercase ASCII characters from Latin script to Katakana script, it is convenient to first (1) separate input base characters and accents, and then (2) convert uppercase to lowercase. To achieve this, a compound transform can be specified as follows: `NFKD; Lower; Latin-Katakana;` (with the default `rules=FALSE`). - -Custom rule-based transliteration is also supported, see the ICU manual and below for some examples. - -Transliteration is not dependent on the current locale. - -## Value - -Returns a character vector. - -## Author(s) - -[Marek Gagolewski](https://www.gagolewski.com/) and other contributors - -## References - -*General Transforms* -- ICU User Guide, - -## See Also - -The official online manual of stringi at - -Gagolewski M., stringi: Fast and portable character string processing in R, *Journal of Statistical Software* 103(2), 2022, 1-59, [doi:10.18637/jss.v103.i02](https://doi.org/10.18637/jss.v103.i02) - -Other transform: [`stri_trans_char()`](stri_trans_char.md), [`stri_trans_list()`](stri_trans_list.md), [`stri_trans_nfc()`](stri_trans_nf.md), [`stri_trans_tolower()`](stri_trans_casemap.md) - -## Examples - - - - -```r -stri_trans_general('gro\u00df', 'latin-ascii') -``` - -``` -## [1] "gross" -``` - -```r -stri_trans_general('stringi', 'latin-greek') -``` - -``` -## [1] "στριγγι" -``` - -```r -stri_trans_general('stringi', 'latin-cyrillic') -``` - -``` -## [1] "стринги" -``` - -```r -stri_trans_general('stringi', 'upper') # see stri_trans_toupper -``` - -``` -## [1] "STRINGI" -``` - -```r -stri_trans_general('\u0104', 'nfd; lower') # compound id; see stri_trans_nfd -``` - -``` -## [1] "ą" -``` - -```r -stri_trans_general('Marek G\u0105golewski', 'pl-pl_FONIPA') -``` - -``` -## [1] "marɛk ɡɔŋɡɔlɛfski" -``` - -```r -stri_trans_general('\u2620', 'any-name') # character name -``` - -``` -## [1] "\\N{SKULL AND CROSSBONES}" -``` - -```r -stri_trans_general('\\N{latin small letter a}', 'name-any') # decode name -``` - -``` -## [1] "a" -``` - -```r -stri_trans_general('\u2620', 'hex/c') # to hex -``` - -``` -## [1] "\\u2620" -``` - -```r -stri_trans_general("\u201C\u2026\u201D \u0105\u015B\u0107\u017C", - "NFKD; NFC; [^\\p{L}] latin-ascii") -``` - -``` -## [1] "\"...\" ąśćż" -``` - -```r -x <- "\uC885\uB85C\uAD6C \uC0AC\uC9C1\uB3D9" -stringi::stri_trans_general(x, "Hangul-Latin") -``` - -``` -## [1] "jonglogu sajigdong" -``` - -```r -# Deviate from the ICU rules of romanisation of Korean, -# see https://en.wikipedia.org/wiki/Romanization_of_Korean -id <- " - :: NFD; - \u11A8 > k; - \u11AE > t; - \u11B8 > p; - \u1105 > r; - :: Hangul-Latin; -" -stringi::stri_trans_general(x, id, rules=TRUE) -``` - -``` -## [1] "jongrogu sajikdong" -``` diff --git a/.devel/sphinx/rapi/stri_trans_list.md b/.devel/sphinx/rapi/stri_trans_list.md deleted file mode 100644 index 47e79b89..00000000 --- a/.devel/sphinx/rapi/stri_trans_list.md +++ /dev/null @@ -1,799 +0,0 @@ -# stri_trans_list: List Available Text Transforms and Transliterators - -## Description - -Returns a list of available text transform identifiers. Each of them may be used in [`stri_trans_general`](stri_trans_general.md) tasks. - -## Usage - -``` r -stri_trans_list() -``` - -## Value - -Returns a character vector. - -## Author(s) - -[Marek Gagolewski](https://www.gagolewski.com/) and other contributors - -## References - -*General Transforms* -- ICU User Guide, - -## See Also - -The official online manual of stringi at - -Gagolewski M., stringi: Fast and portable character string processing in R, *Journal of Statistical Software* 103(2), 2022, 1-59, [doi:10.18637/jss.v103.i02](https://doi.org/10.18637/jss.v103.i02) - -Other transform: [`stri_trans_char()`](stri_trans_char.md), [`stri_trans_general()`](stri_trans_general.md), [`stri_trans_nfc()`](stri_trans_nf.md), [`stri_trans_tolower()`](stri_trans_casemap.md) - -## Examples - - - - -```r -stri_trans_list() -``` - -``` -## [1] "Accents-Any" -## [2] "am_Brai-am_Ethi" -## [3] "am_Ethi-am_Brai" -## [4] "am_Ethi-am_Ethi/Geminate" -## [5] "am_Ethi-d0_Morse" -## [6] "am_FONIPA-am" -## [7] "am-am_FONIPA" -## [8] "am-am_Latn/BGN" -## [9] "am-ar" -## [10] "am-chr" -## [11] "am-fa" -## [12] "Amharic-Amharic/Geminate" -## [13] "Amharic-Latin/BGN" -## [14] "Any-Accents" -## [15] "Any-am" -## [16] "Any-am_Brai" -## [17] "Any-am_Ethi/Geminate" -## [18] "Any-am_FONIPA" -## [19] "Any-am_Latn/BGN" -## [20] "Any-Any" -## [21] "Any-ar" -## [22] "Any-ar_Latn/BGN" -## [23] "Any-Arab" -## [24] "Any-Arabic" -## [25] "Any-Armenian" -## [26] "Any-Armn" -## [27] "Any-az/BGN" -## [28] "Any-be_Latn/BGN" -## [29] "Any-Beng" -## [30] "Any-Bengali" -## [31] "Any-bg_Latn/BGN" -## [32] "Any-blt_FONIPA" -## [33] "Any-Bopo" -## [34] "Any-Bopomofo" -## [35] "Any-Braille/Amharic" -## [36] "Any-byn_Ethi/Tekie_Alibekit" -## [37] "Any-byn_Ethi/Xaleget" -## [38] "Any-byn_Latn/Tekie_Alibekit" -## [39] "Any-byn_Latn/Xaleget" -## [40] "Any-CanadianAboriginal" -## [41] "Any-Cans" -## [42] "Any-ch_FONIPA" -## [43] "Any-chr" -## [44] "Any-chr_FONIPA" -## [45] "Any-cs_FONIPA" -## [46] "Any-cy_FONIPA" -## [47] "Any-Cyrillic" -## [48] "Any-Cyrl/Gutgarts" -## [49] "Any-Deva" -## [50] "Any-Devanagari" -## [51] "Any-dsb_FONIPA" -## [52] "Any-dv_Latn/BGN" -## [53] "Any-el_Latn/BGN" -## [54] "Any-eo_FONIPA" -## [55] "Any-es_419_FONIPA" -## [56] "Any-es_FONIPA" -## [57] "Any-Ethi/Gutgarts" -## [58] "Any-Ethiopic" -## [59] "Any-Ethiopic/Aethiopica" -## [60] "Any-Ethiopic/ALALOC" -## [61] "Any-Ethiopic/Beta_Metsehaf" -## [62] "Any-Ethiopic/IES_JES_1964" -## [63] "Any-Ethiopic/Lambdin" -## [64] "Any-Ethiopic/SERA" -## [65] "Any-Ethiopic/Tekie_Alibekit" -## [66] "Any-Ethiopic/Williamson" -## [67] "Any-Ethiopic/Xalaget" -## [68] "Any-fa" -## [69] "Any-fa_FONIPA" -## [70] "Any-fa_Latn/BGN" -## [71] "Any-FCC" -## [72] "Any-FCD" -## [73] "Any-Geor" -## [74] "Any-Georgian" -## [75] "Any-Greek" -## [76] "Any-Greek/UNGEGN" -## [77] "Any-Grek" -## [78] "Any-Grek/UNGEGN" -## [79] "Any-Gujarati" -## [80] "Any-Gujr" -## [81] "Any-Gurmukhi" -## [82] "Any-Guru" -## [83] "Any-gz_Ethi" -## [84] "Any-ha_NE" -## [85] "Any-Hang" -## [86] "Any-Hangul" -## [87] "Any-Hans" -## [88] "Any-Hant" -## [89] "Any-he_Latn/BGN" -## [90] "Any-Hebr" -## [91] "Any-Hebrew" -## [92] "Any-Hex" -## [93] "Any-Hex/C" -## [94] "Any-Hex/Java" -## [95] "Any-Hex/Perl" -## [96] "Any-Hex/Unicode" -## [97] "Any-Hex/XML" -## [98] "Any-Hex/XML10" -## [99] "Any-Hira" -## [100] "Any-Hiragana" -## [101] "Any-hy_AREVMDA_FONIPA" -## [102] "Any-hy_FONIPA" -## [103] "Any-hy_Latn/BGN" -## [104] "Any-ia_FONIPA" -## [105] "Any-Jamo" -## [106] "Any-ka_Latn/BGN" -## [107] "Any-ka_Latn/BGN_1981" -## [108] "Any-Kana" -## [109] "Any-Kannada" -## [110] "Any-Katakana" -## [111] "Any-kk_FONIPA" -## [112] "Any-kk_Latn/BGN" -## [113] "Any-Knda" -## [114] "Any-ky_FONIPA" -## [115] "Any-ky_Latn/BGN" -## [116] "Any-la_FONIPA" -## [117] "Any-Latin" -## [118] "Any-Latn" -## [119] "Any-Lower" -## [120] "Any-Malayalam" -## [121] "Any-mk_Latn/BGN" -## [122] "Any-Mlym" -## [123] "Any-mn_Latn/BGN" -## [124] "Any-mn_Latn/MNS" -## [125] "Any-my" -## [126] "Any-my_FONIPA" -## [127] "Any-my_Latn" -## [128] "Any-Name" -## [129] "Any-NFC" -## [130] "Any-NFD" -## [131] "Any-NFKC" -## [132] "Any-NFKD" -## [133] "Any-Null" -## [134] "Any-nv_FONIPA" -## [135] "Any-Oriya" -## [136] "Any-Orya" -## [137] "Any-pl_FONIPA" -## [138] "Any-ps_Latn/BGN" -## [139] "Any-Publishing" -## [140] "Any-Remove" -## [141] "Any-rm_FONIPA_SURSILV" -## [142] "Any-ro_FONIPA" -## [143] "Any-ru" -## [144] "Any-ru_Latn/BGN" -## [145] "Any-Sarb" -## [146] "Any-sat_FONIPA" -## [147] "Any-sgw_Ethi/Gurage_2013" -## [148] "Any-si_FONIPA" -## [149] "Any-si_Latn" -## [150] "Any-sk_FONIPA" -## [151] "Any-sr_Latn/BGN" -## [152] "Any-Syrc" -## [153] "Any-Syriac" -## [154] "Any-ta_FONIPA" -## [155] "Any-Tamil" -## [156] "Any-Taml" -## [157] "Any-Telu" -## [158] "Any-Telugu" -## [159] "Any-Thaa" -## [160] "Any-Thaana" -## [161] "Any-Thai" -## [162] "Any-Title" -## [163] "Any-tk/BGN" -## [164] "Any-ug_FONIPA" -## [165] "Any-uk_Latn/BGN" -## [166] "Any-und_FONIPA" -## [167] "Any-und_FONXSAMP" -## [168] "Any-Upper" -## [169] "Any-ur" -## [170] "Any-uz_Cyrl" -## [171] "Any-uz_Latn" -## [172] "Any-uz/BGN" -## [173] "Any-vec_FONIPA" -## [174] "Any-xh_FONIPA" -## [175] "Any-yo_BJ" -## [176] "Any-zh" -## [177] "Any-zu_FONIPA" -## [178] "ar-ar_Latn/BGN" -## [179] "Arab-Latn" -## [180] "Arabic-Latin" -## [181] "Arabic-Latin/BGN" -## [182] "Armenian-Latin" -## [183] "Armenian-Latin/BGN" -## [184] "Armn-Latn" -## [185] "ASCII-Latin" -## [186] "az_Cyrl-az/BGN" -## [187] "az-Lower" -## [188] "az-Title" -## [189] "az-Upper" -## [190] "Azerbaijani-Latin/BGN" -## [191] "be-be_Latn/BGN" -## [192] "Belarusian-Latin/BGN" -## [193] "Beng-Arab" -## [194] "Beng-Deva" -## [195] "Beng-Gujr" -## [196] "Beng-Guru" -## [197] "Beng-Knda" -## [198] "Beng-Latn" -## [199] "Beng-Mlym" -## [200] "Beng-Orya" -## [201] "Beng-Taml" -## [202] "Beng-Telu" -## [203] "Beng-ur" -## [204] "Bengali-Arabic" -## [205] "Bengali-Devanagari" -## [206] "Bengali-Gujarati" -## [207] "Bengali-Gurmukhi" -## [208] "Bengali-Kannada" -## [209] "Bengali-Latin" -## [210] "Bengali-Malayalam" -## [211] "Bengali-Oriya" -## [212] "Bengali-Tamil" -## [213] "Bengali-Telugu" -## [214] "bg-bg_Latn/BGN" -## [215] "blt-blt_FONIPA" -## [216] "Bopo-Latn" -## [217] "Bopomofo-Latin" -## [218] "Braille-Ethiopic/Amharic" -## [219] "Bulgarian-Latin/BGN" -## [220] "Burmese-Latin" -## [221] "byn_Ethi-byn_Latn/Tekie_Alibekit" -## [222] "byn_Ethi-byn_Latn/Xaleget" -## [223] "byn_Latn-byn_Ethi/Tekie_Alibekit" -## [224] "byn_Latn-byn_Ethi/Xaleget" -## [225] "CanadianAboriginal-Latin" -## [226] "Cans-Latn" -## [227] "ch-am" -## [228] "ch-ar" -## [229] "ch-ch_FONIPA" -## [230] "ch-chr" -## [231] "ch-fa" -## [232] "chr-chr_FONIPA" -## [233] "cs_FONIPA-ja" -## [234] "cs_FONIPA-ko" -## [235] "cs-am" -## [236] "cs-ar" -## [237] "cs-chr" -## [238] "cs-cs_FONIPA" -## [239] "cs-fa" -## [240] "cs-ja" -## [241] "cs-ko" -## [242] "cy-cy_FONIPA" -## [243] "Cyrillic-Ethiopic/Gutgarts" -## [244] "Cyrillic-Latin" -## [245] "Cyrl-Ethi/Gutgarts" -## [246] "Cyrl-Latn" -## [247] "d0_Morse-am_Ethi" -## [248] "de-ASCII" -## [249] "Deva-Arab" -## [250] "Deva-Beng" -## [251] "Deva-Gujr" -## [252] "Deva-Guru" -## [253] "Deva-Knda" -## [254] "Deva-Latn" -## [255] "Deva-Mlym" -## [256] "Deva-Orya" -## [257] "Deva-Taml" -## [258] "Deva-Telu" -## [259] "Deva-ur" -## [260] "Devanagari-Arabic" -## [261] "Devanagari-Bengali" -## [262] "Devanagari-Gujarati" -## [263] "Devanagari-Gurmukhi" -## [264] "Devanagari-Kannada" -## [265] "Devanagari-Latin" -## [266] "Devanagari-Malayalam" -## [267] "Devanagari-Oriya" -## [268] "Devanagari-Tamil" -## [269] "Devanagari-Telugu" -## [270] "Digit-Tone" -## [271] "dsb-dsb_FONIPA" -## [272] "dv-dv_Latn/BGN" -## [273] "el-el_Latn/BGN" -## [274] "el-Lower" -## [275] "el-Title" -## [276] "el-Upper" -## [277] "eo-am" -## [278] "eo-ar" -## [279] "eo-chr" -## [280] "eo-eo_FONIPA" -## [281] "eo-fa" -## [282] "es_419-am" -## [283] "es_419-ar" -## [284] "es_419-chr" -## [285] "es_419-fa" -## [286] "es_419-ja" -## [287] "es_419-zh" -## [288] "es_FONIPA-am" -## [289] "es_FONIPA-es_419_FONIPA" -## [290] "es_FONIPA-ja" -## [291] "es_FONIPA-zh" -## [292] "es-am" -## [293] "es-ar" -## [294] "es-chr" -## [295] "es-es_FONIPA" -## [296] "es-fa" -## [297] "es-ja" -## [298] "es-zh" -## [299] "Ethi-Cyrl/Gutgarts" -## [300] "Ethi-Latn" -## [301] "Ethi-Latn/Aethiopi" -## [302] "Ethi-Latn/Aethiopi_Geminate" -## [303] "Ethi-Latn/ALALOC" -## [304] "Ethi-Latn/ALALOC_Geminate" -## [305] "Ethi-Latn/Beta_Metsehaf" -## [306] "Ethi-Latn/Beta_Metsehaf_Geminate" -## [307] "Ethi-Latn/ES3842" -## [308] "Ethi-Latn/IES_JES_1964" -## [309] "Ethi-Latn/IES_JES_1964_Geminate" -## [310] "Ethi-Latn/Lambdin" -## [311] "Ethi-Latn/SERA" -## [312] "Ethi-Latn/Williamson" -## [313] "Ethi-sgw_Ethi/Gurage_2013" -## [314] "Ethiopic-Braille/Amharic" -## [315] "Ethiopic-Cyrillic/Gutgarts" -## [316] "Ethiopic-Ethiopic/Gurage" -## [317] "Ethiopic-Latin" -## [318] "Ethiopic-Latin/Aethiopica" -## [319] "Ethiopic-Latin/Aethiopica_Geminate" -## [320] "Ethiopic-Latin/ALALOC" -## [321] "Ethiopic-Latin/ALALOC_Geminate" -## [322] "Ethiopic-Latin/Beta_Metsehaf" -## [323] "Ethiopic-Latin/BetaMetsehaf_Geminate" -## [324] "Ethiopic-Latin/ES3842" -## [325] "Ethiopic-Latin/IES_JES_1964" -## [326] "Ethiopic-Latin/IES_JES_1964_Geminate" -## [327] "Ethiopic-Latin/Lambdin" -## [328] "Ethiopic-Latin/SERA" -## [329] "Ethiopic-Latin/Tekie_Alibekit" -## [330] "Ethiopic-Latin/Williamson" -## [331] "Ethiopic-Latin/Xaleget" -## [332] "fa-fa_FONIPA" -## [333] "fa-fa_Latn/BGN" -## [334] "Fullwidth-Halfwidth" -## [335] "Geez-Ethiopic" -## [336] "Geez-Musnad" -## [337] "Geor-Latn" -## [338] "Georgian-Latin" -## [339] "Georgian-Latin/BGN" -## [340] "Greek-Latin" -## [341] "Greek-Latin/BGN" -## [342] "Greek-Latin/UNGEGN" -## [343] "Grek-Latn" -## [344] "Grek-Latn/UNGEGN" -## [345] "Gujarati-Arabic" -## [346] "Gujarati-Bengali" -## [347] "Gujarati-Devanagari" -## [348] "Gujarati-Gurmukhi" -## [349] "Gujarati-Kannada" -## [350] "Gujarati-Latin" -## [351] "Gujarati-Malayalam" -## [352] "Gujarati-Oriya" -## [353] "Gujarati-Tamil" -## [354] "Gujarati-Telugu" -## [355] "Gujr-Arab" -## [356] "Gujr-Beng" -## [357] "Gujr-Deva" -## [358] "Gujr-Guru" -## [359] "Gujr-Knda" -## [360] "Gujr-Latn" -## [361] "Gujr-Mlym" -## [362] "Gujr-Orya" -## [363] "Gujr-Taml" -## [364] "Gujr-Telu" -## [365] "Gujr-ur" -## [366] "Gurage-Ethiopic" -## [367] "Gurmukhi-Arabic" -## [368] "Gurmukhi-Bengali" -## [369] "Gurmukhi-Devanagari" -## [370] "Gurmukhi-Gujarati" -## [371] "Gurmukhi-Kannada" -## [372] "Gurmukhi-Latin" -## [373] "Gurmukhi-Malayalam" -## [374] "Gurmukhi-Oriya" -## [375] "Gurmukhi-Tamil" -## [376] "Gurmukhi-Telugu" -## [377] "Guru-Arab" -## [378] "Guru-Beng" -## [379] "Guru-Deva" -## [380] "Guru-Gujr" -## [381] "Guru-Knda" -## [382] "Guru-Latn" -## [383] "Guru-Mlym" -## [384] "Guru-Orya" -## [385] "Guru-Taml" -## [386] "Guru-Telu" -## [387] "Guru-ur" -## [388] "gz_Ethi-Sarb" -## [389] "ha-ha_NE" -## [390] "Halfwidth-Fullwidth" -## [391] "Han-Latin" -## [392] "Han-Latin/Names" -## [393] "Hang-Latn" -## [394] "Hangul-Latin" -## [395] "Hani-Latn" -## [396] "Hans-Hant" -## [397] "Hant-Hans" -## [398] "he-he_Latn/BGN" -## [399] "Hebr-Latn" -## [400] "Hebrew-Latin" -## [401] "Hebrew-Latin/BGN" -## [402] "Hex-Any" -## [403] "Hex-Any/C" -## [404] "Hex-Any/Java" -## [405] "Hex-Any/Perl" -## [406] "Hex-Any/Unicode" -## [407] "Hex-Any/XML" -## [408] "Hex-Any/XML10" -## [409] "Hira-Kana" -## [410] "Hira-Latn" -## [411] "Hiragana-Katakana" -## [412] "Hiragana-Latin" -## [413] "hy_AREVMDA-am" -## [414] "hy_AREVMDA-ar" -## [415] "hy_AREVMDA-chr" -## [416] "hy_AREVMDA-fa" -## [417] "hy_AREVMDA-hy_AREVMDA_FONIPA" -## [418] "hy-am" -## [419] "hy-ar" -## [420] "hy-chr" -## [421] "hy-fa" -## [422] "hy-hy_FONIPA" -## [423] "hy-hy_Latn/BGN" -## [424] "ia-am" -## [425] "ia-ar" -## [426] "ia-chr" -## [427] "ia-fa" -## [428] "ia-ia_FONIPA" -## [429] "IPA-XSampa" -## [430] "it-am" -## [431] "it-ja" -## [432] "ja_Hrkt-ja_Latn/BGN" -## [433] "ja_Latn-ko" -## [434] "ja_Latn-ru" -## [435] "Jamo-Latin" -## [436] "Jamo-Latn" -## [437] "ka-ka_Latn/BGN" -## [438] "ka-ka_Latn/BGN_1981" -## [439] "Kana-Hira" -## [440] "Kana-Latn" -## [441] "Kannada-Arabic" -## [442] "Kannada-Bengali" -## [443] "Kannada-Devanagari" -## [444] "Kannada-Gujarati" -## [445] "Kannada-Gurmukhi" -## [446] "Kannada-Latin" -## [447] "Kannada-Malayalam" -## [448] "Kannada-Oriya" -## [449] "Kannada-Tamil" -## [450] "Kannada-Telugu" -## [451] "Katakana-Hiragana" -## [452] "Katakana-Latin" -## [453] "Katakana-Latin/BGN" -## [454] "Kazakh-Latin/BGN" -## [455] "Kirghiz-Latin/BGN" -## [456] "kk-am" -## [457] "kk-ar" -## [458] "kk-chr" -## [459] "kk-fa" -## [460] "kk-kk_FONIPA" -## [461] "kk-kk_Latn/BGN" -## [462] "Knda-Arab" -## [463] "Knda-Beng" -## [464] "Knda-Deva" -## [465] "Knda-Gujr" -## [466] "Knda-Guru" -## [467] "Knda-Latn" -## [468] "Knda-Mlym" -## [469] "Knda-Orya" -## [470] "Knda-Taml" -## [471] "Knda-Telu" -## [472] "Knda-ur" -## [473] "ko-ko_Latn/BGN" -## [474] "Korean-Latin/BGN" -## [475] "ky-am" -## [476] "ky-ar" -## [477] "ky-chr" -## [478] "ky-fa" -## [479] "ky-ky_FONIPA" -## [480] "ky-ky_Latn/BGN" -## [481] "la-la_FONIPA" -## [482] "Latin-Arabic" -## [483] "Latin-Armenian" -## [484] "Latin-ASCII" -## [485] "Latin-Bengali" -## [486] "Latin-Bopomofo" -## [487] "Latin-CanadianAboriginal" -## [488] "Latin-Cyrillic" -## [489] "Latin-Devanagari" -## [490] "Latin-Ethiopic" -## [491] "Latin-Ethiopic/Aethiopica" -## [492] "Latin-Ethiopic/ALALOC" -## [493] "Latin-Ethiopic/Beta_Metsehaf" -## [494] "Latin-Ethiopic/IES_JES_1964" -## [495] "Latin-Ethiopic/Lambdin" -## [496] "Latin-Ethiopic/SERA" -## [497] "Latin-Ethiopic/Tekie_Alibekit" -## [498] "Latin-Ethiopic/Williamson" -## [499] "Latin-Ethiopic/Xalaget" -## [500] "Latin-Georgian" -## [501] "Latin-Greek" -## [502] "Latin-Greek/UNGEGN" -## [503] "Latin-Gujarati" -## [504] "Latin-Gurmukhi" -## [505] "Latin-Hangul" -## [506] "Latin-Hebrew" -## [507] "Latin-Hiragana" -## [508] "Latin-Jamo" -## [509] "Latin-Kannada" -## [510] "Latin-Katakana" -## [511] "Latin-Malayalam" -## [512] "Latin-NumericPinyin" -## [513] "Latin-Oriya" -## [514] "Latin-Russian/BGN" -## [515] "Latin-Syriac" -## [516] "Latin-Tamil" -## [517] "Latin-Telugu" -## [518] "Latin-Thaana" -## [519] "Latin-Thai" -## [520] "Latn-Arab" -## [521] "Latn-Armn" -## [522] "Latn-Beng" -## [523] "Latn-Bopo" -## [524] "Latn-Cans" -## [525] "Latn-Cyrl" -## [526] "Latn-Deva" -## [527] "Latn-Ethi" -## [528] "Latn-Ethi/Aethiopi" -## [529] "Latn-Ethi/ALALOC" -## [530] "Latn-Ethi/Beta_Metsehaf" -## [531] "Latn-Ethi/IES_JES_1964" -## [532] "Latn-Ethi/Lambdin" -## [533] "Latn-Ethi/SERA" -## [534] "Latn-Ethi/Williamson" -## [535] "Latn-Geor" -## [536] "Latn-Grek" -## [537] "Latn-Grek/UNGEGN" -## [538] "Latn-Gujr" -## [539] "Latn-Guru" -## [540] "Latn-Hang" -## [541] "Latn-Hebr" -## [542] "Latn-Hira" -## [543] "Latn-Jamo" -## [544] "Latn-Kana" -## [545] "Latn-Knda" -## [546] "Latn-Mlym" -## [547] "Latn-Orya" -## [548] "Latn-Syrc" -## [549] "Latn-Taml" -## [550] "Latn-Telu" -## [551] "Latn-Thaa" -## [552] "Latn-Thai" -## [553] "lt-Lower" -## [554] "lt-Title" -## [555] "lt-Upper" -## [556] "Macedonian-Latin/BGN" -## [557] "Malayalam-Arabic" -## [558] "Malayalam-Bengali" -## [559] "Malayalam-Devanagari" -## [560] "Malayalam-Gujarati" -## [561] "Malayalam-Gurmukhi" -## [562] "Malayalam-Kannada" -## [563] "Malayalam-Latin" -## [564] "Malayalam-Oriya" -## [565] "Malayalam-Tamil" -## [566] "Malayalam-Telugu" -## [567] "Maldivian-Latin/BGN" -## [568] "mk-mk_Latn/BGN" -## [569] "Mlym-Arab" -## [570] "Mlym-Beng" -## [571] "Mlym-Deva" -## [572] "Mlym-Gujr" -## [573] "Mlym-Guru" -## [574] "Mlym-Knda" -## [575] "Mlym-Latn" -## [576] "Mlym-Orya" -## [577] "Mlym-Taml" -## [578] "Mlym-Telu" -## [579] "Mlym-ur" -## [580] "mn-mn_Latn/BGN" -## [581] "mn-mn_Latn/MNS" -## [582] "Mongolian-Latin/BGN" -## [583] "my-am" -## [584] "my-ar" -## [585] "my-chr" -## [586] "my-fa" -## [587] "my-my_FONIPA" -## [588] "my-my_Latn" -## [589] "my-Zawgyi" -## [590] "Myanmar-Latin" -## [591] "Name-Any" -## [592] "nl-Title" -## [593] "NumericPinyin-Latin" -## [594] "NumericPinyin-Pinyin" -## [595] "nv-nv_FONIPA" -## [596] "Oriya-Arabic" -## [597] "Oriya-Bengali" -## [598] "Oriya-Devanagari" -## [599] "Oriya-Gujarati" -## [600] "Oriya-Gurmukhi" -## [601] "Oriya-Kannada" -## [602] "Oriya-Latin" -## [603] "Oriya-Malayalam" -## [604] "Oriya-Tamil" -## [605] "Oriya-Telugu" -## [606] "Orya-Arab" -## [607] "Orya-Beng" -## [608] "Orya-Deva" -## [609] "Orya-Gujr" -## [610] "Orya-Guru" -## [611] "Orya-Knda" -## [612] "Orya-Latn" -## [613] "Orya-Mlym" -## [614] "Orya-Taml" -## [615] "Orya-Telu" -## [616] "Orya-ur" -## [617] "Pashto-Latin/BGN" -## [618] "Persian-Latin/BGN" -## [619] "Pinyin-NumericPinyin" -## [620] "pl_FONIPA-ja" -## [621] "pl-am" -## [622] "pl-ar" -## [623] "pl-chr" -## [624] "pl-fa" -## [625] "pl-ja" -## [626] "pl-pl_FONIPA" -## [627] "ps-ps_Latn/BGN" -## [628] "Publishing-Any" -## [629] "rm_SURSILV-am" -## [630] "rm_SURSILV-ar" -## [631] "rm_SURSILV-chr" -## [632] "rm_SURSILV-fa" -## [633] "rm_SURSILV-rm_FONIPA_SURSILV" -## [634] "ro_FONIPA-ja" -## [635] "ro-am" -## [636] "ro-ar" -## [637] "ro-chr" -## [638] "ro-fa" -## [639] "ro-ja" -## [640] "ro-ro_FONIPA" -## [641] "ru_Latn-ru/BGN" -## [642] "ru-ja" -## [643] "ru-ru_Latn/BGN" -## [644] "ru-zh" -## [645] "Russian-Latin/BGN" -## [646] "Sarb-gz_Ethi" -## [647] "sat_Olck-sat_FONIPA" -## [648] "sat-am" -## [649] "sat-ar" -## [650] "sat-chr" -## [651] "sat-fa" -## [652] "Serbian-Latin/BGN" -## [653] "sgw_Ethi-Ethi/Gurage_2013" -## [654] "si-am" -## [655] "si-ar" -## [656] "si-chr" -## [657] "si-fa" -## [658] "si-si_FONIPA" -## [659] "si-si_Latn" -## [660] "Simplified-Traditional" -## [661] "sk_FONIPA-ja" -## [662] "sk-am" -## [663] "sk-ar" -## [664] "sk-chr" -## [665] "sk-fa" -## [666] "sk-ja" -## [667] "sk-sk_FONIPA" -## [668] "sr-sr_Latn/BGN" -## [669] "Syrc-Latn" -## [670] "Syriac-Latin" -## [671] "ta-ta_FONIPA" -## [672] "Tamil-Arabic" -## [673] "Tamil-Bengali" -## [674] "Tamil-Devanagari" -## [675] "Tamil-Gujarati" -## [676] "Tamil-Gurmukhi" -## [677] "Tamil-Kannada" -## [678] "Tamil-Latin" -## [679] "Tamil-Malayalam" -## [680] "Tamil-Oriya" -## [681] "Tamil-Telugu" -## [682] "Taml-Arab" -## [683] "Taml-Beng" -## [684] "Taml-Deva" -## [685] "Taml-Gujr" -## [686] "Taml-Guru" -## [687] "Taml-Knda" -## [688] "Taml-Latn" -## [689] "Taml-Mlym" -## [690] "Taml-Orya" -## [691] "Taml-Telu" -## [692] "Taml-ur" -## [693] "Telu-Arab" -## [694] "Telu-Beng" -## [695] "Telu-Deva" -## [696] "Telu-Gujr" -## [697] "Telu-Guru" -## [698] "Telu-Knda" -## [699] "Telu-Latn" -## [700] "Telu-Mlym" -## [701] "Telu-Orya" -## [702] "Telu-Taml" -## [703] "Telu-ur" -## [704] "Telugu-Arabic" -## [705] "Telugu-Bengali" -## [706] "Telugu-Devanagari" -## [707] "Telugu-Gujarati" -## [708] "Telugu-Gurmukhi" -## [709] "Telugu-Kannada" -## [710] "Telugu-Latin" -## [711] "Telugu-Malayalam" -## [712] "Telugu-Oriya" -## [713] "Telugu-Tamil" -## [714] "Thaa-Latn" -## [715] "Thaana-Latin" -## [716] "Thai-Latin" -## [717] "Thai-Latn" -## [718] "tk_Cyrl-tk/BGN" -## [719] "tlh-am" -## [720] "tlh-ar" -## [721] "tlh-chr" -## [722] "tlh-fa" -## [723] "tlh-tlh_FONIPA" -## [724] "Tone-Digit" -## [725] "tr-Lower" -## [726] "tr-Title" -## [727] "tr-Upper" -## [728] "Traditional-Simplified" -## [729] "Turkmen-Latin/BGN" -## [730] "ug-ug_FONIPA" -## [731] "uk-uk_Latn/BGN" -## [732] "Ukrainian-Latin/BGN" -## [733] "und_FONIPA-ar" -## [734] "und_FONIPA-chr" -## [735] "und_FONIPA-fa" -## [736] "und_FONIPA-und_FONXSAMP" -## [737] "und_FONXSAMP-und_FONIPA" -## [738] "uz_Cyrl-uz_Latn" -## [739] "uz_Cyrl-uz/BGN" -## [740] "uz_Latn-uz_Cyrl" -## [741] "Uzbek-Latin/BGN" -## [742] "vec-vec_FONIPA" -## [743] "xh-am" -## [744] "xh-ar" -## [745] "xh-chr" -## [746] "xh-fa" -## [747] "xh-xh_FONIPA" -## [748] "XSampa-IPA" -## [749] "yo-yo_BJ" -## [750] "Zawgyi-my" -## [751] "zh_Latn_PINYIN-ru" -## [752] "zu-am" -## [753] "zu-ar" -## [754] "zu-chr" -## [755] "zu-fa" -## [756] "zu-zu_FONIPA" -``` diff --git a/.devel/sphinx/rapi/stri_trans_nf.md b/.devel/sphinx/rapi/stri_trans_nf.md deleted file mode 100644 index e027a683..00000000 --- a/.devel/sphinx/rapi/stri_trans_nf.md +++ /dev/null @@ -1,108 +0,0 @@ -# stri_trans_nf: Perform or Check For Unicode Normalization - -## Description - -These functions convert strings to NFC, NFKC, NFD, NFKD, or NFKC_Casefold Unicode Normalization Form or check whether strings are normalized. - -## Usage - -``` r -stri_trans_nfc(str) - -stri_trans_nfd(str) - -stri_trans_nfkd(str) - -stri_trans_nfkc(str) - -stri_trans_nfkc_casefold(str) - -stri_trans_isnfc(str) - -stri_trans_isnfd(str) - -stri_trans_isnfkd(str) - -stri_trans_isnfkc(str) - -stri_trans_isnfkc_casefold(str) -``` - -## Arguments - -| | | -|-------|--------------------------------| -| `str` | character vector to be encoded | - -## Details - -Unicode Normalization Forms are formally defined normalizations of Unicode strings which, e.g., make possible to determine whether any two strings are equivalent. Essentially, the Unicode Normalization Algorithm puts all combining marks in a specified order, and uses rules for decomposition and composition to transform each string into one of the Unicode Normalization Forms. - -The following Normalization Forms (NFs) are supported: - -- NFC (Canonical Decomposition, followed by Canonical Composition), - -- NFD (Canonical Decomposition), - -- NFKC (Compatibility Decomposition, followed by Canonical Composition), - -- NFKD (Compatibility Decomposition), - -- NFKC_Casefold (combination of NFKC, case folding, and removing ignorable characters which was introduced with Unicode 5.2). - -Note that many W3C Specifications recommend using NFC for all content, because this form avoids potential interoperability problems arising from the use of canonically equivalent, yet different, character sequences in document formats on the Web. Thus, you will rather not use these functions in typical string processing activities. Most often you may assume that a string is in NFC, see RFC5198. - -As usual in stringi, if the input character vector is in the native encoding, it will be automatically converted to UTF-8. - -For more general text transforms refer to [`stri_trans_general`](stri_trans_general.md). - -## Value - -The `stri_trans_nf*` functions return a character vector of the same length as input (the output is always in UTF-8). - -`stri_trans_isnf*` return a logical vector. - -## Author(s) - -[Marek Gagolewski](https://www.gagolewski.com/) and other contributors - -## References - -*Unicode Normalization Forms* -- Unicode Standard Annex #15, - -*Unicode Format for Network Interchange* -- RFC5198, - -*Character Model for the World Wide Web 1.0: Normalization* -- W3C Working Draft, - -*Normalization* -- ICU User Guide, (technical details) - -*Unicode Equivalence* -- Wikipedia, - -## See Also - -The official online manual of stringi at - -Gagolewski M., stringi: Fast and portable character string processing in R, *Journal of Statistical Software* 103(2), 2022, 1-59, [doi:10.18637/jss.v103.i02](https://doi.org/10.18637/jss.v103.i02) - -Other transform: [`stri_trans_char()`](stri_trans_char.md), [`stri_trans_general()`](stri_trans_general.md), [`stri_trans_list()`](stri_trans_list.md), [`stri_trans_tolower()`](stri_trans_casemap.md) - -## Examples - - - - -```r -stri_trans_nfd('\u0105') # a with ogonek -> a, ogonek -``` - -``` -## [1] "ą" -``` - -```r -stri_trans_nfkc('\ufdfa') # 1 codepoint -> 18 codepoints -``` - -``` -## [1] "صلى الله عليه وسلم" -``` diff --git a/.devel/sphinx/rapi/stri_trim.md b/.devel/sphinx/rapi/stri_trim.md deleted file mode 100644 index 75309716..00000000 --- a/.devel/sphinx/rapi/stri_trim.md +++ /dev/null @@ -1,100 +0,0 @@ -# stri_trim: Trim Characters from the Left and/or Right Side of a String - -## Description - -These functions may be used, e.g., to remove unnecessary white-spaces from strings. Trimming ends at the first or starts at the last `pattern` match. - -## Usage - -``` r -stri_trim_both(str, pattern = "\\P{Wspace}", negate = FALSE) - -stri_trim_left(str, pattern = "\\P{Wspace}", negate = FALSE) - -stri_trim_right(str, pattern = "\\P{Wspace}", negate = FALSE) - -stri_trim( - str, - side = c("both", "left", "right"), - pattern = "\\P{Wspace}", - negate = FALSE -) -``` - -## Arguments - -| | | -|-----------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| `str` | a character vector of strings to be trimmed | -| `pattern` | a single pattern, specifying the class of characters (see [stringi-search-charclass](about_search_charclass.md)) to to be preserved (if `negate` is `FALSE`; default) or trimmed (otherwise) | -| `negate` | either `TRUE` or `FALSE`; see `pattern` | -| `side` | character \[`stri_trim` only\]; defaults to `'both'` | - -## Details - -Vectorized over `str` and `pattern`. - -`stri_trim` is a convenience wrapper over `stri_trim_left` and `stri_trim_right`. - -Contrary to many other string processing libraries, our trimming functions are universal. The class of characters to be retained or trimmed can be adjusted. - -For replacing pattern matches with an arbitrary replacement string, see [`stri_replace`](stri_replace.md). - -Trimming can also be used where you would normally rely on regular expressions. For instance, you may get `'23.5'` out of `'total of 23.5 bitcoins'`. - -For trimming white-spaces, please note the difference between Unicode binary property \'`\p{Wspace}`\' (more universal) and general character category \'`\p{Z}`\', see [stringi-search-charclass](about_search_charclass.md). - -## Value - -All functions return a character vector. - -## Author(s) - -[Marek Gagolewski](https://www.gagolewski.com/) and other contributors - -## See Also - -The official online manual of stringi at - -Gagolewski M., stringi: Fast and portable character string processing in R, *Journal of Statistical Software* 103(2), 2022, 1-59, [doi:10.18637/jss.v103.i02](https://doi.org/10.18637/jss.v103.i02) - -Other search_replace: [`about_search`](about_search.md), [`stri_replace_all()`](stri_replace.md), [`stri_replace_rstr()`](stri_replace_rstr.md) - -Other search_charclass: [`about_search_charclass`](about_search_charclass.md), [`about_search`](about_search.md) - -## Examples - - - - -```r -stri_trim_left(' aaa') -``` - -``` -## [1] "aaa" -``` - -```r -stri_trim_right('r-project.org/', '\\P{P}') -``` - -``` -## [1] "r-project.org" -``` - -```r -stri_trim_both(' Total of 23.5 bitcoins. ', '\\p{N}') -``` - -``` -## [1] "23.5" -``` - -```r -stri_trim_both(' Total of 23.5 bitcoins. ', '\\P{N}', negate=TRUE) -``` - -``` -## [1] "23.5" -``` diff --git a/.devel/sphinx/rapi/stri_unescape_unicode.md b/.devel/sphinx/rapi/stri_unescape_unicode.md deleted file mode 100644 index 3c57cd25..00000000 --- a/.devel/sphinx/rapi/stri_unescape_unicode.md +++ /dev/null @@ -1,56 +0,0 @@ -# stri_unescape_unicode: Un-escape All Escape Sequences - -## Description - -Un-escapes all known escape sequences - -## Usage - -``` r -stri_unescape_unicode(str) -``` - -## Arguments - -| | | -|-------|------------------| -| `str` | character vector | - -## Details - -Uses ICU facilities to un-escape Unicode character sequences. - -The following ASCII standard escapes are recognized: `\a`, `\b`, `\t`, `\n`, `\v`, `\?`, `\e`, `\f`, `\r`, `\"`, `\'`, `\\`. - -Moreover, the function understands the following ones: `\uXXXX` (4 hex digits), `\UXXXXXXXX` (8 hex digits), `\xXX` (1-2 hex digits), `\ooo` (1-3 octal digits), `\cX` (control-X; X is masked with 0x1F). For `\xXX` and `\ooo`, beware of non-valid UTF-8 byte sequences. - -Note that some versions of R on Windows cannot handle characters defined with `\UXXXXXXXX`. We are working on that. - -## Value - -Returns a character vector. If an escape sequence is ill-formed, result will be `NA` and a warning will be given. - -## Author(s) - -[Marek Gagolewski](https://www.gagolewski.com/) and other contributors - -## See Also - -The official online manual of stringi at - -Gagolewski M., stringi: Fast and portable character string processing in R, *Journal of Statistical Software* 103(2), 2022, 1-59, [doi:10.18637/jss.v103.i02](https://doi.org/10.18637/jss.v103.i02) - -Other escape: [`stri_escape_unicode()`](stri_escape_unicode.md) - -## Examples - - - - -```r -stri_unescape_unicode('a\\u0105!\\u0032\\n') -``` - -``` -## [1] "aą!2\n" -``` diff --git a/.devel/sphinx/rapi/stri_unique.md b/.devel/sphinx/rapi/stri_unique.md deleted file mode 100644 index 23f037f8..00000000 --- a/.devel/sphinx/rapi/stri_unique.md +++ /dev/null @@ -1,75 +0,0 @@ -# stri_unique: Extract Unique Elements - -## Description - -This function returns a character vector like `str`, but with duplicate elements removed. - -## Usage - -``` r -stri_unique(str, ..., opts_collator = NULL) -``` - -## Arguments - -| | | -|-----------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------| -| `str` | a character vector | -| `...` | additional settings for `opts_collator` | -| `opts_collator` | a named list with ICU Collator\'s options, see [`stri_opts_collator`](stri_opts_collator.md), `NULL` for default collation options | - -## Details - -As usual in stringi, no attributes are copied. Unlike [`unique`](https://stat.ethz.ch/R-manual/R-devel/library/base/html/unique.html), this function tests for canonical equivalence of strings (and not whether the strings are just bytewise equal). Such an operation is locale-dependent. Hence, `stri_unique` is significantly slower (but much better suited for natural language processing) than its base R counterpart. - -See also [`stri_duplicated`](stri_duplicated.md) for indicating non-unique elements. - -## Value - -Returns a character vector. - -## Author(s) - -[Marek Gagolewski](https://www.gagolewski.com/) and other contributors - -## References - -*Collation* - ICU User Guide, - -## See Also - -The official online manual of stringi at - -Gagolewski M., stringi: Fast and portable character string processing in R, *Journal of Statistical Software* 103(2), 2022, 1-59, [doi:10.18637/jss.v103.i02](https://doi.org/10.18637/jss.v103.i02) - -Other locale_sensitive: [`%s<%()`](+25s+3C+25.md), [`about_locale`](about_locale.md), [`about_search_boundaries`](about_search_boundaries.md), [`about_search_coll`](about_search_coll.md), [`stri_compare()`](stri_compare.md), [`stri_count_boundaries()`](stri_count_boundaries.md), [`stri_duplicated()`](stri_duplicated.md), [`stri_enc_detect2()`](stri_enc_detect2.md), [`stri_extract_all_boundaries()`](stri_extract_boundaries.md), [`stri_locate_all_boundaries()`](stri_locate_boundaries.md), [`stri_opts_collator()`](stri_opts_collator.md), [`stri_order()`](stri_order.md), [`stri_rank()`](stri_rank.md), [`stri_sort_key()`](stri_sort_key.md), [`stri_sort()`](stri_sort.md), [`stri_split_boundaries()`](stri_split_boundaries.md), [`stri_trans_tolower()`](stri_trans_casemap.md), [`stri_wrap()`](stri_wrap.md) - -## Examples - - - - -```r -# normalized and non-Unicode-normalized version of the same code point: -stri_unique(c('\u0105', stri_trans_nfkd('\u0105'))) -``` - -``` -## [1] "ą" -``` - -```r -unique(c('\u0105', stri_trans_nfkd('\u0105'))) -``` - -``` -## [1] "ą" "ą" -``` - -```r -stri_unique(c('gro\u00df', 'GROSS', 'Gro\u00df', 'Gross'), strength=1) -``` - -``` -## [1] "groß" -``` diff --git a/.devel/sphinx/rapi/stri_width.md b/.devel/sphinx/rapi/stri_width.md deleted file mode 100644 index ac4b8a5e..00000000 --- a/.devel/sphinx/rapi/stri_width.md +++ /dev/null @@ -1,104 +0,0 @@ -# stri_width: Determine the Width of Code Points - -## Description - -Approximates the number of text columns the \'cat()\' function might use to print a string using a mono-spaced font. - -## Usage - -``` r -stri_width(str) -``` - -## Arguments - -| | | -|-------|--------------------------------------------| -| `str` | character vector or an object coercible to | - -## Details - -The Unicode standard does not formalize the notion of a character width. Roughly based on , , and UAX #11 we proceed as follows. The following code points are of width 0: - -- code points with general category (see [stringi-search-charclass](about_search_charclass.md)) `Me`, `Mn`, and `Cf`), - -- `C0` and `C1` control codes (general category `Cc`) - for compatibility with the [`nchar`](https://stat.ethz.ch/R-manual/R-devel/library/base/html/nchar.html) function, - -- Hangul Jamo medial vowels and final consonants (code points with enumerable property `UCHAR_HANGUL_SYLLABLE_TYPE` equal to `U_HST_VOWEL_JAMO` or `U_HST_TRAILING_JAMO`; note that applying the NFC normalization with [`stri_trans_nfc`](stri_trans_nf.md) is encouraged), - -- ZERO WIDTH SPACE (U+200B), - -Characters with the `UCHAR_EAST_ASIAN_WIDTH` enumerable property equal to `U_EA_FULLWIDTH` or `U_EA_WIDE` are of width 2. - -Most emojis and characters with general category So (other symbols) are of width 2. - -SOFT HYPHEN (U+00AD) (for compatibility with [`nchar`](https://stat.ethz.ch/R-manual/R-devel/library/base/html/nchar.html)) as well as any other characters have width 1. - -## Value - -Returns an integer vector of the same length as `str`. - -## Author(s) - -[Marek Gagolewski](https://www.gagolewski.com/) and other contributors - -## References - -*East Asian Width* -- Unicode Standard Annex #11, - -## See Also - -The official online manual of stringi at - -Gagolewski M., stringi: Fast and portable character string processing in R, *Journal of Statistical Software* 103(2), 2022, 1-59, [doi:10.18637/jss.v103.i02](https://doi.org/10.18637/jss.v103.i02) - -Other length: [`%s$%()`](+25s+24+25.md), [`stri_isempty()`](stri_isempty.md), [`stri_length()`](stri_length.md), [`stri_numbytes()`](stri_numbytes.md), [`stri_pad_both()`](stri_pad.md), [`stri_sprintf()`](stri_sprintf.md) - -## Examples - - - - -```r -stri_width(LETTERS[1:5]) -``` - -``` -## [1] 1 1 1 1 1 -``` - -```r -stri_width(stri_trans_nfkd('\u0105')) -``` - -``` -## [1] 1 -``` - -```r -stri_width(stri_trans_nfkd('\U0001F606')) -``` - -``` -## [1] 2 -``` - -```r -stri_width( # Full-width equivalents of ASCII characters: - stri_enc_fromutf32(as.list(c(0x3000, 0xFF01:0xFF5E))) -) -``` - -``` -## [1] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 -## [39] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 -## [77] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 -``` - -```r -stri_width(stri_trans_nfkd('\ubc1f')) # includes Hangul Jamo medial vowels and final consonants -``` - -``` -## [1] 2 -``` diff --git a/.devel/sphinx/rapi/stri_wrap.md b/.devel/sphinx/rapi/stri_wrap.md deleted file mode 100644 index bb56c51a..00000000 --- a/.devel/sphinx/rapi/stri_wrap.md +++ /dev/null @@ -1,133 +0,0 @@ -# stri_wrap: Word Wrap Text to Format Paragraphs - -## Description - -This function breaks text paragraphs into lines, of total width (if it is possible) at most given `width`. - -## Usage - -``` r -stri_wrap( - str, - width = floor(0.9 * getOption("width")), - cost_exponent = 2, - simplify = TRUE, - normalize = TRUE, - normalise = normalize, - indent = 0, - exdent = 0, - prefix = "", - initial = prefix, - whitespace_only = FALSE, - use_length = FALSE, - locale = NULL -) -``` - -## Arguments - -| | | -|---------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| `str` | character vector of strings to reformat | -| `width` | single integer giving the suggested maximal total width/number of code points per line | -| `cost_exponent` | single numeric value, values not greater than zero will select a greedy word-wrapping algorithm; otherwise this value denotes the exponent in the cost function of a (more aesthetic) dynamic programming-based algorithm (values in \[2, 3\] are recommended) | -| `simplify` | single logical value, see Value | -| `normalize` | single logical value, see Details | -| `normalise` | alias of `normalize` | -| `indent` | single non-negative integer; gives the indentation of the first line in each paragraph | -| `exdent` | single non-negative integer; specifies the indentation of subsequent lines in paragraphs | -| `prefix`, `initial` | single strings; `prefix` is used as prefix for each line except the first, for which `initial` is utilized | -| `whitespace_only` | single logical value; allow breaks only at white-spaces? if `FALSE`, ICU\'s line break iterator is used to split text into words, which is suitable for natural language processing | -| `use_length` | single logical value; should the number of code points be used instead of the total code point width (see [`stri_width`](stri_width.md))? | -| `locale` | `NULL` or `''` for text boundary analysis following the conventions of the default locale, or a single string with locale identifier, see [stringi-locale](about_locale.md) | - -## Details - -Vectorized over `str`. - -If `whitespace_only` is `FALSE`, then ICU\'s line-`BreakIterator` is used to determine text boundaries where a line break is possible. This is a locale-dependent operation. Otherwise, the breaks are only at white-spaces. - -Note that Unicode code points may have various widths when printed on the console and that this function, by default, takes that into account. By changing the state of the `use_length` argument, this function starts to act as if each code point was of width 1. - -If `normalize` is `FALSE`, then multiple white spaces between the word boundaries are preserved within each wrapped line. In such a case, none of the strings can contain `\r`, `\n`, or other new line characters, otherwise you will get an error. You should split the input text into lines or, for example, substitute line breaks with spaces before applying this function. - -If `normalize` is `TRUE`, then all consecutive white space (ASCII space, horizontal TAB, CR, LF) sequences are replaced with single ASCII spaces before actual string wrapping. Moreover, [`stri_split_lines`](stri_split_lines.md) and [`stri_trans_nfc`](stri_trans_nf.md) is called on the input character vector. This is for compatibility with [`strwrap`](https://stat.ethz.ch/R-manual/R-devel/library/base/html/strwrap.html). - -The greedy algorithm (for `cost_exponent` being non-positive) provides a very simple way for word wrapping. It always puts as many words in each line as possible. This method -- contrary to the dynamic algorithm -- does not minimize the number of space left at the end of every line. The dynamic algorithm (a.k.a. Knuth\'s word wrapping algorithm) is more complex, but it returns text wrapped in a more aesthetic way. This method minimizes the squared (by default, see `cost_exponent`) number of spaces (raggedness) at the end of each line, so the text is mode arranged evenly. Note that the cost of printing the last line is always zero. - -## Value - -If `simplify` is `TRUE`, then a character vector is returned. Otherwise, you will get a list of `length(str)` character vectors. - -## Author(s) - -[Marek Gagolewski](https://www.gagolewski.com/) and other contributors - -## References - -D.E. Knuth, M.F. Plass, Breaking paragraphs into lines, *Software: Practice and Experience* 11(11), 1981, pp. 1119--1184. - -## See Also - -The official online manual of stringi at - -Gagolewski M., stringi: Fast and portable character string processing in R, *Journal of Statistical Software* 103(2), 2022, 1-59, [doi:10.18637/jss.v103.i02](https://doi.org/10.18637/jss.v103.i02) - -Other locale_sensitive: [`%s<%()`](+25s+3C+25.md), [`about_locale`](about_locale.md), [`about_search_boundaries`](about_search_boundaries.md), [`about_search_coll`](about_search_coll.md), [`stri_compare()`](stri_compare.md), [`stri_count_boundaries()`](stri_count_boundaries.md), [`stri_duplicated()`](stri_duplicated.md), [`stri_enc_detect2()`](stri_enc_detect2.md), [`stri_extract_all_boundaries()`](stri_extract_boundaries.md), [`stri_locate_all_boundaries()`](stri_locate_boundaries.md), [`stri_opts_collator()`](stri_opts_collator.md), [`stri_order()`](stri_order.md), [`stri_rank()`](stri_rank.md), [`stri_sort_key()`](stri_sort_key.md), [`stri_sort()`](stri_sort.md), [`stri_split_boundaries()`](stri_split_boundaries.md), [`stri_trans_tolower()`](stri_trans_casemap.md), [`stri_unique()`](stri_unique.md) - -Other text_boundaries: [`about_search_boundaries`](about_search_boundaries.md), [`about_search`](about_search.md), [`stri_count_boundaries()`](stri_count_boundaries.md), [`stri_extract_all_boundaries()`](stri_extract_boundaries.md), [`stri_locate_all_boundaries()`](stri_locate_boundaries.md), [`stri_opts_brkiter()`](stri_opts_brkiter.md), [`stri_split_boundaries()`](stri_split_boundaries.md), [`stri_split_lines()`](stri_split_lines.md), [`stri_trans_tolower()`](stri_trans_casemap.md) - -## Examples - - - - -```r -s <- stri_paste( - 'Lorem ipsum dolor sit amet, consectetur adipisicing elit. Proin ', - 'nibh augue, suscipit a, scelerisque sed, lacinia in, mi. Cras vel ', - 'lorem. Etiam pellentesque aliquet tellus.') -cat(stri_wrap(s, 20, 0.0), sep='\n') # greedy -``` - -``` -## Lorem ipsum dolor -## sit amet, -## consectetur -## adipisicing elit. -## Proin nibh augue, -## suscipit a, -## scelerisque sed, -## lacinia in, mi. Cras -## vel lorem. Etiam -## pellentesque aliquet -## tellus. -``` - -```r -cat(stri_wrap(s, 20, 2.0), sep='\n') # dynamic -``` - -``` -## Lorem ipsum -## dolor sit amet, -## consectetur -## adipisicing elit. -## Proin nibh augue, -## suscipit a, -## scelerisque sed, -## lacinia in, mi. Cras -## vel lorem. Etiam -## pellentesque aliquet -## tellus. -``` - -```r -cat(stri_pad(stri_wrap(s), side='both'), sep='\n') -``` - -``` -## Lorem ipsum dolor sit amet, consectetur adipisicing elit. Proin nibh -## augue, suscipit a, scelerisque sed, lacinia in, mi. Cras vel lorem. -## Etiam pellentesque aliquet tellus. -``` diff --git a/.devel/sphinx/rapi/stri_write_lines.md b/.devel/sphinx/rapi/stri_write_lines.md deleted file mode 100644 index 7c0dad5d..00000000 --- a/.devel/sphinx/rapi/stri_write_lines.md +++ /dev/null @@ -1,49 +0,0 @@ -# stri_write_lines: Write Text Lines to a Text File - -## Description - -Writes a text file is such a way that each element of a given character vector becomes a separate text line. - -## Usage - -``` r -stri_write_lines( - str, - con, - encoding = "UTF-8", - sep = ifelse(.Platform$OS.type == "windows", "\r\n", "\n"), - fname = con -) -``` - -## Arguments - -| | | -|------------|----------------------------------------------------------------------------| -| `str` | character vector with data to write | -| `con` | name of the output file or a connection object (opened in the binary mode) | -| `encoding` | output encoding, `NULL` or `''` for the current default one | -| `sep` | newline separator | -| `fname` | deprecated alias of `con` | - -## Details - -It is a substitute for the **R** [`writeLines`](https://stat.ethz.ch/R-manual/R-devel/library/base/html/writeLines.html) function, with the ability to easily re-encode the output. - -We suggest using the UTF-8 encoding for all text files: thus, it is the default one for the output. - -## Value - -This function returns nothing noteworthy. - -## Author(s) - -[Marek Gagolewski](https://www.gagolewski.com/) and other contributors - -## See Also - -The official online manual of stringi at - -Gagolewski M., stringi: Fast and portable character string processing in R, *Journal of Statistical Software* 103(2), 2022, 1-59, [doi:10.18637/jss.v103.i02](https://doi.org/10.18637/jss.v103.i02) - -Other files: [`stri_read_lines()`](stri_read_lines.md), [`stri_read_raw()`](stri_read_raw.md) diff --git a/.devel/tinytest.R b/.devel/tinytest.R index 1dc3425b..43bb5521 100644 --- a/.devel/tinytest.R +++ b/.devel/tinytest.R @@ -26,4 +26,5 @@ if (testWarnings) { } rm(testWarnings) +warnings() cat(stri_info(short=TRUE), "\n") diff --git a/.devel/tinytest/test-count-coll.R b/.devel/tinytest/test-count-coll.R index 59799824..61853f46 100644 --- a/.devel/tinytest/test-count-coll.R +++ b/.devel/tinytest/test-count-coll.R @@ -25,8 +25,25 @@ expect_equivalent(stri_count_coll("bababababaab", "aab"), 1L) # stri_opts_collator tests: -expect_equivalent(stri_count_coll("bababababaab", "aab", opts_collator = stri_opts_collator(locale = "UNKNOWN")), +expect_equivalent(suppressWarnings(stri_count_coll("bababababaab", "aab", opts_collator = stri_opts_collator(locale = "UNKNOWN"))), 1L) +expect_warning(stri_count_coll("bababababaab", "aab", opts_collator = stri_opts_collator(locale = "UNKNOWN"))) + +expect_equivalent(stri_count_coll("bababababaab", "aab", opts_collator = stri_opts_collator(locale = "C")), + 1L) + +old_loc <- stri_locale_set("UNKNOWN") +expect_warning(stri_count_coll("bababababaab", "aab")) +stri_locale_set(old_loc) + +expect_equivalent(stri_count_coll("bababababaab", "aab", opts_collator = stri_opts_collator(locale = "C")), + 1L) + +old_loc <- stri_locale_set("C") +expect_equivalent(stri_count_coll("bababababaab", "aab"), 1L) +stri_locale_set(old_loc) + + expect_equivalent(stri_count_coll("bababababaab", "aab", opts_collator = stri_opts_collator(strength = -100)), 1L) expect_error(stri_count_coll("bababababaab", "aab", opts_collator = stri_opts_collator(strength = 100))) diff --git a/.devel/tinytest/test-uloc.R b/.devel/tinytest/test-uloc.R index 54a66bc1..771d2254 100644 --- a/.devel/tinytest/test-uloc.R +++ b/.devel/tinytest/test-uloc.R @@ -4,7 +4,7 @@ library("stringi") expect_true(length(stri_locale_list()) > 0) -suppressMessages(stri_locale_set("XX_YY")) -suppressMessages(expect_true(substr(stri_locale_set("pl_PL"), 1, 5) == "xx_YY")) -suppressMessages(expect_true(substr(stri_locale_set("pl_PL"), 1, 5) == "pl_PL")) - +suppressMessages(old_loc <- stri_locale_set("XX_YY")) +suppressMessages(expect_true(stri_locale_set("pl_PL") == "xx_YY")) +suppressMessages(expect_true(stri_locale_set("C") == "pl_PL")) +suppressMessages(expect_true(stri_locale_set(old_loc) == "en_US_POSIX")) diff --git a/.github/workflows/r-icu-system.yml b/.github/workflows/r-icu-system.yml index 00fc183c..2e282f8a 100644 --- a/.github/workflows/r-icu-system.yml +++ b/.github/workflows/r-icu-system.yml @@ -33,3 +33,5 @@ jobs: - name: Test stringi run: | Rscript -e 'source(".devel/tinytest.R")' + LC_ALL="C" Rscript -e 'source(".devel/tinytest.R")' + LC_ALL="pl_PL" Rscript -e 'source(".devel/tinytest.R")' diff --git a/DESCRIPTION b/DESCRIPTION index 0acc66cb..d9021b77 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -1,6 +1,6 @@ Package: stringi Version: 1.7.9003 -Date: 2023-11-06 +Date: 2023-11-07 Title: Fast and Portable Character String Processing Facilities Description: A collection of character string/text/natural language processing tools for pattern searching (e.g., with 'Java'-like regular diff --git a/NEWS b/NEWS index bceccb7c..0bbce66c 100644 --- a/NEWS +++ b/NEWS @@ -5,8 +5,12 @@ * [GENERAL] ICU bundle updated to version 74.1 (Unicode 15.1, CLDR 44). +* [NEW FEATURE] #476: A warning is emitted when selecting an unknown locale + for collation as it most likely indicates that a wrong resource is being used. + Furthermore, the `C` locale identifier now resolves to `en_US_POSIX`. + * [BUILD TIME] As per the suggestion of Prof. Brian Ripley, `icudt74l` - (ICU data -- little endian) is now included in the source tarball (compressed + (ICU data - little endian) is now included in the source tarball (compressed with xz to save space). This allows for building *stringi* on systems with no internet access. @@ -25,8 +29,6 @@ * [BUGFIX] TODO.... #469: `stri_datetime_parse` did not reset the `Calendar` object when parsing multiple dates. -* [NEW FEATURE] TODO... #476 U_USING_DEFAULT_ERROR on unknown locales - ## 1.7.12 (2023-01-09) diff --git a/R/locale.R b/R/locale.R index a91658c5..2757dfd8 100644 --- a/R/locale.R +++ b/R/locale.R @@ -1,7 +1,7 @@ # kate: default-dictionary en_US ## This file is part of the 'stringi' package for R. -## Copyright (c) 2013-2021, Marek Gagolewski +## Copyright (c) 2013-2023, Marek Gagolewski ## All rights reserved. ## ## Redistribution and use in source and binary forms, with or without @@ -87,26 +87,25 @@ #' For a list of locales that are recognized by \pkg{ICU}, #' call \code{\link{stri_locale_list}}. #' +#' Note that in \pkg{stringi}, 'C' is a synonym of `en_US_POSIX`. +#' #' #' @section A Note on Default Locales: #' #' Each locale-sensitive function in \pkg{stringi} #' selects the current default locale if an empty string or \code{NULL} #' is provided as its \code{locale} argument. Default locales are available -#' to all the functions: -#' they are initially set to be the system locale on that platform, -#' and may be changed with \code{\link{stri_locale_set}}, -#' for example, if automatic detection fails to recognize -#' your locale properly. -#' -#' It is suggested that your program should avoid changing -#' the default locale. +#' to all the functions; initially, the system locale on that platform is used, +#' but it may be changed by calling \code{\link{stri_locale_set}}. +#' +#' Your program should avoid changing the default locale. #' All locale-sensitive functions may request #' any desired locale per-call (by specifying the \code{locale} argument), #' i.e., without referencing to the default locale. #' During many tests, however, we did not observe any improper #' behavior of \pkg{stringi} while using a modified default locale. #' +#' #' @section Locale-Sensitive Functions in \pkg{stringi}: #' #' One of many examples of locale-dependent services is the Collator, which @@ -115,6 +114,10 @@ #' for the description on how to tune its settings, and its \code{locale} #' argument in particular. #' +#' When choosing a resource bundle that is not available in the requested +#' locale nor in its more general variants (e.g., `es_ES` vs `es`), +#' a warning is emitted. +#' #' Other locale-sensitive functions include, e.g., #' \code{\link{stri_trans_tolower}} (that does character case mapping). #' diff --git a/man/about_locale.Rd b/man/about_locale.Rd index 96360dd9..ff9bffed 100644 --- a/man/about_locale.Rd +++ b/man/about_locale.Rd @@ -57,6 +57,8 @@ more information, refer to the ICU user guide. For a list of locales that are recognized by \pkg{ICU}, call \code{\link{stri_locale_list}}. + +Note that in \pkg{stringi}, 'C' is a synonym of `en_US_POSIX`. } \section{A Note on Default Locales}{ @@ -65,14 +67,10 @@ call \code{\link{stri_locale_list}}. Each locale-sensitive function in \pkg{stringi} selects the current default locale if an empty string or \code{NULL} is provided as its \code{locale} argument. Default locales are available -to all the functions: -they are initially set to be the system locale on that platform, -and may be changed with \code{\link{stri_locale_set}}, -for example, if automatic detection fails to recognize -your locale properly. - -It is suggested that your program should avoid changing -the default locale. +to all the functions; initially, the system locale on that platform is used, +but it may be changed by calling \code{\link{stri_locale_set}}. + +Your program should avoid changing the default locale. All locale-sensitive functions may request any desired locale per-call (by specifying the \code{locale} argument), i.e., without referencing to the default locale. @@ -89,6 +87,10 @@ ordering, sorting, and searching. See \code{\link{stri_opts_collator}} for the description on how to tune its settings, and its \code{locale} argument in particular. +When choosing a resource bundle that is not available in the requested +locale nor in its more general variants (e.g., `es_ES` vs `es`), +a warning is emitted. + Other locale-sensitive functions include, e.g., \code{\link{stri_trans_tolower}} (that does character case mapping). } diff --git a/src/stri_brkiter.cpp b/src/stri_brkiter.cpp index b6954f45..b2d7cd05 100644 --- a/src/stri_brkiter.cpp +++ b/src/stri_brkiter.cpp @@ -1,5 +1,5 @@ /* This file is part of the 'stringi' project. - * Copyright (c) 2013-2021, Marek Gagolewski + * Copyright (c) 2013-2023, Marek Gagolewski * All rights reserved. * * Redistribution and use in source and binary forms, with or without @@ -65,7 +65,7 @@ void StriBrkIterOptions::setType(SEXP opts_brkiter, const char* _default) { SEXP names = Rf_getAttrib(opts_brkiter, R_NamesSymbol); if (names == R_NilValue || LENGTH(names) != narg) Rf_error(MSG__INCORRECT_BRKITER_OPTION_SPEC); // error() allowed here - // search for "locale" option + // search for "type" option for (R_len_t i=0; i + * Copyright (c) 2013-2023, Marek Gagolewski * All rights reserved. * * Redistribution and use in source and binary forms, with or without @@ -62,7 +62,8 @@ class StriBrkIterOptions { private: - void setEmptyOpts() { + void setEmptyOpts() + { locale = NULL; type = UBRK_CHARACTER; skip_rules = NULL; @@ -136,6 +137,14 @@ class StriUBreakIterator : public StriBrkIterOptions { } } STRI__CHECKICUSTATUS_THROW(status, {/* do nothing special on err */}) + + // NOTE: this is too invasive, there are very few dedicated brkiters! + // if (status == U_USING_DEFAULT_WARNING && uiterator) { + // UErrorCode status2 = U_ZERO_ERROR; + // const char* valid_locale = ubrk_getLocaleByType(uiterator, ULOC_VALID_LOCALE, &status2); + // if (valid_locale && !strcmp(valid_locale, "root")) + // Rf_warning(ICUError::getICUerrorName(status)); + // } } @@ -238,8 +247,17 @@ class StriRuleBasedBreakIterator : public StriBrkIterOptions { default: throw StriException(MSG__INTERNAL_ERROR); } + } STRI__CHECKICUSTATUS_THROW(status, {/* do nothing special on err */}) + + // NOTE: this is too invasive, there are very few dedicated brkiters! + // if (status == U_USING_DEFAULT_WARNING && rbiterator) { + // UErrorCode status2 = U_ZERO_ERROR; + // const char* valid_locale = rbiterator->getLocaleID(ULOC_VALID_LOCALE, status2); + // if (valid_locale && !strcmp(valid_locale, "root")) + // Rf_warning(ICUError::getICUerrorName(status)); + // } } bool ignoreBoundary(); diff --git a/src/stri_collator.cpp b/src/stri_collator.cpp index f3203e3d..28bc46e0 100644 --- a/src/stri_collator.cpp +++ b/src/stri_collator.cpp @@ -1,5 +1,5 @@ /* This file is part of the 'stringi' project. - * Copyright (c) 2013-2021, Marek Gagolewski + * Copyright (c) 2013-2023, Marek Gagolewski * All rights reserved. * * Redistribution and use in source and binary forms, with or without @@ -65,6 +65,9 @@ * * @version 1.1.6 (Marek Gagolewski, 2017-11-10) * PROTECT STRING_ELT(names, i) + * + * @version 1.8.1 (Marek Gagolewski, 2023-11-07) + * #476: Warn when falling back to the root locale, make C==en_US_POSIX */ UCollator* stri__ucol_open(SEXP opts_collator) { @@ -73,10 +76,22 @@ UCollator* stri__ucol_open(SEXP opts_collator) R_len_t narg = Rf_isNull(opts_collator)?0:LENGTH(opts_collator); + const char* default_locale = stri__prepare_arg_locale(R_NilValue, "locale", true); + if (narg <= 0) { // no custom settings - use default Collator UErrorCode status = U_ZERO_ERROR; - UCollator* col = ucol_open(uloc_getDefault(), &status); + UCollator* col = ucol_open(default_locale, &status); STRI__CHECKICUSTATUS_RFERROR(status, {/* do nothing special on err */}) // error() allowed here + + if (status == U_USING_DEFAULT_WARNING) { + UErrorCode status2 = U_ZERO_ERROR; + const char* valid_locale = ucol_getLocaleByType(col, ULOC_VALID_LOCALE, &status2); + if (valid_locale && !strcmp(valid_locale, "root")) + Rf_warning(ICUError::getICUerrorName(status)); + } + // else if (status == U_USING_FALLBACK_WARNING) // warning on this would be too invasive + // Rf_warning(ICUError::getICUerrorName(status)); + return col; } @@ -94,7 +109,7 @@ UCollator* stri__ucol_open(SEXP opts_collator) UColAttributeValue opt_STRENGTH = UCOL_DEFAULT_STRENGTH; UColAttributeValue opt_NUMERIC_COLLATION = UCOL_DEFAULT; // USearchAttributeValue opt_OVERLAP = USEARCH_OFF; - const char* opt_LOCALE = uloc_getDefault(); + const char* opt_LOCALE = default_locale; for (R_len_t i=0; i + * Copyright (c) 2013-2023, Marek Gagolewski * All rights reserved. * * Redistribution and use in source and binary forms, with or without diff --git a/src/stri_exception.cpp b/src/stri_exception.cpp index 5f0ddba6..5c813f7f 100644 --- a/src/stri_exception.cpp +++ b/src/stri_exception.cpp @@ -54,9 +54,9 @@ const char* ICUError::getICUerrorName(UErrorCode status) { switch(status) { case U_USING_FALLBACK_WARNING: - return "A resource bundle lookup returned a fallback result. (not an error)"; + return "A resource bundle lookup returned a result from a fallback (more general) locale."; // (not an error) case U_USING_DEFAULT_WARNING: - return "A resource bundle lookup returned a result from the root locale. (not an error)"; + return "A resource bundle lookup returned a result either from the root or the default locale."; // (not an error) case U_SAFECLONE_ALLOCATED_WARNING: return "A SafeClone operation required allocating memory. (informational only)"; case U_STATE_OLD_WARNING: diff --git a/src/stri_external.h b/src/stri_external.h index 0d0e2595..30e7d286 100644 --- a/src/stri_external.h +++ b/src/stri_external.h @@ -1,5 +1,5 @@ /* This file is part of the 'stringi' project. - * Copyright (c) 2013-2022, Marek Gagolewski + * Copyright (c) 2013-2023, Marek Gagolewski * All rights reserved. * * Redistribution and use in source and binary forms, with or without @@ -60,6 +60,7 @@ #include #include #include +#include using namespace icu; // #define USE_RINTERNALS removed 2021-08-12 - do not use anymore diff --git a/src/stri_prepare_arg.cpp b/src/stri_prepare_arg.cpp index 2a372666..882b0b8f 100644 --- a/src/stri_prepare_arg.cpp +++ b/src/stri_prepare_arg.cpp @@ -1455,6 +1455,9 @@ const char* stri__prepare_arg_string_1_notNA(SEXP x, const char* argname) * * @version 1.5.4 (Marek Gagolewski, 2021-04-07) * BUGFIX: locale='' is the default + * + * @version 1.8.1 (Marek Gagolewski, 2023-11-07) + * C is an alias of en_US_POSIX */ const char* stri__prepare_arg_locale( SEXP loc, @@ -1462,8 +1465,12 @@ const char* stri__prepare_arg_locale( bool allowdefault, bool allowna ) { + const char* default_locale = uloc_getDefault(); + if (!strcmp(default_locale, "C") || !strcmp(default_locale, "c")) + default_locale = "en_US_POSIX"; + if (allowdefault && Rf_isNull(loc)) - return uloc_getDefault(); + return default_locale; else { PROTECT(loc = stri__prepare_arg_string_1(loc, argname)); if (STRING_ELT(loc, 0) == NA_STRING) { @@ -1471,11 +1478,15 @@ const char* stri__prepare_arg_locale( if (allowna) return NULL; else Rf_error(MSG__ARG_EXPECTED_NOT_NA, argname); // Rf_error allowed here } - if (strlen((const char*)CHAR(STRING_ELT(loc, 0))) == 0) { + else if (strlen((const char*)CHAR(STRING_ELT(loc, 0))) == 0) { UNPROTECT(1); - if (allowdefault) return uloc_getDefault(); + if (allowdefault) return default_locale; else Rf_error(MSG__LOCALE_INCORRECT_ID); // Rf_error allowed here } + else if (!strcmp(CHAR(STRING_ELT(loc, 0)), "C") || !strcmp(CHAR(STRING_ELT(loc, 0)), "c")) { + UNPROTECT(1); + return "en_US_POSIX"; + } UErrorCode err = U_ZERO_ERROR; char buf[ULOC_FULLNAME_CAPACITY]; @@ -1498,7 +1509,7 @@ const char* stri__prepare_arg_locale( if (ret_n == 0) { UNPROTECT(1); - if (allowdefault) return uloc_getDefault(); + if (allowdefault) return default_locale; else Rf_error(MSG__LOCALE_INCORRECT_ID); // Rf_error allowed here } @@ -1508,7 +1519,7 @@ const char* stri__prepare_arg_locale( UNPROTECT(1); Rf_error(MSG__LOCALE_INCORRECT_ID); } - const char* ret_default = uloc_getDefault(); + const char* ret_default = default_locale; R_len_t ret_detault_n = strlen(ret_default); const char* ret_tmp2 = ret; ret = R_alloc(ret_detault_n+ret_n+1, (int)sizeof(char)); @@ -1520,7 +1531,7 @@ const char* stri__prepare_arg_locale( return ret; } - // won't come here anyway + // won't arrive here anyway return NULL; // avoid compiler warning } @@ -1580,7 +1591,7 @@ TimeZone* stri__prepare_arg_timezone(SEXP tz, const char* argname, bool allowdef return ret; } - // won't come here anyway + // won't arrive here anyway return NULL; // avoid compiler warning } diff --git a/src/stri_stringi.h b/src/stri_stringi.h index 2a0068c0..4aae6b51 100644 --- a/src/stri_stringi.h +++ b/src/stri_stringi.h @@ -120,6 +120,7 @@ void stri__locate_set_dimnames_matrix( // date/time void stri__set_class_POSIXct(SEXP x); +Calendar* stri__get_calendar(const char* locale_val); // ------------------------------------------------------------------------ diff --git a/src/stri_time_calendar.cpp b/src/stri_time_calendar.cpp index 6bb7fdfa..d3d667ec 100644 --- a/src/stri_time_calendar.cpp +++ b/src/stri_time_calendar.cpp @@ -1,5 +1,5 @@ /* This file is part of the 'stringi' project. - * Copyright (c) 2013-2021, Marek Gagolewski + * Copyright (c) 2013-2023, Marek Gagolewski * All rights reserved. * * Redistribution and use in source and binary forms, with or without @@ -72,6 +72,31 @@ SEXP stri_datetime_now() } +/** Get calendar + * + * @return Calendar + * + * @version 1.8.1 (Marek Gagolewski, 2023-11-07) + */ +Calendar* stri__get_calendar(const char* locale_val) +{ + UErrorCode status = U_ZERO_ERROR; + Calendar* cal = Calendar::createInstance(locale_val, status); + STRI__CHECKICUSTATUS_THROW(status, {/* do nothing special on err */}) + + // NOTE: unfortunately, in ICU 74.1 U_USING_DEFAULT_WARNING is never emitted + // if (status == U_USING_DEFAULT_WARNING && cal) { + // // UErrorCode status2 = U_ZERO_ERROR; + // // const char* valid_locale = cal->getLocaleID(ULOC_VALID_LOCALE, status2); + // // if (valid_locale && !strcmp(valid_locale, "root")) + // Rf_warning(ICUError::getICUerrorName(status)); + // } + + return cal; +} + + + /** Date-time arithmetic * * @param time @@ -83,9 +108,14 @@ SEXP stri_datetime_now() * @return POSIXct * * @version 0.5-1 (Marek Gagolewski, 2014-12-30) + * * @version 0.5-1 (Marek Gagolewski, 2015-03-06) tz arg added + * + * @version 1.8.1 (Marek Gagolewski, 2023-11-07) + * #476: Warn when falling back to the root locale, make C==en_US_POSIX */ -SEXP stri_datetime_add(SEXP time, SEXP value, SEXP units, SEXP tz, SEXP locale) { +SEXP stri_datetime_add(SEXP time, SEXP value, SEXP units, SEXP tz, SEXP locale) +{ PROTECT(time = stri__prepare_arg_POSIXct(time, "time")); PROTECT(value = stri__prepare_arg_integer(value, "value")); if (!Rf_isNull(tz)) PROTECT(tz = stri__prepare_arg_string_1(tz, "tz")); @@ -136,13 +166,13 @@ SEXP stri_datetime_add(SEXP time, SEXP value, SEXP units, SEXP tz, SEXP locale) throw StriException(MSG__INCORRECT_MATCH_OPTION, "units"); } - UErrorCode status = U_ZERO_ERROR; - cal = Calendar::createInstance(locale_val, status); - STRI__CHECKICUSTATUS_THROW(status, {/* do nothing special on err */}) + + cal = stri__get_calendar(locale_val); cal->adoptTimeZone(tz_val); tz_val = NULL; /* The Calendar takes ownership of the TimeZone. */ + UErrorCode status = U_ZERO_ERROR; SEXP ret; STRI__PROTECT(ret = Rf_allocVector(REALSXP, vectorize_length)); double* ret_val = REAL(ret); @@ -199,9 +229,14 @@ SEXP stri_datetime_add(SEXP time, SEXP value, SEXP units, SEXP tz, SEXP locale) * @return list * * @version 0.5-1 (Marek Gagolewski, 2015-01-01) + * * @version 0.5-1 (Marek Gagolewski, 2015-03-03) tz arg added + * + * @version 1.8.1 (Marek Gagolewski, 2023-11-07) + * #476: Warn when falling back to the root locale, make C==en_US_POSIX */ -SEXP stri_datetime_fields(SEXP time, SEXP tz, SEXP locale) { +SEXP stri_datetime_fields(SEXP time, SEXP tz, SEXP locale) +{ PROTECT(time = stri__prepare_arg_POSIXct(time, "time")); const char* locale_val = stri__prepare_arg_locale(locale, "locale", true); if (!Rf_isNull(tz)) PROTECT(tz = stri__prepare_arg_string_1(tz, "tz")); @@ -213,13 +248,12 @@ SEXP stri_datetime_fields(SEXP time, SEXP tz, SEXP locale) { R_len_t vectorize_length = LENGTH(time); StriContainerDouble time_cont(time, vectorize_length); - UErrorCode status = U_ZERO_ERROR; - cal = Calendar::createInstance(locale_val, status); - STRI__CHECKICUSTATUS_THROW(status, {/* do nothing special on err */}) + cal = stri__get_calendar(locale_val); cal->adoptTimeZone(tz_val); tz_val = NULL; /* The Calendar takes ownership of the TimeZone. */ + UErrorCode status = U_ZERO_ERROR; SEXP ret; #define STRI__FIELDS_NUM 14 STRI__PROTECT(ret = Rf_allocVector(VECSXP, STRI__FIELDS_NUM)); @@ -349,6 +383,9 @@ SEXP stri_datetime_fields(SEXP time, SEXP tz, SEXP locale) { * @version 0.5-1 (Marek Gagolewski, 2015-01-11) lenient arg added * @version 0.5-1 (Marek Gagolewski, 2015-03-02) tz arg added * @version 1.1.2 (Marek Gagolewski, 2016-09-30) round() is not C++98 + * + * @version 1.8.1 (Marek Gagolewski, 2023-11-07) + * #476: Warn when falling back to the root locale, make C==en_US_POSIX */ SEXP stri_datetime_create(SEXP year, SEXP month, SEXP day, SEXP hour, SEXP minute, SEXP second, SEXP lenient, SEXP tz, SEXP locale) @@ -378,15 +415,14 @@ SEXP stri_datetime_create(SEXP year, SEXP month, SEXP day, SEXP hour, StriContainerInteger minute_cont(minute, vectorize_length); StriContainerDouble second_cont(second, vectorize_length); - UErrorCode status = U_ZERO_ERROR; - cal = Calendar::createInstance(locale_val, status); - STRI__CHECKICUSTATUS_THROW(status, {/* do nothing special on err */}) + cal = stri__get_calendar(locale_val); cal->setLenient(lenient_val); cal->adoptTimeZone(tz_val); tz_val = NULL; /* The Calendar takes ownership of the TimeZone. */ + UErrorCode status = U_ZERO_ERROR; SEXP ret; STRI__PROTECT(ret = Rf_allocVector(REALSXP, vectorize_length)); double* ret_val = REAL(ret); diff --git a/src/stri_time_format.cpp b/src/stri_time_format.cpp index 253e2f7d..8ebdaf52 100644 --- a/src/stri_time_format.cpp +++ b/src/stri_time_format.cpp @@ -149,7 +149,8 @@ DateFormat* stri__get_date_format( * @version 0.5-1 (Marek Gagolewski, 2015-02-22) use tz * @version 1.6.3 (Marek Gagolewski, 2021-05-24) #434: vectorise wrt format */ -SEXP stri_datetime_format(SEXP time, SEXP format, SEXP tz, SEXP locale) { +SEXP stri_datetime_format(SEXP time, SEXP format, SEXP tz, SEXP locale) +{ const char* locale_val = stri__prepare_arg_locale(locale, "locale", true); PROTECT(time = stri__prepare_arg_POSIXct(time, "time")); PROTECT(format = stri__prepare_arg_string(format, "format")); @@ -168,13 +169,12 @@ SEXP stri_datetime_format(SEXP time, SEXP format, SEXP tz, SEXP locale) { StriContainerDouble time_cont(time, vectorize_length); StriContainerUTF8 format_cont(format, vectorize_length); - UErrorCode status = U_ZERO_ERROR; - cal = Calendar::createInstance(locale_val, status); - STRI__CHECKICUSTATUS_THROW(status, {/* do nothing special on err */}) + cal = stri__get_calendar(locale_val); cal->adoptTimeZone(tz_val); tz_val = NULL; /* The Calendar takes ownership of the TimeZone. */ + UErrorCode status = U_ZERO_ERROR; SEXP ret; STRI__PROTECT(ret = Rf_allocVector(STRSXP, vectorize_length)); @@ -265,7 +265,8 @@ SEXP stri_datetime_format(SEXP time, SEXP format, SEXP tz, SEXP locale) { * @version 1.6.3 (Marek Gagolewski, 2021-05-24) #434: vectorise wrt format * @version 1.6.3 (Marek Gagolewski, 2021-06-07) empty retval should have a class too */ -SEXP stri_datetime_parse(SEXP str, SEXP format, SEXP lenient, SEXP tz, SEXP locale) { +SEXP stri_datetime_parse(SEXP str, SEXP format, SEXP lenient, SEXP tz, SEXP locale) +{ const char* locale_val = stri__prepare_arg_locale(locale, "locale", true); PROTECT(str = stri__prepare_arg_string(str, "str")); PROTECT(format = stri__prepare_arg_string(format, "format")); @@ -291,15 +292,14 @@ SEXP stri_datetime_parse(SEXP str, SEXP format, SEXP lenient, SEXP tz, SEXP loca StriContainerUTF16 str_cont(str, vectorize_length); StriContainerUTF8 format_cont(format, vectorize_length); - UErrorCode status = U_ZERO_ERROR; - cal = Calendar::createInstance(locale_val, status); - STRI__CHECKICUSTATUS_THROW(status, {/* do nothing special on err */}) + cal = stri__get_calendar(locale_val); cal->adoptTimeZone(tz_val); tz_val = NULL; /* The Calendar takes ownership of the TimeZone. */ cal->setLenient(lenient_val); + UErrorCode status = U_ZERO_ERROR; SEXP ret; STRI__PROTECT(ret = Rf_allocVector(REALSXP, vectorize_length)); diff --git a/src/stri_time_symbols.cpp b/src/stri_time_symbols.cpp index eb483eb4..5d2bdf02 100644 --- a/src/stri_time_symbols.cpp +++ b/src/stri_time_symbols.cpp @@ -50,7 +50,8 @@ * @version 0.5-1 (Marek Gagolewski, 2015-01-01) * use calendar keyword in locale */ -SEXP stri_datetime_symbols(SEXP locale, SEXP context, SEXP width) { +SEXP stri_datetime_symbols(SEXP locale, SEXP context, SEXP width) +{ const char* qloc = stri__prepare_arg_locale(locale, "locale", true); /* this is R_alloc'ed */ const char* context_str = stri__prepare_arg_string_1_notNA(context, "context"); @@ -67,7 +68,7 @@ SEXP stri_datetime_symbols(SEXP locale, SEXP context, SEXP width) { else Rf_error(MSG__INCORRECT_MATCH_OPTION, "context"); DateFormatSymbols::DtWidthType width_val; - if (width_cur == 0) width_val = DateFormatSymbols::ABBREVIATED; + if (width_cur == 0) width_val = DateFormatSymbols::ABBREVIATED; else if (width_cur == 1) width_val = DateFormatSymbols::WIDE; else if (width_cur == 2) width_val = DateFormatSymbols::NARROW; else Rf_error(MSG__INCORRECT_MATCH_OPTION, "width"); @@ -87,6 +88,14 @@ SEXP stri_datetime_symbols(SEXP locale, SEXP context, SEXP width) { sym = DateFormatSymbols(loc, calendar_type.data(), status); STRI__CHECKICUSTATUS_RFERROR(status, {/* do nothing special on err */}) + if (status == U_USING_DEFAULT_WARNING) { + //UErrorCode status2 = U_ZERO_ERROR; + //const char* valid_locale = sym.getLocale(ULOC_VALID_LOCALE, status2).getBaseName(); + // NOTE! It does not fall back to the "root" locale! + //if (valid_locale && !strcmp(valid_locale, "root")) + Rf_warning(ICUError::getICUerrorName(status)); + } + const R_len_t infosize = 5; SEXP vals; R_len_t j = -1; diff --git a/src/stri_time_zone.cpp b/src/stri_time_zone.cpp index 86a1ed52..3e7c567a 100644 --- a/src/stri_time_zone.cpp +++ b/src/stri_time_zone.cpp @@ -157,7 +157,7 @@ SEXP stri_timezone_set(SEXP tz) { } -/** Get localized time zone info +/** Get localised time zone info * * @param tz single string or NULL * @param locale single string or NULL @@ -169,7 +169,8 @@ SEXP stri_timezone_set(SEXP tz) { * @version 0.5-1 (Marek Gagolewski, 2015-03-01) * new out: WindowsID, NameDaylight, new in: display_type */ -SEXP stri_timezone_info(SEXP tz, SEXP locale, SEXP display_type) { +SEXP stri_timezone_info(SEXP tz, SEXP locale, SEXP display_type) +{ TimeZone* curtz = stri__prepare_arg_timezone(tz, "tz", R_NilValue); const char* qloc = stri__prepare_arg_locale(locale, "locale", true); /* this is R_alloc'ed */ const char* dtype_str = stri__prepare_arg_string_1_notNA(display_type, "display_type"); /* this is R_alloc'ed */ @@ -230,6 +231,11 @@ SEXP stri_timezone_info(SEXP tz, SEXP locale, SEXP display_type) { curtz->getDisplayName(false, dtype, Locale::createFromName(qloc), val_name); SET_VECTOR_ELT(vals, curidx, stri__make_character_vector_UnicodeString_ptr(1, &val_name)); + // TODO: If the display name is not available for the locale, + // then getDisplayName returns a string in the localised GMT offset format + // such as GMT[+-]HH:mm. -- we can't check+warn if it is a valid locale + // otherwise other than by comparing the output to this pattern + ++curidx; if ((bool)curtz->useDaylightTime()) { UnicodeString val_name2; diff --git a/src/stri_trans_casemap.cpp b/src/stri_trans_casemap.cpp index bee5a564..b51235a3 100644 --- a/src/stri_trans_casemap.cpp +++ b/src/stri_trans_casemap.cpp @@ -188,7 +188,7 @@ SEXP stri_trans_casemap(SEXP str, int _type, SEXP locale) const char* qloc = stri__prepare_arg_locale(locale, "locale", true); /* this is R_alloc'ed */ PROTECT(str = stri__prepare_arg_string(str, "str")); // prepare string argument -// version 0.2-1 - Does not work with ICU 4.8 (but we require ICU >= 50) + // version 0.2-1 - Does not work with ICU 4.8 (but we require ICU >= 50) UCaseMap* ucasemap = NULL; STRI__ERROR_HANDLER_BEGIN(1) @@ -196,6 +196,9 @@ SEXP stri_trans_casemap(SEXP str, int _type, SEXP locale) ucasemap = ucasemap_open(qloc, U_FOLD_CASE_DEFAULT, &status); STRI__CHECKICUSTATUS_THROW(status, {/* do nothing special on err */}) + // NOTE: we can't check if there submitted locale is valid, + // because there is no API for it [ULOC_VALID_LOCALE] + R_len_t str_n = LENGTH(str); StriContainerUTF8 str_cont(str, str_n); SEXP ret; @@ -290,7 +293,8 @@ SEXP stri_trans_casemap(SEXP str, int _type, SEXP locale) * @version 0.6-1 (Marek Gagolewski, 2015-07-11) * call stri_trans_casemap */ -SEXP stri_trans_tolower(SEXP str, SEXP locale) { +SEXP stri_trans_tolower(SEXP str, SEXP locale) +{ return stri_trans_casemap(str, STRI_CASEMAP_TOLOWER, locale); } @@ -307,7 +311,8 @@ SEXP stri_trans_tolower(SEXP str, SEXP locale) { * @version 0.6-1 (Marek Gagolewski, 2015-07-11) * call stri_trans_casemap */ -SEXP stri_trans_toupper(SEXP str, SEXP locale) { +SEXP stri_trans_toupper(SEXP str, SEXP locale) +{ return stri_trans_casemap(str, STRI_CASEMAP_TOUPPER, locale); } diff --git a/src/stri_uloc.cpp b/src/stri_uloc.cpp index f3168873..8521391b 100644 --- a/src/stri_uloc.cpp +++ b/src/stri_uloc.cpp @@ -95,7 +95,7 @@ SEXP stri_locale_info(SEXP loc) SET_VECTOR_ELT(vals, i, Rf_ScalarString(NA_STRING)); UErrorCode err = U_ZERO_ERROR; - char buf[ULOC_FULLNAME_CAPACITY]; // this is sufficient + char buf[ULOC_FULLNAME_CAPACITY]; // this is sufficient uloc_getLanguage(qloc, buf, ULOC_FULLNAME_CAPACITY, &err); if (U_FAILURE(err)) err = U_ZERO_ERROR; diff --git a/src/stri_wrap.cpp b/src/stri_wrap.cpp index 61ac89e5..5dfc40cd 100644 --- a/src/stri_wrap.cpp +++ b/src/stri_wrap.cpp @@ -274,6 +274,14 @@ SEXP stri_wrap(SEXP str, SEXP width, SEXP cost_exponent, briter = BreakIterator::createLineInstance(loc, status); STRI__CHECKICUSTATUS_THROW(status, {/* do nothing special on err */}) + // NOTE: this is too invasive, there are very few dedicated brkiters! + // if (status == U_USING_DEFAULT_WARNING) { + // UErrorCode status2 = U_ZERO_ERROR; + // const char* valid_locale = briter->getLocaleID(ULOC_VALID_LOCALE, status2); + // if (valid_locale && !strcmp(valid_locale, "root")) + // Rf_warning(ICUError::getICUerrorName(status)); + // } + R_len_t str_length = LENGTH(str); StriContainerUTF8_indexable str_cont(str, str_length); StriContainerUTF8 prefix_cont(prefix, 1);