Skip to content

Commit

Permalink
De-DOS line-endings (rapidsai#14880)
Browse files Browse the repository at this point in the history
These are the only two files in the repo (other than the sphinx make.bat files, which should have DOS line-endings) that use \r\n as the line-ending. Let's fix that.

Authors:
  - Lawrence Mitchell (https://github.com/wence-)

Approvers:
  - David Wendt (https://github.com/davidwendt)
  - Nghia Truong (https://github.com/ttnghia)

URL: rapidsai#14880
  • Loading branch information
wence- authored Jan 25, 2024
1 parent 7535cab commit 35011dd
Show file tree
Hide file tree
Showing 2 changed files with 315 additions and 315 deletions.
46 changes: 23 additions & 23 deletions cpp/doxygen/unicode.md
Original file line number Diff line number Diff line change
@@ -1,23 +1,23 @@
# Unicode Limitations

The strings column currently supports only UTF-8 characters internally.
For functions that require character testing (e.g. cudf::strings::all_characters_of_type()) or
case conversion (e.g. cudf::strings::capitalize(), etc) only the 16-bit [Unicode 13.0](http://www.unicode.org/versions/Unicode13.0.0)
character code-points (0-65535) values are supported.
Case conversion and character testing on characters above code-point 65535 are not supported.

Case conversions that are context-sensitive are not supported. Also, case conversions that result
in multiple characters are not reversible. That is, adjacent individual characters will not be case converted
to a single character. For example, converting character ß to upper case will result in the characters "SS". But converting "SS" to lower case will produce "ss".

Strings case and type APIs:

- cudf::strings::all_characters_of_type()
- cudf::strings::to_upper()
- cudf::strings::to_lower()
- cudf::strings::capitalize()
- cudf::strings::title()
- cudf::strings::swapcase()

Also, using regex patterns that use the shorthand character classes `\d \D \w \W \s \S` will include only appropriate characters with
code-points between (0-65535).
# Unicode Limitations

The strings column currently supports only UTF-8 characters internally.
For functions that require character testing (e.g. cudf::strings::all_characters_of_type()) or
case conversion (e.g. cudf::strings::capitalize(), etc) only the 16-bit [Unicode 13.0](http://www.unicode.org/versions/Unicode13.0.0)
character code-points (0-65535) values are supported.
Case conversion and character testing on characters above code-point 65535 are not supported.

Case conversions that are context-sensitive are not supported. Also, case conversions that result
in multiple characters are not reversible. That is, adjacent individual characters will not be case converted
to a single character. For example, converting character ß to upper case will result in the characters "SS". But converting "SS" to lower case will produce "ss".

Strings case and type APIs:

- cudf::strings::all_characters_of_type()
- cudf::strings::to_upper()
- cudf::strings::to_lower()
- cudf::strings::capitalize()
- cudf::strings::title()
- cudf::strings::swapcase()

Also, using regex patterns that use the shorthand character classes `\d \D \w \W \s \S` will include only appropriate characters with
code-points between (0-65535).
Loading

0 comments on commit 35011dd

Please sign in to comment.