Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support multi-char case conversion in capitalize function #8647

Merged
merged 1 commit into from
Jul 6, 2021

Conversation

davidwendt
Copy link
Contributor

Closes #8644

Multi-character case conversion support added for strings to_upper and to_lower is reused for capitalize and title functions. For example, converting from a single character ʼn to its upper-case equivalent is actually two distinct characters 'N (apostrophe and capital-N). This is different than conversion of a single multi-byte character to another single multi-byte character with different byte lengths. Here a single character is converted into two characters.

@davidwendt davidwendt added bug Something isn't working 3 - Ready for Review Ready for review by team libcudf Affects libcudf (C++/CUDA) code. strings strings issues (C++ and Python) non-breaking Non-breaking change labels Jul 2, 2021
@davidwendt davidwendt self-assigned this Jul 2, 2021
@davidwendt davidwendt requested a review from a team as a code owner July 2, 2021 16:41
@codecov
Copy link

codecov bot commented Jul 2, 2021

Codecov Report

Merging #8647 (f609933) into branch-21.08 (fba09e6) will increase coverage by 0.01%.
The diff coverage is n/a.

❗ Current head f609933 differs from pull request most recent head 4134489. Consider uploading reports for the commit 4134489 to get more accurate results
Impacted file tree graph

@@               Coverage Diff                @@
##           branch-21.08    #8647      +/-   ##
================================================
+ Coverage         10.60%   10.61%   +0.01%     
================================================
  Files               109      109              
  Lines             18280    18645     +365     
================================================
+ Hits               1938     1980      +42     
- Misses            16342    16665     +323     
Impacted Files Coverage Δ
python/cudf/cudf/io/hdf.py 0.00% <0.00%> (ø)
python/cudf/cudf/io/orc.py 0.00% <0.00%> (ø)
python/cudf/cudf/_version.py 0.00% <0.00%> (ø)
python/cudf/cudf/core/abc.py 0.00% <0.00%> (ø)
python/cudf/cudf/api/types.py 0.00% <0.00%> (ø)
python/cudf/cudf/io/dlpack.py 0.00% <0.00%> (ø)
python/cudf/cudf/core/frame.py 0.00% <0.00%> (ø)
python/cudf/cudf/core/index.py 0.00% <0.00%> (ø)
python/cudf/cudf/io/feather.py 0.00% <0.00%> (ø)
python/cudf/cudf/io/parquet.py 0.00% <0.00%> (ø)
... and 44 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update fba09e6...4134489. Read the comment docs.

@davidwendt
Copy link
Contributor Author

@firestarman will you be able to check if this fixes #8644 for you?

@firestarman
Copy link
Contributor

firestarman commented Jul 5, 2021

@firestarman will you be able to check if this fixes #8644 for you?

Thanks @davidwendt , it can fix the issue #8644 .

@harrism harrism added 5 - Ready to Merge Testing and reviews complete, ready to merge and removed 3 - Ready for Review Ready for review by team labels Jul 6, 2021
@harrism
Copy link
Member

harrism commented Jul 6, 2021

@gpucibot merge

@rapids-bot rapids-bot bot merged commit 3ee264c into rapidsai:branch-21.08 Jul 6, 2021
@davidwendt davidwendt deleted the multi-char-capitalize branch July 6, 2021 12:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
5 - Ready to Merge Testing and reviews complete, ready to merge bug Something isn't working libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change strings strings issues (C++ and Python)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] capitalize does not work correctly for the character ʼn .
4 participants