Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add to_lowercase and to_uppercase to PandasExpr.str namespace #455

Merged
merged 11 commits into from
Jul 9, 2024

Conversation

lucianosrp
Copy link
Member

What type of PR is this? (check all applicable)

  • πŸ’Ύ Refactor
  • ✨ Feature
  • πŸ› Bug Fix
  • πŸ”§ Optimization
  • πŸ“ Documentation
  • βœ… Test
  • 🐳 Other

Related issues

  • Related issue #
  • Closes #

Checklist

  • Code follows style guide (ruff)
  • Tests added
  • Documented the changes

If you have comments or can explain your changes, please do so below.

This feature allows to convert strings to upper and lower case variants, similar to what is already possible with polars' Expr.str.to_lowercase() and pandas' Series.str.lower()

Unicode caveat

There is a caveat for PyArrow's backend that converts the character 'ß' to uppercase 'ẞ' instead of 'SS'. There are probably few others Unicode edgecases but I think the vast majority of users won't be affected. This has been documented in the to_uppercase method and linked the issue: apache/arrow#34599

This was my first contribution here, feedback is always welcomed πŸ™

@github-actions github-actions bot added the enhancement New feature or request label Jul 9, 2024
Copy link
Member

@FBruzzesi FBruzzesi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! This is neat πŸ™ŒπŸΌ

CatCatJumpingGIF

@FBruzzesi FBruzzesi changed the title feat: add to_lowercase and to_uppercase to Expr.str namespace feat: add to_lowercase and to_uppercase to PandasExpr.str namespace Jul 9, 2024
@lucianosrp
Copy link
Member Author

Will fix the doctests ! 🫠

Copy link
Member

@MarcoGorelli MarcoGorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing! Well done @lucianosrp for figuring it all out!

I just have some really minor comments

I think we can probably also support this for pyarrow.Table, right? Fancy changing constructor to constructor_with_pyarrow in the test, and implementing this in narwhals/_arrow/series.py and narwhals/_arrow/expr.py? It should be quite similar to what you've already done

Again, seriously good work here, great to have you on board πŸ™Œ

docs/api-reference/expressions_str.md Outdated Show resolved Hide resolved
narwhals/expression.py Outdated Show resolved Hide resolved
narwhals/series.py Outdated Show resolved Hide resolved
tests/expr/str/to_uppercase_to_lowercase_test.py Outdated Show resolved Hide resolved
lucianosrp and others added 2 commits July 9, 2024 10:49
Co-authored-by: Marco Edward Gorelli <[email protected]>
@lucianosrp
Copy link
Member Author

@MarcoGorelli

I think we can probably also support this for pyarrow.Table, right? Fancy changing constructor to constructor_with_pyarrow in the test, and implementing this in narwhals/_arrow/series.py and narwhals/_arrow/expr.py? It should be quite similar to what you've already done

Ah, right! I probably have missed this..I will add that too

Comment on lines +2088 to +2091
Notes:
The PyArrow backend will convert 'ß' to 'ẞ' instead of 'SS'.
For more info see [the related issue](https://github.com/apache/arrow/issues/34599).
There may be other unicode-edge-case-related variations across implementations.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure how much we want to expand on this topic, but I would consider adding some additional information, such as mentioning that pyarrow is based on utf8proc for sake of completeness.

WDYT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, happy to take this as a follow-up! πŸ‘ for now I'll merge as-is

Copy link
Member

@MarcoGorelli MarcoGorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

amazing, thanks @lucianosrp !

we're really keen to expand pyarrow functionality so any PRs in that direction are extremely welcome!

@MarcoGorelli MarcoGorelli merged commit 86fc814 into narwhals-dev:main Jul 9, 2024
17 checks passed
@lucianosrp
Copy link
Member Author

My pleasure, @MarcoGorelli
Thanks also for the feedback @FBruzzesi !

This is an awesome tool! πŸ”₯ I am happy to help expanding it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants