-
Notifications
You must be signed in to change notification settings - Fork 118
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add to_lowercase
and to_uppercase
to PandasExpr.str
namespace
#455
feat: add to_lowercase
and to_uppercase
to PandasExpr.str
namespace
#455
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to_lowercase
and to_uppercase
to Expr.str
namespace to_lowercase
and to_uppercase
to PandasExpr.str
namespace
Will fix the doctests ! π« |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Amazing! Well done @lucianosrp for figuring it all out!
I just have some really minor comments
I think we can probably also support this for pyarrow.Table, right? Fancy changing constructor
to constructor_with_pyarrow
in the test, and implementing this in narwhals/_arrow/series.py
and narwhals/_arrow/expr.py
? It should be quite similar to what you've already done
Again, seriously good work here, great to have you on board π
Co-authored-by: Marco Edward Gorelli <[email protected]>
Co-authored-by: Marco Edward Gorelli <[email protected]>
Ah, right! I probably have missed this..I will add that too |
Notes: | ||
The PyArrow backend will convert 'Γ' to 'αΊ' instead of 'SS'. | ||
For more info see [the related issue](https://github.com/apache/arrow/issues/34599). | ||
There may be other unicode-edge-case-related variations across implementations. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure how much we want to expand on this topic, but I would consider adding some additional information, such as mentioning that pyarrow is based on utf8proc for sake of completeness.
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, happy to take this as a follow-up! π for now I'll merge as-is
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
amazing, thanks @lucianosrp !
we're really keen to expand pyarrow functionality so any PRs in that direction are extremely welcome!
My pleasure, @MarcoGorelli This is an awesome tool! π₯ I am happy to help expanding it! |
What type of PR is this? (check all applicable)
Related issues
Checklist
If you have comments or can explain your changes, please do so below.
This feature allows to convert strings to upper and lower case variants, similar to what is already possible with polars'
Expr.str.to_lowercase()
and pandas'Series.str.lower()
Unicode caveat
There is a caveat for PyArrow's backend that converts the character 'Γ' to uppercase 'αΊ' instead of 'SS'. There are probably few others Unicode edgecases but I think the vast majority of users won't be affected. This has been documented in the
to_uppercase
method and linked the issue: apache/arrow#34599This was my first contribution here, feedback is always welcomed π