-
Notifications
You must be signed in to change notification settings - Fork 907
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add first-class dtype utilities #8308
Conversation
I don't think there's a way to do this without doing a pass over the entire data, which I'd be -1 for. Sadly, I think the best thing for us to do is return |
👍 on For RE: the string dtype, I agree with @shwina that scanning should be avoided. Maybe since it lives on the host checking the 0'th element wouldn't be terrible? Although I guess that doesn't guarantee they're all strings. I wonder what the plan is for that on the pandas roadmap, if there are any plans? Since they have a true |
Yeah I think we'd have to check the entire Series to be sure that it's all strings, not like some strings and some other stuff. That being the case, we may be stuck with letting that fail downstream for now. Note to self, this PR will need to account for changes in #8332 that start introducing the |
…ere the expected behavior is unclear.
Codecov Report
@@ Coverage Diff @@
## branch-21.08 #8308 +/- ##
===============================================
Coverage ? 82.59%
===============================================
Files ? 109
Lines ? 17865
Branches ? 0
===============================================
Hits ? 14755
Misses ? 3110
Partials ? 0 Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks really good to me. Fantastic work, @vyasr!
@gpucibot merge |
rerun tests |
rerun tests |
Continuation of #8308 that moves all imports of standard dtype utilities to use `cudf.api.types` or `cudf.core.dtypes` rather than `cudf.util.dtypes`. Authors: - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Ashwin Srinath (https://github.com/shwina) - GALI PREM SAGAR (https://github.com/galipremsagar) URL: #9011
This PR adds a new
cudf.api.types
module that aims to matchpandas.api.types
while providing the necessary compatibility layers forcudf
objects that is missing from the correspondingpandas
APIs. It also replaces most internal uses ofpandas.api.types
in an attempt to centralize all typing logic so that we have a single place in which to perform any special dtype handling as needed. This work is intended as a best-effort, first-pass attempt to isolate our dependence on pandas dtype APIs; while it resolves a number of incompatibilities with pandas and other unexpected behaviors, there are still a number of open questions that still need to be addressed to completely wrap this up. I've noted a number of TODOs in the code (for instance relating to our nested types or the different types of time types e.g. timedelta). Since getting all of these correct in a single PR will be almost impossible given the pervasive use of dtype utilities throughout our code, this PR is a good first step in that direction that I think we can merge and then work on incrementally fixing the outstanding issues rather than trying to perfect all our type handling here.