-
Notifications
You must be signed in to change notification settings - Fork 590
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Interchange protocol fixes and updates #2150
Conversation
I didn't realise CI has 3.6 and 3.7 jobs. I see #1802, so assume these versions want to be generally maintained for? I have used f-stirngs in this PR, although note the df protocol implementation and tests already used them. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really happy to see this PR, sorry for taking so long to review this.
I'll push two small commits, you only need to accept 1 suggestion (if you agree)
fe57acd
to
0049efc
Compare
Forgot to update some refs of Otherwise I think I've addressed your comments now :) As an aside, might I ask if you still need to maintain for out-dated Python versions? From a selfish perspective, removing those jobs would bring CI to green 😅 |
Yeah, Python 3.6 is still being used by clients of a client, so I'm still stuck with that :) |
This seems like a legit issue :/ |
I wasn't even scrolling down the failing jobs heh, didn't notice all win+mac jobs were failing. |
0049efc
to
0c81b76
Compare
Ok, sorry this takes so long, but apart from being very busy, this was difficult to debug. The issue seems to be that In this specific case, we get the Arrow buffers via So the question is, using the |
Looking at the diff again, this seems like it was an underlying issue that's only come up because this PR handles string labels correctly for categoricals? Mind I'm pretty ignorant when it comes to memory management (seems like a nightmare to debug heh). So if this isn't an introduced bug and what was working still works, I'd recommend to just xfail the failing test cases if you don't have the bandwidth to fix this right now. I did introduce tests covering string behaviour generally, which should help prevent regressions when attempting to fix this. |
Yeah, feel free to xfail that test for now, on all platform |
Note categorical columns are still generally erroneous
0c81b76
to
c1f3bc4
Compare
Done! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks ! ❤️
The interchange protocol had some spec changes recently (data-apis/dataframe-api#74), so this PR namely updates vaex to conform with them.
Column.size()
Column.size
returns 0d arrays as opposed to Pythonint
#2093 (was useful so I could test the updates withpandas.interchange
)Column.describe_categorical
describe_categorical
in interchange columns is a tuple, not a dict #2113categories
as opposed tomapping
, which is aColumn
containing the labelsColumn.describe_categorical
behaviourconvert_categorical_column
There's new behaviour with datatimes (NaTs are now sentinel values), but they first require vaex to support interchanging datetimes generally, so would be better served in a future PR.
cc @maartenbreddels