-
Notifications
You must be signed in to change notification settings - Fork 920
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] pd.api.interchange.from_dataframe fails with simple cuDF dataframe #17282
Comments
Thanks for reporting! I'm looking into this. |
Thanks! Note that the PyCapsule Interface works perfectly well here, and outputs
|
The technical reason this is happening is because pandas is trying to construct dataframe columns around buffers that correspond to GPU data, triggering a segfault. That's what you get when you access the The real reason it's happening though is that there's not really a standard spec for whose responsibility it is to move the data to the current memory space (CPU in this case). There will need to be a little wider discussion on what to do more broadly in situations like these. Hopefully you are able to use |
I do not think this is a cudf bug. cudf delivers an object that obeys the dataframe protocol. However, as noted, the protocol is silent on whose responsibility it is to do cross-memory-region copies. Pandas should probably inspect the I note that in the pandas implementation it deliberately constructs numpy arrays from the raw pointers (rather than going through dlpack, which at the time was not supported). dlpack is now supported in numpy, and if you use that instead you at least get a useful error message from numpy: @MarcoGorelli: I don't think we can fix this here without deciding to eagerly copy everything to host (which we really don't want to do). Is this a bug report because you really want this to work, or to point out that the interchange protocol doesn't handle an obvious usecase? |
Thanks for your response! I reported this because I was expecting it to work, and was surprised that it didn't when I tested out Plotly with cudF At least in plotly/plotly.py#4244 it looks like people were expecting that |
OK thanks. Concretely here, because the dlpack interface does work, and the arrow C interchange protocol has gained enough adoption (and has a device version), our plan is to deprecate this interchange format (in what will be released as 25.02), and point people at those interchange options instead. I've marked this one as wontfix, because, per my reading of the spec, cudf does not have a bug here. |
thanks for looking into this! agree on deprecating - the nice thing about the interchange protocol was that it brought people from libraries together to collaborate, and that is valuable - but at this point I think it's outlived its usefulness |
Describe the bug
pd.api.interchange.from_dataframe fails with simple cuDF dataframe
Steps/Code to reproduce bug
this crashes the session https://colab.research.google.com/drive/1QXtKPcKQONi1g8WY9lI6FPZFhik_VVYg?usp=sharing
Expected behavior
it should convert to pandas dataframe with the same data
Environment overview (please complete the following information)
docker pull
&docker run
commands usedEnvironment details
Please run and paste the output of the
cudf/print_env.sh
script here, to gather any other relevant environment detailscolab notebook
Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered: