-
Notifications
You must be signed in to change notification settings - Fork 902
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix issues with _CPackedColumns.serialize()
handling of host and device data
#8759
Fix issues with _CPackedColumns.serialize()
handling of host and device data
#8759
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small Q
python/cudf/cudf/_lib/copying.pyx
Outdated
@@ -822,12 +823,10 @@ cdef class _CPackedColumns: | |||
|
|||
def serialize(self): | |||
header = {} | |||
frames = [] | |||
frames = [Buffer(self.gpu_data_ptr, self.gpu_data_size, self)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be better to directly call Buffer.serialize()
(and subsequently Buffer.deserialize()
below) ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to leave this as a Buffer
so we are able to inherit the additional methods of Serializable
; see this assertation in Serializable.device_serialize()
:
cudf/python/cudf/cudf/core/abc.py
Lines 98 to 100 in 5b8895d
assert all( | |
(type(f) in [cudf.core.buffer.Buffer, memoryview]) for f in frames | |
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry if this is a dumb question, could you elaborate: why Buffer.deserialize()
fails to inherit the additional methods?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah my mistake I was thinking of just passing the output of Buffer.deserialize()
into frames
, which wouldn't work because of the assertion above - you are correct, we should be serializing the Buffer
here, and add the resulting header to the PackedColumns
header.
Codecov Report
@@ Coverage Diff @@
## branch-21.08 #8759 +/- ##
===============================================
Coverage ? 10.52%
===============================================
Files ? 116
Lines ? 18532
Branches ? 0
===============================================
Hits ? 1951
Misses ? 16581
Partials ? 0 Continue to review full report at Codecov.
|
Buffer
for PackedColumns
serialization_CPackedColumns.serialize()
handling of host and device data
@gpucibot merge |
A few changes in here to resolve some test failures using pack/unpack with DataFrame serialization:
Buffer
inframes
to represent the device data, so that Dask can correctly perform the necessary DtoD transfers when moving a packed columns object between devicesNULL
; since Cython cannot make a memoryview fromNULL
, we now check for this condition before making the host data array