Use `"dask"` serialization to move to/from host #256

jakirkham · 2020-03-19T03:14:58Z

Fixes #242

As all "cuda" serializable objects are now also "dask" serializable, switch to just using "dask" serialization to perform device-to-host transfers to spill to host memory. Though continue falling back to "pickle" if nothing better can be found (after all that will be on host too). Similarly "unspilling" from host-to-device will happen naturally as part of the deserialization step. Should simplify what Dask-CUDA needs to keep track of/do.

codecov-io · 2020-03-19T04:16:30Z

Codecov Report

Merging #256 into branch-0.13 will increase coverage by 0.06%.
The diff coverage is 100.00%.

@@               Coverage Diff               @@
##           branch-0.13     #256      +/-   ##
===============================================
+ Coverage        78.03%   78.09%   +0.06%     
===============================================
  Files               14       14              
  Lines              979      968      -11     
===============================================
- Hits               764      756       -8     
+ Misses             215      212       -3

Impacted Files	Coverage Δ
dask_cuda/device_host_file.py	`100.00% <100.00%> (+3.52%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a51b127...3f1577c. Read the comment docs.

pentschev

There are also some conflicts due to merging of #257

pentschev · 2020-03-19T10:48:12Z

dask_cuda/tests/test_device_host_file.py

@@ -157,9 +157,9 @@ def test_serialize_cupy_collection(collection, length, value):
    if length > 5:
        assert obj.header["serializer"] == "pickle"
    elif length > 0:
-        assert all([h["serializer"] == "cuda" for h in obj.header["sub-headers"]])
+        assert all([h["serializer"] == "dask" for h in obj.header["sub-headers"]])


It seems that there are two types of serialization for CUDA objects in Dask now:

"cuda" to serialize CUDA objects and transfer them without necessarily copying them to/from host -- e.g., can be used to transfer only __cuda_array_interface__ that something like UCX would use to ensure transfer goes over CUDA IPC;

"dask" to serialize CUDA objects and transfer them to/from host -- useful for spilling, as is the case here.

Is my understanding correct?

Generally yes.

"dask" serialization already existed before in Dask. We've just now implemented it for CUDA objects. Implementation details with references in issue ( #242 ).

jakirkham · 2020-03-20T14:52:23Z

@pentschev, any more thoughts here? 🙂

pentschev · 2020-03-20T15:16:44Z

Sorry @jakirkham I was just testing this now, and I think we have an issue with this PR, the reason it didn't fail is because the tests are marked to xfail as they would fail randomly in CI, but I could always (or almost always) have them pass in a DGX.

We have many Failed to deserialize errors, even the tests from this PR show these errors if you look at the full log -- I can't point an exact line in the text file, but if you search for Failed to deserialize you will see those errors. I should note that these errors don't happen with branch-0.13.

I'm sorry that CI masked these errors, in fact what hinted me there could be an issue was the decrease in coverage as seen in https://codecov.io/gh/rapidsai/dask-cuda/pull/256/changes.

Could you check why this is happening @jakirkham ?

pentschev · 2020-03-20T15:18:51Z

@jakirkham I should also mention that you can see the errors on a DGX too if you run pytest -vs dask_cuda/tests/test_spill.py with your branch.

jakirkham · 2020-03-20T15:20:47Z

Sure I can take a look. Should these tests not be marked as xfail then?

jakirkham · 2020-03-20T15:23:44Z

Ah I overlooked is_device_object.py. Sorry. I'll take a look at that. My guess is that module can be dropped with this change.

There may be more than that though.

pentschev · 2020-03-20T15:36:06Z

Sure I can take a look. Should these tests not be marked as xfail then?

I would say this is the goal, but we haven't been able to make the tests deterministic and there's #79 for that, but it has been pretty hard to have CI always working.

Ah I overlooked is_device_object.py. Sorry. I'll take a look at that. My guess is that module can be dropped with this change.

I don't think we can drop that, it is used to control when dask-cuda spills. IOW, it uses them to keep track of LRU cache sizes and when to move data to host and back.

jakirkham · 2020-03-21T05:37:08Z

Have improved this a bit, but there is still an issue. My guess is this has something to do with Dask splitting and compressing frames. Will look more into this next week.

Traceback (most recent call last):
  File "/conda/envs/gdf/lib/python3.7/site-packages/distributed/worker.py", line 2164, in release_key
    if key in self.data and key not in self.dep_state:
  File "/conda/envs/gdf/lib/python3.7/_collections_abc.py", line 666, in __contains__
    self[key]
  File "/var/lib/jenkins/workspace/rapidsai/gpuci/dask-cuda/prb/dask-cuda-gpu-build/dask_cuda/device_host_file.py", line 122, in __getitem__
    return self.device_buffer[key]
  File "/conda/envs/gdf/lib/python3.7/site-packages/zict/buffer.py", line 78, in __getitem__
    return self.slow_to_fast(key)
  File "/conda/envs/gdf/lib/python3.7/site-packages/zict/buffer.py", line 65, in slow_to_fast
    value = self.slow[key]
  File "/conda/envs/gdf/lib/python3.7/site-packages/zict/func.py", line 38, in __getitem__
    return self.load(self.d[key])
  File "/var/lib/jenkins/workspace/rapidsai/gpuci/dask-cuda/prb/dask-cuda-gpu-build/dask_cuda/device_host_file.py", line 57, in host_to_device
    return deserialize(s.header, s.frames)
  File "/conda/envs/gdf/lib/python3.7/site-packages/distributed/protocol/serialize.py", line 269, in deserialize
    return loads(header, frames)
  File "/conda/envs/gdf/lib/python3.7/site-packages/distributed/protocol/serialize.py", line 51, in dask_loads
    return loads(header, frames)
  File "/conda/envs/gdf/lib/python3.7/site-packages/distributed/protocol/cupy.py", line 82, in dask_deserialize_cupy_ndarray
    frames = [dask_deserialize_cuda_buffer(header, frames)]
  File "/conda/envs/gdf/lib/python3.7/site-packages/distributed/protocol/rmm.py", line 37, in dask_deserialize_rmm_device_buffer
    (frame,) = frames
ValueError: too many values to unpack (expected 1)

jakirkham · 2020-03-26T06:52:15Z

I think I've narrowed it down. It appears there may be a bug in Distributed. Basically it splits frames for compression. However it doesn't merge them back into their original structure after. Have submitted PR ( dask/distributed#3639 ) to address this.

As these serialization function names are important and are meant to be kept around, use proper names and not `_` (which is intended for values not to be kept).

As all `"cuda"` serializable objects are now also `"dask"` serializable, switch to just using `"dask"` serialization to perform device-to-host transfers to spill to host memory. Though continue falling back to `"pickle"` if nothing better can be found (after all that will be on host too). Similarly "unspilling" from host-to-device will happen naturally as part of the deserialization step. Should simplify what Dask-CUDA needs to keep track of/do.

This should ensure the frames can go through `"dask"` serialization when `DeviceSerialized` is serialized.

Note these frames are `bytes`-like. IOW C-contiguous buffers that can be easily loaded back into device memory or spilled to disk. https://docs.python.org/3/glossary.html#term-bytes-like-object

jakirkham · 2020-03-26T08:22:48Z

dask_cuda/device_host_file.py

-    frames = [copy_to_host(f) if ic else f for ic, f in zip(is_cuda, frames)]
-    return DeviceSerialized(header, frames, is_cuda)
+    header, frames = serialize(obj, serializers=["dask", "pickle"])
+    frames = [numpy.asarray(f) for f in frames]


Just a note here, the frames are memoryviews. Since we want to serialize DeviceSerialized later, which involves serializing these frames, and Dask doesn't know how to serialize a memoryview, it tries to pickle it or fails. To fix this, we construct a NumPy ndarray that merely views the underlying data in the memoryview. IOW this is quite fast and zero-copy. Since Dask knows how to serialize NumPy ndarrays (and we were using ndarrays here before), this should proceed without issues.

xref: dask/distributed#3640

Got it, this makes sense. Thanks for the description of how it works, it was really helpful to make me understand what's going on in here.

Yeah ideally we wouldn't need this, but for now its a simple enough workaround 🙂

jakirkham · 2020-03-26T09:02:13Z

I've cut this back a bit to a state where it seems to work based on local testing. CI is a bit backed up, but should come through in a bit. Please let me know if anything else is needed.

pentschev

Thanks @jakirkham , I ran the tests in a DGX again and everything passed now!

jakirkham · 2020-03-26T19:43:07Z

Thanks for double checking Peter! 😄

I'll try adding back the other changes in another PR to see if we can simplify things further.

jakirkham requested a review from a team as a code owner March 19, 2020 03:14

jakirkham force-pushed the use_dask_serialization_spill_to_host branch 2 times, most recently from 8e4bf0b to d7c2795 Compare March 19, 2020 04:12

pentschev reviewed Mar 19, 2020

View reviewed changes

madsbk approved these changes Mar 20, 2020

View reviewed changes

jakirkham force-pushed the use_dask_serialization_spill_to_host branch 7 times, most recently from 9c263a1 to a8045f3 Compare March 21, 2020 05:33

jakirkham force-pushed the use_dask_serialization_spill_to_host branch 3 times, most recently from 7bbd1d2 to e34898e Compare March 25, 2020 17:10

jakirkham force-pushed the use_dask_serialization_spill_to_host branch 2 times, most recently from f9d9a27 to 1699d0d Compare March 26, 2020 08:07

jakirkham added 4 commits March 26, 2020 01:14

Require Distributed 2.11.0+

a54004c

Rerun black on the codebase

c856539

Avoid _ in function name

f7db79b

As these serialization function names are important and are meant to be kept around, use proper names and not `_` (which is intended for values not to be kept).

jakirkham added 4 commits March 26, 2020 01:14

Coerce memoryviews to NumPy ndarrays

87992a3

This should ensure the frames can go through `"dask"` serialization when `DeviceSerialized` is serialized.

Assert the "dask" serializer is used in tests

a18fe5b

Update docs on DeviceSerialized frames

b9d277b

Note these frames are `bytes`-like. IOW C-contiguous buffers that can be easily loaded back into device memory or spilled to disk. https://docs.python.org/3/glossary.html#term-bytes-like-object

Note "dask" serialization used for spilling

3f1577c

jakirkham force-pushed the use_dask_serialization_spill_to_host branch from 1699d0d to 3f1577c Compare March 26, 2020 08:15

jakirkham commented Mar 26, 2020

View reviewed changes

jakirkham requested a review from pentschev March 26, 2020 09:09

pentschev approved these changes Mar 26, 2020

View reviewed changes

pentschev merged commit 2dab1b6 into rapidsai:branch-0.13 Mar 26, 2020

jakirkham deleted the use_dask_serialization_spill_to_host branch March 26, 2020 19:41

jakirkham mentioned this pull request Mar 27, 2020

Consolidating DtoH and HtoD operations during spilling #250

Closed

jakirkham mentioned this pull request Jun 6, 2020

Skip 2nd serialization pass of DeviceSerialized #309

Merged

jakirkham mentioned this pull request Jul 1, 2020

Evaluate further serialization performance improvements #106

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use `"dask"` serialization to move to/from host #256

Use `"dask"` serialization to move to/from host #256

jakirkham commented Mar 19, 2020

codecov-io commented Mar 19, 2020 •

edited

Loading

pentschev left a comment

pentschev Mar 19, 2020

jakirkham Mar 19, 2020

jakirkham commented Mar 20, 2020

pentschev commented Mar 20, 2020

pentschev commented Mar 20, 2020

jakirkham commented Mar 20, 2020

jakirkham commented Mar 20, 2020

pentschev commented Mar 20, 2020

jakirkham commented Mar 21, 2020

jakirkham commented Mar 26, 2020

jakirkham Mar 26, 2020 •

edited

Loading

pentschev Mar 26, 2020 •

edited

Loading

jakirkham Mar 26, 2020

jakirkham commented Mar 26, 2020

pentschev left a comment

jakirkham commented Mar 26, 2020

Use "dask" serialization to move to/from host #256

Use "dask" serialization to move to/from host #256

Conversation

jakirkham commented Mar 19, 2020

codecov-io commented Mar 19, 2020 • edited Loading

Codecov Report

pentschev left a comment

Choose a reason for hiding this comment

pentschev Mar 19, 2020

Choose a reason for hiding this comment

jakirkham Mar 19, 2020

Choose a reason for hiding this comment

jakirkham commented Mar 20, 2020

pentschev commented Mar 20, 2020

pentschev commented Mar 20, 2020

jakirkham commented Mar 20, 2020

jakirkham commented Mar 20, 2020

pentschev commented Mar 20, 2020

jakirkham commented Mar 21, 2020

jakirkham commented Mar 26, 2020

jakirkham Mar 26, 2020 • edited Loading

Choose a reason for hiding this comment

pentschev Mar 26, 2020 • edited Loading

Choose a reason for hiding this comment

jakirkham Mar 26, 2020

Choose a reason for hiding this comment

jakirkham commented Mar 26, 2020

pentschev left a comment

Choose a reason for hiding this comment

jakirkham commented Mar 26, 2020

Use `"dask"` serialization to move to/from host #256

Use `"dask"` serialization to move to/from host #256

codecov-io commented Mar 19, 2020 •

edited

Loading

jakirkham Mar 26, 2020 •

edited

Loading

pentschev Mar 26, 2020 •

edited

Loading