Deserialization: zero-copy merge subframes when possible #5208

gjoseph92 · 2021-08-12T21:54:10Z

Better implementation of #5112.

The point of this change (copied from original PR; note I tested and the screencap behavior works with this PR too):

When merging sub-frames before deserializing, if all the sub-frames are memoryviews backed by the same underlying buffer, "merge" them without copying by making a new memoryview pointing to the same buffer, instead of always copying.

Running @crusaderky's reproducer from #5107 on main:

With this PR:

I'd always wanted to merge the memoryviews by comparing their pointers, but couldn't find an API to access that information, hence the gnarly previous PR passing around byte offsets.

Thanks to @jakirkham on #5112 (comment), I wrote the pointer arithmetic using NumPy to access the buffer interface. Then I found @mattgwwalker's great post explaining how to do exactly what we needed just using ctypes: https://mattgwwalker.wordpress.com/2020/10/15/address-of-a-buffer-in-python/.

Closes Unnecessary deep copy causes memory flare on network comms #5107
Tests added / passed
Passes black distributed / flake8 distributed / isort distributed

Use NumPy to get the pointers of the memoryviews and do pointer arithmetic. This is way more reliable and self-contained. TODO unit test for `merge_memoryviews` and deal with NumPy situation

It's very annoying: `from_buffer` requires the buffer to be writeable, because ctypes is inconsistently strict about const correctness (see https://bugs.python.org/issue11427#msg148586). I don't think there's any clever pointer-casting we could do as mentioned in that thread, since we need to access through the buffer protocol—we can't just cast `mv.obj` to `c_void_p`, since we need the pointer into the current location within the buffer, not just at the start of the buffer. So for read-only buffers, we try to use NumPy, and if it's not available, we'll end up silently(!) falling back on copying.

distributed/protocol/utils.py

distributed/protocol/serialize.py

jakirkham · 2021-08-16T19:59:06Z

Neat that's a clever way to access pointers in Python 😄 Will take a closer look (just got back and am catching up 😅)

cc @madsbk (as you may have thoughts on zero-copy frame combination)

cc @quasiben (for general awareness)

Co-authored-by: crusaderky <[email protected]>

gjoseph92 · 2021-09-08T20:18:40Z

@jakirkham do you have any more thoughts on this? @jcrist mentioned he thought frame splitting itself might be unnecessary except for websockets, and should be handled at the transport level instead.

I think this PR has become somewhat low priority because there's discussion around significantly reworking the protocol, but I do have a feeling it would make a difference for users right now.

jakirkham · 2021-09-08T21:48:24Z

I think this is still useful. Have some WIP review comments that I've fallen behind. Will try to catch back up later this week or next week.

Yeah at least on the read side we are allocating one big buffer, which we read into. Agree there's some legacy splitting in serialization that would be good to revisit. IIRC compression is one of the use cases for splitting during serialization (not sure if that is affecting us in deserialization as well).

To Jim's point, I think only frame splitting we do on for communication is baked into the transport layers already. One case is websockets. Another is working around a bug in OpenSSL 1.0.2. In either case neither of these should be in serialization. Jim if you see other issues please let us know.

fjetter · 2021-09-30T10:50:21Z

I think this is a high impact issue. Are there any major concerns left before we can merge? I get that some parts of the protocol are currently in review and might change but until then this should deliver a significant improvement. The changes are well enough isolated that we can easily revert if necessary.

friendly ping @jakirkham @madsbk @jcrist

If there are no objections within the next 24h I will go ahead and merge.

jcrist · 2021-09-30T15:13:17Z

I've given this a read through and it seems fine to me to merge as is. I still think that splitting large buffers should be done at the transport level instead of here, but for now this seems fine. Test failure is unrelated, should should be good-to-go.

jakirkham · 2021-09-30T19:17:15Z

Have some draft review comments above that I haven't had a chance to finish. Can try to finish them, but unfortunately it won't happen for a few more days. That said, feel free to merge anyways if you need this soon and can try to follow up later

gjoseph92 · 2021-10-04T23:53:56Z

@fjetter shall we merge this then? I'd love to see it in the next release.

jakirkham

Apologies for the delay. Had a few minor comments below

distributed/protocol/utils.py

Co-authored-by: jakirkham <[email protected]>

fjetter · 2021-10-07T12:43:59Z

I merged main and fixed a linting issue. will merge once green-ish

fjetter · 2021-10-07T16:01:22Z

17 tests broke on the win py3.7 no ci1 build. I'll retrigger. if this persists we might need to look into it

…copy-merge-memoryviews

gjoseph92 · 2021-10-08T18:01:57Z

@fjetter the tests you just added in #5160 are failing on 3.7 everywhere, and ubuntu 3.8, with odd TypeErrors related to the mocks

>       assert fut2.key in mocked_gather.call_args.kwargs["to_gather"]
E       TypeError: tuple indices must be integers or slices, not str

Then there's known flaky #3574 and one distributed/tests/test_asyncprocess.py::test_exit_callback failure that must just be flaky.

…copy-merge-memoryviews

gjoseph92 · 2021-11-02T04:20:44Z

This is still open. I've merged main and the test failures feel flaky to me, but I'm not sure.

jakirkham · 2021-11-02T04:38:44Z

FWIW merged @pentschev's PR ( #5484 ), which included some test fixes for UCX that should clear out the gpuCI failures seen here. May need to merge again to get those

Did see a couple unrelated test failures over there as well. Maybe @jrbourbeau will be able to let us know if these are flaky tests or if they need further investigation

madsbk

Nice work @gjoseph92, clever approach that is useful even if we move the splitting/merging to the transport layer!

pentschev · 2021-11-02T08:39:34Z

rerun tests

pentschev · 2021-11-02T08:44:04Z

gpuCI now passed, thanks for checking this @jakirkham !

…copy-merge-memoryviews

gjoseph92 · 2021-11-04T17:24:41Z

This is green besides test_reschedule_concurrent_requests_deadlock failing on all Windows machines. Also test_aliases_2 failed on macOS, which seems odd to me to be flaky, definitely doesn't seem related.

xref #5494

jrbourbeau · 2021-11-12T16:05:01Z

Apologies for the delay here. A few folks have approved this PR, so I'll plan to merge this in later today if no further comments

gjoseph92 · 2021-11-12T23:42:44Z

Thanks @jrbourbeau and everyone for reviewing! Very excited to have this in.

gjoseph92 added 8 commits August 11, 2021 19:58

Zero-copy merge subframes with pointer arithmetic

0f90ee7

Use NumPy to get the pointers of the memoryviews and do pointer arithmetic. This is way more reliable and self-contained. TODO unit test for `merge_memoryviews` and deal with NumPy situation

reorg and handle NumPy

c463a45

Use ctypes trick to eliminate NumPy dependency

a2f7243

tests; handle empty slices & base buffer type

cc3999d

ValueErrors instead of asserts

78ca265

don't need merge_subframes really

cc9cee3

skip test if numpy unavailable

22f246d

crusaderky reviewed Aug 16, 2021

View reviewed changes

gjoseph92 and others added 4 commits August 16, 2021 15:44

Simplify repeated code per Guido's suggestion

439be17

future annotations and collections.abc

0032822

test error on non-memoryview

3315441

Don't disable exception chaining

da8ee57

Co-authored-by: crusaderky <[email protected]>

jakirkham mentioned this pull request Aug 18, 2021

Spill to disk may cause data duplication #3756

Closed

jakirkham mentioned this pull request Aug 25, 2021

Thoughts on additional spilling layers #4629

Open

crusaderky approved these changes Sep 2, 2021

View reviewed changes

jakirkham mentioned this pull request Sep 8, 2021

Memory prioritization on workers #5250

Open

jakirkham reviewed Oct 5, 2021

View reviewed changes

distributed/protocol/utils.py Outdated Show resolved Hide resolved

distributed/protocol/utils.py Outdated Show resolved Hide resolved

gjoseph92 and others added 2 commits October 5, 2021 13:03

Update distributed/protocol/utils.py

82ec2ca

Co-authored-by: jakirkham <[email protected]>

Move one_byte_carr into global

edaeb7d

jakirkham approved these changes Oct 5, 2021

View reviewed changes

fjetter added 2 commits October 7, 2021 14:42

Merge remote-tracking branch 'origin/main' into pr/gjoseph92/5208-2

ccea5c4

Fix formatting

0d1e6d1

gjoseph92 added 2 commits October 8, 2021 08:51

more ci

2e0ae4e

Merge remote-tracking branch 'upstream/main' into serialization/zero-…

269ef1f

…copy-merge-memoryviews

gjoseph92 added 2 commits November 1, 2021 16:55

Merge remote-tracking branch 'upstream/main' into serialization/zero-…

6574588

…copy-merge-memoryviews

mypy

f141867

madsbk approved these changes Nov 2, 2021

View reviewed changes

Merge remote-tracking branch 'upstream/main' into serialization/zero-…

69e5cf2

…copy-merge-memoryviews

This was referenced Nov 4, 2021

test_reschedule_concurrent_requests_deadlock flaky #5494

Open

Shuffle prototype: Feedback (disk usage + workers dying) dask/dask#8294

Open

Use asyncio for TCP/TLS comms #5450

Merged

jrbourbeau changed the title ~~Deserialization: zero-copy merge subframes when possible [better than last PR]~~ Deserialization: zero-copy merge subframes when possible Nov 12, 2021

jrbourbeau merged commit 5a75023 into dask:main Nov 12, 2021

gjoseph92 deleted the serialization/zero-copy-merge-memoryviews branch November 12, 2021 23:37

jrbourbeau assigned gjoseph92 Nov 16, 2021

jakirkham mentioned this pull request Feb 1, 2022

Cythonic SchedulerState (WIP) #5176

Closed

3 tasks

crusaderky mentioned this pull request Dec 23, 2022

Deserialization of compressed data is sluggish and causes memory flares #7433

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deserialization: zero-copy merge subframes when possible #5208

Deserialization: zero-copy merge subframes when possible #5208

gjoseph92 commented Aug 12, 2021

jakirkham commented Aug 16, 2021

gjoseph92 commented Sep 8, 2021

jakirkham commented Sep 8, 2021

fjetter commented Sep 30, 2021

jcrist commented Sep 30, 2021

jakirkham commented Sep 30, 2021

gjoseph92 commented Oct 4, 2021

jakirkham left a comment

fjetter commented Oct 7, 2021

fjetter commented Oct 7, 2021

gjoseph92 commented Oct 8, 2021

gjoseph92 commented Nov 2, 2021

jakirkham commented Nov 2, 2021

madsbk left a comment

pentschev commented Nov 2, 2021

pentschev commented Nov 2, 2021

gjoseph92 commented Nov 4, 2021 •

edited

Loading

jrbourbeau commented Nov 12, 2021

gjoseph92 commented Nov 12, 2021

Deserialization: zero-copy merge subframes when possible #5208

Deserialization: zero-copy merge subframes when possible #5208

Conversation

gjoseph92 commented Aug 12, 2021

jakirkham commented Aug 16, 2021

gjoseph92 commented Sep 8, 2021

jakirkham commented Sep 8, 2021

fjetter commented Sep 30, 2021

jcrist commented Sep 30, 2021

jakirkham commented Sep 30, 2021

gjoseph92 commented Oct 4, 2021

jakirkham left a comment

Choose a reason for hiding this comment

fjetter commented Oct 7, 2021

fjetter commented Oct 7, 2021

gjoseph92 commented Oct 8, 2021

gjoseph92 commented Nov 2, 2021

jakirkham commented Nov 2, 2021

madsbk left a comment

Choose a reason for hiding this comment

pentschev commented Nov 2, 2021

pentschev commented Nov 2, 2021

gjoseph92 commented Nov 4, 2021 • edited Loading

jrbourbeau commented Nov 12, 2021

gjoseph92 commented Nov 12, 2021

gjoseph92 commented Nov 4, 2021 •

edited

Loading