-
-
Notifications
You must be signed in to change notification settings - Fork 720
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stream spilled data directly from disk to other workers without loading into memory (sendfile
-style)
#5996
Comments
This would be in violation of LRU time locality. There are two notable exceptions where time locality is not a thing - e.g. data is simply parked on the worker:
I think it would make sense to implement spill-upon-receive as you suggest, but it would only serve this very last, and rather niche, use case. As for implementation, we could write a way to tell the SpillBuffer "add this key/value pair to the LRU heap, but instead of giving it max time weight (e.g. spill it after everything else in data.fast), give it a time weight that is lower than that of stores triggered by compute (e.g. spill it after everything you got from the AMM, but before anything you got from compute)". This could translate into immediate eviction, much like it already happens for values whose individual memory weight is larger than target * memory_limit. |
Would be tempted to just use |
Note that dask currently dynamically downgrades the pickle protocol depending on the interlocutor. |
FYI I think with #7217 we'd get much richer information about how much this change would gain us before implementing this |
New insights here #7351 (comment) suggest that this proposal may offer poor cost/benefit ratio. Note that the data was generated on a cluster of 5 workers and we should rerun the test on a much larget scale to confirm it. |
This has been discussed in a few other places, but I think it's important enough (and a separate enough task) to warrant its own issue for tracking purposes. Related to:
Once #5900 is done, when one worker requests data from another, and that data is currently spilled to disk, we should not require slurping all the data into memory just to send it. Ideally, we'd stream the serialized bytes from disk directly over the network socket, without copying into userspace at all, via something like
socket.sendfile
/sendfile(2)
. If this is not deemed possible, copying into a small userspace buffer and doing incremental sends would still be an improvement (but not ideal).(As a bonus, it could be nice if workers could also receive data directly to disk. This might let us keep fetching dependencies even when workers are under memory pressure. This might make computations complete faster when workers are under memory pressure, but probably won't make the difference between them succeeding/failing like the sending side does, so it's less important for now.)
The biggest challenge is that the comms interface doesn't currently support streams, only individual messages. The
get_data
response can also contain multiple keys, which further complicates the interface.From a technical perspective, doing
sendfile
over a socket should be easy. The challenge is just around making that feasible to do with our comms interface.I believe this is one of the highest-impact changes we can make to improve memory performance and reduce out-of-memory failures. There are a few flaws currently with spill-to-disk, but to me this is the biggest: transfer-heavy graphs basically work against spilling to disk, because workers are currently un-spilling some/most of the data they just spilled in order to send it to their peers.
cc @fjetter @crusaderky
The text was updated successfully, but these errors were encountered: