Skip to content
This repository has been archived by the owner on Feb 11, 2021. It is now read-only.

Executor using dask.distributed dask-scheduler process #11

Closed
iamed2 opened this issue Apr 24, 2017 · 3 comments
Closed

Executor using dask.distributed dask-scheduler process #11

iamed2 opened this issue Apr 24, 2017 · 3 comments

Comments

@iamed2
Copy link
Member

iamed2 commented Apr 24, 2017

Use the ideas here: dask/distributed#586

The components seem to be:

  • the dask-scheduler process
  • a client written in Julia that submits work to the scheduler and queries for state (this would be the part run in the controller process which calls dispatch! on the Executor)
  • a worker written in Julia that accepts instructions from the scheduler, fetches dependencies, executes computations, stores data, and communicates state to the scheduler

In this new Executor we could avoid pre-allocating tasks and allow for fluctuating worker resources. We would also have a stronger guarantee of efficiency.

@shashi
Copy link

shashi commented Apr 27, 2017

@mrocklin had helped me get bootstrapped with this process when I met him. Here's a gist of send_msg and recv_msg routines using MsgPack https://gist.github.com/shashi/e8f37c5f61bab4219555cd3c4fef1dc4.

@mrocklin If I understand correctly, clients (written in Julia) communicate data between themselves directly without going through Python, right? Only scheduler commands and metadata need to pass through python via MsgPack?

edit: pasted the gist link

@mrocklin
Copy link

Workers communicate data between themselves directly. The scheduler will tell them the identifier of the data (what we call a key) and a set of addresses of other workers where they can obtain that data. The worker can then use whatever mechanism it chooses to gather that data. We tend to use the same protocol for inter-worker communication as we do everywhere else, but in principle you could do whatever you wanted, as long as you inform the scheduler with another administrative message when you have obtained the data from the other worker.

The current scheduler is in Python, but no information that passes in to or out of it is Python-specific. All messages to and from the scheduler are purely msgpack and so language agnostic.

@iamed2
Copy link
Member Author

iamed2 commented Sep 7, 2017

@iamed2 iamed2 closed this as completed Sep 7, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants