Skip to content
This repository has been archived by the owner on Oct 14, 2018. It is now read-only.

Add scheduler parameter #44

Merged
merged 5 commits into from
Apr 18, 2017
Merged

Add scheduler parameter #44

merged 5 commits into from
Apr 18, 2017

Conversation

jcrist
Copy link
Member

@jcrist jcrist commented Apr 11, 2017

Allow specifying the scheduler by name instead of passing in the get function directly. Scheduler can be one of:

  • None, to use the global scheduler, falls back to threading if not set (default)
  • A string specifying the scheduler name ("threading", "multiprocessing", or "sync")
  • A scheduler get function
  • Anything that can be passed to dask.distributed.Client.

Not fully set on this yet.

Pros:

  • Easier for less dask-savy users to specify different schedulers, feels a bit more like joblib.

Cons:

  • Doesn't match how schedulers are normally set in dask. Disconnect may feel foreign.

If we go this route, then I'd also add n_jobs as a parameter (matching scikit-learn), which would specify num_workers for the threading and multiprocessing schedulers, and be ignored by the others. Might also make n_jobs=1 for all but distributed result in the synchronous scheduler. Downside of supporting n_jobs here is we'd probably want the default to match what dask does (n_jobs = cpu_count()) instead of what scikit-learn does (n_jobs=1). I'm fine with this, but it is a difference.

If we don't go this route, then I might add a scheduler_kwargs parameter instead, which would be forwarded to the get call. Not sure if any of the other keyword arguments would prove useful for this library though.

Allow specifying the scheduler by name instead of passing in the `get`
function directly
@mrocklin
Copy link
Member

We might also consider adopting a convention like this within Dask. The term get is a pretty bad name. It actually comes all the way from the first commit

jcrist added 3 commits April 11, 2017 13:14
- Copy of fix for pickling masked arrays
- Add distributed to travis
Still needs tests.
- Add tests for n_jobs
- Add tests for scheduler parameter
@jcrist
Copy link
Member Author

jcrist commented Apr 17, 2017

Ok, I've cleaned this up and added support for n_jobs, which has been asked for in the past.

A few potential issues:

  • This makes it a bit tricky to provide good error messages on bad parameters, since we just pass all unrecognized values to Client. I've tried to produce a nice error message from this, but there may be a better.

  • I'm also not 100% pleased with the scheduler names. Valid names are:

    • threading
    • multiprocessing
    • sync

    The first two are good, and match joblib but not the dask docs (we say threaded instead of threading). The third is an abbreviation, which may be undesired. Perhaps synchronous (a bit long though)? The equivalent in joblib is sequential. I don't think that's a good name here, as we don't enforce an ordering of execution (which sequential would seem to indicate).

  • We also have a different default (threading, n_jobs=cpu_count()) than scikit-learn (processes, n_jobs=1). I think this makes sense though.

@mrocklin
Copy link
Member

I think that following joblib over dask names (like threading over threaded) makes sense for this interface. You could also support aliases so that all of sync, sequential, synchronous would map to the same value.

Actually, I like this idea because then we can establish these names in more parts of Dask where we might prefer threaded over threading.

scheduler_types = {
  'sync': 'synchronous',
  'sequential': 'synchronous',  # from joblib
  'threading': 'threaded',
  ...
}

scheduler = scheduler_types.get(scheduler, scheduler)

@mrocklin
Copy link
Member

I would like to give people the option to provide names rather than get= functions.

@jcrist
Copy link
Member Author

jcrist commented Apr 18, 2017

I liked the alias idea and added it. I think this is good to go. Merging.

@jcrist jcrist merged commit 36ef51b into dask:master Apr 18, 2017
@jcrist jcrist deleted the scheduler-param branch April 18, 2017 15:09
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants