This repository has been archived by the owner on Oct 14, 2018. It is now read-only.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Allow specifying the scheduler by name instead of passing in the
get
function directly. Scheduler can be one of:dask.distributed.Client
.Not fully set on this yet.
Pros:
Cons:
If we go this route, then I'd also add
n_jobs
as a parameter (matching scikit-learn), which would specifynum_workers
for the threading and multiprocessing schedulers, and be ignored by the others. Might also maken_jobs=1
for all but distributed result in the synchronous scheduler. Downside of supportingn_jobs
here is we'd probably want the default to match what dask does (n_jobs = cpu_count()
) instead of what scikit-learn does (n_jobs=1
). I'm fine with this, but it is a difference.If we don't go this route, then I might add a
scheduler_kwargs
parameter instead, which would be forwarded to theget
call. Not sure if any of the other keyword arguments would prove useful for this library though.