-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
proposal: autodiscovery of cluster spec files #61
Comments
Yay! I could imagine adding a config option here that could be pointed to a YAML spec. That way it could be configured either in the Dask YAML config or as an environment variable. |
Also thinking more about this, if you create a bare client with distributed and provide no config at all it instantiates a >>> from dask.distributed import Client
>>> client = Client() # Leave all config as defaults
>>> type(client.cluster)
<class 'distributed.deploy.local.LocalCluster'> I wonder if we could add a hook there that So if you have @fjetter what do you think about that idea? |
This reminds me slightly of dask/distributed#6792 which, to some part, also discusses how to manage hooks to cluster instances or how to simplify UX around this. The path-to-spec bit is surely different but the hook-to-implementation thing sounds similar. Does it makes sense to push on that ticket first? Generally speaking I like the suggestion to simplify the user API by "hiding" clusters and offer only clients as a user facing API but I don't have a very strong opinion either way.
If a hook is all you need we can add a hook. I'm not too familiar with dask-ctl but the need to change the "default cluster" has been mentioned frequently and should get started on it one way or the other. |
My two cents here, with the obvious caveat that I'm not super experienced with dask so am potentially missing obvious reasons this is bad: dask-ctl seems to be the general control plane and there are scenarios where I would want to have multiple clusters configured for e.g. gpu/cpu work in a HPC environment. Given that dask-ctl already allows for discovery of existing clusters by name We could then reference many 'named' yaml specs in the dask-ctl config and create them at will with the API proposed above. edit: extra thought, we would probably want to add a check if creating from name that the named cluster doesn't already exist @jacobtomlinson as a stepping stone to implementing dask-ctl discovery in dask-jobqueue I just put together dask/dask-jobqueue#604 - if you have a bit of time a review over there would be appreciated, the implementation goes against what you suggested in some earlier discussion (dask/dask-jobqueue#543) but is, as far as I can tell, required for autodiscovery in dask-ctl |
@alisterburt yeah that makes a lot of sense. I think the PR you raised in We could add a section to the # ctl.yaml
ctl:
cluster-templates:
pbs:
version: 1
module: "dask_jobqueue"
class: "PBSCluster"
args: []
kwargs:
cores: 36
memory: 100GB
queue: regular
custom-pbs-cluster: "/path/to/custom-pbs-cluster.yaml" from dask_ctl import create_cluster
cluster = create_cluster("pbs")
client = cluster.get_client() |
@jacobtomlinson brilliant! I will find some time over the next few days and submit a PR here I think having the mechanism in dask-joqueue separately is also useful as
|
I recently picked the thread up from dask/dask-jobqueue#543 and dask/dask-jobqueue#544 and was super happy to find that arbitrary cluster configuration from a yaml spec is working really well with dask-ctl on the first try now, great work!
What do you think about having
dask_ctl.create_cluster()
autodiscover a yaml spec if one isn't provided? Do you see where this might fit into the existing dask config structure?Thanks!
The text was updated successfully, but these errors were encountered: