You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Right now, we support a workflow that looks like this:
We'd like to be able to support this:
This will require Dask-CHTC to be able to do spooled remote submits. There will be networking and security implications for this, including the need to acquire HTCondor IDTOKENs. Presumably we could have a CLI command to help out with that.
This is related to #5, because a goal of this workflow is to be able to gather workers from many different places and hook them all up to your central scheduler.
I think we also need to understand how this impacts communication overheads - its possible that some workflows that rely on short tasks will be seriously impeded by running like this.
It would be very nice if we could support non-Linux platforms as well, but it will be annoying. We might be able to swing Windows with special install instructions, but Mac is non-starter at the moment. Hopefully we'll get improved HTCondor Python bindings distribution mechanisms soon-ish.
The text was updated successfully, but these errors were encountered:
We ended up running into a major roadblock with the above plan: my home ISP was blocking incoming TLS connections. We can't really control this, so we need to rethink the approach. Braindump below...
The new plan is to do this:
The client will still be on the user's computer, the scheduler will be in Kubernetes, and it will remote-submit to a CHTC schedd. This will work because all the communications between the client and scheduler is from the client (i.e., they are outgoing connections, and will not be blocked by ISPs). The connections from the scheduler to the workers will be inside the CHTC network and won't be blocked.
The scheduler will need a remote-submit-enabled IDTOKEN, presumably generated by us on the user's behalf. I think that leads to the big question, which is: how do users "request" that the remote scheduler "service" be started up for them?
Right now, we support a workflow that looks like this:
We'd like to be able to support this:
This will require Dask-CHTC to be able to do spooled remote submits. There will be networking and security implications for this, including the need to acquire HTCondor IDTOKENs. Presumably we could have a CLI command to help out with that.
This is related to #5, because a goal of this workflow is to be able to gather workers from many different places and hook them all up to your central scheduler.
I think we also need to understand how this impacts communication overheads - its possible that some workflows that rely on short tasks will be seriously impeded by running like this.
It would be very nice if we could support non-Linux platforms as well, but it will be annoying. We might be able to swing Windows with special install instructions, but Mac is non-starter at the moment. Hopefully we'll get improved HTCondor Python bindings distribution mechanisms soon-ish.
The text was updated successfully, but these errors were encountered: