-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow easier installation of extra packages #25
Comments
I suspect there is a way to avoid this work, especially because it appears Dask-CHTC uses dask-docker. The Docker file If there is an
|
Unfortunately, I don't think we can assume the user is inheriting from However, I do like the idea of the entrypoint being able to install extra packages, and there's no reason we can't implement something similar in our own entrypoint script. This is pretty foreign to us in CHTC-land because we usually want people to bake everything into their image up-front, but I like the added flexibility this provides. My instinct is to provide a |
That's the main motivation for this issue, and the reason I specified a CPU node in #25 (comment). I've re-titled this issue to more accurately reflect my concern.
Conda environments are just YAML files; if we're doing this from Python, we should support dictionaries aka parsed YAML files too.
Maybe add arguments for the other install options daskdev/dask supports? CHTCCluser(
...,
conda_env: Optional[str, Path, dict] = None,
pip_packages: Optional[List[str]] = None,
apt_packages: Optional[List[str]] = None,
conda_packages: Optional[List[str]] = None,
) |
Agreed! I'm not sure we can do I suppose that since we're doing in our own wrapper script, we could do |
I was operating under the assumption Dask-CHTC would still use the daskdev/dask Docker images, and install extra packages on top of that image. Is that not what you were thinking? daskdev/dask supports extra pip/conda/apt packages and an environment.yml file (prepare.sh. Can't Dask-CHTC do some prepossessing before worker launch to set the correct environment variables and copy/write If not, I'm fine with an environment file or extra pip/conda packages. |
|
So if I want to use GPUs, would it be possible to do this? CHTCCluster(
image="pytorch/pytorch:1.5.1-cuda10.1-cudnn7-runtime",
conda_env={"requirements": ["pytorch", "torchvision"], "channels": ["conda-forge"]},
) That's pretty much what the GPU example in the docs does: inherits from the PyTorch image, installs requirements then launches |
I think it would be very close... I'll need to investigate a little. We might be able to get away with installing tini ourselves and then using it in the entrypoint script. |
…Docker image (partial progress on #25)
Per discussion in #31 , this is much harder than expected. The current plan is to transparently build Docker images on our build nodes to solve this problem, using CHTCCluster(
image="pytorch/pytorch:1.5.1-cuda10.1-cudnn7-runtime",
conda_env={"requirements": ["pytorch", "torchvision"], "channels": ["conda-forge"]},
) would implicitly submit a build job that will build a Docker image with the Since building the image will take some time, something will need to block. The best option is probably (unfortunately) to block during |
Let's say I want the create a custom Docker image for CPU workers. The docs say that I should follow this process:
That's a fair amount of work, especially if the user is unfamiliar with Docker. It'd be nice to avoid that work.
The text was updated successfully, but these errors were encountered: