Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feature request] Support several Docker configurations in one spotty.yaml (to better support auxiliary on-demand instances) #44

Closed
vadimkantorov opened this issue Jul 30, 2019 · 13 comments
Labels

Comments

@vadimkantorov
Copy link

vadimkantorov commented Jul 30, 2019

Downloading the dataset on a EBS volume can take many hours. I don't want to use a GPU machine for that.

Unfortunately, currently I cannot instruct spotty to not run any Docker image or specify a Docker image per instance (the main Docker image fails to start on the t2.micro machine).

Currently I'm using a separate spotty_preprocess.yaml to achieve this goal.

@vadimkantorov vadimkantorov changed the title [feature request] Start an instance without Docker [feature request] Start an instance without Docker and with a Dockerless AMI Jul 30, 2019
@apls777
Copy link
Collaborator

apls777 commented Jul 30, 2019

Hi Vadim,

Thank you for this feature request, I actually had a similar problem before. Sometimes you just want to run a CPU instance to do some work that doesn't require GPU, for example, analyze your results with Jupyter notebooks, and then you may need a different image that was compiled for CPU.

I was thinking to extend the configuration file to be able to specify several containers and then use them for the instances. So that instead of the container parameter, you could use the containers parameter and define several named containers:

containers:
  default:
    image: tensorflow/tensorflow:1.14.0-gpu-py3-jupyter
    # ...

  cpu:
    image: tensorflow/tensorflow:1.14.0-py3-jupyter
    # ...

And then in the instance parameters, you can redefine the default container:

instances:
  - name: i1
    provider: aws
    parameters:
      container: cpu
      # ...

As for your case: by design, you're actually not supposed to do any work from the host OS :). All the work should be done through your custom environment - a Docker container. And the -H flag in the spotty ssh command is rather for debugging purposes. With a container, you also could define a custom script in the scripts parameter to download your dataset. Then you will never forget where it's stored and what command to use to download it again.

I believe (but need to double-check) that even a t2.micro instance is able to run some lightweight container that contains only Bash and AWS CLI (or whatever you're using). But, of course, in theory, it's possible to have some option in the config file to launch an instance without a container at all. But even in this scenario, I don't see why you would need another AMI without installed Docker.

Did you try to use some lightweight container with a t2.micro instance? Or are you using your main container with more powerful CPU instance? What do you think about the logic with named containers I described above?

P.S. Next 2 weeks I'm on vacation, so will be able to give proper thought and work on this feature only when I come back :).

Best regards,
Oleg

@vadimkantorov
Copy link
Author

@apls777 Thanks for the detailed response! I did try using a smaller instance. It even managed to boot up with ubuntu:18.04 container and Deep Learning AMI. Unfortunately it was unbearable slow (too little memory, I guess).

Even with GPU instances, Docker sometimes takes 5-10 minutes to start-up (DockerReadyWait event).

For this usecase of launching a light CPU instance, probably an option to launch a regular On-Demand instance would be useful (downloading a huge dataset on EBS can take many hours).

A few other points I noticed: for some reason you do not support Nitro-based instances. I commented out the check and everything still worked OK. Also the list of instance types that you hardcoded became out-of-date, some instance types are missing.

Also sometimes, Docker just wouldn't start (no useful error messages on CloudFormation). As a remedy I disabled the docker EBS volume, and things run normally then (for the CPU instance it's probably OK).

@apls777
Copy link
Collaborator

apls777 commented Jul 31, 2019

It even managed to boot up with ubuntu:18.04 container and Deep Learning AMI. Unfortunately it was unbearable slow (too little memory, I guess).

Was it too slow to work with or it took a lot of time to start-up?

Even with GPU instances, Docker sometimes takes 5-10 minutes to start-up (DockerReadyWait event).

I think it's a "bug" that I found recently as well. If you have a lot of files on one of your EBS volumes, it may take time to start. When Spotty is mounting volumes, it's changing ownership of all files from root to ubuntu which is a completely useless operation. Please, try to remove the line 212 from the instance CF template: https://github.com/apls777/spotty/blob/185968fa26bae14da9127bbd53c05eee6068ec7b/spotty/providers/aws/deployment/cf_templates/data/instance.yaml#L212

The instance should start-up much faster.

for some reason you do not support Nitro-based instances. I commented out the check and everything still worked OK.

Nitro-based instances have different device names for attached EBS volumes: see here vs here. At the moment, a Nitro-based instance would fail to start if you attached any volume. So I decided to disable this functionality at all for now, but it's in the TODO list.

Also the list of instance types that you hardcoded became out-of-date, some instance types are missing.

Thanks for pointing out, I'll try to load this list dynamically using AWS API, otherwise, it's difficult to keep it up-to-date.

Also sometimes, Docker just wouldn't start (no useful error messages on CloudFormation). As a remedy I disabled the docker EBS volume, and things run normally then (for the CPU instance it's probably OK).

Can it be the issue with changing ownership that I described above?

@vadimkantorov
Copy link
Author

The t2.micro machine took many minutes to start a Docker and then ssh was super unresponsive. Bigger machines also took long time to start Docker, but then there were no problems with ssh.

About Docker not starting, it's hard to say, CloudFormation log just spits some "unique error ID" which is ungoogleable without an additional error message.

I'll try commenting out the chown line. Thanks!

@apls777
Copy link
Collaborator

apls777 commented Jul 31, 2019

Also, do you use your custom Dockerfile or already built image? Because if you're using a Dockerfile, Spotty builds this image every time it starts an instance. Then you may want to consider caching it using a dedicated EBS volume: see here.

@vadimkantorov
Copy link
Author

vadimkantorov commented Jul 31, 2019

Prebuilt image: just ubuntu:18.04. Somehow caching it using a separate volume seems to have caused the failing docker build (maybe insufficient volume size of 10Gb?) on some instance types. But I haven't thoroughly confirmed this hypothesis.

@apls777
Copy link
Collaborator

apls777 commented Jul 31, 2019

Okay, thanks. I'll check what I can do about micro instances once I'll come back from vacation.

Somehow caching it using a separate volume seems to have caused the failing docker build

If you have issues when the instance is launched, but Docker container is not started for some reason, try to connect to the host OS using -H flag and check CloudFormation logs there: /var/log/cfn-init-cmd.log and /var/log/cfn-init.log files.

@vadimkantorov
Copy link
Author

Thanks! The errors I referred to were from the CloudFormation web console, maybe the local logs have more information.

@vadimkantorov
Copy link
Author

It would be super useful if the CloudFormation logs were automatically downloaded and offered to the user (no hassle with manual ssh'ing as in #48)

@vadimkantorov
Copy link
Author

vadimkantorov commented Aug 22, 2019

Btw removing chown was crucial, since my dataset drive contains a terabyte of small audio files and chown'ing them takes forever

My fork is at https://github.com/vadimkantorov/spotty

@vadimkantorov vadimkantorov changed the title [feature request] Start an instance without Docker and with a Dockerless AMI [feature request] Support several Docker configurations in one spotty.yaml (to better support auxiliary on-demand instances) Aug 31, 2019
@vadimkantorov
Copy link
Author

Hi @apls777! Any news about this one?

@apls777
Copy link
Collaborator

apls777 commented Feb 18, 2020

Hi @vadimkantorov, Unfortunately, I don't have time to work on this feature at the moment, but maybe I will have it in April.

@apls777 apls777 mentioned this issue Jun 27, 2020
@apls777 apls777 mentioned this issue Oct 25, 2020
@apls777
Copy link
Collaborator

apls777 commented Oct 25, 2020

Added support for multiple container configurations.

@apls777 apls777 closed this as completed Oct 25, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants