Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when trying the default config #66

Closed
anmoljagetia opened this issue Feb 9, 2020 · 3 comments
Closed

Error when trying the default config #66

anmoljagetia opened this issue Feb 9, 2020 · 3 comments

Comments

@anmoljagetia
Copy link

anmoljagetia commented Feb 9, 2020

Creating IAM role for the instance...
Preparing CloudFormation template...
  - volume "lsma-hw1-i1-workspace" will be created
  - volume "lsma-hw1-i1-docker" will be created
  - availability zone: auto
  - maximum Spot Instance price: on-demand
  - AMI: "Deep Learning AMI (Ubuntu 16.04) Version 26.0" (ami-025ed45832b817a35)
  - Docker data will be stored on the "docker" volume

Volumes:
+-----------+---------------+------------+-----------------+
| Name      | Container Dir | Type       | Deletion Policy |
+===========+===============+============+=================+
| workspace | /workspace    | EBS volume | Retain Volume   |
+-----------+---------------+------------+-----------------+
| docker    | -             | EBS volume | Retain Volume   |
+-----------+---------------+------------+-----------------+

Waiting for the stack to be created...
  - launching the instance...
  - waiting for the Docker container to be ready...
Error:
------
Stack "spotty-instance-lsma-hw1-i1" was not created.
Please, see CloudFormation logs for the details.

What is the recommended image for PyTorch? I can't seem to find any recommendations? I tried even with the default Tensorflow one, but I get the same error. I guess it's because of the following error, that I get when I look at the Cloudformation logs by running the following on the instance:

 sudo tail /var/log/cfn-init-cmd.log
ubuntu@ip-172-31-84-193:~$  sudo tail /var/log/cfn-init-cmd.log
2020-02-09 01:03:12,195 P1844 [INFO]    + MOUNT_DIRS=("/mnt/lsma-hw1-i1-workspace" "/docker")
2020-02-09 01:03:12,195 P1844 [INFO]    + for i in '${!MOUNT_DIRS[*]}'
2020-02-09 01:03:12,195 P1844 [INFO]    + DEVICE=/dev/xvdf
2020-02-09 01:03:12,195 P1844 [INFO]    + MOUNT_DIR=/mnt/lsma-hw1-i1-workspace
2020-02-09 01:03:12,195 P1844 [INFO]    + blkid -o value -s TYPE /dev/xvdf
2020-02-09 01:03:12,195 P1844 [INFO]    + mkfs -t ext4 /dev/xvdf
2020-02-09 01:03:12,195 P1844 [INFO]    mke2fs 1.42.13 (17-May-2015)
2020-02-09 01:03:12,195 P1844 [INFO]    The file /dev/xvdf does not exist and no size was specified.
2020-02-09 01:03:12,195 P1844 [INFO] ------------------------------------------------------------
2020-02-09 01:03:12,195 P1844 [ERROR] Exited with error code 1

Do you know why this could be happening?

My spotty config is the following:

project:
  name: test-hw1
  syncFilters:
    - exclude:
      - .git/*
      - .idea/*
      - '/_pycache_/'

container:
  projectDir: /workspace/project
  image: tensorflow/tensorflow:latest-gpu-py3-jupyter
  ports: [6006, 8888]
  volumeMounts:
    - name: workspace
      mountPath: /workspace

instances:
  - name: i1
    provider: aws
    parameters:
      region: us-east-1
      instanceType: g4dn.xlarge
      dockerDataRoot: /docker
      volumes:
        - name: workspace
          parameters:
            size: 50
            deletionPolicy: retain
        - name: docker
          parameters:
            size: 10
            mountDir: /docker
            deletionPolicy: retain


scripts:
  jupyter: |
    jupyter notebook --allow-root --ip 0.0.0.0 --notebook-dir=/workspace/project
@anmoljagetia
Copy link
Author

Also, the same config works with p2.xlarge instance but not g4dn.xlarge. Do you have a list of which instances are supported and which are not?

@apls777
Copy link
Collaborator

apls777 commented Feb 9, 2020

There is no recommended image for PyTorch, just use the latest one or whatever suits you.

All G4 instances are Nitro-based instances, and, unfortunately, they're not supported right now (see this issue). Spotty has a hard-coded blacklist of such instance types and supposed to show you an error, but G4 instances are new ones and I didn't add them yet. You can see the full list here: Nitro-based Instances.

@apls777 apls777 mentioned this issue Jun 27, 2020
@apls777 apls777 mentioned this issue Oct 25, 2020
@apls777
Copy link
Collaborator

apls777 commented Oct 25, 2020

Now Spotty supports Nitro-based instances.

@apls777 apls777 closed this as completed Oct 25, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants