Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clean up and fix CPU issues #8

Merged
merged 11 commits into from
Feb 4, 2019
Merged

Clean up and fix CPU issues #8

merged 11 commits into from
Feb 4, 2019

Conversation

lewfish
Copy link
Contributor

@lewfish lewfish commented Jan 28, 2019

This PR does various cleanup tasks, improve documentation, and fixes some issues re: the CPU resources. I tested this by creating the stack using cloudformation/template.yml following the README (the parameter values I used are posted in the #raster-vision Slack channel), and then using the instructions in azavea/raster-vision#668 (review) with the Vegas example.

After this is merged, I plan to rename raster-vision-aws to raster-vision-aws-terraform, and this repo to raster-vision-aws.

Closes #3
Closes #4
Closes #18
Closes azavea/raster-vision#512
Closes #19

@lewfish lewfish force-pushed the lf/clean-up branch 2 times, most recently from d23dc4d to cdff746 Compare January 29, 2019 00:04
* Add more background to intro
* Add sections on Account Setup and Updating RV config
* More info about picking subnets for specific instance types
* More info about adding a ECR repos for "advanced users"
* Remove Terraform info
* Split table with variables to set into respective sections to avoid
confusion
The CPU instances need a different (lower) setting for these variables.
FYI, if the instance memory for a job def is set higher than what the
instance has, the job will get stuck in Runnable with no error message
(including in the Spot console).
@lewfish lewfish requested review from jeancochrane and jamesmcclain and removed request for jamesmcclain January 30, 2019 17:13
@lewfish
Copy link
Contributor Author

lewfish commented Jan 30, 2019

For review, I was hoping that @jeancochrane could look at the changes, and after it is approved, @jamesmcclain could test it by following the whole README.

@lewfish lewfish changed the title WIP: Clean up Clean up and fix CPU issues Jan 30, 2019
@jamesmcclain
Copy link
Contributor

For review, I was hoping that @jeancochrane could look at the changes, and after it is approved, @jamesmcclain could test it by following the whole README.

Yes

Copy link
Contributor

@jeancochrane jeancochrane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for leading up this effort! The setup process is much clearer now, and I appreciate all of the troubleshooting details you added to the CloudFormation instructions.

Couple small comments on the docs and the template order, mostly nitpicks. I'll leave functionality testing to @jamesmcclain.

README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
README.md Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
cloudformation/template.yml Outdated Show resolved Hide resolved
@lewfish
Copy link
Contributor Author

lewfish commented Jan 30, 2019

Thanks; fixed according to suggestions.

@lewfish lewfish requested a review from jamesmcclain January 30, 2019 22:51
Copy link
Contributor

@jamesmcclain jamesmcclain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I followed the instructions on a fresh virtual machine.

The instructions seem to be out of date: there is no create-ami target in the Makefile. Instead I had to build both the packer-image and create-image targets. I was looking at the right instructions but was on the wrong branch.

The instructions say that AWS_PROFILE environment variable must be set, but perhaps it should be emphasized that forgetting to set it will cause the create-ami build to hang indefinitely and have to be stopped from another terminal.

Also, the packer build fails with the latest Deep Learning AMI (Version 21):

screenshot_2019-01-31_19-16-39

That seems to be a well-known issue (see here and here) in various contexts. Using the Version 11 AMI from the screen shot works (for now), but that may not be true in the future and will not help people in other regions. I was not able to find AMIs for older versions from the UI.

The instructions gave me resources named RasterVisionCpuJobQueue, RasterVisionGpuJobQueue, RasterVisionHostedCpuJobDefinition, RasterVisionHostedGpuJobDefinition. These generic name are likely to collide if more than one user follows the instructions. I see some non-generic names (apparently) been created by the setup, but I did not see how to do that in the instructions (update: I found the option on my second run-through, but it does not seem to be clearly spelled-out in the instructions).

Finally, I have attempted to run the Vegas SpaceNet example, but the first CPU job is stuck in the RUNNABLE state. This could mean that that problem is unresolved, or it may be that I am simply using stale resources (see the previous paragraph).

@lewfish
Copy link
Contributor Author

lewfish commented Feb 1, 2019

Also, the packer build fails with the latest Deep Learning AMI (Version 21):

I just had success running the packer job with Deep Learning AMI (Version 21). This makes me think it's an intermittent problem. I made a new issue for this #9 and made a note of it in the README.

I also added other warnings based on your feedback.

@jamesmcclain
Copy link
Contributor

Also, the packer build fails with the latest Deep Learning AMI (Version 21):

I just had success running the packer job with Deep Learning AMI (Version 21). This makes me think it's an intermittent problem. I made a new issue for this #9 and made a note of it in the README.

I also added other warnings based on your feedback.

Sounds good

@lewfish lewfish merged commit c02a8ae into master Feb 4, 2019
@lewfish lewfish deleted the lf/clean-up branch February 4, 2019 16:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants