-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clean up and fix CPU issues #8
Conversation
d23dc4d
to
cdff746
Compare
* Add more background to intro * Add sections on Account Setup and Updating RV config * More info about picking subnets for specific instance types * More info about adding a ECR repos for "advanced users" * Remove Terraform info * Split table with variables to set into respective sections to avoid confusion
The CPU instances need a different (lower) setting for these variables. FYI, if the instance memory for a job def is set higher than what the instance has, the job will get stuck in Runnable with no error message (including in the Spot console).
For review, I was hoping that @jeancochrane could look at the changes, and after it is approved, @jamesmcclain could test it by following the whole README. |
Yes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for leading up this effort! The setup process is much clearer now, and I appreciate all of the troubleshooting details you added to the CloudFormation instructions.
Couple small comments on the docs and the template order, mostly nitpicks. I'll leave functionality testing to @jamesmcclain.
Thanks; fixed according to suggestions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I followed the instructions on a fresh virtual machine.
The instructions seem to be out of date: there is no I was looking at the right instructions but was on the wrong branch.create-ami
target in the Makefile. Instead I had to build both the packer-image
and create-image
targets.
The instructions say that AWS_PROFILE
environment variable must be set, but perhaps it should be emphasized that forgetting to set it will cause the create-ami
build to hang indefinitely and have to be stopped from another terminal.
Also, the packer build fails with the latest Deep Learning AMI (Version 21):
That seems to be a well-known issue (see here and here) in various contexts. Using the Version 11 AMI from the screen shot works (for now), but that may not be true in the future and will not help people in other regions. I was not able to find AMIs for older versions from the UI.
The instructions gave me resources named RasterVisionCpuJobQueue
, RasterVisionGpuJobQueue
, RasterVisionHostedCpuJobDefinition
, RasterVisionHostedGpuJobDefinition
. These generic name are likely to collide if more than one user follows the instructions. I see some non-generic names (apparently) been created by the setup, but I did not see how to do that in the instructions (update: I found the option on my second run-through, but it does not seem to be clearly spelled-out in the instructions).
Finally, I have attempted to run the Vegas SpaceNet example, but the first CPU job is stuck in the RUNNABLE
state. This could mean that that problem is unresolved, or it may be that I am simply using stale resources (see the previous paragraph).
I just had success running the packer job with Deep Learning AMI (Version 21). This makes me think it's an intermittent problem. I made a new issue for this #9 and made a note of it in the README. I also added other warnings based on your feedback. |
Sounds good |
This PR does various cleanup tasks, improve documentation, and fixes some issues re: the CPU resources. I tested this by creating the stack using
cloudformation/template.yml
following the README (the parameter values I used are posted in the#raster-vision
Slack channel), and then using the instructions in azavea/raster-vision#668 (review) with the Vegas example.After this is merged, I plan to rename
raster-vision-aws
toraster-vision-aws-terraform
, and this repo toraster-vision-aws
.Closes #3
Closes #4
Closes #18
Closes azavea/raster-vision#512
Closes #19