-
Notifications
You must be signed in to change notification settings - Fork 387
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Repeatable containerized experiments #228
Comments
I think a good approach is to use a private fork of Raster Vision to store branches for experimental runs. These branches should never be deleted, and should follow some sensible naming convention. |
To support this (and to support other RV users) we should add an option for the repo URI to use when submitting batch jobs. |
To support this, we need to pass Github credentials to the Batch job so it can check out the code from a private repo. |
We've discussed a different way of doing this that involves creating a Docker image for each experiment. I will write an ADR on it. |
Subsumed by #512 |
When running a job remotely, it is necessary to specify which version of RV you want to run. We currently specify a branch off of this repo using https://github.com/azavea/raster-vision/blob/develop/src/run_script.sh#L8
This makes it impossible to execute forks of this repo. Also, using branches is problematic because if you add a commit while a chain of jobs is running, different jobs in that chain will run different versions of RV. A quick way of fixing this is to allow specifying the repo URI and the commit id.
This approach is still problematic though because you could delete a commit on a branch while the jobs are running. For the sake of repeatability, we need some way of freezing the code that should be run and archiving it. We could do that by creating a zip file of the repo and the point of launching the Batch jobs and storing it along with the files for that run. I've seen this done in before in https://github.com/openai/evolution-strategies-starter/blob/master/scripts/launch.py
The text was updated successfully, but these errors were encountered: