New model upload format #17

mharvan · 2020-08-03T21:23:44Z

Fixes #16

Model format is compatible with
https://docs.aws.amazon.com/deepracer/latest/developerguide/deepracer-troubleshooting-service-migration-errors.html

The current best way for local training seems to be:

Upload model to s3
Import model from s3 in AWS DeepRacer console.

Format is compatible with https://docs.aws.amazon.com/deepracer/latest/developerguide/deepracer-troubleshooting-service-migration-errors.html The current best way for local training seems to be: 1. Upload model to s3 2. Import model from s3 in AWS DeepRacer console.

upload-current.sh

adam-aph · 2020-08-06T11:10:34Z

upload-current.sh

+cp -v $MODEL_FILE checkpoint/model/
+cp -v $METADATA_FILE checkpoint/model/
+
+CHECKPOINT_FILES=$MODEL_DIR/${CHECKPOINT}*


It seems tha according to the spec we need current checkpoint and previous one (so 2 checkpoint sets) - so the previous checkpoint should be copied or simply all checkpoints (there are two sets per last and best anyways), i.e. CHECKPOINT_FILES=$MODEL_DIR/*.ckpt.*

It works for race submission with a single checkpoint. Does something break without 2 checkpoints?

If we really need multiple checkpoints then we should upload all checkpoints.
Uploading all checkpoints takes longer, so I would only upload them all if this is really needed.

adam-aph · 2020-08-06T11:12:23Z

upload-current.sh

-  aws s3 cp $filename s3://$S3_BUCKET/$S3_PREFIX/model/
-done
+# Cleanup upload destination
+aws s3 rm --recursive s3://$S3_BUCKET/$S3_PREFIX/


this one is removing all other folders which makes AWS unhappy, change it to: aws s3 rm s3://$S3_BUCKET/$S3_PREFIX/model --recursive

I forgot to include the directory upload-template. So some files were missing and that was causing issues.

I would always clean up the whole prefix to ensure that only newly uploaded files are present. Otherwise, unknown old files could be affecting the import and your model.

adam-aph · 2020-08-06T11:13:20Z

upload-current.sh


-tar -czvf ${CHECKPOINT}-checkpoint.tar.gz checkpoint/*
+# Upload files to s3
+aws s3 sync checkpoint/ s3://$S3_BUCKET/$S3_PREFIX/


here it should be just: aws s3 sync checkpoint/model s3://$S3_BUCKET/$S3_PREFIX/model

The goal is to always upload a complete model with all required files, not just the model files. That includes also reward_function.py and ip/hyperparameters.json.

The tar is needed to keep an archive of what was uploaded. Once a new best model is found, sagemaker deletes the old best model so you would no longer have a copy.

Add missing directory with upload template. The template includes a generic reward function and generic hyperparameters.

DarrenBro reviewed Aug 4, 2020

View reviewed changes

upload-current.sh Show resolved Hide resolved

adam-aph reviewed Aug 6, 2020

View reviewed changes

Template for new model upload format

814565b

Add missing directory with upload template. The template includes a generic reward function and generic hyperparameters.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New model upload format #17

New model upload format #17

mharvan commented Aug 3, 2020

adam-aph Aug 6, 2020

mharvan Aug 6, 2020 •

edited

Loading

adam-aph Aug 6, 2020

mharvan Aug 6, 2020

adam-aph Aug 6, 2020

mharvan Aug 6, 2020

mharvan Aug 6, 2020

New model upload format #17

Are you sure you want to change the base?

New model upload format #17

Conversation

mharvan commented Aug 3, 2020

adam-aph Aug 6, 2020

Choose a reason for hiding this comment

mharvan Aug 6, 2020 • edited Loading

Choose a reason for hiding this comment

adam-aph Aug 6, 2020

Choose a reason for hiding this comment

mharvan Aug 6, 2020

Choose a reason for hiding this comment

adam-aph Aug 6, 2020

Choose a reason for hiding this comment

mharvan Aug 6, 2020

Choose a reason for hiding this comment

mharvan Aug 6, 2020

Choose a reason for hiding this comment

mharvan Aug 6, 2020 •

edited

Loading