Skip to content
This repository has been archived by the owner on Jun 30, 2021. It is now read-only.

UsereXperience improvements #12

Merged
merged 25 commits into from
Jun 18, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
649f8c7
Update .gitignore
Maciej-GBS May 26, 2020
60d4311
Surpressed deprecation warning
Maciej-GBS May 28, 2020
361da2d
Symbolic link to custom_files - convenient shortcut
Maciej-GBS May 28, 2020
1101945
Set default image to gpu, more newbie friendly config (local desktop)
Maciej-GBS May 28, 2020
4177731
Convenience program for training files cleanup
Maciej-GBS May 28, 2020
25cdfb9
Moved hyperparameters to special json file
Maciej-GBS May 28, 2020
88d691e
Fixed mismatched track in config.env
Maciej-GBS May 28, 2020
1f3685e
Delete last now removes symlinks as well
Maciej-GBS May 31, 2020
4db83d0
Increased wait time to 30s
Maciej-GBS Jun 2, 2020
b628155
Local copy script for easy exporting from runtime files
Maciej-GBS Jun 2, 2020
5b5d51c
Local backup now copies reward as well
Maciej-GBS Jun 3, 2020
a0a5045
Hyperparams.json now accepts pretrained field
Maciej-GBS Jun 3, 2020
b1e0cc8
Moved hyperparams section lower
Maciej-GBS Jun 3, 2020
28abe84
Dirty config
Maciej-GBS Jun 3, 2020
b6d9e4e
Uploaded local-copy and created mk-model script
Maciej-GBS Jun 4, 2020
b8e28eb
Clean configuration
Maciej-GBS Jun 4, 2020
6bd00e6
Changed default track to reinvent_base
Maciej-GBS Jun 6, 2020
f512d1a
Reduced speed in model_metadata
Maciej-GBS Jun 6, 2020
cadb06f
Hyperparam pretrained is now true/false flag
Maciej-GBS Jun 6, 2020
987f689
Merge branch 'master' into ux-improvements
Maciej-GBS Jun 6, 2020
6523db3
Updated README.md - hyperparams information
Maciej-GBS Jun 6, 2020
a12ea16
Merge branch 'ux-improvements' of github.com:Maciej-GBS/deepracer-loc…
Maciej-GBS Jun 6, 2020
55d84dc
Update README.md - script descriptions
Maciej-GBS Jun 6, 2020
2277739
Resolved track mismatch
Maciej-GBS Jun 6, 2020
7495dc0
Fixed error in mk-model script
Maciej-GBS Jun 14, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1,5 +1,8 @@
data/robomaker
data/minio/bucket/current
data/minio/bucket/rl-deepracer-pretrained
data/minio/bucket/DeepRacer-Metrics
data/minio/.minio.sys
.idea
**/.idea
**/.idea
__pycache__
12 changes: 10 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,9 @@ Very rough guide for use (details to come):
- run `./start-training.sh` to start training
- view docker logs to see if it's working (automatic if `tmux` is installed)
- run `./stop-training.sh` to stop training.
- run `./delete_last_run.sh` to clear out the buckets for a fresh run.
- run `./delete_last_run.sh` to clear out the buckets for a fresh run. For convenient version without sudo prompt check out `utilites/delete-last.c`.
- run `./local-copy.sh <model_backup_name>` to backup current model files into user specified MODEL directory.
- run `./mk-model.sh <model_path>` to create physical car uploadable .tar.gz file from your model. (Will be removed in a future update once file gets correctly generated after training)

The first run will likely take quite a while to start as it needs to pull over 10GB of all the docker images.
You can avoid this delay by pulling the images in advance:
Expand All @@ -40,13 +42,16 @@ You can avoid this delay by pulling the images in advance:
- `docker pull mattcamp/dr-coach`
- `docker pull minio/minio`

## Modifying parameters
Hyperparameters for training are loaded from `hyperparams.json` inside `src/rl_coach_2020_v2/hyperparams.json` - shortcut link has been created in the root directory. Available options are exactly the same except the new option `pretrained` that simplifies enabling pretrained mode.

## Video stream

The video stream is available either via a web stream of via Kinesis.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also add some notes regarding the new location of where to edit the hyperparameters and the additional scripts you have added.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be all right now.

### Web stream:

The web video stream is exposed on port 8888. If you're running a local browser then you should be able to browse directly to http://127.0.0.1:8888/stream_viewer?topic=/racecar/deepracer/kvs_stream once Robomaker has started.
The web video stream is exposed on port 8888. If you're running a local browser then you should be able to browse directly to `http://127.0.0.1:8888/stream_viewer?topic=/racecar/deepracer/kvs_stream` once Robomaker has started.

### Kinesis stream:

Expand All @@ -62,6 +67,9 @@ Kinesis video is a stream of approx 1.5Mbps so beware the impact on your AWS cos

Once working the stream should be visible in the Kinesis console.

### VNC
You can enter runnning environment using a vncviewer at localhost:8080.

## Known issues:
- Sometimes sagemaker won't start claiming that `/opt/ml/input/config/resourceconfig.json` is missing. Still trying to work out why.
- Stopping training at the wrong time seems to cause a problem where sagemaker will crash next time when trying to load the 'best' model which may not exist properly. This only happens if you start a new training session without clearing out the bucket first. Yet to be seen if this will cause a problem when trying to use pretrained models.
Expand Down
12 changes: 6 additions & 6 deletions config.env
Original file line number Diff line number Diff line change
@@ -1,14 +1,14 @@
ALTERNATE_DRIVING_DIRECTION=False
APP_REGION=us-east-1
AWS_REGION=us-east-1
AWS_ACCESS_KEY_ID=your_aws_access_key
AWS_SECRET_ACCESS_KEY=your_aws_secret_key
AWS_ACCESS_KEY_ID=minio
AWS_SECRET_ACCESS_KEY=miniokey
CHANGE_START_POSITION=False
GPU_AVAILABLE=True
KINESIS_VIDEO_STREAM_NAME=dr-kvs-local
LOCAL=True
MINIO_ACCESS_KEY=your_aws_access_key
MINIO_SECRET_KEY=your_aws_secret_key
MINIO_ACCESS_KEY=minio
MINIO_SECRET_KEY=miniokey
MODEL_METADATA_FILE_S3_KEY=custom_files/model_metadata.json
MODEL_S3_BUCKET=bucket
MODEL_S3_PREFIX=current
Expand All @@ -22,5 +22,5 @@ ENABLE_KINESIS=false
ENABLE_GUI=true
ENABLE_GPU_TRAINING=true
ENABLE_LOCAL_DESKTOP=false
ENABLE_TMUX=true
MIN_EVAL_TRIALS=5
ENABLE_TMUX=false
MIN_EVAL_TRIALS=5
1 change: 1 addition & 0 deletions custom_files
14 changes: 7 additions & 7 deletions data/minio/bucket/custom_files/model_metadata.json
Original file line number Diff line number Diff line change
Expand Up @@ -7,27 +7,27 @@
},
{
"steering_angle": -20,
"speed": 1.3333333333333333,
"speed": 1.2,
"index": 1
},
{
"steering_angle": -10,
"speed": 2,
"speed": 1.2,
"index": 2
},
{
"steering_angle": 0,
"speed": 2.5,
"speed": 1.2,
"index": 3
},
{
"steering_angle": 10,
"speed": 2,
"speed": 1.2,
"index": 4
},
{
"steering_angle": 20,
"speed": 1.3333333333333333,
"speed": 1.2,
"index": 5
},
{
Expand All @@ -37,8 +37,8 @@
}
],
"sensor": [
"STEREO_CAMERAS"
"FRONT_FACING_CAMERA"
],
"neural_network": "DEEP_CONVOLUTIONAL_NETWORK_SHALLOW",
"version": "2"
}
}
2 changes: 1 addition & 1 deletion data/minio/bucket/custom_files/reward.py
Original file line number Diff line number Diff line change
Expand Up @@ -42,4 +42,4 @@ def reward_function(params):
# the two aspects above
reward += 1.0 * reward_lane + 4.0 * reward_avoid

return reward
return reward
3 changes: 1 addition & 2 deletions data/minio/bucket/custom_files/training_params.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,5 @@ METRIC_NAME: "TrainingRewardScore"
CAR_COLOR: "Purple"
TARGET_REWARD_SCORE: "None"
NUMBER_OF_OBSTACLES: "3"
CHANGE_START_POSITION: "true"
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was this param removed?

Copy link
Contributor Author

@Maciej-GBS Maciej-GBS Jun 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was this param removed?

It was duplicated inside the file.
I should have marked this change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The param is still there on line 5

OBSTACLE_TYPE: "BOX"
RANDOMIZE_OBSTACLE_LOCATIONS: "false"
RANDOMIZE_OBSTACLE_LOCATIONS: "false"
4 changes: 2 additions & 2 deletions docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -21,13 +21,13 @@ services:
env_file: config.env
container_name: coach
volumes:
- '//var/run/docker.sock:/var/run/docker.sock'
- '/var/run/docker.sock:/var/run/docker.sock'
- './src/rl_coach_2020_v2:/deepracer/rl_coach'
- '/robo/container:/robo/container'
depends_on:
- minio
robomaker:
image: awsdeepracercommunity/deepracer-robomaker:cpu
image: awsdeepracercommunity/deepracer-robomaker:cpu-avx2
command: ["${ROBOMAKER_COMMAND}"]
volumes:
- ./data/robomaker:/root/.ros/
Expand Down
1 change: 1 addition & 0 deletions hyperparams.json
16 changes: 16 additions & 0 deletions local-copy.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
#!/bin/bash
# USAGE: ./local-copy.sh <model_backup_name>

MODELS=../models

echo "Backup to $MODELS/$1"
echo "..."

mkdir $MODELS/$1

cp data/robomaker/log/rl_coach_* $MODELS/$1/
cp -R data/minio/bucket/current/model $MODELS/$1/
cp data/minio/bucket/custom_files/reward.py $MODELS/$1/

echo "done"

24 changes: 24 additions & 0 deletions mk-model.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
#!/bin/bash
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The model file will actually be created automatically for you if the containers are shut down in the correct way, and the correct order, however I don't think the stop script currently does this correctly so let's leave this script here for now but will remove in near future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good.

# create .tar.gz file uploadable to physical deepracer
# USAGE: ./mk-model.sh <model_path>
cd $1
echo $(pwd)

if [ "$1" = "" ]; then
echo "USAGE: $0 <model_path>"
else

NUM=`cut -d '_' -f 1 < model/.coach_checkpoint`

mkdir -p output/agent

cp "model/model_$NUM.pb" output/agent/model.pb
cp model/model_metadata.json output/

cd output
tar -czvf ../output.tar.gz *

echo "done"

fi

17 changes: 17 additions & 0 deletions src/rl_coach_2020_v2/hyperparams.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
{
"batch_size": 64,
"beta_entropy": 0.01,
"discount_factor": 0.999,
"e_greedy_value": 0.05,
"epsilon_steps": 10000,
"exploration_type": "categorical",
"loss_type": "mean squared error",
"lr": 0.0003,
"num_episodes_between_training": 20,
"num_epochs": 10,
"stack_size": 1,
"term_cond_avg_score": 100000.0,
"term_cond_max_episodes": 10000,
"pretrained": "false"
}

57 changes: 33 additions & 24 deletions src/rl_coach_2020_v2/rl_deepracer_coach_robomaker.py
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,6 @@
s3_location = "s3://%s/%s" % (s3_bucket, s3_prefix)
print("Uploading to " + s3_location)


metric_definitions = [
# Training> Name=main_level/agent, Worker=0, Episode=19, Total reward=-102.88, Steps=19019, Training iteration=1
{'Name': 'reward-training',
Expand Down Expand Up @@ -96,6 +95,38 @@
instance_type = "local_gpu"
image_name = "awsdeepracercommunity/deepracer-sagemaker:gpu"

# Hyperparams
## Here we load hyperparameters from hyperparams.json file
with open('hyperparams.json', 'r', encoding='utf-8') as hp:
hyper = eval(hp.read())
# Create dictionary that will be passed to estimator
# TODO: code can be simplified if we iterate over an array of keys to init dict
hyperparameters = {"s3_bucket": s3_bucket,
"s3_prefix": s3_prefix,
"aws_region": aws_region,
"model_metadata_s3_key": "s3://{}/custom_files/model_metadata.json".format(s3_bucket),
"RLCOACH_PRESET": RLCOACH_PRESET,
"batch_size": hyper["batch_size"],
"beta_entropy": hyper["beta_entropy"],
"discount_factor": hyper["discount_factor"],
"e_greedy_value": hyper["e_greedy_value"],
"epsilon_steps": hyper["epsilon_steps"],
"exploration_type": hyper["exploration_type"],
"loss_type": hyper["loss_type"],
"lr": hyper["lr"],
"num_episodes_between_training": hyper["num_episodes_between_training"],
"num_epochs": hyper["num_epochs"],
"stack_size": hyper["stack_size"],
"term_cond_avg_score": hyper["term_cond_avg_score"],
"term_cond_max_episodes": hyper["term_cond_max_episodes"]
}
# Enable pretrained if setting existed
if hyper["pretrained"].lower() == "true":
hyperparameters.update({
"pretrained_s3_bucket": "{}".format(s3_bucket),
"pretrained_s3_prefix": "rl-deepracer-pretrained"
})

estimator = RLEstimator(entry_point="training_worker.py",
source_dir='src',
dependencies=["common/sagemaker_rl"],
Expand All @@ -111,29 +142,7 @@
base_job_name=job_name,
image_name=image_name,
train_max_run=job_duration_in_seconds, # Maximum runtime in seconds
hyperparameters={"s3_bucket": s3_bucket,
"s3_prefix": s3_prefix,
"aws_region": aws_region,
"model_metadata_s3_key": "s3://{}/custom_files/model_metadata.json".format(s3_bucket),
"RLCOACH_PRESET": RLCOACH_PRESET,

"batch_size": 64,
"beta_entropy": 0.01,
"discount_factor": 0.999,
"e_greedy_value": 0.05,
"epsilon_steps": 10000,
"exploration_type": "categorical",
"loss_type": "mean squared error",
"lr": 0.0003,
"num_episodes_between_training": 20,
"num_epochs": 10,
"stack_size": 1,
"term_cond_avg_score": 100000.0,
"term_cond_max_episodes": 100000

#"pretrained_s3_bucket": "{}".format(s3_bucket),
#"pretrained_s3_prefix": "rl-deepracer-pretrained"
},
hyperparameters=hyperparameters,
metric_definitions = metric_definitions,
s3_client=s3Client
#subnets=default_subnets, # Required for VPC mode
Expand Down
8 changes: 4 additions & 4 deletions start-training.sh
Original file line number Diff line number Diff line change
Expand Up @@ -13,15 +13,15 @@ export CURRENT_UID=$(id -u):$(id -g)
docker-compose -f ./docker-compose.yml up -d

if [ "$ENABLE_LOCAL_DESKTOP" = true ] ; then
echo "Starting desktop mode... waiting 20s for Sagemaker container to start"
sleep 20
echo "Starting desktop mode... waiting 30s for Sagemaker container to start"
sleep 30

echo 'Attempting to pull up sagemaker logs...'
SAGEMAKER_ID="$(docker ps | awk ' /sagemaker/ { print $1 }')"

echo 'Attempting to open stream viewer and logs...'
gnome-terminal -x sh -c "echo viewer;x-www-browser -new-window http://localhost:8888/stream_viewer?topic=/racecar/deepracer/kvs_stream;sleep 1;wmctrl -r kvs_stream -b remove,maximized_vert,maximized_horz;sleep 1;wmctrl -r kvs_stream -e 1,100,100,720,640"
gnome-terminal -x sh -c "docker logs -f $SAGEMAKER_ID"
gnome-terminal --tab -- sh -c "echo viewer;x-www-browser -new-window http://localhost:8888/stream_viewer?topic=/racecar/deepracer/kvs_stream;sleep 1;wmctrl -r kvs_stream -b remove,maximized_vert,maximized_horz;sleep 1;wmctrl -r kvs_stream -e 1,100,100,720,640"
gnome-terminal --tab -- sh -c "docker logs -f $SAGEMAKER_ID"
else
echo "Started in headless server mode. Set ENABLE_LOCAL_DESKTOP to true in config.env for desktop mode."
if [ "$ENABLE_TMUX" = true ] ; then
Expand Down
2 changes: 1 addition & 1 deletion tail-sagemaker-logs.sh
Original file line number Diff line number Diff line change
Expand Up @@ -7,4 +7,4 @@ do
sleep 1
done

docker logs -f $SAGEMAKER_ID
docker logs --follow $SAGEMAKER_ID
3 changes: 3 additions & 0 deletions utilities/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
*
!delete-last.c
!.gitignore
Loading