updated README, updated base reward func, updated startup/stop script…

…s, default track now reinvent_base
mattcamp · Jun 18, 2020 · a5ad548 · a5ad548
1 parent bd8afbc
commit a5ad548
Show file tree

Hide file tree

Showing 7 changed files with 97 additions and 77 deletions.
diff --git a/README.md b/README.md
@@ -2,13 +2,14 @@
 
 Heavily based off work by [Crr0004](https://github.com/crr0004), [AlexSchultz](https://github.com/alexschultz), [Richardfan1126](https://github.com/richardfan1126) and [LarsLL](https://github.com/larsll)
 
-This is a very early upload of Matt's local training setup so that a few people can test. Lots of things probably won't work properly and lots of functionality is still missing. 
+## Prerequisites
 
-Very rough guide for use (details to come):
+This project is designed to run on a linux system, ideally with an nvidia GPU. CPU training is possible but will be very slow. AMD GPUs are not currently supported.
+Ubuntu 18.04 has been extensively tested. 
 
-- install nvidia cuda drivers and tools.
-- install docker and docker-compose
-- set docker-nvidia2 as default runtime in your `/etc/docker/daemon.json`
+1.  install nvidia cuda drivers and tools.
+2.  install docker and docker-compose
+3.  set docker-nvidia2 as default runtime in your `/etc/docker/daemon.json`
 
         {
          "default-runtime": "nvidia",
@@ -20,19 +21,34 @@ Very rough guide for use (details to come):
             }
         }
 
+## Configure training session
 
-- edit reward function and training params in `data/minio/bucket/custom_files`. Note that the track name MUST be the same in both files!
-- tweak any other settings you want in `config.env`
-   - Modify `ENABLE_GPU_TRAINING` for SageMaker runtime: `true` (nvidia runtime) or `false` (CPU runtime). Default is GPU.
-   - If you do not have an nvidia GPU then you will also need to change the tag of the robomaker image inside `docker-compose.yml`
-   - Set `ENABLE_LOCAL_DESKTOP` to `true` if you have a local X-windows install (desktop machine) and want to automatically start the stream viewer and tail sagemaker logs.
-   - Install tmux (`sudo apt install tmux` on Ubuntu Linux) if you want robomaker + sagemaker logs automatically tailed in your terminal session.
-- run `./start-training.sh` to start training
-- view docker logs to see if it's working (automatic if `tmux` is installed)
-- run `./stop-training.sh` to stop training.
-- run `./delete_last_run.sh` to clear out the buckets for a fresh run. For convenient version without sudo prompt check out `utilites/delete-last.c`.
-- run `./local-copy.sh <model_backup_name>` to backup current model files into user specified MODEL directory.
-- run `./mk-model.sh <model_path>` to create physical car uploadable .tar.gz file from your model. (Will be removed in a future update once file gets correctly generated after training)
+1.  Edit the reward function in `data/minio/bucket/custom_files/reward.py`
+2.  Edit the action space in `data/mini/bucket/custom_files/model_metadata.json`
+3.  Edit the training params in `config.env` and `data/minio/bucket/custom_files/training_params.yaml`. Note that the track name MUST be the same in both files!
+
+    Useful options include:
+
+    | option | description |
+    |--------|-------------|
+    |ENABLE_GPU_TRAINING|Enables GPU for SageMaker runtime: `true` (nvidia runtime) or `false` (CPU runtime). Default is GPU|
+    |ENABLE_LOCAL_DESKTOP|Set to `true` if you have a local X-windows install (desktop machine) and want to automatically start the stream viewer and tail sagemaker and robomaker logs.|
+    |ENABLE_TMUX|Enables tmux for automatic log tails in your existing terminal session (good for remote servers)|
+    |ENABLE_GUI|Enables gazebo client. Access via vnc|
+    |WORLD_NAME|The track name. Tracks are contained within the robomaker container image, built from the [deepracer-simapp community project](https://github.com/aws-deepracer-community/deepracer-simapp/tree/master/bundle/deepracer_simulation_environment/share/deepracer_simulation_environment/worlds) (excluding the .world suffix)
+
+    Many other options are available.
+
+4. Edit hyperparameters for training are loaded from `hyperparams.json` inside `src/rl_coach_2020_v2/hyperparams.json` - shortcut link has been created in the root directory. Available options are exactly the same except the new option `pretrained` that simplifies enabling pretrained mode.
+
+    More information on configuring local training can be found at https://wiki.deepracing.io/Customise_Local_Training
+
+## Starting a training session
+Run `./start-training.sh` to start training. 
+
+The current model data dir (defaults to data/minio/bucket/current) must be empty. 
+
+To use a pretrained model as a base for a new training session rename `data/minio/bucket/current` to `data/minio/bucket/rl-deepracer-pretrained` and set `"pretrained": "true"` in hyperparams.json
 
 The first run will likely take quite a while to start as it needs to pull over 10GB of all the docker images.
 You can avoid this delay by pulling the images in advance:
@@ -41,19 +57,30 @@ You can avoid this delay by pulling the images in advance:
    - `docker pull awsdeepracercommunity/deepracer-robomaker:<cpu or gpu>`
    - `docker pull mattcamp/dr-coach`
    - `docker pull minio/minio`
+
+   Note that different flavours of CPU image are available, see https://github.com/aws-deepracer-community/deepracer-simapp for details.
+   `cpu-avx2` is the default.
 
-## Modifying parameters
-Hyperparameters for training are loaded from `hyperparams.json` inside `src/rl_coach_2020_v2/hyperparams.json` - shortcut link has been created in the root directory. Available options are exactly the same except the new option `pretrained` that simplifies enabling pretrained mode.
-
-## Video stream
+## Monitoring training
+- Docker logs should open automatically in new terminal tabs if running with `ENABLE_LOCAL_DESKTOP` enabled, or via tmux in your existing terminal session if `ENABLE_TMUX` is enabled.
+- Logs can be manually viewed using `docker ps` and `docker logs robomaker` or `docker logs <sagemaker_container_id>`
+- The web video stream is available by default on port 8888. If running in desktop mode a browser window should open automatically, otherwise you can try opening a url such as http://127.0.0.1:8888/stream_viewer?topic=/racecar/deepracer/kvs_stream
+- Kinesis video stream can also be enabled. See below for more details, however usually the web video stream just works better.
+- if `ENABLE_GUI` is enabled then you can connect a vncviewer on port 8080 to view the gazebo client directly.
 
-The video stream is available either via a web stream of via Kinesis. 
+## Stopping training
+Run `./stop-training.sh` to stop training. 
 
-### Web stream:
+If running, sagemaker will be stopped first and then after a 20s delay the rest of the containers will be stopped. This allows Robomaker to create a model.tar.gz file in the current model dir, ready to be loaded onto a physical DeepRacer car.
+
+**NOTE: Sagemaker should not be stopped during the policy training phase or things might get weird and corrupt. You should only stop training while the video stream status is "Training" and not "Evaluating" (or verify via sagemaker logs that policy training has completed for the current iteration)**
 
-The web video stream is exposed on port 8888. If you're running a local browser then you should be able to browse directly to `http://127.0.0.1:8888/stream_viewer?topic=/racecar/deepracer/kvs_stream` once Robomaker has started.
+## Model management
+- run `./delete_last_run.sh` to clear out the buckets for a fresh run. For convenient version without sudo prompt check out `utilites/delete-last.c`.
+- run `./local-copy.sh <model_backup_name>` to backup current model files into user specified MODEL directory.
+- run `./mk-model.sh <model_path>` to create physical car uploadable .tar.gz file from your model. (Will be removed in a future update once file gets correctly generated after training)
 
-### Kinesis stream:
+### Kinesis video stream:
 
 Kinesis video currently only works via the real AWS Kinesis service probably only makes sense if you are training on an EC2 instance.
 
@@ -67,14 +94,11 @@ Kinesis video is a stream of approx 1.5Mbps so beware the impact on your AWS cos
 
 Once working the stream should be visible in the Kinesis console. 
 
-### VNC
-You can enter runnning environment using a vncviewer at localhost:8080.
-
 ## Known issues:
 - Sometimes sagemaker won't start claiming that `/opt/ml/input/config/resourceconfig.json` is missing. Still trying to work out why.
 - Stopping training at the wrong time seems to cause a problem where sagemaker will crash next time when trying to load the 'best' model which may not exist properly. This only happens if you start a new training session without clearing out the bucket first. Yet to be seen if this will cause a problem when trying to use pretrained models.
 - `training_params.yaml` must exist in the target bucket or robomaker will not start. The start-training.sh script will copy it over from custom_files if necessary.
-- Scripts not currently included to handle pretrainined models or uploading to AWS Console or virtual league. 
+- Scripts not currently included to handle uploading to AWS Console or virtual league. 
 - Current sagemaker and robomaker GPU images are built for nvidia GPU only. 
 - The sagemaker and robomakers images are huge (~4.5GB)
 

diff --git a/config.env b/config.env
@@ -17,10 +17,10 @@ S3_ENDPOINT_URL=http://minio:9000
 S3_YAML_NAME=training_params.yaml
 SAGEMAKER_SHARED_S3_BUCKET=bucket
 SAGEMAKER_SHARED_S3_PREFIX=current
-WORLD_NAME=LGSWide
+WORLD_NAME=reinvent_base
 ENABLE_KINESIS=false
 ENABLE_GUI=true
 ENABLE_GPU_TRAINING=true
-ENABLE_LOCAL_DESKTOP=false
+ENABLE_LOCAL_DESKTOP=true
 ENABLE_TMUX=false
 MIN_EVAL_TRIALS=5
diff --git a/data/minio/bucket/custom_files/reward.py b/data/minio/bucket/custom_files/reward.py
@@ -1,45 +1,25 @@
 def reward_function(params):
     '''
-    Example of rewarding the agent to stay inside two borders
-    and penalizing getting too close to the objects in front
+    Example of rewarding the agent to follow center line
     '''
 
-    all_wheels_on_track = params['all_wheels_on_track']
-    distance_from_center = params['distance_from_center']
+    # Read input parameters
     track_width = params['track_width']
-    objects_distance = params['objects_distance']
-    _, next_object_index = params['closest_objects']
-    objects_left_of_center = params['objects_left_of_center']
-    is_left_of_center = params['is_left_of_center']
-
-    # Initialize reward with a small number but not zero
-    # because zero means off-track or crashed
-    reward = 1e-3
+    distance_from_center = params['distance_from_center']
 
-    # Reward if the agent stays inside the two borders of the track
-    if all_wheels_on_track and (0.5 * track_width - distance_from_center) >= 0.05:
-        reward_lane = 1.0
+    # Calculate 3 markers that are at varying distances away from the center line
+    marker_1 = 0.1 * track_width
+    marker_2 = 0.25 * track_width
+    marker_3 = 0.5 * track_width
+
+    # Give higher reward if the car is closer to center line and vice versa
+    if distance_from_center <= marker_1:
+        reward = 1.0
+    elif distance_from_center <= marker_2:
+        reward = 0.5
+    elif distance_from_center <= marker_3:
+        reward = 0.1
     else:
-        reward_lane = 1e-3
-
-    # Penalize if the agent is too close to the next object
-    reward_avoid = 1.0
-
-    # Distance to the next object
-    distance_closest_object = objects_distance[next_object_index]
-    # Decide if the agent and the next object is on the same lane
-    is_same_lane = objects_left_of_center[next_object_index] == is_left_of_center
-
-    if is_same_lane:
-        if 0.5 <= distance_closest_object < 0.8: 
-            reward_avoid *= 0.5
-        elif 0.3 <= distance_closest_object < 0.5:
-            reward_avoid *= 0.2
-        elif distance_closest_object < 0.3:
-            reward_avoid = 1e-3 # Likely crashed
-
-    # Calculate reward by putting different weights on 
-    # the two aspects above
-    reward += 1.0 * reward_lane + 4.0 * reward_avoid
+        reward = 1e-3  # likely crashed/ close to off track
 
-    return reward
+    return float(reward)
diff --git a/data/minio/bucket/custom_files/training_params.yaml b/data/minio/bucket/custom_files/training_params.yaml
@@ -1,5 +1,5 @@
 ---
-WORLD_NAME: "LGSWide"
+WORLD_NAME: "reinvent_base"
 RACE_TYPE: "OBJECT_AVOIDANCE"
 SAGEMAKER_SHARED_S3_PREFIX: "current"
 CHANGE_START_POSITION: "true"
@@ -19,6 +19,6 @@ MODEL_METADATA_FILE_S3_KEY: "custom_files/model_metadata.json"
 METRIC_NAME: "TrainingRewardScore"
 CAR_COLOR: "Purple"
 TARGET_REWARD_SCORE: "None"
-NUMBER_OF_OBSTACLES: "3"
+NUMBER_OF_OBSTACLES: "0"
 OBSTACLE_TYPE: "BOX"
 RANDOMIZE_OBSTACLE_LOCATIONS: "false"
diff --git a/mk-model.sh b/mk-model.sh
@@ -1,5 +1,6 @@
 #!/bin/bash
-# create .tar.gz file uploadable to physical deepracer
+# create .tar.gz file uploadable to physical deepracer.
+# This should not be necessary if sagemaker is stopped before robomaker as the model.tar.gz will automatically be created.
 # USAGE: ./mk-model.sh <model_path>
 cd $1
 echo $(pwd)

diff --git a/start-training.sh b/start-training.sh
@@ -2,6 +2,13 @@
 
 source config.env
 
+if [ -e data/minio/bucket/current/model/deepracer_checkpoints.json ] ; then
+  echo "WARNING: Files were found in the current model directory data/minio/bucket/current/"
+  echo "Please run ./delete_last_run.sh or relocate the current model dir before starting a new training session."
+  echo "You cannot currently restart training of an existing model, instead you should move the current model dir to rl-deepracer-pretrained and enable pretrained in hyperparams.json"
+  exit 1
+fi
+
 if [ ! -e data/minio/bucket/current/training_params.yaml ]; then
     mkdir -p data/minio/bucket/current
     cp data/minio/bucket/custom_files/training_params.yaml data/minio/bucket/current
@@ -22,6 +29,7 @@ if [ "$ENABLE_LOCAL_DESKTOP" = true ] ; then
     echo 'Attempting to open stream viewer and logs...'
     gnome-terminal --tab -- sh -c "echo viewer;x-www-browser -new-window http://localhost:8888/stream_viewer?topic=/racecar/deepracer/kvs_stream;sleep 1;wmctrl -r kvs_stream -b remove,maximized_vert,maximized_horz;sleep 1;wmctrl -r kvs_stream -e 1,100,100,720,640"
     gnome-terminal --tab -- sh -c "docker logs -f $SAGEMAKER_ID"
+    gnome-terminal --tab -- sh -c 'docker logs -f robomaker'
 else
     echo "Started in headless server mode. Set ENABLE_LOCAL_DESKTOP to true in config.env for desktop mode."
     if [ "$ENABLE_TMUX" = true ] ; then

diff --git a/stop-training.sh b/stop-training.sh
@@ -3,18 +3,25 @@
 source config.env
 
 export ROBOMAKER_COMMAND=""
-docker-compose -f ./docker-compose.yml down
 
-docker stop $(docker ps | awk ' /sagemaker/ { print $1 }')
-docker rm $(docker ps -a | awk ' /sagemaker/ { print $1 }')
+SAGEMAKER_ID=$(docker ps | awk ' /sagemaker/ { print $1 }')
+if [ ! -z "${SAGEMAKER_ID}" ]; then
+  echo "Stopping sagemaker and waiting 20s while model.tar.gz is created"
+  docker stop ${SAGEMAKER_ID}
+  sleep 20
+  docker rm ${SAGEMAKER_ID}
+fi
 
+docker-compose -f ./docker-compose.yml down
 
 if [ "$ENABLE_LOCAL_DESKTOP" = true ] ; then
-    wmctrl -c kvs_stream
+    if [ -n  "$(which wmctrl)" ] ; then
+      wmctrl -c kvs_stream
+    fi
 fi
 
-if [ ! -z "$(which tmux)" ]
-then
+if [ "$ENABLE_TMUX" = true ] ; then
   tmux kill-session
 fi
 
+