Autonomous Driving with Reinforcement Learning

This project implements the DDPG algorithm to a scaled car, aiming to train it for a constant-velocity tracking and a circular path tracking tasks.

Algorithm Description

DDPG is the abbreviation of the Deep Deterministic Policy Gradient algorithm. It is a modification based on the deep Q-network (DQN) and the deterministic policy gradient (DPG) algorithms, such that it is able to use neural network function approximators to learn action functions and policies in continuous observation and action spaces. It also uses a replay buffer and a delayed update target network to deal with the unstable feature of the nonlinear function approximations via neural networks. Detailed explanations of DDPG is included in our report. We also refer readers to read the original papers of DDPG, DQN, and DPG to gain insights.

File Description

Buffer Data

Several tests were done with hand controlled episodes, and transition information was recorded in the replay buffer data. The buffer data was used to train several policy and critic networks located in the network_data folder.

The buffer files labeled "ref_path..." were used in the task of trying to follow the circle path in the room.

Files labeled "ref_vel..." were used in the reference velocity task.

Network Data

Trained network weights. All the existing network files are for the reference velocity task.

Bag Data

All these files contain ROStopic information which contains information about vehicle states, actions, and critic and policy network outputs in real time.

Buffer Node

This is a python-ROS script which was run on the physical vehicle using SSH to collect the replay buffer in real-time. Additionally, offline trained networks could be moved onto the vehicle computer and tested using this script as well as for gathering online training data. Due to the nature of the system, you will need a RC car which runs ROS-Python to run this script.

Learning Node

This file was used to train policy and critic network using DDPG. It utilizes the chosen buffer data to run the script. Buffer rewards can be recalculated using this script as well.

Learning Node Vel

This file is similar to the learning node file but more problem specific for the velocity task.

Testing

Unfortunately, the reader will need a programmable RC car to implement our project to the hardware. learning_node2.py file can be ran to view the critic and actor loss for every 10000 learning iterations.

Dependencies

The algorithm was ran in Python3. External dependencies are:

numpy
matplotlib
tensorflow
pickle

Results

The following clip shows a demonstration of a RC car tracking a constant speed via DDPG

Video Presentation

We made a video to describe our project, and it can be accessed through https://drive.google.com/file/d/10kc1evXaU6gW1XUthezoBVENrDguuzZI/view?usp=sharing

Authors

Contributors are:

Perry Scott, Masters student, Mechanical Engineering Department, The University of Texas at Austin

Bryant Zhou, Masters student, Mechanical Engineering Department, The University of Texas at Austin

Paper

This is the Deep Q-network (DQN) paper:

@article{mnih2015human,
  title={Human-level control through deep reinforcement learning},
  author={Mnih, Volodymyr and Kavukcuoglu, Koray and Silver, David and Rusu, Andrei A and Veness, Joel and Bellemare, Marc G and Graves, Alex and Riedmiller, Martin and Fidjeland, Andreas K and Ostrovski, Georg and others},
  journal={nature},
  volume={518},
  number={7540},
  pages={529--533},
  year={2015},
  publisher={Nature Publishing Group}
}

This is the Deterministic Policy Gradient (DPG) paper:

@inproceedings{silver2014deterministic,
  title={Deterministic policy gradient algorithms},
  author={Silver, David and Lever, Guy and Heess, Nicolas and Degris, Thomas and Wierstra, Daan and Riedmiller, Martin},
  booktitle={International conference on machine learning},
  pages={387--395},
  year={2014},
  organization={PMLR}
}

This is the Deep Deterministic Policy Gradient (DDPG) paper:

@article{lillicrap2015continuous,
  title={Continuous control with deep reinforcement learning},
  author={Lillicrap, Timothy P and Hunt, Jonathan J and Pritzel, Alexander and Heess, Nicolas and Erez, Tom and Tassa, Yuval and Silver, David and Wierstra, Daan},
  journal={arXiv preprint arXiv:1509.02971},
  year={2015}
}

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
Readme_file		Readme_file
bagFile_data		bagFile_data
buffer_data		buffer_data
figures		figures
network_data		network_data
.DS_Store		.DS_Store
README.md		README.md
buffer_node_5.py		buffer_node_5.py
learning_node2.py		learning_node2.py
learning_node_vel.py		learning_node_vel.py
readData.asv		readData.asv
readData.m		readData.m
readData2.m		readData2.m

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Autonomous Driving with Reinforcement Learning

Algorithm Description

File Description

Buffer Data

Network Data

Bag Data

Buffer Node

Learning Node

Learning Node Vel

Testing

Dependencies

Results

Video Presentation

Authors

Paper

About

Releases

Packages

Contributors 2

Languages

perryascott/qcar_RL

Folders and files

Latest commit

History

Repository files navigation

Autonomous Driving with Reinforcement Learning

Algorithm Description

File Description

Buffer Data

Network Data

Bag Data

Buffer Node

Learning Node

Learning Node Vel

Testing

Dependencies

Results

Video Presentation

Authors

Paper

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages