Reinforcement learners choose actions to perform based on environmental state and a reward system. They provide an AI agent that can learn to behave optimally in it's environment given a policy, or task - like obtaining the reward.
In many scenarios, the state space is significantly complex and multi-dimensional to where neural networks are increasingly used to estimate the Q-function, which approximates the future reward based on state sequence.
This repository includes Q-learning algorithms in Torch7 and an API in C++ for integrating with applications in robotics, simulation, and elsewhere.
These archives contain a full snapshot of the repo on the indicate date including binary build and prerequisites like Torch for the indicated JetPack/L4T release. These are available to download, extract, and run straight away.
JetPack 2.2 / JetPack 2.2.1 64-bit
L4T R24.1 aarch64
jetson-reinforcement-R241-aarch64-20160803.tar.gz (110MB)
https://drive.google.com/file/d/0BwYxpotGWRNOYlBmM2xMUXZ0eUE/view?usp=sharing
Alternatively the project and Torch can be built from source, a process which is mostly scripted on supported platforms. If you downloaded one of the archives above, skip ahead to Verifying Lua + Torch.
If you want to incorporate the latest source changes or build on a different release, the project can be built from source relatively easily (but can take a bit of time).
The process is scripted to automatically install dependencies like Torch7 and build the project from source.
You may be required to enter the sudo password at some point.
Note: some versions of JetPack/L4T already have pre-built archives available for download, see above.
First, make sure build tools
$ sudo apt-get install git cmake
$ git clone http://github.org/dusty-nv/jetson-reinforcement
$ cd jetson-reinforcement
$ mkdir build
$ cd build
$ cmake ../
This will initiate the building of dependencies like Torch and it's bindings for CUDA/cuDNN, which can take some time.
$ cd jetson-inference/build # omit if pwd is already this directory from step #2
$ make
Depending on architecture, the package will be built to either armhf or aarch64, with the following directory structure:
|-build
\aarch64 (64-bit)
\bin where the application binaries are built to
\include where the headers reside
\lib where the libraries are build to
\armhf (32-bit)
\bin where the application binaries are built to
\include where the headers reside
\lib where the libraries are build to
After either Building from Source or [Downloading the Package](#downloading-the-package], verify the LuaJIT-5.1 / Torch7 scripting environment with these commands:
$ cd aarch64/bin
$ ./deepRL-console hello.lua # verify Lua interpreter (consult if unfamiliar with Lua)
[deepRL] created new lua_State
[deepRL] opened LUA libraries
[deepRL] loading 'hello.lua'
HELLO from LUA!
my variable equals 16
list 1
map.x 10
one
two
3
4
5
6
7
8
9
10
multiply = 200
goodbye!
[deepRL] closing lua_State
This command will test loading Torch7 packages and bindings for CUDA/cuDNN:
$ ./deepRL-console test-packages.lua # load Torch packages and bindings
[deepRL] created new lua_State
[deepRL] opened LUA libraries
[deepRL] loading 'test-packages.lua'
[deepRL] hello from within Torch/Lua environment (time=0.032163)
[deepRL] loading Lua packages...
[deepRL] loading torch...
[deepRL] loading cutorch...
cutorch.hasHalf == false
[deepRL] loading nn...
[deepRL] loading cudnn...
[deepRL] loading math...
[deepRL] loading nnx...
[deepRL] loading optim...
[deepRL] done loading packages. (time=5.234669)
[deepRL] closing lua_State
These scripts should run normally and verify the Lua / Torch environment is sane.
the deepRL-console program can launch a user's script from the command line (CLI).
Next, to verify that the reinforcement Q-learner learns like it's supposed to, let's play a simple game: half-pong, or catch.
$ ./deepRL-console catchDQN.lua
Launching the script above should begin your Jetson playing games of catch and plotting the learning process in realtime:
Each epoch is one game of play, where the ball drops from the top of the screen to the bottom. After a few hundred epochs, the Q-learner should be starting to catch the ball the majority of the time.