A simple and modular implementation of the Soft Actor Critic algorithm in PyTorch.
This library is deprecated in favor of JaxCQL and CQL, which include up-to-date SAC and CQL implementation in JAX and PyTorch. This codebase will no longer receive new updates and bug fixes.
- Install and use the included Ananconda environment
$ conda env create -f environment.yml
$ source activate SimpleSAC
You'll need to get your own MuJoCo key if you want to use MuJoCo.
- Add this repo directory to your
PYTHONPATH
environment variable.
export PYTHONPATH="$PYTHONPATH:$(pwd)"
You can run SAC experiments using the following command:
python -m SimpleSAC.sac_main \
--env 'HalfCheetah-v2' \
--logging.output_dir './experiment_output' \
--device='cuda'
If you want to run on CPU only, just omit the --device='cuda'
part.
All available command options can be seen in SimpleSAC/sac_main.py
and SimpleSAC/sac.py.
You can visualize the experiment metrics with viskit:
python -m viskit './experiment_output'
and simply navigate to http://localhost:5000/
This codebase can also log to W&B online visualization platform. To log to W&B, you first need to set your W&B API key environment variable:
export WANDB_API_KEY='YOUR W&B API KEY HERE'
Then you can run experiments with W&B logging turned on:
python -m SimpleSAC.sac \
--env 'HalfCheetah-v2' \
--logging.output_dir './experiment_output' \
--device='cuda' \
--logging.online
The project organization is inspired by TD3. The SAC implementation is based on rlkit. The viskit visualization is taken from viskit, which is taken from rllab.