Source code and data for the paper "Testing the Plasticity of Reinforcement Learning Based Systems".
The approach generates the map below:
The figure shows the adaptation heatmaps of PPO
(A), SAC
(B) and DQN
(C) in the CartPole
environment when only the length
(x axis) and cart_friction
(y axis) parameters are considered. The ranges of these two parameters are the same for each DRL algorithm and the colors in each heatmap indicate the adaptation probability, as shown by the color bar on the bottom right corner of the figure. The adaptation frontier is the yellow continuous line between the green (adaptation successful) and the red (adaptation failed) regions of the heatmap, while the black dots indicate the search points sampled by the search procedure of AlphaTest. From the maps we can see that the parameter length
is more critical in terms of adaptation capabilities than the cart_friction
parameter, for all the algorithms. In particular, no DRL algorithm adapted when length = 4.0
, although DQN
(C) seems to be the best at tolerating the increase of such parameter, but all DRL algorithms adapted when cart_friction = 51.20
. The adaptation heatmaps are easily interpretable by developers, as they show the regions of the parameter space where the adaptation frontier lies and where we can expect a certain algorithm to successfully adapt or not when the initial conditions of the environment change.
To install the dependencies to run this project (we currently support MacOS and Ubuntu):
- Download and install Anaconda
- Create directory workspace in your home directory and move there:
mkdir ~/workspace && cd ~/workspace
- Clone this repository:
git clone https://github.com/testingautomated-usi/rl-plasticity-experiments
- Create the environment with the python dependencies:
conda env create -f rl-plasticity-experiments.yml
To run an experiment with alphatest type the following commands:
- Move to the source code directory:
cd ~/workspace/rl-plasticity-experiments/src
- Activate the python environment:
conda activate rl-plasticity-experiments
- Run the experiment:
python experiments.py --algo_name ppo2 \
--env_name CartPole-v1 \
--num_iterations 13 \
--param_names length,cart_friction \
--runs_for_probability_estimation 3 \
--num_search_iterations 1 \
--logging_level INFO \
--search_type alphatest
The previous command runs alphatest to test the plasticity of the PPO algorithm in the environment CartPole-v1 varying only two parameters of this environment, i.e. length and cart_friction. The number of repetitions is 1 (--num_search_iterations 1
) and the number of runs for probability estimation is 3. If enough CPU cores are available such runs are carried out in parallel.
The experiment takes about 20 minutes on an 8-core machine and it creates a directory structure alphatest/CartPole-v1/ppo2/n_iterations_length_cart_friction_8_0
in the root of the project. In the directory there are artifacts of the experiment, including the models for each continual learning run, the frontier pairs found, the search points sampled and the executions skipped due to the dominance analysis.
The directory scripts contains the scripts for reproducing the experiments reported in the paper. There is a directory for running alphatest experiments and one for running random experiments. Moreover, the scripts are divided by environment. For example the previous python command was taken from the cartpole directory of alphatest.
To run alphatest for 5 repetitions in the CartPole-v1 environment for the PPO algorithm do the following steps:
- Move to the root directory of the project:
cd ~/workspace/rl-plasticity-experiments
- Activate the python environment:
conda activate rl-plasticity-experiments
- Run the experiment:
source scripts/alphatest/CartPole-v1/cartpole_ppo_length_cart_friction.sh
The 5 repetitions should take about 2 hours.
Assuming that the directory alphatest/CartPole-v1/ppo2/n_iterations_length_cart_friction_8_0
is present in the root of the project, i.e. at least one repetition of alphatest is completed.
To compute the adaptation and anti-regression volumes and the heatmaps:
- Move to the source code directory:
cd ~/workspace/rl-plasticity-experiments/src
- Activate the python environment:
conda activate rl-plasticity-experiments
- Compute the volume:
python analyze_volume_results.py --dir ~/workspace/rl-plasticity-experiments/alphatest/CartPole-v1/ppo2 \
--algo_name ppo2 \
--grid_granularity_percentage_of_range 1.0 \
--env_name CartPole-v1 \
--plot_file_path ~/workspace/rl-plasticity-experiments/alphatest/CartPole-v1/ppo2 \
--param_names length,cart_friction \
--smooth 2.0 \
--plot_only_approximated \
--max_points_x 50 \
--max_points_y 50 \
--regression_probability \
--logging_level INFO
The previous command should produce a txt file called analyze_volume_results_adapt_regress_probability_g_1.0.txt
in alphatest/CartPole-v1/ppo2/n_iterations_length_cart_friction_8_0
where at the end it is written the adapation volume and the anti-regression volume (it is called regression volume in the txt file for brevity). In the same directory there are two pdf files called heatmap_adaptation_probability_iteration_g_1.0_0.pdf
and heatmap_regression_probability_iteration_g_1.0_0.pdf
for the adaptation probability and regression probability respectively. The adaptation probability heatmap is red where the algorithm did not adapt successfully and it is green otherwise; for the regression heatmap the opposite holds and the gray region is where it is not defined. The black points are the search points sampled during the search phase.
The results of the experiments for all environments (i.e. CartPole, Pendulum, MountainCar, Acrobot) all RL algorithms (i.e. PPO, SAC, DQN) and all search methods (i.e. alphatest and random) are available for download at this link.
To cite this repository in publications:
@article{DBLP:journals/tosem/BiagiolaT22,
author = {Matteo Biagiola and
Paolo Tonella},
title = {Testing the Plasticity of Reinforcement Learning-based Systems},
journal = {{ACM} Trans. Softw. Eng. Methodol.},
volume = {31},
number = {4},
pages = {80:1--80:46},
year = {2022},
url = {https://doi.org/10.1145/3511701},
doi = {10.1145/3511701},
timestamp = {Mon, 28 Aug 2023 21:25:10 +0200},
biburl = {https://dblp.org/rec/journals/tosem/BiagiolaT22.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}