This repository contains the implementation for the paper "Doubly Non-homogeneous Reinforcement Learning" in Python (and R for plotting). The main challenge lies in that not only each subject can have their own system dynamics, but the system dynamics may evolve over time for each subject. In this work, we assume that at each time point, subjects can be grouped into clusters defined by their system dynamics. We provide some concrete examples with two subjects and a single change point (including merge, split, promotion, evolution, etc.) to elaborate.
Figure 1: Basic building blocks with two subjects (one in each row) and a single change point. Different dynamics are represented by distinct colors.
This directory contains utility functions for the numerical experiments, including simulation and data analysis tasks:
simu_mean_detect.py
: Implements the proposed change point and cluster detection method for non-homogeneous environments.compute_test_statistics_separateA.py
: Computes the optimal policy.evaluation.py
: Implements the evaluation procedure, including functions to estimate the optimal policy and assess its value using fitted-Q evaluation.simulate_data_1d.py
: Generates data based on the provided transition and reward functions.
This directory houses the platform used to analyze the IHS 2020 study, as discussed in Section 3.1 of the paper:
realdata.py
: Detects change points and clusters in the training data, and evaluates the trained policies on testing data.create_realdata.sh
: Creates SLURM jobs to runrealdata.py
.collect_res.py
andcreate_collectres.sh
: Collect and summarize results from the real data analysis.
This directory contains the platform for the IHS simulation described in the paper. It is divided into two subdirectories:
offline.py
: Simulates 3-dimensional data based on the fitted model from IHS 2020 data, incorporating the detected change points and clusters.create_offline.sh
: Creates SLURM jobs to runoffline.py
.collect_res.py
andcreate_collectres.sh
: Collect and summarize results from the offline estimation.
run_value.py
: Estimates the value of different policies in a doubly inhomogeneous environment.create_value.sh
: Creates SLURM jobs to runrun_value.py
.collect_res.py
andcreate_collectres.sh
: Collect and summarize results from the online evaluation.