Doubly Inhomogeneous Reinforcement Learning

This repository contains the implementation for the paper "Doubly Non-homogeneous Reinforcement Learning" in Python (and R for plotting). The main challenge lies in that not only each subject can have their own system dynamics, but the system dynamics may evolve over time for each subject. In this work, we assume that at each time point, subjects can be grouped into clusters defined by their system dynamics. We provide some concrete examples with two subjects and a single change point (including merge, split, promotion, evolution, etc.) to elaborate.

Figure 1: Basic building blocks with two subjects (one in each row) and a single change point. Different dynamics are represented by distinct colors.

File Overview

`functions/` Directory

This directory contains utility functions for the numerical experiments, including simulation and data analysis tasks:

simu_mean_detect.py: Implements the proposed change point and cluster detection method for non-homogeneous environments.
compute_test_statistics_separateA.py: Computes the optimal policy.
evaluation.py: Implements the evaluation procedure, including functions to estimate the optimal policy and assess its value using fitted-Q evaluation.
simulate_data_1d.py: Generates data based on the provided transition and reward functions.

`realdata_2020/` Directory

This directory houses the platform used to analyze the IHS 2020 study, as discussed in Section 3.1 of the paper:

realdata.py: Detects change points and clusters in the training data, and evaluates the trained policies on testing data.
create_realdata.sh: Creates SLURM jobs to run realdata.py.
collect_res.py and create_collectres.sh: Collect and summarize results from the real data analysis.

`semisyn_2020/` Directory

This directory contains the platform for the IHS simulation described in the paper. It is divided into two subdirectories:

`offline/` (for Section 5.1)

offline.py: Simulates 3-dimensional data based on the fitted model from IHS 2020 data, incorporating the detected change points and clusters.
create_offline.sh: Creates SLURM jobs to run offline.py.
collect_res.py and create_collectres.sh: Collect and summarize results from the offline estimation.

`online_value/` (for Section 5.2)

run_value.py: Estimates the value of different policies in a doubly inhomogeneous environment.
create_value.sh: Creates SLURM jobs to run run_value.py.
collect_res.py and create_collectres.sh: Collect and summarize results from the online evaluation.

Name		Name	Last commit message	Last commit date
Latest commit History 140 Commits
functions		functions
realdata_2020		realdata_2020
semisyn_2020		semisyn_2020
.gitignore		.gitignore
BuildingBlock.png		BuildingBlock.png
Readme.md		Readme.md
supp.pdf		supp.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Doubly Inhomogeneous Reinforcement Learning

File Overview

`functions/` Directory

`realdata_2020/` Directory

`semisyn_2020/` Directory

`offline/` (for Section 5.1)

`online_value/` (for Section 5.2)

About

Releases

Packages

Contributors 3

Languages

zaza0209/DIRL

Folders and files

Latest commit

History

Repository files navigation

Doubly Inhomogeneous Reinforcement Learning

File Overview

functions/ Directory

realdata_2020/ Directory

semisyn_2020/ Directory

offline/ (for Section 5.1)

online_value/ (for Section 5.2)

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

`functions/` Directory

`realdata_2020/` Directory

`semisyn_2020/` Directory

`offline/` (for Section 5.1)

`online_value/` (for Section 5.2)

Packages