Skip to content

zaza0209/DIRL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Doubly Inhomogeneous Reinforcement Learning

This repository contains the implementation for the paper "Doubly Non-homogeneous Reinforcement Learning" in Python (and R for plotting). The main challenge lies in that not only each subject can have their own system dynamics, but the system dynamics may evolve over time for each subject. In this work, we assume that at each time point, subjects can be grouped into clusters defined by their system dynamics. We provide some concrete examples with two subjects and a single change point (including merge, split, promotion, evolution, etc.) to elaborate.

drawing

Figure 1: Basic building blocks with two subjects (one in each row) and a single change point. Different dynamics are represented by distinct colors.

File Overview

functions/ Directory

This directory contains utility functions for the numerical experiments, including simulation and data analysis tasks:

  • simu_mean_detect.py: Implements the proposed change point and cluster detection method for non-homogeneous environments.
  • compute_test_statistics_separateA.py: Computes the optimal policy.
  • evaluation.py: Implements the evaluation procedure, including functions to estimate the optimal policy and assess its value using fitted-Q evaluation.
  • simulate_data_1d.py: Generates data based on the provided transition and reward functions.

realdata_2020/ Directory

This directory houses the platform used to analyze the IHS 2020 study, as discussed in Section 3.1 of the paper:

  • realdata.py: Detects change points and clusters in the training data, and evaluates the trained policies on testing data.
  • create_realdata.sh: Creates SLURM jobs to run realdata.py.
  • collect_res.py and create_collectres.sh: Collect and summarize results from the real data analysis.

semisyn_2020/ Directory

This directory contains the platform for the IHS simulation described in the paper. It is divided into two subdirectories:

offline/ (for Section 5.1)

  • offline.py: Simulates 3-dimensional data based on the fitted model from IHS 2020 data, incorporating the detected change points and clusters.
  • create_offline.sh: Creates SLURM jobs to run offline.py.
  • collect_res.py and create_collectres.sh: Collect and summarize results from the offline estimation.

online_value/ (for Section 5.2)

  • run_value.py: Estimates the value of different policies in a doubly inhomogeneous environment.
  • create_value.sh: Creates SLURM jobs to run run_value.py.
  • collect_res.py and create_collectres.sh: Collect and summarize results from the online evaluation.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •