Skip to content

Codebase adapted from DVRL by Igl et al, 2017 to evaluate dvrl and known observation belief encoders

Notifications You must be signed in to change notification settings

nam630/acno_mdp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

56 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Code for "Reinforcement Learning with State Observation Costs in Action-Contingent Noiselessly Observable Markov Decision Processes" (NeurIPS 2021) by Alex Nam/Scott Fleming/Emma Brunskill https://openreview.net/pdf?id=jgze2dDL9y8

List of ACNO-MDP algorithms and commands

Conda packages and versions used for generating the reported results are shared in conda.yml (Note not all the packages may be needed.)

To run the known observation belief encoder

  1. cartpole (can adjust observation cost in cofig_file) python known_obs/code/main_known.py -p with environment.config_file=cartpole_ver3.yaml

  2. mountain hike (can adjust observation cost in config file) python known_obs/code/main_known.py -p with environment.config_file=mountainHike_ver2.yaml

To run the default DVRL belief encoder (also need to manually set env_id in code/envs.py 'make_env' -- review inline comments)

Source code: https://github.com/maximilianigl/DVRL

  1. cartpole (need to set obs_cost in custom_cartpole/envs/AdvancedCartPole.py obs_cost) python ./code/main.py -p with environment.config_file=cartpole_ver2.yaml algorithm.use_particle_filter=True log.filename='temp/'

  2. mountain (need to set obs_cost in custom_mountain/envs/hike.py obs_cost) python ./code/main.py -p with environment.config_file=mountainHike_ver3.yaml algorithm.use_particle_filter=True log.filename='temp/'

To run Sepsis with POMCP/MCTS

POMDPy source code: https://github.com/pemami4911/POMDPy

Empirical model built from 1M random interactions is saved in "./POMDPy/examples/sepsis/model_256.obj"

  1. Observe-then-Plan (can change cost to any <= 0 value, init_idx specifies the start patient state)

''' assume the transition and reward estimates are learned in advance and use a copy of the model parameter estimates for planning. transitions need to be unzipped from transitions.npy.zip inside acno_mdp/POMDPy/pompdy directory '''

Set "observe_then_plan = True" in L183 run_pomcp(self, epoch, eps, temp=None, observe_then_plan=True) in main/POMDPy/pomdpy/agent.py python pomcp.py --init_idx 256 --cost -0.1 --is_mdp 0

  1. ACNO-POMCP (observe while planning)

''' starts with uniform initialization of transition models and updates the transition for every observed tuple '''

Set "observe_then_plan = False" in L183 run_pomcp(self, epoch, eps, temp=None, observe_then_plan=True) in main/POMDPy/pomdpy/agent.py. This will set the transition model parameters to uniform over all possible next states and update the observed tuple counts.

python pomcp.py --init_idx 256 --cost -0.1 --is_mdp 0

  1. MCTS (always observing POMCP, not included in the main paper)

python pomcp.py --init_idx 256 --cost -0.05 --is_mdp 1

  1. For running POMCP with the true model parameters (e.g., stepping actions in the true environment instead of imaginging with the learned model parameters), modify L343 in POMDPy/examples/sepsis/sepsis.py to "_true = True" so the actions are executed in a copy of the sepsis environment. Otherwise, use the same command as ACNO-POMCP

python pomcp.py --init_idx 256 --cost -0.1 --is_mdp 0

  1. DRQN

Source code: https://github.com/Bigpig4396/PyTorch-Deep-Recurrent-Q-Learning-DRQN

python drqn.py

Generating plots for continuous domains uses the same code as DVRL and plots for sepsis can be replicated using sepsis_res/plot.py.

About

Codebase adapted from DVRL by Igl et al, 2017 to evaluate dvrl and known observation belief encoders

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages