SARSOP for R

library(sarsop)
library(tidyverse) # for plotting

Problem definition

Our problem is defined by a state space, states, representing the true fish stock size (in arbitrary units), and an action space, actions representing the number of fish that will be harvested (or attempted to harvest). For simplicity, we will permit any action from 0 harvest to the maximum possible state size.

A stock recruitment function, f describes the expected future state given the current state. The true future state will be a stochastic draw with this mean.

A reward function determines the value of taking action of harvesting h fish when stock size is x fish; for simplicity this example assumes a fixed price per unit harvest, with no cost on harvesting effort. Future rewards are discounted.

states <- seq(0,1, length=50)
actions <- states
observations <- states
sigma_g <- 0.1
sigma_m <- 0.2
reward_fn <- function(x,h) pmin(x,h) # - .001*h
discount <- 0.95

r <- 1
K <- 0.75
f <- function(x, h){ # ricker
  s <- pmax(x - h, 0)
  s * exp(r * (1 - s / K) )
}

Semi-analytic solution to Deterministic problem

For comparison, we note that an exact solution to the deterministic or low-noise problem comes from Reed 1979, which proves that a constant escapement policy (S^) is optimal, with (\tfrac{df}{dx}|_{x = S^} = 1/\gamma) for discount (\gamma),

S_star <- optimize(function(x) -f(x,0) + x / discount, 
                   c(min(states),max(states)))$minimum
det_policy <- sapply(states, function(x) if(x < S_star) 0 else x - S_star)
det_action <- sapply(det_policy, function(x) which.min(abs(actions - x)))

When the state is observed without error, the problem is a Markov Decision Process (MDP) and can be solved by stochastic dynamic programming (e.g. policy iteration) over the discrete state and action space. To do so, we need matrix representations of the above transition function and reward function.

sarsop provides a convenience function for generating transition, observation, and reward matrices given these parameters for the fisheries management problem:

m <- fisheries_matrices(states, actions, observations, reward_fn, 
                        f, sigma_g, sigma_m, noise = "lognormal")

POMDP Solution

In the POMDP problem, the true state is unknown, but measured imperfectly. We introduce an observation matrix to indicate the probability of observing a particular state (y) given a true state (x). In principle this could depend on the action taken as well, though for simplicity we assume only a log-normal measurement error independent of the action chosen.

Long-running code to actually compute the solution.

log_dir <- "inst/extdata/vignette" 
alpha <- sarsop(m$transition, m$observation, m$reward, discount, 
                log_dir = log_dir,
                precision = .1, timeout = 200) # run much longer for more precise curve

sarsop logs solution files in the specified directory, along with a metadata table. The metadata table makes it convenient to store multiple solutions in a single directory, and load the desired solution later using it’s id or matching metatata. We can read this solution from the log where it is stored.

Given the model matrices and alpha vectors. Start belief with a uniform prior over states, compute & plot policy:

state_prior = rep(1, length(states)) / length(states) # initial belief
df <- compute_policy(alpha, m$transition, m$observation, m$reward,  state_prior)

## append deterministic action
df$det <- det_action

ggplot(df, aes(states[state], states[state] - actions[policy])) + 
  geom_line(col='blue') + 
  geom_line(aes(y = states[state] - actions[det]), col='red')

Simulate management under the POMDP policy:

set.seed(12345)
x0 <- which.min(abs(states - K))
Tmax <- 20
sim <- sim_pomdp(m$transition, m$observation, m$reward, discount, 
                 state_prior, x0 = x0, Tmax = Tmax, alpha = alpha)

Plot simulation data:

sim$df %>%
  select(-value) %>%
  mutate(state = states[state], action = actions[action], obs = observations[obs]) %>%
  gather(variable, stock, -time) %>%
  ggplot(aes(time, stock, color = variable)) + geom_line()  + geom_point()

Plot belief evolution:

sim$state_posterior %>% 
  data.frame(time = 1:Tmax) %>%
  filter(time %in% seq(1,Tmax, by = 2)) %>%
  gather(state, probability, -time, factor_key =TRUE) %>% 
  mutate(state = as.numeric(state)) %>% 
  ggplot(aes(state, probability, group = time, alpha = time)) + geom_line()

Developer Notes

Unfortunately the appl source code is a bit dated and not suitable for using as a shared library. It builds with lot of warnings and on Windows it only builds with MS Visual Studio. This package tries to make things as easy as possible for the user by bundling the appl executables and wrap them with system calls in R. This package also provides higher-level functions for POMDP analysis.

Thanks

The underlying algorithm, SARSOP: Successive Approximations of the Reachable Space under Optimal Policies was designed and described by Hanna Kurniawati, David Hsu, and Wee Sun Lee, in Department of Computer Science, National University of Singapore in 2008, see doi:10.15607/RSS.2008.IV.009. Kurniawati et al acknowledge Yanzhu Du and Xan Huang for helping with the software implementation in C++, which at the time of that publication, was available at http://motion.comp.nus.edu.sg/projects/pomdp/pomdp.html. That implementation has since moved (in 2017, after this R package wrapper was first implemented in 2016) to the GitHub repository: https://github.com/AdaCompNUS/sarsop. The C++ implementation, “APPL” acknowledges contributions based on an early version of ZMDP by Trey Smith (http://www.cs.cmu.edu/~trey/zmdp/). ZMDP in turn uses code from pomdp-solve by Tony Cassandra (http://www.cassandra.org/pomdp/code/index.shtml). The POMDPX parser uses TinyXML by Lee Thomason (http://www.grinninglizard.com/tinyxml/). Part of APPL makes use of code based on an early version of ZMDP released under Apache License 2.0. The POMDPX parser makes use of TinyXML released under zlib License. The rest of APPL is released under GNU General Public License V2.

Jeroen Ooms developed the original R package compilation setup, Mykel Kochenderfer and Markus Herrmann have been helpful in providing windows builds using MS Visual Studio. Milad Memarzadeh wrote the R POMDPX parsers and contributed to pacakge development. Carl Boettiger developed and maintains the R package. The C++ library here is modified from the original only to conform to modern C++ development standards and avoid compilation errors or warnings under gcc-8 and gcc-9, as well as modern clang compilers.

Name		Name	Last commit message	Last commit date
Latest commit History 370 Commits
.github		.github
R		R
articles		articles
docs		docs
inst		inst
man		man
src		src
tests		tests
tools		tools
windows/appl-0.96		windows/appl-0.96
.Rbuildignore		.Rbuildignore
.gitignore		.gitignore
.travis.yml		.travis.yml
CRAN-SUBMISSION		CRAN-SUBMISSION
DESCRIPTION		DESCRIPTION
LICENSE		LICENSE
NAMESPACE		NAMESPACE
NEWS.md		NEWS.md
README.Rmd		README.Rmd
README.md		README.md
appveyor.yml		appveyor.yml
codecov.yml		codecov.yml
cran-comments.md		cran-comments.md
sarsop.Rproj		sarsop.Rproj

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SARSOP for R

Problem definition

Semi-analytic solution to Deterministic problem

POMDP Solution

Developer Notes

Thanks

About

Releases 6

Packages

Contributors 4

Languages

License

boettiger-lab/sarsop

Folders and files

Latest commit

History

Repository files navigation

SARSOP for R

Problem definition

Semi-analytic solution to Deterministic problem

POMDP Solution

Developer Notes

Thanks

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 6

Packages 0

Contributors 4

Languages

Packages