Skip to content

Learning Intention Reconsideration Strategies using Reinforcement Learning on Markov Decision Processes

License

Notifications You must be signed in to change notification settings

marcvanzee/mdp-plan-revision

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

70 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

mdp-plan-revision

Read a more detailed description of the conceptual underpinnings and experimental results in the following paper:

Intention Reconsideration as Metareasoning (Marc van Zee, Thomas Icard), In Bounded Optimality and Rational Metareasoning NIPS 2015 Workshop, 2015.

Summary

This project implements an agent that is situated on a Markov Decision Process (MDP).

A Markov Decision Process in this software

The agent is able to compute the optimal policy through Value Iteration.

Optimal policy (green arrows) computing using value iteration

The MDP is changing over time, and the agent can respond to this change by either acting (i.e. executing the optimal action according to its current policy) or thinking (i.e. computing a new policy). The task is to learn the best meta-reasoning strategy, i.e. deciding when to think or act, based on the characteristics of the envrionment.

This general setup is quite complex, so we have simplified the environment (i.e. the MDP) to the TIleworld environment. This consists of an agent that is situated on a grid. It can move up, down, left, or right and has to fill holes, which means it has to reach specific states in the grid. It cannot move through obstacles.

Tileworld in MDP representation

In the above screenshot, the agent is represented by a yellow state, obstacles are represented by black states, goal states by larger green states, and "normal" grid states by red states. In order to simplify the Tileworld visualization, we have developed an alternative one:

Tileworld in simplified representation

Note that this is still an MDP: We have only simplified the visualization. This allows us to visualize larger Tileworld scenarios easily:

Tileworld in simplified representation

We then develop several metareasoning strategies that the agent can use. Read a more detailed description of the conceptual underpinnings and experimental results in our paper:

Intention Reconsideration as Metareasoning (Marc van Zee, Thomas Icard), In Bounded Optimality and Rational Metareasoning NIPS 2015 Workshop, 2015.

About

Learning Intention Reconsideration Strategies using Reinforcement Learning on Markov Decision Processes

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages