Skip to content

An experiment in extensible transfer reinforcement learning

Notifications You must be signed in to change notification settings

tehZevo/aegis-core

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 

Repository files navigation

Aegis

An experiment in extensible transfer reinforcement learning

How does it work?

The core protocol is very simple: you send a reward to a node, and it responds with its current state. This seems to work so far for simple experiments (ex: a single agent and a single CartPole environment). The concept of reward distribution and propagation through agents is under exploration.

Currently, the protocol is implemented using a single HTTP POST request.

Terminology (subject to change):

Node

  • A single process that accepts rewards and returns states when requested
  • Could be a reinforcement learning agent, a Gym environment, a static pretrained classifier/regressor, autoencoder, generative network, etc.
  • Rewards need not be respected; they can simply be ignored or propagated elsewhere.

Controller

  • Accumulates rewards and distributes state upon request
  • Calls an engine's update method to train and produce a new state

Engine:

  • Handles internal update logic such as accumulating input states, training a reinforcement learning agent, producing a new state, etc.

Reward proxy:

  • Accepts rewards which are then sent to multiple nodes at the same time

TODO

  • Use python-fire to reduce clutter in run scripts

  • Reduce dependency on start_nodes in run scripts

  • Perhaps clean up run scripts/functions with inspect.getargspec

  • support environments with discrete observation spaces

  • Add flask log anti-spam to flask controller

  • VAE engine

    • stores seen inputs and trains on them
  • multiple input/output channels per node (mostly for VAE and pretrained SL models)

  • enable CORS for non-standard flask engines (reward_proxy)

  • throttle node

    • scale data between 0 and 1x
    • might be useful for slowly reducing the dependence on a particular node
  • monitoring / command API

    • connect to one node, it finds upstream connections?
      • how to handle downstream connections
    • config file for saving nodes?
  • REST API for modifying node settings

    • connect/disconnect
    • learning rate, gamma/lambda, etc
  • Add autoencoders with channels so the AE can be reused for (and trained on) multiple inputs

  • "PSET" engine

    • policy search with eligibility traces
    • uses parameter search instead of gradients (compared to PGET)

About

An experiment in extensible transfer reinforcement learning

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages