Reverse

Title	Action Type	Action Shape	Action Values	Observation Shape	Observation Values	Average Total Reward	Import
Reverse	Discrete	(3,)	[(0, 1),(0,1),(0,base-1)]	(1,)	(0,base)		from gym.envs.algorithmic import reverse

The goal is to reverse a sequence of symbols on the input tape. We provide a special character r to indicate the end of the sequence. The model must learn to move right multiple times until it hits the r symbol, then move to the left, copying the symbols to the output tape. This task was originally used in the paper Learning Simple Algorithms from Examples.

The model has to learn:

correspondence between input and output symbols.
executing the move left and right action on input tape.

The agent take a 3-element vector for actions. The action space is (x, w, v), where:

x is used for left/right movement. It can take values (0,1).
w is used for writing to output tape or not. It can take values (0,1).
r is used for selecting the value to be written on output tape.

The observation space size is (1,) .

Rewards:

Rewards are issued similar to other Algorithmic Environments. Reward schedule:

write a correct character: +1
write a wrong character: -.5
run out the clock: -1
otherwise: 0

Arguments

gym.make('Reverse-v0', base=2)

base: Number of distinct characters to read/write.

Version History

v0: Initial versions release (1.0.0)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

reverse.md

reverse.md

Reverse

Arguments

Version History

Files

reverse.md

Latest commit

History

reverse.md

File metadata and controls

Reverse

Arguments

Version History