Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Algorithmic Environment documentation #2334

Closed
wants to merge 5 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
41 changes: 41 additions & 0 deletions docs/algorithmic.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
# Algorithmic Environments

The unique dependencies for this set of environments can be installed via:

````bash
pip install gym[algorithmic]
````

### Characteristics

Algorithmic environments have the following traits in common:
- A 1-d "input tape" or 2-d "input grid" of characters
- A target string which is a deterministic function of the input characters

Agents control a read head that moves over the input tape. Observations consist
of the single character currently under the read head. The read head may fall
off the end of the tape in any direction. When this happens, agents will observe
a special blank character (with index=env.base) until they get back in bounds.

### Actions
Actions consist of 3 sub-actions:
- Direction to move the read head (left or right, plus up and down for 2-d
envs)
- Whether to write to the output tape
- Which character to write (ignored if the above sub-action is 0)

An episode ends when:
- The agent writes the full target string to the output tape.
- The agent writes an incorrect character.
- The agent runs out the time limit. (Which is fairly conservative.)

Reward schedule:
- write a correct character: +1
- write a wrong character: -.5
- run out the clock: -1
- otherwise: 0

In the beginning, input strings will be fairly short. After an environment has
been consistently solved over some window of episodes, the environment will
increase the average length of generated strings. Typical env specs require
leveling up many times to reach their reward threshold.
43 changes: 43 additions & 0 deletions docs/algorithmic/copy.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
Copy
---
|Title|Action Type|Action Shape|Action Values|Observation Shape|Observation Values|Average Total Reward|Import|
| ----------- | -----------| ----------- | -----------| ----------- | -----------| ----------- | -----------|
|Copy|Discrete|(3,)|[(0, 1),(0,1),(0,<a href="#base">base</a>-1)]|(1,)|(0,<a href="#base">base</a>)| |from gym.envs.algorithmic import copy_|
---

This task involves copying content from the input tape to the output tape. This task was originally used in the paper <a href="http://arxiv.org/abs/1511.07275">Learning Simple Algorithms from Examples</a>.

The model has to learn:
- correspondence between input and output symbols.
- executing the move right action on input tape.

The agent take a 3-element vector for actions.
The action space is `(x, w, v)`, where:
- `x` is used for left/right movement. It can take values (0,1).
- `w` is used for writing to output tape or not. It can take values (0,1).
- `r` is used for selecting the value to be written on output tape.


The observation space size is `(1,)` .

**Rewards:**

Rewards are issued similar to other Algorithmic Environments. Reward schedule:
- write a correct character: +1
- write a wrong character: -.5
- run out the clock: -1
- otherwise: 0

### Arguments

```
gym.make('Copy-v0', base=5, chars=True)
```

<a id="base">`base`</a>: Number of distinct characters to read/write.

`chars`: If True, use uppercase alphabet. Otherwise, digits. Only affects rendering.

### Version History

* v0: Initial versions release (1.0.0)
43 changes: 43 additions & 0 deletions docs/algorithmic/duplicated_input.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
Duplicated Input
---
|Title|Action Type|Action Shape|Action Values|Observation Shape|Observation Values|Average Total Reward|Import|
| ----------- | -----------| ----------- | -----------| ----------- | -----------| ----------- | -----------|
|Duplicated Input|Discrete|(3,)|[(0, 1),(0,1),(0,<a href="#base">base</a>-1)]|(1,)|(0,<a href="#base">base</a>)| |from gym.envs.algorithmic import duplicated_input|
---

Task is to return every nth (<a href="#dup">duplication</a>) character from the input tape. This task was originally used in the paper <a href="http://arxiv.org/abs/1511.07275">Learning Simple Algorithms from Examples</a>.

The model has to learn:
- correspondence between input and output symbols.
- executing the move right action on input tape.

The agent take a 3-element vector for actions.
The action space is `(x, w, v)`, where:
- `x` is used for left/right movement. It can take values (0,1).
- `w` is used for writing to output tape or not. It can take values (0,1).
- `r` is used for selecting the value to be written on output tape.


The observation space size is `(1,)` .

**Rewards:**

Rewards are issued similar to other Algorithmic Environments. Reward schedule:
- write a correct character: +1
- write a wrong character: -.5
- run out the clock: -1
- otherwise: 0

### Arguments

```
gym.make('DuplicatedInput-v0', base=5, duplication=2)
```

<a id="base">`base`</a>: Number of distinct characters to read/write.

<a id="dup">`duplication`</a>: Number of similar characters that should be converted to a single character.

### Version History

* v0: Initial versions release (1.0.0)
47 changes: 47 additions & 0 deletions docs/algorithmic/repeat_copy.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
Repeat Copy
---
|Title|Action Type|Action Shape|Action Values|Observation Shape|Observation Values|Average Total Reward|Import|
| ----------- | -----------| ----------- | -----------| ----------- | -----------| ----------- | -----------|
|Repeat Copy|Discrete|(3,)|[(0, 1),(0,1),(0,<a href="#base">base</a>-1)]|(1,)|(0,<a href="#base">base</a>)| |from gym.envs.algorithmic import repeat_copy|
---



This task involves copying content from the input tape to the output tape in normal order, reverse order and normal order, for example for input [x​1 x2​​ …xk] the required output is [x​1 x2​​ …xk xk …x2 x1 x​1 x2​​ …xk] . This task was originally used in the paper <a href="http://arxiv.org/abs/1511.07275">Learning Simple Algorithms from Examples</a>.

The model has to learn:
- correspondence between input and output symbols.
- executing the move left and right action on input tape.

The agent take a 3-element vector for actions.
The action space is `(x, w, v)`, where:
- `x` is used for left/right movement. It can take values (0,1).
- `w` is used for writing to output tape or not. It can take values (0,1).
- `r` is used for selecting the value to be written on output tape.


The observation space size is `(1,)` .

**Rewards:**

Rewards are issued similar to other Algorithmic Environments. Reward schedule:
- write a correct character: +1
- write a wrong character: -.5
- run out the clock: -1
- otherwise: 0



### Arguments

```
gym.make('RepeatCopy-v0', base=5)
```

<a id="base">`base`</a>: Number of distinct characters to read/write.



### Version History

* v0: Initial versions release (1.0.0)
41 changes: 41 additions & 0 deletions docs/algorithmic/reverse.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
Reverse
---
|Title|Action Type|Action Shape|Action Values|Observation Shape|Observation Values|Average Total Reward|Import|
| ----------- | -----------| ----------- | -----------| ----------- | -----------| ----------- | -----------|
|Reverse|Discrete|(3,)|[(0, 1),(0,1),(0,<a href="#base">base</a>-1)]|(1,)|(0,<a href="#base">base</a>)| |from gym.envs.algorithmic import reverse|
---

The goal is to reverse a sequence of symbols on the input tape. We provide a special character `r` to indicate the end of the sequence. The model must learn to move right multiple times until it hits the `r` symbol, then move to the left, copying the symbols to the output tape. This task was originally used in the paper <a href="http://arxiv.org/abs/1511.07275">Learning Simple Algorithms from Examples</a>.

The model has to learn:
- correspondence between input and output symbols.
- executing the move left and right action on input tape.

The agent take a 3-element vector for actions.
The action space is `(x, w, v)`, where:
- `x` is used for left/right movement. It can take values (0,1).
- `w` is used for writing to output tape or not. It can take values (0,1).
- `r` is used for selecting the value to be written on output tape.


The observation space size is `(1,)` .

**Rewards:**

Rewards are issued similar to other Algorithmic Environments. Reward schedule:
- write a correct character: +1
- write a wrong character: -.5
- run out the clock: -1
- otherwise: 0

### Arguments

```
gym.make('Reverse-v0', base=2)
```

<a id="base">`base`</a>: Number of distinct characters to read/write.

### Version History

* v0: Initial versions release (1.0.0)
46 changes: 46 additions & 0 deletions docs/algorithmic/reversed_addition.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
Reversed Addition
---
|Title|Action Type|Action Shape|Action Values|Observation Shape|Observation Values|Average Total Reward|Import|
| ----------- | -----------| ----------- | -----------| ----------- | -----------| ----------- | -----------|
|Reversed Addition|Discrete|(3,)|[(0,1,2,3),(0,1),(0,<a href="#base">base</a>-1)]|(1,)|(0,<a href="#base">base</a>)| |from gym.envs.algorithmic import reversed_addition|
---

The goal is to add <a href="#rows">"rows"</a> number of multi-digit sequences, provided on an input grid. The sequences are provided in <a href="#rows">"rows"</a> number adjacent rows, with the right edges aligned. The initial position of the read head is the last digit of the top number (i.e. upper-right corner). This task was originally used in the paper <a href="http://arxiv.org/abs/1511.07275">Learning Simple Algorithms from Examples</a>.

The model has to:
- memorize an addition table for pairs of digits.
- learn how to move over the input grid.
- discover the concept of a carry.

The agent take a 3-element vector for actions.
The action space is `(x, w, v)`, where:
- `x` is used for direction of movement. It can take values (0,1,2,3).
- `w` is used for writing to output tape or not. It can take values (0,1).
- `r` is used for selecting the value to be written on output tape.


The observation space size is `(1,)` .

**Rewards:**

Rewards are issued similar to other Algorithmic Environments. Reward schedule:
- write a correct character: +1
- write a wrong character: -.5
- run out the clock: -1
- otherwise: 0

### Arguments

```
gym.make('ReversedAddition-v0', rows=2, base=3) #for ReversedAddition
gym.make('ReversedAddition3-v0', rows=3, base=3) #for ReversedAddition3
gym.make('ReversedAddition-v0', rows=n, base=3) #for ReversedAddition with n numbers
```

<a id="rows">`rows`</a>: Number of multi-digit sequences to add at a time.

<a id="base">`base`</a>: Number of distinct characters to read/write.

### Version History

* v0: Initial versions release (1.0.0)
60 changes: 60 additions & 0 deletions docs/toy_text/blackjack.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
Blackjack
---
|Title|Action Type|Action Shape|Action Values|Observation Shape|Observation Values|Average Total Reward|Import|
| ----------- | -----------| ----------- | -----------| ----------- | -----------| ----------- | -----------|
|Blackjack|Discrete|(1,)|(0,1)|(3,)|[(0,31),(0,10),(0,1)]| |from gym.envs.toy_text import blackjack|
---

Blackjack is a card game where the goal is to obtain cards that sum to as near as possible to 21 without going over. They're playing against a fixed dealer.

Card Values:

- Face cards (Jack, Queen, King) have point value 10.
- Aces can either count as 11 or 1, and it's called 'usable ace' at 11.
- Numerical cards (2-9) have value of their number.

This game is placed with an infinite deck (or with replacement).
The game starts with dealer having one face up and one face down card, while player having two face up cards.

The player can request additional cards (hit, action=1) until they decide to stop
(stick, action=0) or exceed 21 (bust).
After the player sticks, the dealer reveals their facedown card, and draws
until their sum is 17 or greater. If the dealer goes bust the player wins.
If neither player nor dealer busts, the outcome (win, lose, draw) is
decided by whose sum is closer to 21.

The agent take a 1-element vector for actions.
The action space is `(action)`, where:
- `action` is used to decide stick/hit for values (0,1).

The observation of a 3-tuple of: the players current sum,
the dealer's one showing card (1-10 where 1 is ace), and whether or not the player holds a usable ace (0 or 1).

This environment corresponds to the version of the blackjack problem
described in Example 5.1 in Reinforcement Learning: An Introduction
by Sutton and Barto.
http://incompleteideas.net/book/the-book-2nd.html

**Rewards:**

Reward schedule:
- win game: +1
- lose game: -1
- draw game: 0
- win game with natural blackjack:

+1.5 (if <a href="#nat">natural</a> is True.)

+1 (if <a href="#nat">natural</a> is False.)

### Arguments

```
gym.make('Blackjack-v0', natural=False)
```

<a id="nat">`natural`</a>: Whether to give an additional reward for starting with a natural blackjack, i.e. starting with an ace and ten (sum is 21).

### Version History

* v0: Initial versions release (1.0.0)
70 changes: 70 additions & 0 deletions docs/toy_text/frozen_lake.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
Frozen Lake
---
|Title|Action Type|Action Shape|Action Values|Observation Shape|Observation Values|Average Total Reward|Import|
| ----------- | -----------| ----------- | -----------| ----------- | -----------| ----------- | -----------|
|Frozen Lake|Discrete|(1,)|(0,3)|(1,)|(0,nrows*ncolumns)| |from gym.envs.toy_text import frozen_lake|
---


Frozen lake involves crossing a frozen lake from Start(S) to goal(G) without falling into any holes(H). The agent may not always move in the intended direction due to the slippery nature of the frozen lake.

The agent take a 1-element vector for actions.
The action space is `(dir)`, where `dir` decides direction to move in which can be:

- 0: LEFT
- 1: DOWN
- 2: RIGHT
- 3: UP

The observation is a value representing the agents current position as

current_row * nrows + current_col

**Rewards:**

Reward schedule:
- Reach goal(G): +1
- Reach hole(H): 0

### Arguments

```
gym.make('FrozenLake-v0', desc=None,map_name="4x4", is_slippery=True)
```

`desc`: Used to specify custom map for frozen lake. For example,

desc=["SFFF", "FHFH", "FFFH", "HFFG"].

`map_name`: ID to use any of the preloaded maps.

"4x4":[
"SFFF",
"FHFH",
"FFFH",
"HFFG"
]

"8x8": [
"SFFFFFFF",
"FFFFFFFF",
"FFFHFFFF",
"FFFFFHFF",
"FFFHFFFF",
"FHHFFFHF",
"FHFFHFHF",
"FFFHFFFG",
]




`is_slippery`: True/False. If True will move in intended direction with probability of 1/3 else will move in either perpendicular direction with equal probability of 1/3 in both directions.

For example, if action is left and is_slippery is True, then:
- P(move left)=1/3
- P(move up)=1/3
- P(move down)=1/3
### Version History

* v0: Initial versions release (1.0.0)
Loading