Skip to content
This repository has been archived by the owner on May 6, 2024. It is now read-only.

Eric/dataset #318

Merged
merged 27 commits into from
May 16, 2022
Merged

Eric/dataset #318

merged 27 commits into from
May 16, 2022

Conversation

cdmatters
Copy link
Contributor

No description provided.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 26, 2022
Copy link

@drothermel drothermel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So many tests & comments, this looks great to me so far (I just left a few mainly tiny spelling suggestions).

third_party/converter/converter.h Outdated Show resolved Hide resolved
nle/dataset/dataset.py Show resolved Hide resolved
nle/dataset/db.py Outdated Show resolved Hide resolved
nle/dataset/dataset.py Outdated Show resolved Hide resolved
nle/dataset/dataset.py Outdated Show resolved Hide resolved
nle/dataset/populate_db.py Outdated Show resolved Hide resolved
nle/dataset/populate_db.py Outdated Show resolved Hide resolved
nle/dataset/populate_db.py Outdated Show resolved Hide resolved
nle/dataset/populate_db.py Show resolved Hide resolved
nle/dataset/db.py Show resolved Hide resolved
Base automatically changed from eric/add-converter to main April 28, 2022 11:03
@cdmatters cdmatters marked this pull request as ready for review May 9, 2022 11:03
@cdmatters cdmatters requested a review from tscmoo May 9, 2022 11:16
Add a new integration test testing the construction of the dataset from
the altorg and nle_data directories. These altorg games have been checked to see if they are
consecutive episodes, and the match up is correct.

In this mock directory we include extra mock ttyrecs that should not be
included in the dataset. Readmore about the mock dataset in altorg/about.txt

Also: change dataset to default shuffle games.
Remove all mention of torch, and instead just ship numpy arrays.
documentation of populate db.

We want to blacklist files that we have had trouble rendering in the
past. In experience this is only a handful of ttyrecs in millions.
@cdmatters cdmatters force-pushed the eric/dataset branch 3 times, most recently from 6be75b7 to 9bc95a4 Compare May 9, 2022 13:20
NLE does not use the standard ttyrec format, used by alt.org and others.
It stores actions, as well as states, albeit in a different channel.
This commit introduces a version to the ttyrec format - original
ttyrecs are still 'ttyrec.bz2', whereas henceforth NLE generates
'ttyrec2.bz2'.

This will be more useful down the line, as readers can easily detect
which way to read ttyrecs.
This allows the dataset to know which version of ttyrec is being read,
which will allow the downstream Dataset object to chose the right way to read
the arrays in future.

This ttyrec version is always 1 for 'altorg' or old style ttyrecs, and
currently 2 for any nle generated ttyrecs.
Now the converter objects are aware of which version of ttyrec they are
decoding, have been passed it from the the dataset.
@cdmatters cdmatters force-pushed the eric/dataset branch 2 times, most recently from f656ec5 to 7319368 Compare May 9, 2022 20:21
We will now start storing the in-game score in ttyrecs, to provide
greater information for offline learning methods. This is done by
utilising a different 'channel' for score. As it stands the channels
are:
 - 0 - terminal information
 - 1 - action input
 - 2 - in-game score (based on `botl_score()` via `blstats[NLE_BL_SCORE]`)

 Note the in-game score is recorded once, just before `nle_step` returns, at
 the same time that `obs->done` is set. This is done to avoid having to store
 the in-game score everytime the game flushes to screen, and to keep the final
 ttyrec size low.
Add support for v3 ttyrecs in the converter, and feed through to the
dataset. Also add support to the read_tty script.  Note the converter
now writes a score when the frame comes up, but only "flushes" the main
frame when the action comes in.

We test with ttyrec with an example of gaining core from picking up gold and
killing monsters. We expect the action at index t to correspond to the
state and reward just passed.
@drothermel
Copy link

Again, no major comments except that the new versioning function is very convenient. Two minor comments.

@drothermel drothermel self-requested a review May 12, 2022 13:17
Copy link

@drothermel drothermel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, repeating my previous comment now that I've re-requested review from myself: looks great!

:param load_fn: A callback that loads the next file into a converter:
sig: load_fn(converter) -> bool is_success

Note actions will only be non-null if the ttyrec read_actions flag is set.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This note is no longer relevant I believe

@@ -181,6 +181,11 @@ def main():
char, *_ = struct.unpack("<B", data)
data = chr(char).encode("ascii", "backslashreplace")
arrow = "->"
elif channel == 2:
score, *_ = struct.unpack("<i", data)
# data = chr(score).encode("ascii", "backslashreplace")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to delete?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants