Skip to content

Latest commit

 

History

History
82 lines (60 loc) · 2.49 KB

RUNNING.md

File metadata and controls

82 lines (60 loc) · 2.49 KB

Running this code

Working with the network locally

Prerequisites

We use the pipenv dependency/virtualenv framework:

$ pipenv install
$ pipenv shell
(mac-graph-sjOzWQ6Y) $

Prediction

You can watch the model predict values from the hold-back data:

$ python -m macgraph.predict --name my_dataset --model-version 0ds9f0s

predicted_label: shabby
actual_label: derilict
src: How <space> clean <space> is <space> 3 ? <unk> <eos> <eos>
-------
predicted_label: small
actual_label: medium-sized
src: How <space> big <space> is <space> 4 ? <unk> <eos> <eos>
-------
predicted_label: medium-sized
actual_label: tiny
src: How <space> big <space> is <space> 7 ? <unk> <eos> <eos>
-------
predicted_label: True
actual_label: True
src: Does <space> 1 <space> have <space> rail <space> connections ? <unk>
-------
predicted_label: True
actual_label: False
src: Does <space> 0 <space> have <space> rail <space> connections ? <unk>
-------
predicted_label: victorian
actual_label: victorian
src: What <space> architectural <space> style <space> is <space> 1 ? <unk>

TODO: Get it predicting from your typed input

Building the data

To train the model, you need training data.

If you want to skip this step, you can download the pre-built data from our public dataset. This repo is a work in progress so the format is still in flux.

The underlying data (a Graph-Question-Answer YAML from CLEVR-graph) must be pre-processed for training and evaluation. The YAML is transformed into TensorFlow records, and split into train-evaluate-predict tranches.

First generate a gqa.yaml with the command:

clevr-graph$ python -m gqa.generate --count 50000 --int-names
cp data/gqa-some-id.yaml ../mac-graph/input_data/raw/my_dataset.yaml

Then build (that is, pre-process into a vocab table and tfrecords) the data:

mac-graph$ python -m macgraph.input.build --name my_dataset

Arguments to build

  • --limit N will only read N records from the YAML and only output a total of N tf-records (split across three tranches)
  • --type-string-prefix StationProperty will filter just questions with type string prefix "StationProperty"

Training

Let's build a model. (Note, this requires training data from the previous section).

General advice is to have at least 40,000 training records (e.g. build from 50,000 GQA triples)

python -m macgraph.train --name my_dataset