Skip to content
This repository has been archived by the owner on Dec 29, 2022. It is now read-only.

TF-Seq2Seq via Docker #258

Open
wants to merge 14 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
# TF-Seq2Seq (v0.1.0)

[![CircleCI](https://circleci.com/gh/google/seq2seq.svg?style=svg)](https://circleci.com/gh/google/seq2seq)

---
Expand Down
86 changes: 85 additions & 1 deletion docs/getting_started.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
## Download & Setup

To use tf-seq2seq you need a working installation of TensorFlow 1.0 with
Python 2.7 or Python 3.5. Follow the [TensorFlow Getting Started](https://www.tensorflow.org/versions/r1.0/get_started/os_setup) guide for detailed setup instructions. With TensorFlow installed, you can clone this repository:
Python 2.7 or Python 3.5. Follow the [Installing TensorFlow](https://www.tensorflow.org/install/) guide for detailed setup instructions. With TensorFlow installed, you can clone this repository:

```bash
git clone https://github.com/google/seq2seq.git
Expand All @@ -11,12 +11,96 @@ cd seq2seq
pip install -e .
```

## TF-Seq2Seq via Docker

Now is possible to run [tf-seq2seq via Docker](https://github.com/google/seq2seq/seq2seq/tools/docker). Unfortunately, we haven't deployed an automatic pipeline to build container, so we provide the Dockerfile to build images for your repository.

### Build Image
First of all you have to build the image for running tf-seq2seq, according to your python preference(python2.7 or python3.5) and hardware capabilities(with or without GPU) from the right Dockerfile

* [Docker for Python 2.7](https://github.com/google/seq2seq/seq2seq/tools/docker/py27)
* [Docker for Python 3.5](https://github.com/google/seq2seq/seq2seq/tools/docker/py35)

Both the directories have Dockerfile to build image with basic depenecies(Dockerfile), development dependecies(Dockerfile.devel) and GPU support(Dockerfile.gpu and Dockerfile.devel-gpu). The file provided, are coded from the [official TensorFlow Docker directory](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/tools/docker/README.md) with **latest** TensorFlow version/tags.
Now you are ready to build the image(you have to be in py27 or py35 dir to run this command):

```bash
$ docker build -t image_name -f Dockerfile.suffix .
```

We give you some examples:
```bash
# In py27 dir
$ docker build -t py27 -f Dockerfile .
$ docker build -t py27-devel-gpu -f Dockerfile.devel-gpu .

# In py35 dir
$ docker build -t py35-devel -f Dockerfile.devel .
$ docker build -t py35-gpu -f Dockerfile.gpu .
```

### Build Custom Image
To build a container with a certain TensorFlow version, look at the tags/version in the [Docker Store](https://store.docker.com/community/images/tensorflow/tensorflow/tags), then change the first line's tags of the Dockerfile with your choice and run the `docker build` command.

Some examples:
```
# In py27 Dockerfile, with the TensorFlow 1.0.1 version
FROM tensorflow/tensorflow:latest --Became--> FROM tensorflow/tensorflow:1.0.1

#In py35 Dockerfile.devel-gpu, with the TensorFlow 1.1.0-rc2 version
FROM tensorflow/tensorflow:latest-devel-gpu-py3 --Became--> 1.2.0-rc2-devel-gpu-py3
```

Once a container is built, you will find the Tf-Seq2Seq package in the `/src/seq2seq` path(the default workdir).

### Running container

Run non-GPU container using

$ docker run -it -p hostPort:containerPort image_name(provided during the building step)

Some examples
```bash
# Run a container with the Tf-Seq2Seq package in a py27 developer env
$ docker run -it py27-devel

# Run a container with the Tf-Seq2Seq package in a py35 env and look at the result with TensorBoard
$ docker run -it -p 6006:6006 py35

# Run a container with the Tf-Seq2Seq package in a py27 env and work at a Tf-seq2seq package cloned in the Host through the container
$ docker run -it -v $(pwd):/seq2seq -w /seq2seq py27
```

For GPU support install NVidia drivers (ideally latest) and
[nvidia-docker](https://github.com/NVIDIA/nvidia-docker). Run using

$ nvidia-docker run -it -p hostPort:containerPort {repository_name}(provided during the building step)

The examples are the same as above with the only difference that the command is `nvidia-docker`.

Note: If you would have a problem running nvidia-docker you may try the old method
we have used. But it is not recommended. If you find a bug in nvidia-docker, please report
it there and try using nvidia-docker as described above.

```bash
$ export CUDA_SO=$(\ls /usr/lib/x86_64-linux-gnu/libcuda.* | xargs -I{} echo '-v {}:{}')
$ export DEVICES=$(\ls /dev/nvidia* | xargs -I{} echo '--device {}:{}')
$ docker run -it -p 8888:8888 $CUDA_SO $DEVICES gcr.io/tensorflow/tensorflow:latest-gpu
```

## Validate your installation

To make sure everything works as expect you can run a simple pipeline unit test:

```bash
python -m unittest seq2seq.test.pipeline_test
```

Or the full test pipeline(you have to download nose if you haven't installed the tf-seq2seq via Docker):
```bash
nosetests
```

If you see a "OK" message, you are all set. Note that you may need to install pyrouge, pyyaml, and matplotlib, in order for these tests to pass. If you run into other setup issues,
please [file a Github issue](https://github.com/google/seq2seq/issues).

Expand Down
2 changes: 1 addition & 1 deletion docs/nmt.md
Original file line number Diff line number Diff line change
Expand Up @@ -255,7 +255,7 @@ The above command also demonstrates how to pass several tasks to the inference s

### Evaluating specific checkpoint

The training script will save multiple model checkpoints throughout training. The exact checkpoint behavior can be controlled via [training script flags](training/). By default, the inference script evaluates the latest checkpoint in the model directory. To evaluate a specific checkpiint you can pass the `checkpoint_path` flag.
The training script will save multiple model checkpoints throughout training. The exact checkpoint behavior can be controlled via [training script flags](training/). By default, the inference script evaluates the latest checkpoint in the model directory. To evaluate a specific checkpoint you can pass the `checkpoint_path` flag.


## Calcuating BLEU scores
Expand Down
2 changes: 1 addition & 1 deletion pylintrc
Original file line number Diff line number Diff line change
Expand Up @@ -292,7 +292,7 @@ generated-members=set_shape,np.float32
# List of decorators that produce context managers, such as
# contextlib.contextmanager. Add to this list to register other decorators that
# produce valid context managers.
contextmanager-decorators=contextlib.contextmanager
contextmanager-decorators=contextlib.contextmanager,tensorflow.python.util.tf_contextlib.contextmanager


[VARIABLES]
Expand Down
9 changes: 7 additions & 2 deletions seq2seq/contrib/seq2seq/helper.py
Original file line number Diff line number Diff line change
Expand Up @@ -32,8 +32,13 @@

import six

from tensorflow.contrib.distributions.python.ops import bernoulli
from tensorflow.contrib.distributions.python.ops import categorical
try:
from tensorflow.python.ops.distributions import bernoulli
from tensorflow.python.ops.distributions import categorical
except:
# Backwards compatibility with TensorFlow prior to 1.2.
from tensorflow.contrib.distributions.python.ops import bernoulli
from tensorflow.contrib.distributions.python.ops import categorical
from tensorflow.python.framework import dtypes
from tensorflow.python.framework import ops
from tensorflow.python.layers import base as layers_base
Expand Down
2 changes: 2 additions & 0 deletions seq2seq/data/input_pipeline.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,9 @@
import six

import tensorflow as tf
# pylint: disable=no-name-in-module
from tensorflow.contrib.slim.python.slim.data import tfexample_decoder
# pylint: enable=no-name-in-module

from seq2seq.configurable import Configurable
from seq2seq.data import split_tokens_decoder, parallel_data_provider
Expand Down
2 changes: 2 additions & 0 deletions seq2seq/data/parallel_data_provider.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,8 +22,10 @@
import numpy as np

import tensorflow as tf
# pylint: disable=no-name-in-module
from tensorflow.contrib.slim.python.slim.data import data_provider
from tensorflow.contrib.slim.python.slim.data import parallel_reader
# pylint: enable=no-name-in-module

from seq2seq.data import split_tokens_decoder

Expand Down
2 changes: 2 additions & 0 deletions seq2seq/data/sequence_example_decoder.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,9 @@
"""A decoder for tf.SequenceExample"""

import tensorflow as tf
# pylint: disable=no-name-in-module
from tensorflow.contrib.slim.python.slim.data import data_decoder
# pylint: enable=no-name-in-module


class TFSEquenceExampleDecoder(data_decoder.DataDecoder):
Expand Down
2 changes: 2 additions & 0 deletions seq2seq/data/split_tokens_decoder.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,9 @@
from __future__ import unicode_literals

import tensorflow as tf
# pylint: disable=no-name-in-module
from tensorflow.contrib.slim.python.slim.data import data_decoder
# pylint: enable=no-name-in-module


class SplitTokensDecoder(data_decoder.DataDecoder):
Expand Down
2 changes: 2 additions & 0 deletions seq2seq/encoders/image_encoder.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,8 +20,10 @@
from __future__ import print_function

import tensorflow as tf
# pylint: disable=no-name-in-module
from tensorflow.contrib.slim.python.slim.nets.inception_v3 \
import inception_v3_base
# pylint: enable=no-name-in-module

from seq2seq.encoders.encoder import Encoder, EncoderOutput

Expand Down
3 changes: 1 addition & 2 deletions seq2seq/encoders/rnn_encoder.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,6 @@

import copy
import tensorflow as tf
from tensorflow.contrib.rnn.python.ops import rnn

from seq2seq.encoders.encoder import Encoder, EncoderOutput
from seq2seq.training import utils as training_utils
Expand Down Expand Up @@ -186,7 +185,7 @@ def encode(self, inputs, sequence_length, **kwargs):
cells_fw = _unpack_cell(cell_fw)
cells_bw = _unpack_cell(cell_bw)

result = rnn.stack_bidirectional_dynamic_rnn(
result = tf.contrib.rnn.stack_bidirectional_dynamic_rnn(
cells_fw=cells_fw,
cells_bw=cells_bw,
inputs=inputs,
Expand Down
2 changes: 2 additions & 0 deletions seq2seq/metrics/metric_specs.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,9 @@

import tensorflow as tf
from tensorflow.contrib import metrics
# pylint: disable=no-name-in-module
from tensorflow.contrib.learn import MetricSpec
# pylint: enable=no-name-in-module

from seq2seq.data import postproc
from seq2seq.configurable import Configurable
Expand Down
14 changes: 10 additions & 4 deletions seq2seq/test/hooks_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -39,16 +39,22 @@ class TestPrintModelAnalysisHook(tf.test.TestCase):
def test_begin(self):
model_dir = tempfile.mkdtemp()
outfile = tempfile.NamedTemporaryFile()
tf.get_variable("weigths", [128, 128])
tf.get_variable("weights", [128, 128])
hook = hooks.PrintModelAnalysisHook(
params={}, model_dir=model_dir, run_config=tf.contrib.learn.RunConfig())
hook.begin()

with gfile.GFile(os.path.join(model_dir, "model_analysis.txt")) as file:
file_contents = file.read().strip()

self.assertEqual(file_contents.decode(), "_TFProfRoot (--/16.38k params)\n"
" weigths (128x128, 16.38k/16.38k params)")
lines = tf.compat.as_text(file_contents).split("\n")
if len(lines) == 3:
# TensorFlow v1.2 includes an extra header line
self.assertEqual(lines[0], "node name | # parameters")

self.assertEqual(lines[-2], "_TFProfRoot (--/16.38k params)")
self.assertEqual(lines[-1], " weights (128x128, 16.38k/16.38k params)")

outfile.close()


Expand Down Expand Up @@ -125,7 +131,7 @@ def tearDown(self):
def test_capture(self):
global_step = tf.contrib.framework.get_or_create_global_step()
# Some test computation
some_weights = tf.get_variable("weigths", [2, 128])
some_weights = tf.get_variable("weights", [2, 128])
computation = tf.nn.softmax(some_weights)

hook = hooks.MetadataCaptureHook(
Expand Down
100 changes: 100 additions & 0 deletions seq2seq/tools/docker/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,100 @@
# Using TF-Seq2Seq via Docker

This directory contains `Dockerfile`s to make it easy to get up and running with
TF-Seq2Seq via [Docker](https://www.docker.com/).


## Installing Docker

General installation instructions are
[on the Docker site](https://docs.docker.com/), but we give some
quick links here:

* [Docker for Mac OSX](https://www.docker.com/docker-mac)
* [Docker for Windows PC](https://www.docker.com/docker-windows)
* [Docker for Debian](https://www.docker.com/docker-debian)
* [Docker for Ubuntu](https://www.docker.com/docker-ubuntu)
* [Docker for CentOS Distribution](https://www.docker.com/docker-centos-distribution)


## Dockerfile

At the moment we haven't deployed an automatic pipeline to build container, so we provide the Dockerfile to build images
This directory is structured to maintain Dockerfile for python2.7(py27) and python3.5(py35).

* `Dockerfile` - TF-Seq2Seq - CPU only!

* `Dockerfile.devel` - Developer build for TF-Seq2Seq - CPU only!

* `Dockerfile.gpu` - TF-Seq2Seq with support of NVidia CUDA

* `Dockerfile.devel-gpu` - Developer build for TF-Seq2Seq with support of NVidia CUDA

The file provided, are coded from the [official TensorFlow Docker directory](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/tools/docker/README.md) and with **latest** TensorFlow version/tags.


## Build image

To build the container's image run in the directory of your favorite python version(some_path/seq2seq/seq2seq/tools/docker/{python-version}/):

$ docker build -t image_name -f Dockerfile.suffix .


Some examples
```bash
# In py27 dir
$ docker build -t py27 -f Dockerfile .
$ docker build -t py27-devel-gpu -f Dockerfile.devel-gpu .

# In py35 dir
$ docker build -t py35-devel -f Dockerfile.devel .
$ docker build -t py35-gpu -f Dockerfile.gpu .
```

To build a container with a certain TensorFlow version, look at the tags/version in the [Docker Store](https://store.docker.com/community/images/tensorflow/tensorflow/tags), then change the first line's tags of the Dockerfile with your choice:

Some examples:
```
# In py27 Dockerfile, with the TensorFlow 1.0.1 version
FROM tensorflow/tensorflow:latest --Became--> FROM tensorflow/tensorflow:1.0.1

#In py35 Dockerfile.devel-gpu, with the TensorFlow 1.1.0-rc2 version
FROM tensorflow/tensorflow:latest-devel-gpu-py3 --Became--> 1.2.0-rc2-devel-gpu-py3
```

Once a container is built, you will find the Tf-Seq2Seq package in the `/src/seq2seq` path(the default workdir).

## Running container

Run non-GPU container using

$ docker run -it -p hostPort:containerPort image_name(provided during the building step)

Some examples
```bash
# Run a container with the Tf-Seq2Seq package in a py27 developer env
$ docker run -it py27-devel

# Run a container with the Tf-Seq2Seq package in a py35 env and look at the result with TensorBoard
$ docker run -it -p 6006:6006 py35

# Run a container with the Tf-Seq2Seq package in a py27 env and work at a Tf-seq2seq package cloned in the Host through the container
$ docker run -it -v $(pwd):/seq2seq -w /seq2seq py27
```

For GPU support install NVidia drivers (ideally latest) and
[nvidia-docker](https://github.com/NVIDIA/nvidia-docker). Run using

$ nvidia-docker run -it -p hostPort:containerPort {repository_name}(provided during the building step)

The examples are the same as above with the only difference that the command is `nvidia-docker`.

Note: If you would have a problem running nvidia-docker you may try the old method
we have used. But it is not recommended. If you find a bug in nvidia-docker, please report
it there and try using nvidia-docker as described above.

```bash
$ export CUDA_SO=$(\ls /usr/lib/x86_64-linux-gnu/libcuda.* | xargs -I{} echo '-v {}:{}')
$ export DEVICES=$(\ls /dev/nvidia* | xargs -I{} echo '--device {}:{}')
$ docker run -it -p 8888:8888 $CUDA_SO $DEVICES gcr.io/tensorflow/tensorflow:latest-gpu
```
31 changes: 31 additions & 0 deletions seq2seq/tools/docker/py27/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
FROM tensorflow/tensorflow:latest

MAINTAINER Alessio Gozzoli <[email protected]>

RUN apt-get update && apt-get install -y \
python-tk \
python3-tk \
git \
&& \
apt-get clean && \
rm -rf /var/lib/apt/lists/*

# Install Seq2Seq Dependencies
WORKDIR /
RUN pip --no-cache-dir install -e git+https://github.com/google/seq2seq.git#egg=seq2seq && \
pip --no-cache-dir install \
nose \
pylint \
tox \
yapf \
mkdocs

# Set Matploblib Backend
RUN mkdir -p /root/.config/matplotlib/ && \
touch /root/.config/matplotlib/matplotlibrc && \
echo "backend : Agg" >> /root/.config/matplotlib/matplotlibrc

# Default workdir
WORKDIR "/src/seq2seq"

CMD ["/bin/bash"]
Loading