We train a Transformer encoder-decoder model
on the extracted title-comment pairs, considering the prediction of comments
as a translation problem by modelling p(comment|title)
. This can also be interpreted as a
language model conditioned on the submission title.
It is not obvious to me if this should work at all, since I'd think that the p(comment|title)
distribution
has a significantly larger entropy than the usual translation models. Let's just see what'll happen...
See ../data.
Now that we're done with data preparation, let's prepare the data some more.
paste ../data/data.train.bpe.{titles,comments} | shuf > data.train.bpe.shuf.titles-comments
cut -f1 < data.train.bpe.shuf.titles-comments > data.train.bpe.shuf.titles
cut -f2 < data.train.bpe.shuf.titles-comments > data.train.bpe.shuf.comments
Since titles are lowercased and comments are not, and comments contain other additional symbols, such as Markdown or links, we build two vocabularies:
onmt-build-vocab --save_vocab vocab.titles data.train.bpe.shuf.titles
onmt-build-vocab --save_vocab vocab.comments data.train.bpe.shuf.comments
Adjust settings and paths in opennmt_config.yml
if necessary. Let's hope TensorFlow is ready
to go, and start training. OpenNMT-tf will save checkpoints periodically (as configured),
so training can be continued from there in case something crashes or if your mother rips out
the power plug.
onmt-main train --config opennmt_config.yml --model_type Transformer --num_gpus 1
I trained the model for about 40K steps with opennmt_config.yml
. I noticed that the loss wasn't
improving much after that, so I got worried and increased the batch size (known to help with
training Transformer models) by performing gradient accumulation as in opennmt_config_larger_batch.yml
.
As can be seen in the plot below, this seems to have helped.
Unfortunately, I don't have a plot for the dev loss, since I forgot to turn on dev evaluation. What a bummer.
Once the model has finished training, we can export it for serving as follows:
CUDA_VISIBLE_DEVICES= onmt-main export --export export1 --config opennmt_config_larger_batch.yml sample.yml --num_gpus 0
Here, we save the exported model in the directory export1/
.
See ../serve.