Convert RNN generation to TF v2. #1978

mihaimaruseac · 2020-08-18T00:03:07Z

TF v2 offers ways to handle validation and TensorBoard during model callback. We can also do the batching using tf.data instead of writing the loop manually.

I'm trying it this way now to not do too many changes at once but will probably send another PR later to convert to the recommended way.

TF v2 models only save 2 files, not three.

If no checkpoint has been saved, validation and data generation will fail as they imply loading saved weights to use in the rebatched model. A better design would be to have callbacks that do validation and generation and all TensorBoard stuff but for that we first need to convert to a dataset representation and do automatic batching.

mihaimaruseac · 2020-08-18T00:03:21Z

Fixes #1540

Dor1s · 2020-08-18T00:04:24Z

/gcbrun

Dor1s

This is crazy and LGTM! Let's see what makes the CI unhappy

Dor1s · 2020-08-18T00:28:19Z

/gcbrun

mihaimaruseac · 2020-08-18T00:51:33Z

The pylint issue on Travis I think can be fixed by pylint-dev/pylint#1542 (comment). TensorBoard recommends the as_default() syntax in the migration guide

I cannot see the GCP build log so I don't know what failed there.

Should I instead try and convert code to use tf.data and callbacks for validation?

mihaimaruseac · 2020-08-18T01:09:17Z

I disabled it inline, hopefully this would fix the travis build

mihaimaruseac · 2020-08-18T01:32:01Z

Ok, Travis passes now. Should we try another /gcbrun ?

inferno-chromium · 2020-08-18T01:43:54Z

/gcbrun

mihaimaruseac · 2020-08-18T02:17:20Z

Still fails :( But I cannot see the log

inferno-chromium · 2020-08-18T02:28:24Z

Still fails :( But I cannot see the log

======================================================================
FAIL: test_train_rnn (tests.core.bot.tasks.ml_train_task_test.MLRnnTrainTaskIntegrationTest)
Test train RNN model on a simple corpus.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/workspace/src/python/tests/core/bot/tasks/ml_train_task_test.py", line 193, in test_train_rnn
    self.assertTrue(ml_train_task.get_last_saved_model(self.model_directory))
AssertionError: {} is not true

----------------------------------------------------------------------
Ran 155 tests in 30.966s

FAILED (failures=1, skipped=4)

Ran 1437 tests (111 skipped, 0 errors, 1 failures).

inferno-chromium · 2020-08-18T02:29:13Z

Still fails :( But I cannot see the log

======================================================================
FAIL: test_train_rnn (tests.core.bot.tasks.ml_train_task_test.MLRnnTrainTaskIntegrationTest)
Test train RNN model on a simple corpus.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/workspace/src/python/tests/core/bot/tasks/ml_train_task_test.py", line 193, in test_train_rnn
    self.assertTrue(ml_train_task.get_last_saved_model(self.model_directory))
AssertionError: {} is not true

----------------------------------------------------------------------
Ran 155 tests in 30.966s

FAILED (failures=1, skipped=4)

Ran 1437 tests (111 skipped, 0 errors, 1 failures).

Using python butler.py py_unittest -t core -m

mihaimaruseac · 2020-08-18T15:29:40Z

Thanks. I'll try to replicate locally, it's possible to be caused by the change in model format

inferno-chromium · 2020-08-18T17:07:52Z

/gcbrun

inferno-chromium · 2020-08-18T17:19:39Z

/gcbrun

Dor1s

Awesome!!! Thanks @inferno-chromium for helping to debug!

Dor1s · 2020-08-18T18:01:03Z

We can merge this today, but if anyone deploys, I won't be able to monitor the errors as I'm OOO today.

mihaimaruseac · 2020-08-18T18:25:48Z

I'm ok with waiting :)

mihaimaruseac added 14 commits August 17, 2020 16:58

No longer look for model metadata.

a806aa5

TF v2 models only save 2 files, not three.

Don't display gaps in the progress bar.

ed2e422

Remove tf.contrib imports.

b97132e

Re-enable RNN tests.

f171c4d

Create a TF Keras sequential model for RNN.

15c7650

Use TF 2.0 for randomness init.

59779e3

Use the TF v2 optimizer

de3e1db

Use the TFv2 summary.

cb04a7b

Use the TF v2 RNN model

40befc0

Save/load the RNN model

ccf483a

Model validation and demo test generation

a409c9c

Typo fix

a0da8b6

Use a @tf.function for training step to optimize runtime.

3e19319

google-cla bot added the cla: yes CLA signed. label Aug 18, 2020

mihaimaruseac mentioned this pull request Aug 18, 2020

Update TensorFlow and other dependencies after Python3 migration #1540

Closed

Pylint

15b8c54

Dor1s reviewed Aug 18, 2020

View reviewed changes

Disable pylint's no-context-manager locally.

c5eeb5c

Space fix

68a8244

Update tests for the new model format (2 files instead of 3).

dbc9f88

Fix pylint

f70f2c2

Dor1s approved these changes Aug 18, 2020

View reviewed changes

Dor1s merged commit 8f1da98 into google:master Aug 19, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Convert RNN generation to TF v2. #1978

Convert RNN generation to TF v2. #1978

mihaimaruseac commented Aug 18, 2020

mihaimaruseac commented Aug 18, 2020

Dor1s commented Aug 18, 2020

Dor1s left a comment

Dor1s commented Aug 18, 2020

mihaimaruseac commented Aug 18, 2020

mihaimaruseac commented Aug 18, 2020

mihaimaruseac commented Aug 18, 2020

inferno-chromium commented Aug 18, 2020

mihaimaruseac commented Aug 18, 2020

inferno-chromium commented Aug 18, 2020

inferno-chromium commented Aug 18, 2020

mihaimaruseac commented Aug 18, 2020

inferno-chromium commented Aug 18, 2020

inferno-chromium commented Aug 18, 2020

Dor1s left a comment

Dor1s commented Aug 18, 2020

mihaimaruseac commented Aug 18, 2020

Convert RNN generation to TF v2. #1978

Convert RNN generation to TF v2. #1978

Conversation

mihaimaruseac commented Aug 18, 2020

mihaimaruseac commented Aug 18, 2020

Dor1s commented Aug 18, 2020

Dor1s left a comment

Choose a reason for hiding this comment

Dor1s commented Aug 18, 2020

mihaimaruseac commented Aug 18, 2020

mihaimaruseac commented Aug 18, 2020

mihaimaruseac commented Aug 18, 2020

inferno-chromium commented Aug 18, 2020

mihaimaruseac commented Aug 18, 2020

inferno-chromium commented Aug 18, 2020

inferno-chromium commented Aug 18, 2020

mihaimaruseac commented Aug 18, 2020

inferno-chromium commented Aug 18, 2020

inferno-chromium commented Aug 18, 2020

Dor1s left a comment

Choose a reason for hiding this comment

Dor1s commented Aug 18, 2020

mihaimaruseac commented Aug 18, 2020