Force DenseWithSparseWeights to produce dense output and use all inputs #8011

JEM-Mosig · 2021-02-22T11:28:37Z

Proposed changes:

Renames DenseWithSparseWeights into RandomlyConnectedDense
Adds guarantees that even at density zero the output is dense and every input is connected to at least one output
The former weight_sparsity parameter is now roughly equivalent to 1 - connection_density, except at very low densities (high sparsities).
Fixes Force DenseWithSparseWeights to have at least one entry per row #7999

Status (please check what you already did):

added some tests for the functionality
~~updated the documentation~~
updated the changelog (please check changelog for instructions)
reformat files using black (please check Readme for instructions)

rasa/utils/tensorflow/constants.py

github-actions · 2021-03-12T19:06:55Z

Commit: 6c7c46a, The full report is available as an artifact.

Dataset: Carbon Bot, Dataset repository branch: main

Configuration	Intent Classification Micro F1	Entity Recognition Micro F1	Response Selection Micro F1
`BERT + DIET(bow) + ResponseSelector(bow)` test: `1m26s`, train: `3m52s`, total: `5m17s`	0.8000 (0.01)	0.7529 (0.00)	0.5382 (-0.05)
`BERT + DIET(seq) + ResponseSelector(t2t)` test: `1m42s`, train: `4m3s`, total: `5m45s`	0.7786 (-0.02)	0.7892 (0.01)	0.5430 (0.00)
`Sparse + BERT + DIET(bow) + ResponseSelector(bow)` test: `1m25s`, train: `4m17s`, total: `5m42s`	0.7825 (-0.01)	0.7529 (0.00)	0.5515 (0.01)
`Sparse + BERT + DIET(seq) + ResponseSelector(t2t)` test: `1m48s`, train: `4m32s`, total: `6m20s`	0.7903 (-0.01)	0.7847 (-0.02)	0.5497 (-0.01)
`Sparse + DIET(bow) + ResponseSelector(bow)` test: `37s`, train: `2m31s`, total: `3m7s`	0.7534 (0.03)	0.7529 (0.00)	0.4636 (-0.08)
`Sparse + DIET(seq) + ResponseSelector(t2t)` test: `57s`, train: `3m46s`, total: `4m43s`	0.7437 (0.01)	0.6685 (-0.04)	0.5449 (0.04)

Dataset: Hermit, Dataset repository branch: main

Configuration	Intent Classification Micro F1	Entity Recognition Micro F1	Response Selection Micro F1
`BERT + DIET(bow) + ResponseSelector(bow)` test: `3m11s`, train: `18m22s`, total: `21m33s`	0.8941 (0.01)	0.7504 (0.00)	`no data`
`BERT + DIET(seq) + ResponseSelector(t2t)` test: `2m45s`, train: `12m1s`, total: `14m46s`	0.9024 (0.01)	0.8119 (0.01)	`no data`
`Sparse + BERT + DIET(bow) + ResponseSelector(bow)` test: `3m43s`, train: `21m52s`, total: `25m35s`	0.8643 (-0.01)	0.7504 (0.00)	`no data`
`Sparse + BERT + DIET(seq) + ResponseSelector(t2t)` test: `3m30s`, train: `14m11s`, total: `17m41s`	0.8755 (0.01)	0.8224 (0.03)	`no data`
`Sparse + DIET(seq) + ResponseSelector(t2t)` test: `1m40s`, train: `13m3s`, total: `14m42s`	0.8346 (-0.00)	0.7461 (-0.00)	`no data`

Dataset: Private 1, Dataset repository branch: main

Configuration	Intent Classification Micro F1	Entity Recognition Micro F1	Response Selection Micro F1
`BERT + DIET(bow) + ResponseSelector(bow)` test: `2m9s`, train: `4m0s`, total: `6m8s`	0.9064 (0.00)	0.9612 (0.00)	`no data`
`BERT + DIET(seq) + ResponseSelector(t2t)` test: `2m32s`, train: `3m30s`, total: `6m1s`	0.9168 (0.00)	0.9718 (-0.00)	`no data`
`Spacy + DIET(bow) + ResponseSelector(bow)` test: `1m36s`, train: `3m47s`, total: `5m23s`	0.7942 (-0.00)	0.9574 (0.00)	`no data`
`Spacy + DIET(seq) + ResponseSelector(t2t)` test: `2m0s`, train: `4m18s`, total: `6m17s`	0.8358 (-0.00)	0.9260 (0.01)	`no data`
`Sparse + DIET(bow) + ResponseSelector(bow)` test: `31s`, train: `3m19s`, total: `3m50s`	0.8950 (0.01)	0.9612 (0.00)	`no data`
`Sparse + DIET(seq) + ResponseSelector(t2t)` test: `58s`, train: `3m18s`, total: `4m15s`	0.9044 (-0.00)	0.9690 (-0.00)	`no data`
`Sparse + Spacy + DIET(bow) + ResponseSelector(bow)` test: `1m33s`, train: `4m56s`, total: `6m29s`	0.8940 (-0.00)	0.9574 (0.00)	`no data`
`Sparse + Spacy + DIET(seq) + ResponseSelector(t2t)` test: `2m8s`, train: `4m42s`, total: `6m49s`	0.9033 (0.00)	0.9680 (0.00)	`no data`

Dataset: Private 2, Dataset repository branch: main

Configuration	Intent Classification Micro F1	Entity Recognition Micro F1	Response Selection Micro F1
`BERT + DIET(bow) + ResponseSelector(bow)` test: `2m17s`, train: `11m36s`, total: `13m53s`	0.8798 (0.01)	`no data`	`no data`
`Spacy + DIET(bow) + ResponseSelector(bow)` test: `1m47s`, train: `6m50s`, total: `8m37s`	0.5783 (0.00)	`no data`	`no data`
`Spacy + DIET(seq) + ResponseSelector(t2t)` test: `1m57s`, train: `6m40s`, total: `8m37s`	0.7017 (-0.00)	`no data`	`no data`
`Sparse + DIET(bow) + ResponseSelector(bow)` test: `42s`, train: `5m3s`, total: `5m44s`	0.8487 (-0.01)	`no data`	`no data`
`Sparse + DIET(seq) + ResponseSelector(t2t)` test: `51s`, train: `5m0s`, total: `5m51s`	0.8358 (-0.02)	`no data`	`no data`
`Sparse + Spacy + DIET(bow) + ResponseSelector(bow)` test: `1m54s`, train: `8m34s`, total: `10m28s`	0.8594 (0.00)	`no data`	`no data`
`Sparse + Spacy + DIET(seq) + ResponseSelector(t2t)` test: `2m3s`, train: `7m14s`, total: `9m16s`	0.8562 (0.01)	`no data`	`no data`

Dataset: Private 3, Dataset repository branch: main

Configuration	Intent Classification Micro F1	Entity Recognition Micro F1	Response Selection Micro F1
`BERT + DIET(bow) + ResponseSelector(bow)` test: `1m6s`, train: `1m9s`, total: `2m14s`	0.9053 (-0.01)	`no data`	`no data`
`BERT + DIET(seq) + ResponseSelector(t2t)` test: `1m14s`, train: `51s`, total: `2m5s`	0.8601 (0.05)	`no data`	`no data`
`Spacy + DIET(bow) + ResponseSelector(bow)` test: `1m37s`, train: `1m48s`, total: `3m25s`	0.0700 (0.00)	`no data`	`no data`
`Spacy + DIET(seq) + ResponseSelector(t2t)` test: `1m46s`, train: `1m42s`, total: `3m28s`	0.2757 (0.01)	`no data`	`no data`
`Sparse + DIET(bow) + ResponseSelector(bow)` test: `36s`, train: `1m2s`, total: `1m39s`	0.8436 (-0.01)	`no data`	`no data`
`Sparse + DIET(seq) + ResponseSelector(t2t)` test: `45s`, train: `46s`, total: `1m30s`	0.8601 (0.04)	`no data`	`no data`
`Sparse + Spacy + DIET(bow) + ResponseSelector(bow)` test: `1m39s`, train: `2m8s`, total: `3m47s`	0.8313 (-0.01)	`no data`	`no data`
`Sparse + Spacy + DIET(seq) + ResponseSelector(t2t)` test: `1m47s`, train: `1m48s`, total: `3m34s`	0.8560 (-0.01)	`no data`	`no data`

Dataset: Sara, Dataset repository branch: main

Configuration	Intent Classification Micro F1	Entity Recognition Micro F1	Response Selection Micro F1
`BERT + DIET(bow) + ResponseSelector(bow)` test: `2m35s`, train: `4m56s`, total: `7m31s`	0.8570 (-0.01)	0.8683 (0.00)	0.8870 (-0.00)
`Sparse + DIET(bow) + ResponseSelector(bow)` test: `58s`, train: `5m26s`, total: `6m23s`	0.8355 (-0.00)	0.8683 (0.00)	0.8391 (-0.02)
`Sparse + DIET(seq) + ResponseSelector(t2t)` test: `1m22s`, train: `4m11s`, total: `5m33s`	0.8580 (0.01)	0.8565 (0.00)	0.8630 (0.01)

Ghostvv

looks good from my side

JEM-Mosig · 2021-03-29T07:13:31Z

@alwx Can you review, please? Or assign someone else from Engineering?

JEM-Mosig · 2021-04-16T08:56:18Z

@joejuzl (or anybody from Enable), when do you think you could have a look at this? The latest merges with main have broken something, but I want to hold off on fixing it until just before you've got some time scheduled.

joejuzl

Can we update the docs and also add some info to the migration-guide.mdx?

rasa/utils/train_utils.py

joejuzl · 2021-04-16T12:39:30Z

rasa/utils/train_utils.py

+            f"`{WEIGHT_SPARSITY}` is deprecated."
+            f"Please update your configuration file to use"
+            f"`{CONNECTION_DENSITY}` instead.",
+            warn_until_version=NEXT_MAJOR_VERSION_FOR_DEPRECATIONS,


Can we point the user to any documentation explaining this change?

Good point! I'll add a section in the migration docs and link it here.

We need more epochs when we don't have much data. Otherwise tests are failing depending on the seed.

joejuzl · 2021-04-22T09:56:28Z

data/test_response_selector_bot/config.yml

@@ -5,10 +5,10 @@ pipeline:
  - name: "CountVectorsFeaturizer"
  - name: "DIETClassifier"
    entity_recognition: False
-    epochs: 5
+    epochs: 50


Any reason we need 50? it will make the tests a lot slower...

The less training data you have, the more epochs you have to train for. At 5 you can find random seeds where the tests fail - even on main (and so they may fail if we just change the architecture slightly). We might get away with fewer epochs and a higher learning rate, but tests like tests/core/test_evaluation.py::test_retrieval_intent can fail randomly.

Then can we not set it to a random seed that works and keep the epochs low?

Ok, I've reduced the number of epochs and increased the learning rate for now.

Then can we not set it to a random seed that works and keep the epochs low?

Sorry, GitHub didn't show this until I refreshed the page. We can do this, but I would call it bad practice. If you change anything in the model, this would be like changing the random seed, and your test may suddenly fail for no reason.

I think with the increased learning rate we might be fine (we can increase it because we have so few training data and it's therefore "easy to learn" for the model; and when it is increased, we don't need as many epochs). But success is never 100% guaranteed.

Yeah, the point of unit tests is to test the code around the model rather than the effectiveness or accuracy of the model - so the most important thing is just having stability. And we need our unit tests to be fast so it doesn't hinder development and the CI .
However I do see your point about it being liable to change easily meaning having to update the test often which is not ideal. The other alternative is to mock the model, however I like this even less as it brings us further for reality.
I would say the best (however not perfect) option is setting the random seed and having relatively low epochs.

Ok, then let's keep it as it is now (5 epochs, increased learning rate and fixed seed at 42). From an ML perspective it makes sense to have a higher learning rate here, and the test succeeds.

docs/docs/migration-guide.mdx

joejuzl

Looks good!

Johannes E. M. Mosig added 17 commits February 22, 2021 11:21

Force dense output for DenseLayerWithSparseWeights

da9a672

Merge branch 'main' into johannes-7999

0bb64b5

Fix type error

401abe6

Draft LocallyConnectedDense

b0f2da4

Develop LocallyConnectedDense

95cd0f1

Trying things...

2b08a6c

Use density instead of sparsity

350b5c6

Ensure all inputs are connected

6509cb6

Merge branch 'main' into johannes-7999

4cdc08b

Rename connection_density

30ab16f

Fix doc strings

cb3eec7

Add changelog

e3038c5

Fix change log

6ad689f

Fix missing rename

0385e9e

Fix doc string formatting

44a1c6e

Add more doc strings

ea8fac7

Fix more doc strings

0e7a5e1

Ghostvv reviewed Mar 12, 2021

View reviewed changes

rasa/utils/tensorflow/constants.py Show resolved Hide resolved

JEM-Mosig added status:model-regression-tests runner:gpu and removed status:model-regression-tests labels Mar 12, 2021

github-actions bot deleted a comment from JEM-Mosig Mar 12, 2021

github-actions bot removed status:model-regression-tests runner:gpu labels Mar 12, 2021

Johannes E. M. Mosig added 2 commits March 16, 2021 10:53

Merge branch 'main' into johannes-7999

eae7edf

Deprecate WEIGHT_SPARSITY

fd5da6b

JEM-Mosig changed the title ~~Force DenseWithSparseWeights to produce dense output~~ Force DenseWithSparseWeights to produce dense output and use all inputs Mar 17, 2021

Merge branch 'main' into johannes-7999

2d0d94b

JEM-Mosig requested a review from alwx March 23, 2021 16:02

Ghostvv reviewed Mar 23, 2021

View reviewed changes

JEM-Mosig requested a review from joejuzl April 13, 2021 07:00

Johannes E. M. Mosig added 2 commits April 13, 2021 09:00

Merge branch 'main' into johannes-7999

e9da5ef

Merge branch 'main' into johannes-7999

7eeaa32

joejuzl suggested changes Apr 16, 2021

View reviewed changes

Johannes E. M. Mosig added 5 commits April 19, 2021 13:32

Add migration guide

312f251

Merge branch 'main' into johannes-7999

94d28ff

Merge branch 'main' into johannes-7999

99fcdd7

Lint

b6accb5

Increase epochs in test config

02cdcc6

We need more epochs when we don't have much data. Otherwise tests are failing depending on the seed.

JEM-Mosig requested a review from joejuzl April 21, 2021 14:40

joejuzl suggested changes Apr 22, 2021

View reviewed changes

Johannes E. M. Mosig added 3 commits April 22, 2021 17:32

Fix heading depth

0e28203

Use increased learning rate instead of epochs

116f723

Merge branch 'main' into johannes-7999

bbf93b5

JEM-Mosig requested a review from joejuzl April 22, 2021 16:37

Merge branch 'main' into johannes-7999

38bdd7c

joejuzl approved these changes Apr 26, 2021

View reviewed changes

Merge branch 'main' into johannes-7999

ff6ca9f

JEM-Mosig enabled auto-merge (squash) April 28, 2021 18:30

JEM-Mosig disabled auto-merge April 29, 2021 12:58

Johannes E. M. Mosig added 2 commits April 29, 2021 15:00

Merge branch 'main' into johannes-7999

0053024

Fix merge

04bddb2

JEM-Mosig enabled auto-merge (squash) April 29, 2021 13:58

Change random seed

60cba3e

JEM-Mosig merged commit 6a73b2e into main Apr 29, 2021

JEM-Mosig deleted the johannes-7999 branch April 29, 2021 15:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Force DenseWithSparseWeights to produce dense output and use all inputs #8011

Force DenseWithSparseWeights to produce dense output and use all inputs #8011

JEM-Mosig commented Feb 22, 2021 •

edited

Loading

github-actions bot commented Mar 12, 2021

Ghostvv left a comment

JEM-Mosig commented Mar 29, 2021

JEM-Mosig commented Apr 16, 2021

joejuzl left a comment

joejuzl Apr 16, 2021

JEM-Mosig Apr 19, 2021

joejuzl Apr 22, 2021

JEM-Mosig Apr 22, 2021

joejuzl Apr 22, 2021

JEM-Mosig Apr 22, 2021

JEM-Mosig Apr 22, 2021 •

edited

Loading

JEM-Mosig Apr 22, 2021 •

edited

Loading

joejuzl Apr 23, 2021

JEM-Mosig Apr 23, 2021 •

edited

Loading

joejuzl left a comment

Force DenseWithSparseWeights to produce dense output and use all inputs #8011

Force DenseWithSparseWeights to produce dense output and use all inputs #8011

Conversation

JEM-Mosig commented Feb 22, 2021 • edited Loading

github-actions bot commented Mar 12, 2021

Ghostvv left a comment

Choose a reason for hiding this comment

JEM-Mosig commented Mar 29, 2021

JEM-Mosig commented Apr 16, 2021

joejuzl left a comment

Choose a reason for hiding this comment

joejuzl Apr 16, 2021

Choose a reason for hiding this comment

JEM-Mosig Apr 19, 2021

Choose a reason for hiding this comment

joejuzl Apr 22, 2021

Choose a reason for hiding this comment

JEM-Mosig Apr 22, 2021

Choose a reason for hiding this comment

joejuzl Apr 22, 2021

Choose a reason for hiding this comment

JEM-Mosig Apr 22, 2021

Choose a reason for hiding this comment

JEM-Mosig Apr 22, 2021 • edited Loading

Choose a reason for hiding this comment

JEM-Mosig Apr 22, 2021 • edited Loading

Choose a reason for hiding this comment

joejuzl Apr 23, 2021

Choose a reason for hiding this comment

JEM-Mosig Apr 23, 2021 • edited Loading

Choose a reason for hiding this comment

joejuzl left a comment

Choose a reason for hiding this comment

JEM-Mosig commented Feb 22, 2021 •

edited

Loading

JEM-Mosig Apr 22, 2021 •

edited

Loading

JEM-Mosig Apr 22, 2021 •

edited

Loading

JEM-Mosig Apr 23, 2021 •

edited

Loading