Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Force DenseWithSparseWeights to produce dense output and use all inputs #8011

Merged
merged 53 commits into from
Apr 29, 2021

Conversation

JEM-Mosig
Copy link

@JEM-Mosig JEM-Mosig commented Feb 22, 2021

Proposed changes:

  • Renames DenseWithSparseWeights into RandomlyConnectedDense
  • Adds guarantees that even at density zero the output is dense and every input is connected to at least one output
  • The former weight_sparsity parameter is now roughly equivalent to 1 - connection_density, except at very low densities (high sparsities).
  • Fixes Force DenseWithSparseWeights to have at least one entry per row #7999

Status (please check what you already did):

  • added some tests for the functionality
  • updated the documentation
  • updated the changelog (please check changelog for instructions)
  • reformat files using black (please check Readme for instructions)

@github-actions
Copy link
Contributor

Commit: 6c7c46a, The full report is available as an artifact.

Dataset: Carbon Bot, Dataset repository branch: main

Configuration Intent Classification Micro F1 Entity Recognition Micro F1 Response Selection Micro F1
BERT + DIET(bow) + ResponseSelector(bow)
test: 1m26s, train: 3m52s, total: 5m17s
0.8000 (0.01) 0.7529 (0.00) 0.5382 (-0.05)
BERT + DIET(seq) + ResponseSelector(t2t)
test: 1m42s, train: 4m3s, total: 5m45s
0.7786 (-0.02) 0.7892 (0.01) 0.5430 (0.00)
Sparse + BERT + DIET(bow) + ResponseSelector(bow)
test: 1m25s, train: 4m17s, total: 5m42s
0.7825 (-0.01) 0.7529 (0.00) 0.5515 (0.01)
Sparse + BERT + DIET(seq) + ResponseSelector(t2t)
test: 1m48s, train: 4m32s, total: 6m20s
0.7903 (-0.01) 0.7847 (-0.02) 0.5497 (-0.01)
Sparse + DIET(bow) + ResponseSelector(bow)
test: 37s, train: 2m31s, total: 3m7s
0.7534 (0.03) 0.7529 (0.00) 0.4636 (-0.08)
Sparse + DIET(seq) + ResponseSelector(t2t)
test: 57s, train: 3m46s, total: 4m43s
0.7437 (0.01) 0.6685 (-0.04) 0.5449 (0.04)

Dataset: Hermit, Dataset repository branch: main

Configuration Intent Classification Micro F1 Entity Recognition Micro F1 Response Selection Micro F1
BERT + DIET(bow) + ResponseSelector(bow)
test: 3m11s, train: 18m22s, total: 21m33s
0.8941 (0.01) 0.7504 (0.00) no data
BERT + DIET(seq) + ResponseSelector(t2t)
test: 2m45s, train: 12m1s, total: 14m46s
0.9024 (0.01) 0.8119 (0.01) no data
Sparse + BERT + DIET(bow) + ResponseSelector(bow)
test: 3m43s, train: 21m52s, total: 25m35s
0.8643 (-0.01) 0.7504 (0.00) no data
Sparse + BERT + DIET(seq) + ResponseSelector(t2t)
test: 3m30s, train: 14m11s, total: 17m41s
0.8755 (0.01) 0.8224 (0.03) no data
Sparse + DIET(seq) + ResponseSelector(t2t)
test: 1m40s, train: 13m3s, total: 14m42s
0.8346 (-0.00) 0.7461 (-0.00) no data

Dataset: Private 1, Dataset repository branch: main

Configuration Intent Classification Micro F1 Entity Recognition Micro F1 Response Selection Micro F1
BERT + DIET(bow) + ResponseSelector(bow)
test: 2m9s, train: 4m0s, total: 6m8s
0.9064 (0.00) 0.9612 (0.00) no data
BERT + DIET(seq) + ResponseSelector(t2t)
test: 2m32s, train: 3m30s, total: 6m1s
0.9168 (0.00) 0.9718 (-0.00) no data
Spacy + DIET(bow) + ResponseSelector(bow)
test: 1m36s, train: 3m47s, total: 5m23s
0.7942 (-0.00) 0.9574 (0.00) no data
Spacy + DIET(seq) + ResponseSelector(t2t)
test: 2m0s, train: 4m18s, total: 6m17s
0.8358 (-0.00) 0.9260 (0.01) no data
Sparse + DIET(bow) + ResponseSelector(bow)
test: 31s, train: 3m19s, total: 3m50s
0.8950 (0.01) 0.9612 (0.00) no data
Sparse + DIET(seq) + ResponseSelector(t2t)
test: 58s, train: 3m18s, total: 4m15s
0.9044 (-0.00) 0.9690 (-0.00) no data
Sparse + Spacy + DIET(bow) + ResponseSelector(bow)
test: 1m33s, train: 4m56s, total: 6m29s
0.8940 (-0.00) 0.9574 (0.00) no data
Sparse + Spacy + DIET(seq) + ResponseSelector(t2t)
test: 2m8s, train: 4m42s, total: 6m49s
0.9033 (0.00) 0.9680 (0.00) no data

Dataset: Private 2, Dataset repository branch: main

Configuration Intent Classification Micro F1 Entity Recognition Micro F1 Response Selection Micro F1
BERT + DIET(bow) + ResponseSelector(bow)
test: 2m17s, train: 11m36s, total: 13m53s
0.8798 (0.01) no data no data
Spacy + DIET(bow) + ResponseSelector(bow)
test: 1m47s, train: 6m50s, total: 8m37s
0.5783 (0.00) no data no data
Spacy + DIET(seq) + ResponseSelector(t2t)
test: 1m57s, train: 6m40s, total: 8m37s
0.7017 (-0.00) no data no data
Sparse + DIET(bow) + ResponseSelector(bow)
test: 42s, train: 5m3s, total: 5m44s
0.8487 (-0.01) no data no data
Sparse + DIET(seq) + ResponseSelector(t2t)
test: 51s, train: 5m0s, total: 5m51s
0.8358 (-0.02) no data no data
Sparse + Spacy + DIET(bow) + ResponseSelector(bow)
test: 1m54s, train: 8m34s, total: 10m28s
0.8594 (0.00) no data no data
Sparse + Spacy + DIET(seq) + ResponseSelector(t2t)
test: 2m3s, train: 7m14s, total: 9m16s
0.8562 (0.01) no data no data

Dataset: Private 3, Dataset repository branch: main

Configuration Intent Classification Micro F1 Entity Recognition Micro F1 Response Selection Micro F1
BERT + DIET(bow) + ResponseSelector(bow)
test: 1m6s, train: 1m9s, total: 2m14s
0.9053 (-0.01) no data no data
BERT + DIET(seq) + ResponseSelector(t2t)
test: 1m14s, train: 51s, total: 2m5s
0.8601 (0.05) no data no data
Spacy + DIET(bow) + ResponseSelector(bow)
test: 1m37s, train: 1m48s, total: 3m25s
0.0700 (0.00) no data no data
Spacy + DIET(seq) + ResponseSelector(t2t)
test: 1m46s, train: 1m42s, total: 3m28s
0.2757 (0.01) no data no data
Sparse + DIET(bow) + ResponseSelector(bow)
test: 36s, train: 1m2s, total: 1m39s
0.8436 (-0.01) no data no data
Sparse + DIET(seq) + ResponseSelector(t2t)
test: 45s, train: 46s, total: 1m30s
0.8601 (0.04) no data no data
Sparse + Spacy + DIET(bow) + ResponseSelector(bow)
test: 1m39s, train: 2m8s, total: 3m47s
0.8313 (-0.01) no data no data
Sparse + Spacy + DIET(seq) + ResponseSelector(t2t)
test: 1m47s, train: 1m48s, total: 3m34s
0.8560 (-0.01) no data no data

Dataset: Sara, Dataset repository branch: main

Configuration Intent Classification Micro F1 Entity Recognition Micro F1 Response Selection Micro F1
BERT + DIET(bow) + ResponseSelector(bow)
test: 2m35s, train: 4m56s, total: 7m31s
0.8570 (-0.01) 0.8683 (0.00) 0.8870 (-0.00)
Sparse + DIET(bow) + ResponseSelector(bow)
test: 58s, train: 5m26s, total: 6m23s
0.8355 (-0.00) 0.8683 (0.00) 0.8391 (-0.02)
Sparse + DIET(seq) + ResponseSelector(t2t)
test: 1m22s, train: 4m11s, total: 5m33s
0.8580 (0.01) 0.8565 (0.00) 0.8630 (0.01)

@JEM-Mosig JEM-Mosig changed the title Force DenseWithSparseWeights to produce dense output Force DenseWithSparseWeights to produce dense output and use all inputs Mar 17, 2021
@JEM-Mosig JEM-Mosig requested a review from alwx March 23, 2021 16:02
Copy link
Contributor

@Ghostvv Ghostvv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good from my side

@JEM-Mosig
Copy link
Author

@alwx Can you review, please? Or assign someone else from Engineering?

@JEM-Mosig JEM-Mosig requested a review from joejuzl April 13, 2021 07:00
@JEM-Mosig
Copy link
Author

@joejuzl (or anybody from Enable), when do you think you could have a look at this? The latest merges with main have broken something, but I want to hold off on fixing it until just before you've got some time scheduled.

Copy link
Contributor

@joejuzl joejuzl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we update the docs and also add some info to the migration-guide.mdx?

rasa/utils/train_utils.py Outdated Show resolved Hide resolved
f"`{WEIGHT_SPARSITY}` is deprecated."
f"Please update your configuration file to use"
f"`{CONNECTION_DENSITY}` instead.",
warn_until_version=NEXT_MAJOR_VERSION_FOR_DEPRECATIONS,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we point the user to any documentation explaining this change?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! I'll add a section in the migration docs and link it here.

Johannes E. M. Mosig added 5 commits April 19, 2021 13:32
We need more epochs when we don't have much data. Otherwise tests are failing
depending on the seed.
@JEM-Mosig JEM-Mosig requested a review from joejuzl April 21, 2021 14:40
@@ -5,10 +5,10 @@ pipeline:
- name: "CountVectorsFeaturizer"
- name: "DIETClassifier"
entity_recognition: False
epochs: 5
epochs: 50
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason we need 50? it will make the tests a lot slower...

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The less training data you have, the more epochs you have to train for. At 5 you can find random seeds where the tests fail - even on main (and so they may fail if we just change the architecture slightly). We might get away with fewer epochs and a higher learning rate, but tests like tests/core/test_evaluation.py::test_retrieval_intent can fail randomly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then can we not set it to a random seed that works and keep the epochs low?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I've reduced the number of epochs and increased the learning rate for now.

Copy link
Author

@JEM-Mosig JEM-Mosig Apr 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then can we not set it to a random seed that works and keep the epochs low?

Sorry, GitHub didn't show this until I refreshed the page. We can do this, but I would call it bad practice. If you change anything in the model, this would be like changing the random seed, and your test may suddenly fail for no reason.

Copy link
Author

@JEM-Mosig JEM-Mosig Apr 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think with the increased learning rate we might be fine (we can increase it because we have so few training data and it's therefore "easy to learn" for the model; and when it is increased, we don't need as many epochs). But success is never 100% guaranteed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the point of unit tests is to test the code around the model rather than the effectiveness or accuracy of the model - so the most important thing is just having stability. And we need our unit tests to be fast so it doesn't hinder development and the CI .
However I do see your point about it being liable to change easily meaning having to update the test often which is not ideal. The other alternative is to mock the model, however I like this even less as it brings us further for reality.
I would say the best (however not perfect) option is setting the random seed and having relatively low epochs.

Copy link
Author

@JEM-Mosig JEM-Mosig Apr 23, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, then let's keep it as it is now (5 epochs, increased learning rate and fixed seed at 42). From an ML perspective it makes sense to have a higher learning rate here, and the test succeeds.

docs/docs/migration-guide.mdx Outdated Show resolved Hide resolved
@JEM-Mosig JEM-Mosig requested a review from joejuzl April 22, 2021 16:37
Copy link
Contributor

@joejuzl joejuzl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@JEM-Mosig JEM-Mosig enabled auto-merge (squash) April 28, 2021 18:30
@JEM-Mosig JEM-Mosig disabled auto-merge April 29, 2021 12:58
@JEM-Mosig JEM-Mosig enabled auto-merge (squash) April 29, 2021 13:58
@JEM-Mosig JEM-Mosig merged commit 6a73b2e into main Apr 29, 2021
@JEM-Mosig JEM-Mosig deleted the johannes-7999 branch April 29, 2021 15:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Force DenseWithSparseWeights to have at least one entry per row
4 participants