-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Force DenseWithSparseWeights to produce dense output and use all inputs #8011
Conversation
Commit: 6c7c46a, The full report is available as an artifact. Dataset:
Dataset:
Dataset:
Dataset:
Dataset:
Dataset:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good from my side
@alwx Can you review, please? Or assign someone else from Engineering? |
@joejuzl (or anybody from Enable), when do you think you could have a look at this? The latest merges with main have broken something, but I want to hold off on fixing it until just before you've got some time scheduled. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we update the docs and also add some info to the migration-guide.mdx
?
f"`{WEIGHT_SPARSITY}` is deprecated." | ||
f"Please update your configuration file to use" | ||
f"`{CONNECTION_DENSITY}` instead.", | ||
warn_until_version=NEXT_MAJOR_VERSION_FOR_DEPRECATIONS, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we point the user to any documentation explaining this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point! I'll add a section in the migration docs and link it here.
We need more epochs when we don't have much data. Otherwise tests are failing depending on the seed.
@@ -5,10 +5,10 @@ pipeline: | |||
- name: "CountVectorsFeaturizer" | |||
- name: "DIETClassifier" | |||
entity_recognition: False | |||
epochs: 5 | |||
epochs: 50 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason we need 50
? it will make the tests a lot slower...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The less training data you have, the more epochs you have to train for. At 5 you can find random seeds where the tests fail - even on main (and so they may fail if we just change the architecture slightly). We might get away with fewer epochs and a higher learning rate, but tests like tests/core/test_evaluation.py::test_retrieval_intent can fail randomly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then can we not set it to a random seed that works and keep the epochs low?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I've reduced the number of epochs and increased the learning rate for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then can we not set it to a random seed that works and keep the epochs low?
Sorry, GitHub didn't show this until I refreshed the page. We can do this, but I would call it bad practice. If you change anything in the model, this would be like changing the random seed, and your test may suddenly fail for no reason.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think with the increased learning rate we might be fine (we can increase it because we have so few training data and it's therefore "easy to learn" for the model; and when it is increased, we don't need as many epochs). But success is never 100% guaranteed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the point of unit tests is to test the code around the model rather than the effectiveness or accuracy of the model - so the most important thing is just having stability. And we need our unit tests to be fast so it doesn't hinder development and the CI .
However I do see your point about it being liable to change easily meaning having to update the test often which is not ideal. The other alternative is to mock the model, however I like this even less as it brings us further for reality.
I would say the best (however not perfect) option is setting the random seed and having relatively low epochs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, then let's keep it as it is now (5 epochs, increased learning rate and fixed seed at 42). From an ML perspective it makes sense to have a higher learning rate here, and the test succeeds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
Proposed changes:
DenseWithSparseWeights
intoRandomlyConnectedDense
weight_sparsity
parameter is now roughly equivalent to1 - connection_density
, except at very low densities (high sparsities).Status (please check what you already did):
updated the documentationblack
(please check Readme for instructions)