Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Force DenseWithSparseWeights to have at least one entry per row #7999

Closed
JEM-Mosig opened this issue Feb 19, 2021 · 3 comments · Fixed by #8011
Closed

Force DenseWithSparseWeights to have at least one entry per row #7999

JEM-Mosig opened this issue Feb 19, 2021 · 3 comments · Fixed by #8011
Assignees
Labels
area:rasa-oss 🎡 Anything related to the open source Rasa framework type:enhancement ✨ Additions of new features or changes to existing ones, should be doable in a single PR

Comments

@JEM-Mosig
Copy link
Contributor

Description of Problem:

Whenever DenseWithSparseWeights has completely zero weight rows (which can happen if sparsity is high), it effectively reduces the layer size, which is pointless. So we should prevent this from happening.

Overview of the Solution:

Examples (if relevant):

Blockers (if relevant):

Definition of Done:

@JEM-Mosig JEM-Mosig added type:enhancement ✨ Additions of new features or changes to existing ones, should be doable in a single PR area:rasa-oss 🎡 Anything related to the open source Rasa framework labels Feb 19, 2021
@JEM-Mosig JEM-Mosig self-assigned this Feb 19, 2021
@JEM-Mosig
Copy link
Contributor Author

It might be even better to use LocallyConnected1D layers. Treating this as the same issue.

@JEM-Mosig
Copy link
Contributor Author

JEM-Mosig commented Mar 17, 2021

LocallyConnectedDense layers are slow and buggy (implementation!=1 doesn't work) in current Tensorflow version, so I backtracked on those changes. We're now using our own RandomlyConnectedDense layer.

I'm also forcing every input to be connected to at least one output, because it doesn't make sense to ignore inputs randomly.

@JEM-Mosig
Copy link
Contributor Author

changes to the sparse layers definitely stabilize TED's performance at low densities:

image

Solid lines show performance at fixed density (see legend) with my modified sparse layers. Dash-dotted lines show the equivalent on the main branch.

The solid green curve is the fully dense layers one. The orange curves are the 20% density (or 80% sparsity) which are our default - for those my changes don't do anything.

The dash-dotted red curve is further to the left, because on main sparse layers are allowed to drop inputs or outputs and thus can have fewer trainable weights.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:rasa-oss 🎡 Anything related to the open source Rasa framework type:enhancement ✨ Additions of new features or changes to existing ones, should be doable in a single PR
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant