Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make first order gradient graphs more efficient #5959

Merged
merged 1 commit into from
Jun 30, 2020

Conversation

t-vi
Copy link
Contributor

@t-vi t-vi commented Jun 29, 2020

Previously, nodes are visited as often as they are used and each time a
derivative is computed. Only at the leaves were the contributions of
everything added. This patch changes this to add at any node that is
used several times.

@t-vi
Copy link
Contributor Author

t-vi commented Jun 29, 2020

This pull request is only about the second commit, the first is #5946 .
I noticed that my gradient had many more O^3 (matmul etc.) operations than it should have and tracked this down to how gradients are computed when a value is used several times in the computation.
Graphs are becoming really big and unwieldy if they are not purely sequential computation. Also, the duplication cannot be eliminated by CSE because the "output part" is duplicate rather than the input (one could, in theory commute add with all the gradient ops).
While it doesn't fix anything, it might also have a mitigating impact for people seeing other effects when working with first order gradients (e.g. #4534).

@@ -27,6 +29,20 @@
import tvm.relay.op as op


def count_ops(expr):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you move this to some common file? this look like something useful to other places as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll move it to python/tvm/relay/testing/__init__.py when rebasing post #5946 .

@MarisaKirisame
Copy link
Contributor

We was thinking of using ANF then AD. But this also work.

@MarisaKirisame
Copy link
Contributor

@t-vi please rebase.

Previously, nodes are visited as often as they are used and each time a
derivative is computed. Only at the leaves were the contributions of
everything added. This patch changes this to add at any node that is
used several times.
@t-vi t-vi force-pushed the grad_efficiency branch from e344cfb to 35d00aa Compare June 30, 2020 05:35
@t-vi
Copy link
Contributor Author

t-vi commented Jun 30, 2020

@MarisaKirisame Thank you! I rebased and now the CI is all happy again.

@tqchen tqchen merged commit 7176483 into apache:master Jun 30, 2020
@tqchen
Copy link
Member

tqchen commented Jun 30, 2020

Thanks @t-vi @MarisaKirisame

trevor-m pushed a commit to trevor-m/tvm that referenced this pull request Jun 30, 2020
Previously, nodes are visited as often as they are used and each time a
derivative is computed. Only at the leaves were the contributions of
everything added. This patch changes this to add at any node that is
used several times.
zhiics pushed a commit to neo-ai/tvm that referenced this pull request Jul 2, 2020
Previously, nodes are visited as often as they are used and each time a
derivative is computed. Only at the leaves were the contributions of
everything added. This patch changes this to add at any node that is
used several times.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants