Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Relay] change device annotation from post DFS to recursive #6124

Merged
merged 2 commits into from
Aug 19, 2020

Conversation

zhanghaohit
Copy link
Contributor

This is related to #5840 and split from PR #5842

Originally, device type is propagated based on the post DFS traversed graph, which may not be consistent if the argument order changes. In addition, it may handle some cases wrongly, e.g., the first residual block in Resnet50. The first few layers in Resnet50 are depicted in the following figure (top to bottom is in DFS order). Basically, we want to let all the layers run on FPGA device, except the first and last few layers. In the original device propagation algorithm, based on the post DFS order, the conv2d layers in grey will be propagated with CPU device type as we encounter copy2 first, following which the three grey conv2d nodes are marked as the source device type of copy2 (i.e., CPU), which is not correct.

Resnet50

By change the device annotation behaviour, we can support more complex graph structure.

@zhanghaohit zhanghaohit changed the title change device annotation from post DFS to recursive [Relay] change device annotation from post DFS to recursive Jul 23, 2020
@jroesch jroesch requested a review from zhiics July 23, 2020 08:13
@jroesch
Copy link
Member

jroesch commented Jul 23, 2020

cc @mbrookhart

Copy link
Contributor

@mbrookhart mbrookhart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please add a test showing the problem/desired change in behavior?

@zhiics zhiics added the status: need test case need test cases to cover the change label Jul 23, 2020
Copy link
Contributor

@tmoreau89 tmoreau89 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @zhanghaohit for this PR, I second @mbrookhart's request to add a test case.

@zhanghaohit zhanghaohit reopened this Jul 29, 2020
@zhanghaohit
Copy link
Contributor Author

Can you please add a test showing the problem/desired change in behavior?

Thanks @mbrookhart and @tmoreau89 for the suggestion. I've added a test, which would fail in the original code.

Copy link
Contributor

@mbrookhart mbrookhart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to fix lint. A couple of nitpicks here, but overall, I think it looks good now. Kind of wondering why the unit test doesn't fail earlier.

Comment on lines +444 to +441
int dev_type_ = -1;
int out_dev_type_ = -1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't love making this class state, precisely because you have to do crazy things with maintaining state in your recursive calls. You could do it as a set of recursive arguments, but that kind of requires re-implementing with ExprFunctor...so maybe this is the cleanest solution.


annotated_expr = annotated()
expected_expr = expected()
assert tvm.ir.structural_equal(annotated_expr, expected_expr)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious that master passes this check, but fails on line 377. Why doesn't structural equal properly resolve the error in the device copy op?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think master fails on line 379, right? the device type of log2 is not correctly marked.

Up to this line, annotated_expr and expected_expr are exactly the same. The device_copy op is inserted correctly. We haven't go through the device propagation yet.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 You're right, I misread my first test.

Copy link
Contributor

@mbrookhart mbrookhart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Flaky test?


annotated_expr = annotated()
expected_expr = expected()
assert tvm.ir.structural_equal(annotated_expr, expected_expr)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 You're right, I misread my first test.

@zhiics
Copy link
Member

zhiics commented Aug 3, 2020

please trigger the ci again.

Copy link
Member

@junrushao junrushao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM👍

@zhanghaohit zhanghaohit force-pushed the feature/device-annot branch from 5a47476 to 61a9bc8 Compare August 6, 2020 01:51
@zhanghaohit zhanghaohit requested a review from tmoreau89 August 12, 2020 07:32
@zhanghaohit
Copy link
Contributor Author

@tmoreau89 Could you help merge this PR? Thanks.

Copy link
Contributor

@tmoreau89 tmoreau89 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tmoreau89 tmoreau89 merged commit 8b166e6 into apache:master Aug 19, 2020
@tmoreau89
Copy link
Contributor

Thank you @zhanghaohit , @mbrookhart, @junrushao1994, @zhiics the PR has been merged.

@zhanghaohit zhanghaohit deleted the feature/device-annot branch August 21, 2020 01:49
trevor-m pushed a commit to trevor-m/tvm that referenced this pull request Aug 26, 2020
)

* change device annotation from post DFS to recursive

* add testcast for recursive device propogation
trevor-m pushed a commit to trevor-m/tvm that referenced this pull request Aug 26, 2020
)

* change device annotation from post DFS to recursive

* add testcast for recursive device propogation
trevor-m pushed a commit to trevor-m/tvm that referenced this pull request Aug 26, 2020
)

* change device annotation from post DFS to recursive

* add testcast for recursive device propogation
trevor-m pushed a commit to trevor-m/tvm that referenced this pull request Sep 2, 2020
)

* change device annotation from post DFS to recursive

* add testcast for recursive device propogation
trevor-m pushed a commit to neo-ai/tvm that referenced this pull request Sep 3, 2020
)

* change device annotation from post DFS to recursive

* add testcast for recursive device propogation
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status: need test case need test cases to cover the change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants