Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RELAY] Fixes to MergeCompilerRegions #5195

Merged
merged 9 commits into from
Apr 1, 2020

Conversation

mbaret
Copy link
Contributor

@mbaret mbaret commented Mar 31, 2020

There were a few outstanding issues with the previous PR to implement the MergeCompilerRegions pass. This PR addresses those issues and fixes the AnnotateTarget test.

@mbaret
Copy link
Contributor Author

mbaret commented Mar 31, 2020

cc @zhiics @trevor-m @comaniac
Can you see if you still observe failures with this PR?

@trevor-m
Copy link
Contributor

trevor-m commented Mar 31, 2020

Thanks @mbaret ! This fixes the segfault issues with resnet and other networks.

However now it is impossible to execute a model that is not fully offloaded to an external codegen. SinceAnnotateRestDefault labels the ops meant to be run in TVM as "relay.ext.default" it results in a failure during codegen since there is obviously no "default" codegen: TVMError: Check failed: pf: Failed to find the codegen tool for relay.ext.default. Am I supposed to do something extra to remove the default label after partitioning?

@zhiics
Copy link
Member

zhiics commented Mar 31, 2020

For "default", I commented in the other PR: #5028

Copy link
Contributor

@comaniac comaniac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also address the problem @trevor-m encountered. The graph partitioning pass should set the attributes correctly for the regions that will be handled by TVM codegens.

@@ -52,6 +55,13 @@ class AnnotateTargetWrapper : public ExprMutator {
return fannotate[op](call->attrs, call->args);
}
}
if (expr->IsInstance<TupleGetItemNode>()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a concern about this. It seems like you have to list all possible nodes here and it's easy to miss something. For now I would suggest adding an exception at least to indicate that we need to add more here.

Copy link
Contributor

@trevor-m trevor-m Mar 31, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that many nodes will need to be supported here. TupleGetItemNode is needed for batchnorm. I have a PR ready to add support for TupleNode which is needed for concatenate. I haven't encountered any other nodes yet that would need to be added here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure yet what we plan on doing with Let nodes (or Ref* nodes). I think just saying they're not supported is sufficient for now though without also throwing an error.

src/relay/transforms/merge_compiler_regions.cc Outdated Show resolved Hide resolved
Copy link
Member

@zhiics zhiics left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please fix the default target issue and add unit tests that use both merge regions and partitioning graph.

src/relay/analysis/annotated_region_set.cc Outdated Show resolved Hide resolved
src/relay/transforms/merge_compiler_regions.cc Outdated Show resolved Hide resolved
src/relay/transforms/merge_compiler_regions.cc Outdated Show resolved Hide resolved
@trevor-m
Copy link
Contributor

trevor-m commented Apr 1, 2020

HI @mbaret, I prepared this small enhancement to include TupleNode in annotation when surrounded by supported nodes, in a similar way to what you have done for TupleGetItemNode.

Here is the code, you are welcome to include it in this PR if you would like.
trevor-m@2b7c3b1

mbaret added 4 commits April 1, 2020 13:48
This PR addresses a few outstanding issues with
the implementation of MergeCompilerRegions. In
particular, it now handles TupleGetItem nodes properly
and other minor bugs related to region merging have
been fixed.

Change-Id: I07783afc56183a6f798a510209f23b0a5f252255
Change-Id: I0a844ac59bda1089ae0c67cef52f0b0c7ab2cbd7
Change-Id: Ib6f2eede6f38bbb270073eb8d4c4dc19f60832c6
Change-Id: I9b7696a51c95871491cbea33c40f92ec327e417f
@mbaret mbaret force-pushed the merge-compiler-fix branch from 9ae789f to b35023b Compare April 1, 2020 12:49
Change-Id: I0098bd1bf6788dd6366810dcefa84f1ebbffaab0
@mbaret mbaret force-pushed the merge-compiler-fix branch from b35023b to f8e378e Compare April 1, 2020 13:21
mbaret added 3 commits April 1, 2020 15:47
Change-Id: I944365cd3080a97a9261f643a8f1efa5a63cf82b
Change-Id: Ie43113492bda8f1ce63eaf9615cb645bb9e2ee86
Change-Id: I46f9e349b1a813a9140f7e4f8a2241687e2df73b
@mbaret
Copy link
Contributor Author

mbaret commented Apr 1, 2020

@trevor-m @zhiics Default behaviour should now be fixed.

@zhiics
Copy link
Member

zhiics commented Apr 1, 2020

Can you add a unit test for end-to-end test? It should include anntation-mege-partitioning and we can invoke it using runtime. I just want to make sure everything is correct.

Change-Id: I309afdd1951d7e796e41d13788aa487707e0ac4c
@mbaret
Copy link
Contributor Author

mbaret commented Apr 1, 2020

Can you add a unit test for end-to-end test? It should include anntation-mege-partitioning and we can invoke it using runtime. I just want to make sure everything is correct.

Yup, I'm looking at adding this right now :) My question is what codegen can I use?

@zhiics
Copy link
Member

zhiics commented Apr 1, 2020

You can have a mock example that uses gcc. I think we have examples for it already.

@mbaret
Copy link
Contributor Author

mbaret commented Apr 1, 2020

Do you mean ccompiler? If so, I don't think that supports codegen on a region of multiple ops.

@zhiics
Copy link
Member

zhiics commented Apr 1, 2020

hmm, you are right. Let's have examples with annotation-merge-partitioning with structural equal check first.

@trevor-m
Copy link
Contributor

trevor-m commented Apr 1, 2020

Hi @mbaret, thanks again for fixing these issues!

I noticed the call to MergeCompilerRegions hangs indefinitely for Densenet121 when all nodes are annotated. Here is a script to reproduce the issue. You will also need this commit trevor-m@2b7c3b1 to make sure that Tuple + concatenate is annotated properly also. There might a some infinite loop in the pass code or the runtime is scaling very poorly with respect to number of nodes.

from tvm import relay
import mxnet as mx
from mxnet.gluon.model_zoo.vision import get_model
from tvm.relay import op as reg

def _register_external_op_helper(op_name, supported=True):
    @reg.register(op_name, "target.test")
    def _func_wrapper(attrs, args):
        return supported
    return _func_wrapper

_register_external_op_helper("nn.conv2d")
_register_external_op_helper("nn.dense")
_register_external_op_helper("nn.relu")
_register_external_op_helper("add")
_register_external_op_helper("multiply")
_register_external_op_helper("nn.bias_add")
_register_external_op_helper("nn.batch_flatten")
_register_external_op_helper("nn.max_pool2d")
_register_external_op_helper("nn.dropout")
_register_external_op_helper("nn.batch_norm")
_register_external_op_helper("nn.global_avg_pool2d")
_register_external_op_helper("clip")
_register_external_op_helper("concatenate")
_register_external_op_helper("nn.avg_pool2d")

def test_densenet():
    block = get_model('densenet121', pretrained=True)
    mod, params = relay.frontend.from_mxnet(block, shape={'data': (1, 3, 224, 224)}, dtype='float32')
    mod = relay.transform.AnnotateTarget("test")(mod)
    print('Running MergeCompilerRegions...')
    mod = relay.transform.MergeCompilerRegions()(mod)
    mod = relay.transform.PartitionGraph()(mod)
    print(mod)

test_densenet()

@mbaret
Copy link
Contributor Author

mbaret commented Apr 1, 2020

Thanks again Trevor for testing :) That test completes for me (in less than a second). If I had to take a guess, I'd say you might be hitting a recursion limit?

@mbaret
Copy link
Contributor Author

mbaret commented Apr 1, 2020

Actually, are you using your patch for Tuples? I'm testing it without that patch.

@trevor-m
Copy link
Contributor

trevor-m commented Apr 1, 2020

Actually, are you using your patch for Tuples? I'm testing it without that patch.

Yes, I am using that patch. Without the patch it works. Maybe we don't have to fix it right here if it is introduced by my patch changes.

@zhiics zhiics merged commit 0449966 into apache:master Apr 1, 2020
trevor-m pushed a commit to trevor-m/tvm that referenced this pull request Apr 16, 2020
* [RELAY] Fixed issues with MergeCompilerRegions

This PR addresses a few outstanding issues with
the implementation of MergeCompilerRegions. In
particular, it now handles TupleGetItem nodes properly
and other minor bugs related to region merging have
been fixed.

Change-Id: I07783afc56183a6f798a510209f23b0a5f252255

* Fixed issue using pre-merged regions

Change-Id: I0a844ac59bda1089ae0c67cef52f0b0c7ab2cbd7

* Removed some debugging logic

Change-Id: Ib6f2eede6f38bbb270073eb8d4c4dc19f60832c6

* Remove default annotations

Change-Id: I9b7696a51c95871491cbea33c40f92ec327e417f

* Annotate default 'if's

Change-Id: I0098bd1bf6788dd6366810dcefa84f1ebbffaab0

* Clang format

Change-Id: I944365cd3080a97a9261f643a8f1efa5a63cf82b

* Use src/dest in merge

Change-Id: Ie43113492bda8f1ce63eaf9615cb645bb9e2ee86

* Fixed partition test

Change-Id: I46f9e349b1a813a9140f7e4f8a2241687e2df73b

* Removed comments

Change-Id: I309afdd1951d7e796e41d13788aa487707e0ac4c
zhiics pushed a commit to neo-ai/tvm that referenced this pull request Apr 17, 2020
* [RELAY] Fixed issues with MergeCompilerRegions

This PR addresses a few outstanding issues with
the implementation of MergeCompilerRegions. In
particular, it now handles TupleGetItem nodes properly
and other minor bugs related to region merging have
been fixed.

Change-Id: I07783afc56183a6f798a510209f23b0a5f252255

* Fixed issue using pre-merged regions

Change-Id: I0a844ac59bda1089ae0c67cef52f0b0c7ab2cbd7

* Removed some debugging logic

Change-Id: Ib6f2eede6f38bbb270073eb8d4c4dc19f60832c6

* Remove default annotations

Change-Id: I9b7696a51c95871491cbea33c40f92ec327e417f

* Annotate default 'if's

Change-Id: I0098bd1bf6788dd6366810dcefa84f1ebbffaab0

* Clang format

Change-Id: I944365cd3080a97a9261f643a8f1efa5a63cf82b

* Use src/dest in merge

Change-Id: Ie43113492bda8f1ce63eaf9615cb645bb9e2ee86

* Fixed partition test

Change-Id: I46f9e349b1a813a9140f7e4f8a2241687e2df73b

* Removed comments

Change-Id: I309afdd1951d7e796e41d13788aa487707e0ac4c
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants