Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Relay][Operator Fusion] When two node love each other very much they join and get in a group but when null is not null they stop loving each other and the join is a not join #3867

Closed
MarisaKirisame opened this issue Aug 31, 2019 · 1 comment

Comments

@MarisaKirisame
Copy link
Contributor

Sorry for the strange title, my sanity deteriorate during the one week long battle with the error.
let me explain the error: basically, operator fusion output not well-formed program if the program has a operator that output tuple, and the tuple size is odd and larger then 1.

Yes, somehow the code is doing fine when tuple has 2, 4, 6, 8... fields, but does not work if it has 3, 5... fields. see uwsampl/relay-aot#26

naturally, i opened fuse_ops.cc and try to look for if (x % 2 == 0) in the code, found nothing, and dont know what to do :)

jokes aside, I think I got the error, but I am hesitant and, as I am not familiar with operator fusion, want ppl to make sure if it is indeed the culpit before fixing it.

In the test case, multiple node without LCA got joined to a single group.
So, in FuseMutator, scope is violated - the wrong parent got infeered, leaving some nodes outside, and by doing so, when the arguments is replaced with parameters (ginfo.arguments), some stuff got out of scoped.

They got joined in https://github.com/dmlc/tvm/blob/master/src/relay/pass/fuse_ops.cc#L459:

      for (auto link = gnode->outputs.head; link != nullptr; link= link->next) {
        size_t oindex = link->value.node->index;
        CHECK_LT(oindex, tree.nodes.size());
        Node* onode = tree.nodes[oindex];
        CHECK(onode != nullptr);
        if (parent != nullptr) {
          parent = LeastCommonAncestor(parent, onode, &pattern);
        } else {
          parent = onode;
        }
        pattern = CombinePattern(pattern, link->value.pattern);
      }

In here, LeastCommonAncestor will return nullptr when there are none.
however, this nullptr has different meaning then the initial nullptr - it mean there should be no group, as they do not have a LCA, while the initial nullptr mean 'no onode explored'.
in the case of a 3-tuple with 3 seprate projection, here's what happend:
0: parent got init to be nullptr, and become onode, which is x.0
1: parent is x.0, and it's LCA with onode(x.1) is nullptr
2: parent is nullptr, and become x.2.

It seems fixable by adding a root group, which does nothing, right?

@jroesch @tqchen @vinx13 @masahi can you guys take a quick look?

@tqchen
Copy link
Member

tqchen commented Aug 31, 2019

Pleas open a new developing discussion thread on https://discuss.tvm.ai/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants