Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Heterogeneous Execution Get "target" Bug Introduced by #7518 #8536

Closed
Johnson9009 opened this issue Jul 23, 2021 · 1 comment · Fixed by #8537
Closed

[Bug] Heterogeneous Execution Get "target" Bug Introduced by #7518 #8536

Johnson9009 opened this issue Jul 23, 2021 · 1 comment · Fixed by #8537

Comments

@Johnson9009
Copy link
Contributor

I found the bug when sync the latest TVM commits into our own repo, after that lots of our test cases are failed.

The bug will cause the case that all of the operators are annotated to executed on a custom device fail, because the "target" is set to "llvm" incorrectly instead of the custom target like "xxpu".

Even through the root cause of the failure is difficult to describe and understand, but the bug is very obvious.

Before changes of #7518.

  Target GetTargetFromInteger(int64_t dev_type) {
    if (targets_.size() == 1) {
      // homogeneous execution.
      const auto& it = targets_.begin();
      return (*it).second;
    } else {
      // heterogeneous execution.
      std::string call_dev_name;
      if (dev_type == 0) {
-->     call_dev_name = "llvm";   <--- Here haven't set "dev_type", "dev_type" is still equals to 0.
      } else {
        call_dev_name = runtime::DeviceName(dev_type);
      }
      if (targets_.count(dev_type) == 0) {
        LOG(FATAL) << "No target is provided for device " << call_dev_name;
      }
 -->  return targets_[dev_type];  <--- Use 0 get the correct "target".
    }
  }
...
target = GetTargetFromInteger(call_dev_type);

After changes of #7518.

      std::string call_dev_name;
      if (call_dev_type == 0) {
        call_dev_name = "llvm";
--->    call_dev_type = kDLCPU;   <--- Here shouldn't change the value of "call_dev_type".
      } else {
        call_dev_name = ::tvm::runtime::DeviceName(call_dev_type);
      }

      if (targets_.count(call_dev_type) == 0) {
        std::stringstream msg;
        msg << "No target is specified for provided device name: `" << call_dev_name << "`\n\n";
....
        }
        LOG(FATAL) << msg.str();
      }

--->  target = targets_[call_dev_type];     <--- The 0th item of "targets_" is the correct target, but the "call_dev_type" is set to kDLCPU(i.e., 1), so the target is wrong.

I think the authors of #7518 is mislead by the code call_dev_name = "llvm";, if we set the target dictionary to {"cpu": "llvm", "xxpu": "xxpu"} when call "relay.build", then the size of targets_ at above code is 3 instead of 2, the 0th item of targets_ maybe is "xxpu", so here the call_dev_type can't be changed to kDLCPU.

@csullivan @jroesch @tqchen

@comaniac
Copy link
Contributor

@jroesch this might be the root cause of the issue I mentioned in the forum days ago.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants