Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix intel conv2d auto tune #5200

Merged
merged 5 commits into from
Apr 4, 2020

Conversation

kevinthesun
Copy link
Contributor

debug_skip_region will cause execution time to be inaccurate on x86. This PR fixes x86 conv2d and depthwise conv2d.

@icemelon9 @anijain2305

@FrozenGene
Copy link
Member

I think this issue exist in all auto tvm topi template.

@anijain2305
Copy link
Contributor

anijain2305 commented Apr 1, 2020

@kevinthesun Do you also want to send the PR (or update this one) to change zero tensor to random tensor for AutoTVM for stable measurements?

@kevinthesun
Copy link
Contributor Author

@FrozenGene If that's the case, would you mind opening an issue tracking all topi ops we might want to modify?

@kevinthesun
Copy link
Contributor Author

@anijain2305 Added.

@comaniac
Copy link
Contributor

comaniac commented Apr 1, 2020

Did a brief search and here is a list of TOPI files that has the same use case:

  • arm_cpu/conv2d_spatial_pack.py
  • arm_cpu/conv2d.py
  • arm_cpu/depthwise_conv2d.py
  • bifrost/conv2d.py
  • cuda/conv2d_int8.py
  • cuda/conv2d_winograd.py
  • cuda/group_conv2d_nchw.py
  • mali/conv2d.py

btw just curious, do you have an experimental result with an isolated case to illustrate the accuracy issue introduced by debug_skip_region?

@kevinthesun
Copy link
Contributor Author

kevinthesun commented Apr 1, 2020

@comaniac One way to verify this is to directly build a tvm func involving debug_skip_region. I verified that on x86 and debug_skip_region did cause inaccurate measurement. However, I didn't dig into why debug_skip_region causes this. For other platforms, @FrozenGene notices this issue also exists. We might want to verify on other platforms and fix them.

# This can avoid some memory issues that make the measurement results unreliable.
args = [nd.empty(x[0], dtype=x[1], ctx=ctx) for x in build_result.arg_info]
args = [nd.array(np.random.uniform(0.0, 255.0, size=x[0]).astype(dtype=x[1]), ctx=ctx)
Copy link
Member

@merrymercy merrymercy Apr 1, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will introduce a data copy when using RPCRunner, which will bring some network overhead.
One way to solve this is by implementing a tvm.nd.non_empty or tvm.nd.random in the tvm runtime, then we can do the random fill on the target device without copying over the network.

@FrozenGene has implemented a version in our internal codebase. Maybe @FrozenGene can help on this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@merrymercy Sure. I will port it to our upstream soon.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I open #5216 to track this

@merrymercy
Copy link
Member

merrymercy commented Apr 1, 2020

Good catch. I can confirm both tvm.nd.empty and debug_skip_region will cause inaccurate measurement from my experiences.

@FrozenGene
Copy link
Member

Open #5215 to track this issue.

@kevinthesun
Copy link
Contributor Author

@merrymercy @FrozenGene Do we keep empty array for now and wait for non_empty array?

@merrymercy
Copy link
Member

I am happy with keeping the empty array and merging this first.

@kevinthesun kevinthesun force-pushed the FixIntelConv2dAutoTune branch from 7f0a72a to a0e73c9 Compare April 2, 2020 21:39
@FrozenGene
Copy link
Member

I am happy with keeping the empty array and merging this first.

+1

@kevinthesun
Copy link
Contributor Author

Is this good to be merged?

@merrymercy merrymercy merged commit 0cfdecd into apache:master Apr 4, 2020
zhiics pushed a commit to comaniac/tvm that referenced this pull request Apr 7, 2020
* Fix x86 conv2d and depthwise conv2d auto tuning

* Fix depthwise conv2d infer layout

* Use random data instead of empty data for autotvm

* Fix pylint

* Keep empty array for now for autotvm
trevor-m pushed a commit to trevor-m/tvm that referenced this pull request Apr 16, 2020
* Fix x86 conv2d and depthwise conv2d auto tuning

* Fix depthwise conv2d infer layout

* Use random data instead of empty data for autotvm

* Fix pylint

* Keep empty array for now for autotvm
zhiics pushed a commit to neo-ai/tvm that referenced this pull request Apr 17, 2020
* Fix x86 conv2d and depthwise conv2d auto tuning

* Fix depthwise conv2d infer layout

* Use random data instead of empty data for autotvm

* Fix pylint

* Keep empty array for now for autotvm
dpankratz pushed a commit to dpankratz/incubator-tvm that referenced this pull request Apr 24, 2020
* Fix x86 conv2d and depthwise conv2d auto tuning

* Fix depthwise conv2d infer layout

* Use random data instead of empty data for autotvm

* Fix pylint

* Keep empty array for now for autotvm
@kevinthesun kevinthesun deleted the FixIntelConv2dAutoTune branch May 26, 2020 17:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants