-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix intel conv2d auto tune #5200
Conversation
I think this issue exist in all auto tvm topi template. |
@kevinthesun Do you also want to send the PR (or update this one) to change zero tensor to random tensor for AutoTVM for stable measurements? |
@FrozenGene If that's the case, would you mind opening an issue tracking all topi ops we might want to modify? |
@anijain2305 Added. |
Did a brief search and here is a list of TOPI files that has the same use case:
btw just curious, do you have an experimental result with an isolated case to illustrate the accuracy issue introduced by |
@comaniac One way to verify this is to directly build a tvm func involving debug_skip_region. I verified that on x86 and debug_skip_region did cause inaccurate measurement. However, I didn't dig into why debug_skip_region causes this. For other platforms, @FrozenGene notices this issue also exists. We might want to verify on other platforms and fix them. |
# This can avoid some memory issues that make the measurement results unreliable. | ||
args = [nd.empty(x[0], dtype=x[1], ctx=ctx) for x in build_result.arg_info] | ||
args = [nd.array(np.random.uniform(0.0, 255.0, size=x[0]).astype(dtype=x[1]), ctx=ctx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will introduce a data copy when using RPCRunner, which will bring some network overhead.
One way to solve this is by implementing a tvm.nd.non_empty
or tvm.nd.random
in the tvm runtime, then we can do the random fill on the target device without copying over the network.
@FrozenGene has implemented a version in our internal codebase. Maybe @FrozenGene can help on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@merrymercy Sure. I will port it to our upstream soon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I open #5216 to track this
Good catch. I can confirm both |
Open #5215 to track this issue. |
@merrymercy @FrozenGene Do we keep empty array for now and wait for non_empty array? |
I am happy with keeping the empty array and merging this first. |
7f0a72a
to
a0e73c9
Compare
+1 |
Is this good to be merged? |
* Fix x86 conv2d and depthwise conv2d auto tuning * Fix depthwise conv2d infer layout * Use random data instead of empty data for autotvm * Fix pylint * Keep empty array for now for autotvm
* Fix x86 conv2d and depthwise conv2d auto tuning * Fix depthwise conv2d infer layout * Use random data instead of empty data for autotvm * Fix pylint * Keep empty array for now for autotvm
* Fix x86 conv2d and depthwise conv2d auto tuning * Fix depthwise conv2d infer layout * Use random data instead of empty data for autotvm * Fix pylint * Keep empty array for now for autotvm
* Fix x86 conv2d and depthwise conv2d auto tuning * Fix depthwise conv2d infer layout * Use random data instead of empty data for autotvm * Fix pylint * Keep empty array for now for autotvm
debug_skip_region will cause execution time to be inaccurate on x86. This PR fixes x86 conv2d and depthwise conv2d.
@icemelon9 @anijain2305