[TOPI] Basic x86 schedules #775

masahi · 2018-01-11T23:36:09Z

Contains the following:

NCHW conv schedule
NHWC conv compute, schedule, test case
multi threading for elemwise/injective ops, pad, pool, softmax
multi threaded + parallelized bias + batch norm + relu, fused into conv if possible

yzhliu · 2018-01-16T00:56:14Z

topi/python/topi/x86/conv2d.py

+                    n, c, h, w = op.axis
+                    fused = s[op].fuse(n, c)
+                    s[op].parallel(fused)
+                    s[op].vectorize(w)


what's this for? shouldn't it be under if 'conv2d_nchw' in op.tag: ?

this is for parallelizing and vectorizing bias + batchnorm + relu, which comes right after conv2d.
To understand this, you need to compile your network (resnet18, say) with nnvm and dump lowered IR after operator fusion.
See this dmlc/nnvm#292

Also see this
https://github.com/dmlc/nnvm/issues/275

yzhliu · 2018-01-16T01:10:47Z

topi/python/topi/x86/conv2d.py

+            s[C].reorder(fused, rc, h, wo, ry, rx, wi) # move rc to outer loop
+            s[C].unroll(rx)
+            s[C].unroll(ry)
+            s[C].vectorize(wi)


I tested this schedule on AWS C5 instance (skylake-avx512 4-cpu) which does not perform well, about 13 ms for conv only.
My input is (1, 64, 56, 56) and kernel size (64, 64, 3, 3)

Looks like the final s[C].vectorize(wi) is not working for this input size (56), meaning there is no vector instruction generated.

For my use cases, all of input width and height is power of two. I wrote my schedules assuming such input.

yzhliu · 2018-01-16T01:15:41Z

topi/python/topi/x86/conv2d.py

+
+
+@generic.schedule_conv2d_nhwc.register(["cpu"])
+def schedule_conv2d_nhwc(outs):


On the same C5 instance, this one works pretty good, 1.8 ms for (1, 56, 56, 64) input and (3, 3, 64, 64) kernel size.

yzhliu · 2018-01-16T01:21:34Z

topi/python/topi/x86/injective.py

+    tvm.schedule.AutoInlineInjective(s)
+    if len(s[x].op.axis) == 4:
+        n, c, _, _ = s[x].op.axis
+        fused = s[x].fuse(n, c) # for nhwc layout, fuse n and h


I'm still confused with this. why not put it in schedule_conv2d ?
and the comment does not match.

this schedule is not about conv2d. This schedule is used for multi threading injective ops such as pooling, upsampling, and softmax. All elemwise/broadcast ops are also multithreaded with this schedule.

the comment is actually correct, what I mean is s[x].op.axis depends on the input layout. With NCHW layout it will be (n, c, h, w), while with NHWC it will be (n, h, w, c). The comment says no matter what layout, the schedule fuse the first two axes.

masahi · 2018-01-16T02:48:25Z

@yzhliu this PR is not just about x86 conv schedules, but also contains some change I made to run a whole network run faster in conjunction with NNVM.

I should also mention that my schedule is optimized for my use case (input size being power of two, all conv are 3x3, etc..). In particular, I didn't optimize my schedules for imagenet workload at all. Nonetheless, it should be better than the current x86 schedule (which is mostly empty).

The idea is @yzhliu or some others from community will improve on it, to make it faster on a wide range of workloads, including imagenet.

tqchen · 2018-01-16T19:47:56Z

I am going to merge this given the solution is better than current x86 ones. We can followup with updates to improve it

* add basic x86 schedules * parallelize & vectorize batchnorm + relu * fuse conv into bn + relu * move rc loop to outer * add nhwc conv * change weight layout to hwcf * conv + bn + relu fusion for nhwc conv * fix conv_nhwc schedule when no fusion * clean up default parallel schedules * simplify elemwise parallel * fix elemwise parallel for batch == 1 * update nhwc conv test * fix and add comment * fix lint * remove redundant import * remove default multithreading for some ops * remove default multithreading for global pool

Masahiro Masuda and others added 17 commits January 12, 2018 08:19

add basic x86 schedules

865b609

parallelize & vectorize batchnorm + relu

2234cd0

fuse conv into bn + relu

6a33991

move rc loop to outer

a728a7b

add nhwc conv

1a03a2c

change weight layout to hwcf

45852b8

conv + bn + relu fusion for nhwc conv

1ececc2

fix conv_nhwc schedule when no fusion

e355faf

clean up default parallel schedules

3d89fd1

simplify elemwise parallel

3b67b29

fix elemwise parallel for batch == 1

dbb82fe

update nhwc conv test

788b153

fix and add comment

51ad678

fix lint

10fffdd

remove redundant import

bde4080

remove default multithreading for some ops

0d4acbb

remove default multithreading for global pool

32ef045

alex-weaver mentioned this pull request Jan 15, 2018

Porting schedules (except convolutions) to C++ #763

Merged

27 tasks

yzhliu reviewed Jan 16, 2018

View reviewed changes

tqchen approved these changes Jan 16, 2018

View reviewed changes

tqchen merged commit 519ea5a into apache:master Jan 16, 2018

tqchen mentioned this pull request Jan 16, 2018

Crash when building unet-style network #787

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[TOPI] Basic x86 schedules #775

[TOPI] Basic x86 schedules #775

masahi commented Jan 11, 2018

yzhliu Jan 16, 2018

masahi Jan 16, 2018 •

edited

Loading

yzhliu Jan 16, 2018

masahi Jan 16, 2018 •

edited

Loading

yzhliu Jan 16, 2018

yzhliu Jan 16, 2018

masahi Jan 16, 2018 •

edited

Loading

masahi commented Jan 16, 2018 •

edited

Loading

tqchen commented Jan 16, 2018



		@generic.schedule_conv2d_nhwc.register(["cpu"])
		def schedule_conv2d_nhwc(outs):

[TOPI] Basic x86 schedules #775

[TOPI] Basic x86 schedules #775

Conversation

masahi commented Jan 11, 2018

yzhliu Jan 16, 2018

Choose a reason for hiding this comment

masahi Jan 16, 2018 • edited Loading

Choose a reason for hiding this comment

yzhliu Jan 16, 2018

Choose a reason for hiding this comment

masahi Jan 16, 2018 • edited Loading

Choose a reason for hiding this comment

yzhliu Jan 16, 2018

Choose a reason for hiding this comment

yzhliu Jan 16, 2018

Choose a reason for hiding this comment

masahi Jan 16, 2018 • edited Loading

Choose a reason for hiding this comment

masahi commented Jan 16, 2018 • edited Loading

tqchen commented Jan 16, 2018

masahi Jan 16, 2018 •

edited

Loading

masahi Jan 16, 2018 •

edited

Loading

masahi Jan 16, 2018 •

edited

Loading

masahi commented Jan 16, 2018 •

edited

Loading