[BYOC][DNNL] Enable layer normalization in DNNL byoc. #11508

billishyahao · 2022-05-30T14:16:15Z

This patch is to enable layer normalization in DNNL BYOC by providing an out-of-box rewrite pattern for combining the operators into a single relay layer normalization operator as well as its implementation in dnnl json codegen.

After applying the rewrite pattern, we will observe the following dnnl function:

6202 def @tvmgen_default_dnnl_main_108(%dnnl_108_i0: Tensor[(1, 784, 128), float32], Inline=1, Compiler="dnnl", global_symbol="tvmgen_default_dnnl_main_108", Primitive=1) -> Tensor[(1, 784, 128), float32] {
6203   nn.layer_norm(%dnnl_108_i0, meta[relay.Constant][56] /* ty=Tensor[(128), float32] */, meta[relay.Constant][57] /* ty=Tensor[(128), float32] */) /* ty=Tensor[(1,  784, 128), float32] */
6204 }

Once you enable DNNL_VERBOSE flag, more informations are shown in log file as below:

onednn_verbose,exec,cpu,layer_normalization,simple_layer_normalization:any,forward_inference,data_f32::blocked:abc:f0 stats_undef::undef::f0 diff_undef::undef::f0,,flags:CH,1x784x128,0.0551758

With this patch, I benchmarked the inference performance of a kind of vision-tranformer called PCPVT (https://arxiv.org/abs/2104.13840) on ICX-8352Y. It gains up to 1.18X boost. Here is some boost data:

32 cores	Latency
baseline byoc	11.45ms
byoc w/ patch	9.68ms

Thanks for contributing to TVM! Please refer to guideline https://tvm.apache.org/docs/contribute/ for useful information and tips. After the pull request is submitted, please request code reviews from Reviewers by @ them in the pull request thread.

crazydemo · 2022-05-31T09:16:47Z

Thanks for your contribution for BYOC-DNNL. And my suggestions are listed below:

I wonder if we can get better performance via running layernorm on dnnl codegen than running consecutive ops on native codegen. Could you please provide some performance numbers?
Lint has failed. Please run task_lint.sh to check the code style.
UT is required. You can add your test cases in tests/python/contrib/test_dnnl.py to ensure the functionality of the enabled ops.

python/tvm/relay/op/contrib/dnnl.py

src/runtime/contrib/dnnl/dnnl_json_runtime.cc

apeskov · 2022-05-31T19:25:46Z

src/runtime/contrib/dnnl/dnnl_json_runtime.cc

+
+    dnnl::memory::dims data_shape = nodes_[data_entry.id_].GetOpShape()[data_entry.index_];
+
+    float epsilon = std::stof(node.GetAttr<std::vector<std::string>>("epsilon")[0]);


original "nn.layer_norm" has not only epsilon argument. At least axis, center and scale. By this code you assume that they always equal axis = -1, center=true and scale=true.

Could you please add support of all attributes or verify their values on codegen stage.

I add ICHECK to this case. And will update later for supporting all other attributes.

src/runtime/contrib/dnnl/dnnl_json_runtime.cc

apeskov · 2022-06-01T20:00:15Z

@crazydemo Answering your question about performance.

I wonder if we can get better performance via running layernorm on dnnl codegen than running consecutive ops on native codegen. Could you please provide some performance numbers?

Yes, there is performance benefit. At least they use different memory access approach. Consecutive ops with llvm codegen will produce sequence of fused kernel like next:

mean. One pass through src tensor
sub. One pass through src and dst tensor
power + mean. One pass through src
add + sqrt + div + mul + add. One pass through src and dst.

Totally we have 6 times traversing through data tensor for TVM codegen. DNNL implement it as single kernel and do only 4 passes through memory buffers (or 3 in case of in place memory).

In case of multi core system(xeon servers and other) normalise op is memory bound. And reduction of memory access becomes more important.

billishyahao · 2022-06-03T15:43:29Z

Hi @apeskov , please take a look at the latest version here. Feel free to comment more.

billishyahao · 2022-06-06T13:23:24Z

Hi @masahi Please take a look.

billishyahao · 2022-06-08T08:16:00Z

Hi @comaniac , @trevor-m , @mbaret , Could you take a look at this pr? Thanks!

masahi · 2022-06-08T08:22:25Z

You need to resolve the conflict in test_dnnl.py, but since it will be modified in #11513, let's merge #11513 first.

…ing TensorRequisite(PR-11345)

billishyahao · 2022-06-08T11:56:27Z

You need to resolve the conflict in test_dnnl.py, but since it will be modified in #11513, let's merge #11513 first.

Hi @masahi , Thanks for pointing out this. I have resolved the conflict in test_dnnl.py.

* Enable layer normalization in DNNL byoc. * Added unittest for layer norm and make code compatible after introducing TensorRequisite(PR-11345) * Fix lint issue * Fix clang format issue

apeskov reviewed May 31, 2022

View reviewed changes

python/tvm/relay/op/contrib/dnnl.py Outdated Show resolved Hide resolved

apeskov reviewed May 31, 2022

View reviewed changes

src/runtime/contrib/dnnl/dnnl_json_runtime.cc Outdated Show resolved Hide resolved

apeskov reviewed May 31, 2022

View reviewed changes

src/runtime/contrib/dnnl/dnnl_json_runtime.cc Outdated Show resolved Hide resolved

billishyahao force-pushed the enable_dnnl_ln branch from 98bebf6 to 7a6b334 Compare June 3, 2022 15:37

billishyahao force-pushed the enable_dnnl_ln branch 2 times, most recently from 355766e to 4c34e00 Compare June 5, 2022 08:00

billishyahao added 3 commits June 8, 2022 17:17

Enable layer normalization in DNNL byoc.

62fb414

Added unittest for layer norm and make code compatible after introduc…

e018afb

…ing TensorRequisite(PR-11345)

Fix lint issue

c8979e1

billishyahao force-pushed the enable_dnnl_ln branch from 4c34e00 to cb54ad9 Compare June 8, 2022 10:03

Fix clang format issue

0130071

billishyahao force-pushed the enable_dnnl_ln branch from cb54ad9 to 0130071 Compare June 8, 2022 10:04

masahi approved these changes Jun 8, 2022

View reviewed changes

masahi merged commit 9817338 into apache:main Jun 8, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BYOC][DNNL] Enable layer normalization in DNNL byoc. #11508

[BYOC][DNNL] Enable layer normalization in DNNL byoc. #11508

billishyahao commented May 30, 2022 •

edited

Loading

crazydemo commented May 31, 2022

apeskov May 31, 2022

billishyahao Jun 3, 2022

apeskov commented Jun 1, 2022

billishyahao commented Jun 3, 2022

billishyahao commented Jun 6, 2022

billishyahao commented Jun 8, 2022

masahi commented Jun 8, 2022

billishyahao commented Jun 8, 2022


		dnnl::memory::dims data_shape = nodes_[data_entry.id_].GetOpShape()[data_entry.index_];

		float epsilon = std::stof(node.GetAttr<std::vector<std::string>>("epsilon")[0]);

[BYOC][DNNL] Enable layer normalization in DNNL byoc. #11508

[BYOC][DNNL] Enable layer normalization in DNNL byoc. #11508

Conversation

billishyahao commented May 30, 2022 • edited Loading

crazydemo commented May 31, 2022

apeskov May 31, 2022

Choose a reason for hiding this comment

billishyahao Jun 3, 2022

Choose a reason for hiding this comment

apeskov commented Jun 1, 2022

billishyahao commented Jun 3, 2022

billishyahao commented Jun 6, 2022

billishyahao commented Jun 8, 2022

masahi commented Jun 8, 2022

billishyahao commented Jun 8, 2022

billishyahao commented May 30, 2022 •

edited

Loading