-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BYOC][DNNL] Enable layer normalization in DNNL byoc. #11508
Conversation
Thanks for your contribution for BYOC-DNNL. And my suggestions are listed below:
|
|
||
dnnl::memory::dims data_shape = nodes_[data_entry.id_].GetOpShape()[data_entry.index_]; | ||
|
||
float epsilon = std::stof(node.GetAttr<std::vector<std::string>>("epsilon")[0]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
original "nn.layer_norm" has not only epsilon argument. At least axis
, center
and scale
. By this code you assume that they always equal axis = -1
, center=true
and scale=true
.
Could you please add support of all attributes or verify their values on codegen stage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I add ICHECK to this case. And will update later for supporting all other attributes.
@crazydemo Answering your question about performance.
Yes, there is performance benefit. At least they use different memory access approach. Consecutive ops with llvm codegen will produce sequence of fused kernel like next:
Totally we have 6 times traversing through data tensor for TVM codegen. DNNL implement it as single kernel and do only 4 passes through memory buffers (or 3 in case of in place memory). In case of multi core system(xeon servers and other) normalise op is memory bound. And reduction of memory access becomes more important. |
98bebf6
to
7a6b334
Compare
Hi @apeskov , please take a look at the latest version here. Feel free to comment more. |
355766e
to
4c34e00
Compare
Hi @masahi Please take a look. |
4c34e00
to
cb54ad9
Compare
cb54ad9
to
0130071
Compare
* Enable layer normalization in DNNL byoc. * Added unittest for layer norm and make code compatible after introducing TensorRequisite(PR-11345) * Fix lint issue * Fix clang format issue
This patch is to enable layer normalization in DNNL BYOC by providing an out-of-box rewrite pattern for combining the operators into a single relay layer normalization operator as well as its implementation in dnnl json codegen.
After applying the rewrite pattern, we will observe the following dnnl function:
Once you enable DNNL_VERBOSE flag, more informations are shown in log file as below:
With this patch, I benchmarked the inference performance of a kind of vision-tranformer called PCPVT (https://arxiv.org/abs/2104.13840) on ICX-8352Y. It gains up to 1.18X boost. Here is some boost data:
Thanks for contributing to TVM! Please refer to guideline https://tvm.apache.org/docs/contribute/ for useful information and tips. After the pull request is submitted, please request code reviews from Reviewers by @ them in the pull request thread.