-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Object reuse for Convolution #1113
Comments
Hi @cfRod , void app::execute_2_convs() {
...
auto conv_prim = convolution_forward(conv_pd);
...
conv_prim.execute(engine_stream, conv_args1);
conv_prim.execute(engine_stream, conv_args2);
} Such case is not relevant to primitive cache, because the same convolution primitive object is reused and there is no need for cache. Cache is needed when application/framework can't keep oneDNN primitive objects between executions and recreates it before each execution: void app::execute_conv() { // conv will be destroyed once execute_conv is finished
...
auto conv_prim = convolution_forward(conv_pd);
...
conv_prim.execute(engine_stream, conv_args);
}
void app::execute_2_convs() {
execute_conv();
execute_conv(); // if cache is enabled conv object will be recreated much faster due to cached information
}
Regards, |
Hi @igorsafo , and the specific test case is https://github.com/apache/incubator-mxnet/blob/4e3a4c903f8befd81a0e90c29484c17ced35c519/tests/python/unittest/test_operator.py#L1707
Here, the first convolution is done by Gemm reference as it is not supported by ACL and then the two subsequent convolution with different input and weight slices is done by ACL. The results from ACL are incorrect for the second slice and it is down to a limitation in ACL as explained in the issue. Is it possible to prevent object reuse in oneDNN (1 create and followed by two execs) and create a new primitive every time or from your comment it seems this is controlled from the framework/application? |
I remember there is a primitive object cache on mxnet side which was implemented before the feature was officially supported in the library. |
Interesting, thanks for the details. According to your code snippet convolution should be recreated at the second iteration of Another strange behavior you described in the mxnet issue which is weights are immutable and don't get change between
In the source code I see that gemm-based implementation sets appropriate weights memory buffer each execution call:
Anyway regardless of the workaround seems like we need to find an issue in ACL integration into oneDNN and fix it to avoid workarounds in every framework. Casting @nSircombe and @alenik01 as developers of ACL integration. |
@cfRod @TaoLv on a second thought, does it mean that MxNet has a bug in their hashing function in cache? How is it possible that an object for a convolution with number of groups equal to 1 is reused for a convolution with number of groups equal to 2? These convolution should have different memory descriptors for weights and implementations. |
@igorsafo, I don't think MxNET has a bug in their hashing function. As you can see my logs in the mxnet issue (putting it here again for reference). I should have amended the code to show that I had fixed num groups to 2. Adding it here:
It create two objects, one for when num groups = 2 and gemm reference is called because ACL does not support it and the second one for num groups = 1, when a convolution is created and passed with two different slices of inputs and weights. @nSircombe and @alenik01 are in the same team as me :D. |
Yes, I think this is the case. |
My bad, got it :)
Thanks, this helps. Is slicing an in-place operation in this case? Does convolution work with dense slices or it works with strided memory which points to the original buffer of a variable (x, w, b)? In the first case the same convolution can be reused, in the second case the convolutions should be different. Regards, |
From what I understand ACL internally calls a reshape function https://github.com/ARM-software/ComputeLibrary/blob/f7399fd616cc8ccedfae668da3023e589a7ef558/src/runtime/NEON/functions/NEGEMMConvolutionLayer.cpp#L618 and it does this step the first time around. The second call to the same convolution it does not and this causes the issue. We had a similar problem with TensorFlow caching (tensorflow/tensorflow#47415) |
Agreed. I initially thought it was on the oneDNN side but thanks for your explanation. It made it clearer. |
I see, so the convolution object is tied to a particular weights memory buffer. I don't think we can control it using oneDNN API. in oneDNN primitive is immutable during execution and all buffers are passed as arguments to it, but seems like ACL keeps some context.
// pseudo code of src/cpu/aarch64/acl_gemm_convolution.cpp
status_t acl_gemm_convolution_fwd_t<src_type, wei_type, dst_type,
bia_type>::execute_forward(const exec_ctx_t &ctx) const {
auto new_weights = CTX_IN_MEM(const wei_data_t *, DNNL_ARG_WEIGHTS);
auto prev_weights_storage = ctx.get_resource_mapper()->get<acl_resource_t>(this)->prev_weights;
auto prev_weights = *prev_weights_storage;
if (new_weights != prev_weights) {
ACL->prepare_weights(new_weights);
*prev_weights_storage = new_weights;
}
ACL->convolution();
} |
Actually it can be fixed on ACL side as well by checking the incoming weights pointer and doing the reordering inside ACL in case it is different. This will be easier since there will be no need in new ACL API and changes on oneDNN side. |
@igorsafo, thanks for the information. We do have a plan to fix this in ACL but until then we will work on a workaround as a temporary solution. I think fixing mxnet hashing to avoid reuse for Arm will have to do in this case. I think this issue can be closed now. |
Hi,
I wanted to know if it is possible to disable object reuse for the convolution operator as per (https://docs.oneapi.com/versions/latest/onednn/dev_guide_primitive_cache.html).
I built oneDNN reference on AArch64 to demonstrate this. Here, the convolution with two different input slices but with same configuration (filter sizes, input sizes etc) will reuse the same primitive regardless of the user setting DNNL_ENABLE_PRIMITIVE_CACHE OFF at build time or setting the run time control DNNL_PRIMITIVE_CACHE_CAPACITY to 0.
I set
This should set the DNNL_PRIMITIVE_CACHE_CAPACITY internally to 0. However for the convolution operator the code path that checks for cache hits and misses is never called again?
The text was updated successfully, but these errors were encountered: