-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[GPU/OpenCL] Initial version of Transpose (all axes) with OpenCL ops. #2750
Conversation
📝 TAOS-CI Version: 1.5.20200925. Thank you for submitting PR #2750. Please a submit 1commit/1PR (one commit per one PR) policy to get comments quickly from reviewers. Your PR must pass all verificiation processes of cibot before starting a review process from reviewers. If you are new member to join this project, please read manuals in documentation folder and wiki page. In order to monitor a progress status of your PR in more detail, visit http://ci.nnstreamer.ai/. |
cibot: @niket-agarwal, The last line of a text file must have a newline character. Please append a new line at the end of the line in test/unittest/layers/unittest_layers_transpose_cl.cpp. |
6dd356c
to
984da60
Compare
cibot: @niket-agarwal, The last line of a text file must have a newline character. Please append a new line at the end of the line in test/unittest/layers/unittest_layers_transpose_cl.cpp. |
d5932ce
to
4df7b28
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@niket-agarwal, 💯 All CI checkers are successfully verified. Thanks.
* @param[in] input Tensor | ||
* @param[in] result Tensor | ||
*/ | ||
void Transpose_i_cl(const std::string &direction, Tensor const &in, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it might be better to follow the naming convention.
What about changing the first letter as lowercase?
In addition, why do you name this with _i
? It seems not the inplace operation though.
void Transpose_i_cl(const std::string &direction, Tensor const &in, | |
void transposeCl(const std::string &direction, Tensor const &in, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated it, thanks! Please review.
LayerGoldenTestParamOptions::USE_INC_FORWARD, | ||
"nchw", "fp16", "fp16"); | ||
|
||
// auto transpose_basic_plain_w16a16_axis1 = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you leave these commented out?
Is there anything to do more to enable these unittests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only a specific direction transpose call (1:0:2) is currently being made in reference to the CPU implementation. I have written a more generalized version, but other axes aren't needed in the current flow. Hence commented out the other testcases. Might be helpful later.
void TransposeLayerCl::forwarding(RunLayerContext &context, bool training) { | ||
Tensor &in = context.getInput(SINGLE_INOUT_IDX); | ||
Tensor &out = context.getOutput(SINGLE_INOUT_IDX); | ||
Transpose_i_cl("1:0:2", in, out); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The cpu version of the transpose layer implementation should refer to "nntrainer/layers/permute_layer.cpp", not "Applications/LLaMA/jni/transpose_layer.cpp (it's just an example implementation limited to the LLaMA application)". I have already left a comment about this when your PR #2690 was previously uploaded. Why is this implementation specified axis as "1:0:2"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, the same was discussed with Mr. Moon earlier and was decided that it's fine to merge as of now as our application is MHA currently and this would work fine with it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for checking
4df7b28
to
3badb27
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@niket-agarwal, 💯 All CI checkers are successfully verified. Thanks.
3badb27
to
b4f660d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@niket-agarwal, 💯 All CI checkers are successfully verified. Thanks.
5d2614e
to
56be618
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@niket-agarwal, 💯 All CI checkers are successfully verified. Thanks.
56be618
to
1c14454
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@niket-agarwal, 💯 All CI checkers are successfully verified. Thanks.
} else { | ||
dim[i].channel(dim[i].channel()); | ||
dim[i].height(dim[i].height()); | ||
dim[i].width(dim[i].width()); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this code has no effect. let's remove this line!
} else { | |
dim[i].channel(dim[i].channel()); | |
dim[i].height(dim[i].height()); | |
dim[i].width(dim[i].width()); | |
} |
void TransposeLayerCl::forwarding(RunLayerContext &context, bool training) { | ||
Tensor &in = context.getInput(SINGLE_INOUT_IDX); | ||
Tensor &out = context.getOutput(SINGLE_INOUT_IDX); | ||
transposeCl("1:0:2", in, out); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's leave a comment that "1:0:2" is arbitrary.
#define CREATE_IF_EMPTY_DIMS(tensor, ...) \ | ||
do { \ | ||
if (tensor.empty()) \ | ||
tensor = Tensor(__VA_ARGS__); \ | ||
} while (0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#define CREATE_IF_EMPTY_DIMS(tensor, ...) \ | |
do { \ | |
if (tensor.empty()) \ | |
tensor = Tensor(__VA_ARGS__); \ | |
} while (0); |
do not redefine the same macro. please use it from the tensor.h
/** | ||
* @copydoc bool supportBackwarding() const | ||
*/ | ||
bool supportBackwarding() const override { return true; }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should return false.
bool supportBackwarding() const override { return true; }; | |
bool supportBackwarding() const override { return false; }; |
void transpose_cl_axis0(const float *in, float *res, | ||
unsigned int input_batch_size, | ||
unsigned int input_channels, unsigned int input_height, | ||
unsigned int input_width) { | ||
|
||
bool result = false; | ||
|
||
do { | ||
ClContext::SharedPtrClKernel kernel_transpose_ptr = | ||
cl_context_ref.registerClKernel(transpose_cl_kernel_axis0, | ||
"transpose_cl_axis0"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no need to have three different functions by axis. it can be combined into one.
void transpose_cl_axis0(const float *in, float *res, | |
unsigned int input_batch_size, | |
unsigned int input_channels, unsigned int input_height, | |
unsigned int input_width) { | |
bool result = false; | |
do { | |
ClContext::SharedPtrClKernel kernel_transpose_ptr = | |
cl_context_ref.registerClKernel(transpose_cl_kernel_axis0, | |
"transpose_cl_axis0"); | |
void transpose_cl_axis(const float *in, float *res, | |
unsigned int input_batch_size, | |
unsigned int input_channels, unsigned int input_height, | |
unsigned int input_width, unsigned int axis) { | |
bool result = false; | |
do { | |
switch (axis) { | |
case 0: | |
kernel_transpose_ptr = cl_context_ref.registerClKernel( | |
transpose_cl_kernel_axis0, "transpose_cl_axis0"); | |
break; | |
case 1: | |
kernel_transpose_ptr = cl_context_ref.registerClKernel( | |
transpose_cl_kernel_axis0, "transpose_cl_axis1"); | |
break; | |
case 2: | |
kernel_transpose_ptr = cl_context_ref.registerClKernel( | |
transpose_cl_kernel_axis0, "transpose_cl_axis2"); | |
break; | |
default: | |
throw std::invalid_argument("failed to register CL kernel"); | |
break; | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There would be difference in the work_group_count that we pass for the three axes. Hence I didn't merge them into one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you suggest adding another switch condition for them as well, or should be keep them separate only?
cb4485d
to
009c3cc
Compare
Added naive version of OpenCL implementation for Transpose. Incorporated kernel for ops using blas_kernels. Added unit test for Transpose_cl. Signed-off-by: Niket Agarwal <[email protected]>
009c3cc
to
a3f3a58
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@niket-agarwal, 💯 All CI checkers are successfully verified. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great job! I think it is good to go 👍
(New PR. Closed last one due to merge conflicts.)
Added initial version of Transpose function for GPU. This is a basic implementation using naive kernel.
Changes added with this PR:
Incorporated kernels for GPU to transpose about different axes.
Added unittest_layers_transpose_cl.cpp to test Transpose function on GPU.
Updated with the recent GPU pipeline changes.
Signed-off-by: Niket Agarwal [email protected]