[GPU/OpenCL] Initial version of Transpose (all axes) with OpenCL ops. #2750

niket-agarwal · 2024-10-09T06:36:50Z

(New PR. Closed last one due to merge conflicts.)

Added initial version of Transpose function for GPU. This is a basic implementation using naive kernel.

Changes added with this PR:

Incorporated kernels for GPU to transpose about different axes.
Added unittest_layers_transpose_cl.cpp to test Transpose function on GPU.
Updated with the recent GPU pipeline changes.

Signed-off-by: Niket Agarwal [email protected]

taos-ci · 2024-10-09T06:36:53Z

📝 TAOS-CI Version: 1.5.20200925. Thank you for submitting PR #2750. Please a submit 1commit/1PR (one commit per one PR) policy to get comments quickly from reviewers. Your PR must pass all verificiation processes of cibot before starting a review process from reviewers. If you are new member to join this project, please read manuals in documentation folder and wiki page. In order to monitor a progress status of your PR in more detail, visit http://ci.nnstreamer.ai/.

taos-ci · 2024-10-09T06:37:00Z

cibot: @niket-agarwal, The last line of a text file must have a newline character. Please append a new line at the end of the line in test/unittest/layers/unittest_layers_transpose_cl.cpp.

taos-ci · 2024-10-09T06:38:33Z

cibot: @niket-agarwal, The last line of a text file must have a newline character. Please append a new line at the end of the line in test/unittest/layers/unittest_layers_transpose_cl.cpp.

taos-ci

@niket-agarwal, 💯 All CI checkers are successfully verified. Thanks.

EunjuYang · 2024-10-10T07:08:23Z

nntrainer/tensor/cl_operations/blas_kernel_interface.h

+ * @param[in] input Tensor
+ * @param[in] result Tensor
+ */
+void Transpose_i_cl(const std::string &direction, Tensor const &in,


I think it might be better to follow the naming convention.
What about changing the first letter as lowercase?
In addition, why do you name this with _i ? It seems not the inplace operation though.

Suggested change

void Transpose_i_cl(const std::string &direction, Tensor const &in,

void transposeCl(const std::string &direction, Tensor const &in,

Updated it, thanks! Please review.

EunjuYang · 2024-10-10T07:10:06Z

test/unittest/layers/unittest_layers_transpose_cl.cpp

+                             LayerGoldenTestParamOptions::USE_INC_FORWARD,
+                           "nchw", "fp16", "fp16");
+
+// auto transpose_basic_plain_w16a16_axis1 =


Why did you leave these commented out?
Is there anything to do more to enable these unittests?

Only a specific direction transpose call (1:0:2) is currently being made in reference to the CPU implementation. I have written a more generalized version, but other axes aren't needed in the current flow. Hence commented out the other testcases. Might be helpful later.

baek2sm · 2024-10-14T05:24:12Z

nntrainer/layers/cl_layers/transpose_cl.cpp

+void TransposeLayerCl::forwarding(RunLayerContext &context, bool training) {
+  Tensor &in = context.getInput(SINGLE_INOUT_IDX);
+  Tensor &out = context.getOutput(SINGLE_INOUT_IDX);
+  Transpose_i_cl("1:0:2", in, out);


The cpu version of the transpose layer implementation should refer to "nntrainer/layers/permute_layer.cpp", not "Applications/LLaMA/jni/transpose_layer.cpp (it's just an example implementation limited to the LLaMA application)". I have already left a comment about this when your PR #2690 was previously uploaded. Why is this implementation specified axis as "1:0:2"?

Hi, the same was discussed with Mr. Moon earlier and was decided that it's fine to merge as of now as our application is MHA currently and this would work fine with it.

Thanks for checking

taos-ci

@niket-agarwal, 💯 All CI checkers are successfully verified. Thanks.

taos-ci

@niket-agarwal, 💯 All CI checkers are successfully verified. Thanks.

taos-ci

@niket-agarwal, 💯 All CI checkers are successfully verified. Thanks.

taos-ci

@niket-agarwal, 💯 All CI checkers are successfully verified. Thanks.

djeong20 · 2024-10-30T00:10:35Z

nntrainer/layers/cl_layers/transpose_cl.cpp

+    } else {
+      dim[i].channel(dim[i].channel());
+      dim[i].height(dim[i].height());
+      dim[i].width(dim[i].width());
+    }


this code has no effect. let's remove this line!

Suggested change

} else {

dim[i].channel(dim[i].channel());

dim[i].height(dim[i].height());

dim[i].width(dim[i].width());

}

djeong20 · 2024-10-30T00:14:18Z

nntrainer/layers/cl_layers/transpose_cl.cpp

+void TransposeLayerCl::forwarding(RunLayerContext &context, bool training) {
+  Tensor &in = context.getInput(SINGLE_INOUT_IDX);
+  Tensor &out = context.getOutput(SINGLE_INOUT_IDX);
+  transposeCl("1:0:2", in, out);


let's leave a comment that "1:0:2" is arbitrary.

djeong20 · 2024-10-30T00:17:07Z

nntrainer/layers/cl_layers/transpose_cl.h

+#define CREATE_IF_EMPTY_DIMS(tensor, ...) \
+  do {                                    \
+    if (tensor.empty())                   \
+      tensor = Tensor(__VA_ARGS__);       \
+  } while (0);


Suggested change

#define CREATE_IF_EMPTY_DIMS(tensor, ...) \

do { \

if (tensor.empty()) \

tensor = Tensor(__VA_ARGS__); \

} while (0);

do not redefine the same macro. please use it from the tensor.h

djeong20 · 2024-10-30T00:20:21Z

nntrainer/layers/cl_layers/transpose_cl.h

+  /**
+   * @copydoc bool supportBackwarding() const
+   */
+  bool supportBackwarding() const override { return true; };


this should return false.

Suggested change

bool supportBackwarding() const override { return true; };

bool supportBackwarding() const override { return false; };

djeong20 · 2024-10-30T00:31:23Z

nntrainer/tensor/cl_operations/blas_kernels.cpp

+void transpose_cl_axis0(const float *in, float *res,
+                        unsigned int input_batch_size,
+                        unsigned int input_channels, unsigned int input_height,
+                        unsigned int input_width) {
+
+  bool result = false;
+
+  do {
+    ClContext::SharedPtrClKernel kernel_transpose_ptr =
+      cl_context_ref.registerClKernel(transpose_cl_kernel_axis0,
+                                      "transpose_cl_axis0");


There's no need to have three different functions by axis. it can be combined into one.

Suggested change

void transpose_cl_axis0(const float *in, float *res,

unsigned int input_batch_size,

unsigned int input_channels, unsigned int input_height,

unsigned int input_width) {

bool result = false;

do {

ClContext::SharedPtrClKernel kernel_transpose_ptr =

cl_context_ref.registerClKernel(transpose_cl_kernel_axis0,

"transpose_cl_axis0");

void transpose_cl_axis(const float *in, float *res,

unsigned int input_batch_size,

unsigned int input_channels, unsigned int input_height,

unsigned int input_width, unsigned int axis) {

bool result = false;

do {

switch (axis) {

case 0:

kernel_transpose_ptr = cl_context_ref.registerClKernel(

transpose_cl_kernel_axis0, "transpose_cl_axis0");

break;

case 1:

kernel_transpose_ptr = cl_context_ref.registerClKernel(

transpose_cl_kernel_axis0, "transpose_cl_axis1");

break;

case 2:

kernel_transpose_ptr = cl_context_ref.registerClKernel(

transpose_cl_kernel_axis0, "transpose_cl_axis2");

break;

default:

throw std::invalid_argument("failed to register CL kernel");

break;

}

There would be difference in the work_group_count that we pass for the three axes. Hence I didn't merge them into one.

Would you suggest adding another switch condition for them as well, or should be keep them separate only?

Added naive version of OpenCL implementation for Transpose. Incorporated kernel for ops using blas_kernels. Added unit test for Transpose_cl. Signed-off-by: Niket Agarwal <[email protected]>

taos-ci

@niket-agarwal, 💯 All CI checkers are successfully verified. Thanks.

djeong20

Great job! I think it is good to go 👍

niket-agarwal requested review from myungjoo, jijoongmoon, again4you, jaeyun-jung, leemgs, wooksong, helloahn, kparichay, gichan-jang, anyj0527, zhoonit, lhs8928, songgot, jihochu, DonghakPark, SeoHyungjun, baek2sm, skykongkong8, djeong20, EunjuYang and a team as code owners October 9, 2024 06:36

github-actions bot added the Need Review label Oct 9, 2024

niket-agarwal force-pushed the transpose_new branch from 6dd356c to 984da60 Compare October 9, 2024 06:38

niket-agarwal force-pushed the transpose_new branch 2 times, most recently from d5932ce to 4df7b28 Compare October 9, 2024 07:03

taos-ci approved these changes Oct 9, 2024

View reviewed changes

EunjuYang reviewed Oct 10, 2024

View reviewed changes

baek2sm requested changes Oct 14, 2024

View reviewed changes

niket-agarwal requested a review from baek2sm October 16, 2024 05:53

baek2sm approved these changes Oct 18, 2024

View reviewed changes

niket-agarwal force-pushed the transpose_new branch from 4df7b28 to 3badb27 Compare October 21, 2024 08:04

taos-ci approved these changes Oct 21, 2024

View reviewed changes

niket-agarwal force-pushed the transpose_new branch from 3badb27 to b4f660d Compare October 22, 2024 10:01

taos-ci approved these changes Oct 22, 2024

View reviewed changes

niket-agarwal force-pushed the transpose_new branch 2 times, most recently from 5d2614e to 56be618 Compare October 22, 2024 10:48

taos-ci approved these changes Oct 22, 2024

View reviewed changes

niket-agarwal force-pushed the transpose_new branch from 56be618 to 1c14454 Compare October 29, 2024 09:49

taos-ci approved these changes Oct 29, 2024

View reviewed changes

djeong20 requested changes Oct 30, 2024

View reviewed changes

niket-agarwal force-pushed the transpose_new branch 2 times, most recently from cb4485d to 009c3cc Compare November 13, 2024 09:54

[GPU/OpenCL] Initial version of Transpose (all axes) with OpenCL ops

a3f3a58

Added naive version of OpenCL implementation for Transpose. Incorporated kernel for ops using blas_kernels. Added unit test for Transpose_cl. Signed-off-by: Niket Agarwal <[email protected]>

niket-agarwal force-pushed the transpose_new branch from 009c3cc to a3f3a58 Compare November 13, 2024 09:59

niket-agarwal requested a review from djeong20 November 13, 2024 10:00

taos-ci approved these changes Nov 13, 2024

View reviewed changes

djeong20 approved these changes Nov 14, 2024

View reviewed changes

github-actions bot added PR/READY2MERGE and removed Need Review labels Nov 14, 2024

jijoongmoon merged commit 07250f1 into nnstreamer:main Nov 14, 2024
38 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[GPU/OpenCL] Initial version of Transpose (all axes) with OpenCL ops. #2750

[GPU/OpenCL] Initial version of Transpose (all axes) with OpenCL ops. #2750

niket-agarwal commented Oct 9, 2024

taos-ci commented Oct 9, 2024

taos-ci commented Oct 9, 2024

taos-ci commented Oct 9, 2024

taos-ci left a comment

EunjuYang Oct 10, 2024 •

edited

Loading

niket-agarwal Oct 21, 2024

EunjuYang Oct 10, 2024

niket-agarwal Oct 10, 2024

baek2sm Oct 14, 2024 •

edited

Loading

niket-agarwal Oct 14, 2024 •

edited

Loading

baek2sm Oct 18, 2024

taos-ci left a comment

taos-ci left a comment

taos-ci left a comment

taos-ci left a comment

djeong20 Oct 30, 2024

djeong20 Oct 30, 2024

djeong20 Oct 30, 2024

djeong20 Oct 30, 2024

djeong20 Oct 30, 2024

niket-agarwal Nov 11, 2024

niket-agarwal Nov 11, 2024

taos-ci left a comment

djeong20 left a comment

	void Transpose_i_cl(const std::string &direction, Tensor const &in,
	void transposeCl(const std::string &direction, Tensor const &in,

	bool supportBackwarding() const override { return true; };
	bool supportBackwarding() const override { return false; };

[GPU/OpenCL] Initial version of Transpose (all axes) with OpenCL ops. #2750

[GPU/OpenCL] Initial version of Transpose (all axes) with OpenCL ops. #2750

Conversation

niket-agarwal commented Oct 9, 2024

taos-ci commented Oct 9, 2024

taos-ci commented Oct 9, 2024

taos-ci commented Oct 9, 2024

taos-ci left a comment

Choose a reason for hiding this comment

EunjuYang Oct 10, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

baek2sm Oct 14, 2024 • edited Loading

Choose a reason for hiding this comment

niket-agarwal Oct 14, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

taos-ci left a comment

Choose a reason for hiding this comment

taos-ci left a comment

Choose a reason for hiding this comment

taos-ci left a comment

Choose a reason for hiding this comment

taos-ci left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

taos-ci left a comment

Choose a reason for hiding this comment

djeong20 left a comment

Choose a reason for hiding this comment

EunjuYang Oct 10, 2024 •

edited

Loading

baek2sm Oct 14, 2024 •

edited

Loading

niket-agarwal Oct 14, 2024 •

edited

Loading