Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GPU/OpenCL] Initial version of Transpose (all axes) with OpenCL ops. #2750

Merged
merged 1 commit into from
Nov 14, 2024

Conversation

niket-agarwal
Copy link
Contributor

(New PR. Closed last one due to merge conflicts.)

Added initial version of Transpose function for GPU. This is a basic implementation using naive kernel.

Changes added with this PR:

Incorporated kernels for GPU to transpose about different axes.
Added unittest_layers_transpose_cl.cpp to test Transpose function on GPU.
Updated with the recent GPU pipeline changes.

Signed-off-by: Niket Agarwal [email protected]

@taos-ci
Copy link
Collaborator

taos-ci commented Oct 9, 2024

📝 TAOS-CI Version: 1.5.20200925. Thank you for submitting PR #2750. Please a submit 1commit/1PR (one commit per one PR) policy to get comments quickly from reviewers. Your PR must pass all verificiation processes of cibot before starting a review process from reviewers. If you are new member to join this project, please read manuals in documentation folder and wiki page. In order to monitor a progress status of your PR in more detail, visit http://ci.nnstreamer.ai/.

@taos-ci
Copy link
Collaborator

taos-ci commented Oct 9, 2024

:octocat: cibot: @niket-agarwal, The last line of a text file must have a newline character. Please append a new line at the end of the line in test/unittest/layers/unittest_layers_transpose_cl.cpp.

@taos-ci
Copy link
Collaborator

taos-ci commented Oct 9, 2024

:octocat: cibot: @niket-agarwal, The last line of a text file must have a newline character. Please append a new line at the end of the line in test/unittest/layers/unittest_layers_transpose_cl.cpp.

@niket-agarwal niket-agarwal force-pushed the transpose_new branch 2 times, most recently from d5932ce to 4df7b28 Compare October 9, 2024 07:03
Copy link
Collaborator

@taos-ci taos-ci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@niket-agarwal, 💯 All CI checkers are successfully verified. Thanks.

* @param[in] input Tensor
* @param[in] result Tensor
*/
void Transpose_i_cl(const std::string &direction, Tensor const &in,
Copy link
Contributor

@EunjuYang EunjuYang Oct 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it might be better to follow the naming convention.
What about changing the first letter as lowercase?
In addition, why do you name this with _i ? It seems not the inplace operation though.

Suggested change
void Transpose_i_cl(const std::string &direction, Tensor const &in,
void transposeCl(const std::string &direction, Tensor const &in,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated it, thanks! Please review.

LayerGoldenTestParamOptions::USE_INC_FORWARD,
"nchw", "fp16", "fp16");

// auto transpose_basic_plain_w16a16_axis1 =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you leave these commented out?
Is there anything to do more to enable these unittests?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only a specific direction transpose call (1:0:2) is currently being made in reference to the CPU implementation. I have written a more generalized version, but other axes aren't needed in the current flow. Hence commented out the other testcases. Might be helpful later.

void TransposeLayerCl::forwarding(RunLayerContext &context, bool training) {
Tensor &in = context.getInput(SINGLE_INOUT_IDX);
Tensor &out = context.getOutput(SINGLE_INOUT_IDX);
Transpose_i_cl("1:0:2", in, out);
Copy link
Contributor

@baek2sm baek2sm Oct 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cpu version of the transpose layer implementation should refer to "nntrainer/layers/permute_layer.cpp", not "Applications/LLaMA/jni/transpose_layer.cpp (it's just an example implementation limited to the LLaMA application)". I have already left a comment about this when your PR #2690 was previously uploaded. Why is this implementation specified axis as "1:0:2"?

Copy link
Contributor Author

@niket-agarwal niket-agarwal Oct 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, the same was discussed with Mr. Moon earlier and was decided that it's fine to merge as of now as our application is MHA currently and this would work fine with it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for checking

Copy link
Collaborator

@taos-ci taos-ci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@niket-agarwal, 💯 All CI checkers are successfully verified. Thanks.

Copy link
Collaborator

@taos-ci taos-ci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@niket-agarwal, 💯 All CI checkers are successfully verified. Thanks.

@niket-agarwal niket-agarwal force-pushed the transpose_new branch 2 times, most recently from 5d2614e to 56be618 Compare October 22, 2024 10:48
Copy link
Collaborator

@taos-ci taos-ci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@niket-agarwal, 💯 All CI checkers are successfully verified. Thanks.

Copy link
Collaborator

@taos-ci taos-ci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@niket-agarwal, 💯 All CI checkers are successfully verified. Thanks.

} else {
dim[i].channel(dim[i].channel());
dim[i].height(dim[i].height());
dim[i].width(dim[i].width());
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this code has no effect. let's remove this line!

Suggested change
} else {
dim[i].channel(dim[i].channel());
dim[i].height(dim[i].height());
dim[i].width(dim[i].width());
}

void TransposeLayerCl::forwarding(RunLayerContext &context, bool training) {
Tensor &in = context.getInput(SINGLE_INOUT_IDX);
Tensor &out = context.getOutput(SINGLE_INOUT_IDX);
transposeCl("1:0:2", in, out);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's leave a comment that "1:0:2" is arbitrary.

Comment on lines 22 to 26
#define CREATE_IF_EMPTY_DIMS(tensor, ...) \
do { \
if (tensor.empty()) \
tensor = Tensor(__VA_ARGS__); \
} while (0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#define CREATE_IF_EMPTY_DIMS(tensor, ...) \
do { \
if (tensor.empty()) \
tensor = Tensor(__VA_ARGS__); \
} while (0);

do not redefine the same macro. please use it from the tensor.h

/**
* @copydoc bool supportBackwarding() const
*/
bool supportBackwarding() const override { return true; };
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should return false.

Suggested change
bool supportBackwarding() const override { return true; };
bool supportBackwarding() const override { return false; };

Comment on lines 391 to 401
void transpose_cl_axis0(const float *in, float *res,
unsigned int input_batch_size,
unsigned int input_channels, unsigned int input_height,
unsigned int input_width) {

bool result = false;

do {
ClContext::SharedPtrClKernel kernel_transpose_ptr =
cl_context_ref.registerClKernel(transpose_cl_kernel_axis0,
"transpose_cl_axis0");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no need to have three different functions by axis. it can be combined into one.

Suggested change
void transpose_cl_axis0(const float *in, float *res,
unsigned int input_batch_size,
unsigned int input_channels, unsigned int input_height,
unsigned int input_width) {
bool result = false;
do {
ClContext::SharedPtrClKernel kernel_transpose_ptr =
cl_context_ref.registerClKernel(transpose_cl_kernel_axis0,
"transpose_cl_axis0");
void transpose_cl_axis(const float *in, float *res,
unsigned int input_batch_size,
unsigned int input_channels, unsigned int input_height,
unsigned int input_width, unsigned int axis) {
bool result = false;
do {
switch (axis) {
case 0:
kernel_transpose_ptr = cl_context_ref.registerClKernel(
transpose_cl_kernel_axis0, "transpose_cl_axis0");
break;
case 1:
kernel_transpose_ptr = cl_context_ref.registerClKernel(
transpose_cl_kernel_axis0, "transpose_cl_axis1");
break;
case 2:
kernel_transpose_ptr = cl_context_ref.registerClKernel(
transpose_cl_kernel_axis0, "transpose_cl_axis2");
break;
default:
throw std::invalid_argument("failed to register CL kernel");
break;
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There would be difference in the work_group_count that we pass for the three axes. Hence I didn't merge them into one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you suggest adding another switch condition for them as well, or should be keep them separate only?

@niket-agarwal niket-agarwal force-pushed the transpose_new branch 2 times, most recently from cb4485d to 009c3cc Compare November 13, 2024 09:54
Added naive version of OpenCL implementation for Transpose.
Incorporated kernel for ops using blas_kernels.
Added unit test for Transpose_cl.

Signed-off-by: Niket Agarwal <[email protected]>
Copy link
Collaborator

@taos-ci taos-ci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@niket-agarwal, 💯 All CI checkers are successfully verified. Thanks.

Copy link
Contributor

@djeong20 djeong20 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job! I think it is good to go 👍

@jijoongmoon jijoongmoon merged commit 07250f1 into nnstreamer:main Nov 14, 2024
38 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants