Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rgbd: get rid of kernel build failure #2831

Merged
merged 1 commit into from
Feb 4, 2021

Conversation

tomoaki0705
Copy link
Contributor

@tomoaki0705 tomoaki0705 commented Jan 22, 2021

Closes #2830

Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

  • I agree to contribute to the project under Apache 2 License.
  • To the best of my knowledge, the proposed patch is not based on a code under GPL or other license that is incompatible with OpenCV
  • The PR is proposed to proper branch
  • There is reference to original bug report and related work
  • There is accuracy test, performance test and test data in opencv_extra repository, if applicable
    Patch to opencv_extra has the same branch name.
  • The feature is well documented and sample code can be built with the project CMake
force_builders=Linux OpenCL

@tomoaki0705
Copy link
Contributor Author

Please hold to merge this PR.
I missed that even with this PR, kernel build still fails with following message

OpenCL program build log: rgbd/tsdf
Status -11: CL_BUILD_PROGRAM_FAILURE
-cl-mad-enable
<source>:1:9: error: unknown type name 'int8_t'; did you mean 'int8'?
typedef int8_t TsdfType;
        ^

<source>:10:1: error: unknown type name 'int8_t'; did you mean 'int8'?
int8_t res = (int8_t) ( (num * (-128)) );
^

<source>:10:23: error: called object type '<dependent type>' is not a function or function pointer
int8_t res = (int8_t) ( (num * (-128)) );
             ~~~~~~~~ ^

<source>:16:10: error: invalid conversion between vector type 'TsdfType' (aka 'int8') and scalar type 'float'
return ( (float) num ) / (-128);
         ^~~~~~~~~~~

error: Compiler frontend failed (error code 59)

unknown file: Failure
C++ exception with description "Failed to create kernel: <source>:1:9: error: unknown type name 'int8_t'; did you mean 'int8'?
typedef int8_t TsdfType;
        ^

<source>:10:1: error: unknown type name 'int8_t'; did you mean 'int8'?
int8_t res = (int8_t) ( (num * (-128)) );
^

<source>:10:23: error: called object type '<dependent type>' is not a function or function pointer
int8_t res = (int8_t) ( (num * (-128)) );
             ~~~~~~~~ ^

<source>:16:10: error: invalid conversion between vector type 'TsdfType' (aka 'int8') and scalar type 'float'
return ( (float) num ) / (-128);
         ^~~~~~~~~~~

error: Compiler frontend failed (error code 59)
" thrown in the test body.
[  FAILED  ] TSDF.valid_points (308 ms)

Even adding typedef char int8_t, then the OpenCL kernel call failed with mapping, so there's something heavy issue lies beneath this test failure.
Let me follow up.

@tomoaki0705 tomoaki0705 force-pushed the fixRgbdOpenCL branch 2 times, most recently from 7fda4cb to d76d8d6 Compare January 25, 2021 13:48
@tomoaki0705
Copy link
Contributor Author

Sorry for late update.
Now this PR is ready for merge
Before on ODROID-N2


[----------] 4 tests from TSDF
[ RUN      ] TSDF.raycast_normals
unknown file: Failure
C++ exception with description "Failed to create kernel: " thrown in the test body.
[  FAILED  ] TSDF.raycast_normals (159 ms)
[ RUN      ] TSDF.fetch_points_normals
unknown file: Failure
C++ exception with description "Failed to create kernel: " thrown in the test body.
[  FAILED  ] TSDF.fetch_points_normals (158 ms)
[ RUN      ] TSDF.fetch_normals
unknown file: Failure
C++ exception with description "Failed to create kernel: " thrown in the test body.
[  FAILED  ] TSDF.fetch_normals (157 ms)
[ RUN      ] TSDF.valid_points
unknown file: Failure
C++ exception with description "Failed to create kernel: " thrown in the test body.
[  FAILED  ] TSDF.valid_points (158 ms)
[----------] 4 tests from TSDF (632 ms total)

After

[----------] 4 tests from TSDF
[ RUN      ] TSDF.raycast_normals
[       OK ] TSDF.raycast_normals (247 ms)
[ RUN      ] TSDF.fetch_points_normals
[       OK ] TSDF.fetch_points_normals (1380 ms)
[ RUN      ] TSDF.fetch_normals
[       OK ] TSDF.fetch_normals (237 ms)
[ RUN      ] TSDF.valid_points
[       OK ] TSDF.valid_points (202 ms)
[----------] 4 tests from TSDF (2066 ms total)

Now, for fillPtsNrm, this kernel doesn't have a fallback path to CPU (It directly raises an exception).
So on some devices, the test fails but it's no longer a kernel build failure.

ex.) on Firefly-RK3399

[==========] Running 4 tests from 1 test case.
[----------] Global test environment set-up.
[----------] 4 tests from TSDF
[ RUN      ] TSDF.raycast_normals
[       OK ] TSDF.raycast_normals (937 ms)
[ RUN      ] TSDF.fetch_points_normals
OpenCL error CL_OUT_OF_RESOURCES (-5) during call: clEnqueueNDRangeKernel('fillPtsNrm', dims=3, globalsize=128x128x128, localsize=8x8x4) sync=true
unknown file: Failure
C++ exception with description "Failed to run kernel" thrown in the test body.
[  FAILED  ] TSDF.fetch_points_normals (1153 ms)
[ RUN      ] TSDF.fetch_normals
OpenCL error CL_OUT_OF_RESOURCES (-5) during call: clEnqueueNDRangeKernel('fillPtsNrm', dims=3, globalsize=128x128x128, localsize=8x8x4) sync=true
unknown file: Failure
C++ exception with description "Failed to run kernel" thrown in the test body.
[  FAILED  ] TSDF.fetch_normals (380 ms)
[ RUN      ] TSDF.valid_points
[       OK ] TSDF.valid_points (538 ms)
[----------] 4 tests from TSDF (3008 ms total)

[----------] Global test environment tear-down
[==========] 4 tests from 1 test case ran. (3012 ms total)
[  PASSED  ] 2 tests.
[  FAILED  ] 2 tests, listed below:
[  FAILED  ] TSDF.fetch_points_normals
[  FAILED  ] TSDF.fetch_normals

@tomoaki0705
Copy link
Contributor Author

I want to emphasize that test of rgbd module was failing before this PR.
4 tests are failing on Linux OpenCL builder

[==========] 22 tests from 11 test cases ran. (63875 ms total)
[  PASSED  ] 18 tests.
[  FAILED  ] 4 tests, listed below:
[  FAILED  ] KinectFusion.lowDense
[  FAILED  ] KinectFusion.highDense
[  FAILED  ] KinectFusion.inequal
[  FAILED  ] KinectFusion.OCL

They are from KinectFusion test, which is irrelevant from this PR.
The tests I'm focusing is passing

[----------] 4 tests from TSDF
[ RUN      ] TSDF.raycast_normals
[       OK ] TSDF.raycast_normals (1239 ms)
[ RUN      ] TSDF.fetch_points_normals
[       OK ] TSDF.fetch_points_normals (205 ms)
[ RUN      ] TSDF.fetch_normals
[       OK ] TSDF.fetch_normals (305 ms)
[ RUN      ] TSDF.valid_points
[       OK ] TSDF.valid_points (130 ms)
[----------] 4 tests from TSDF (1879 ms total)

So, the failure is not caused by this PR

@alalek
Copy link
Member

alalek commented Jan 28, 2021

@tomoaki0705
Copy link
Contributor Author

I confirmed.
My modification does causes a test failure.
The failed test goes through the part that I modified.

Forgive me, this KinectFusion tests never passed on my local build farm.
It failed on every single Arm boards, so I believed that this was same on windows, too.
I tried on my old laptop and confirmed that before my modification, these tests pass but doesn't after my PR.

I don't want to fix entire problem (it's triggering another test one after another), so I''ll look for some meaningful portion of fix in a single PR.

Please hold this PR until I dig out the exact cause of this test failure.

  * prevent out-of-bounds access
  * prevent deallocate before return
  * fix build warning
@@ -859,14 +859,13 @@ static cv::UMat preCalculationPixNormGPU(int depth_rows, int depth_cols, Vec2f f
throw std::runtime_error("Failed to create kernel: " + errorStr);

AccessFlag af = ACCESS_READ;
UMat pixNorm = _pixNorm.getUMat(af);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, getUMat() is local only wrapper - it should not be "returned" or stored somewhere.

@tomoaki0705
Copy link
Contributor Author

Appreciate.
It took long, but I dig to the cause, at last.
I see no test failure on my Windows, Arm boards and on the official build farm.

Mainly 3 points to fix.

  1. Kernel build failure: On Arm boards, __INT8_TYPE__ was not predefined
  2. Out of bounds access: Was happening on some arrays in the kernel
  3. De-allocate before return: removed getUMat()

@tomoaki0705
Copy link
Contributor Author

Anything else to be done before this PR gets merged?

Copy link
Member

@alalek alalek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@opencv-pushbot opencv-pushbot merged commit 0def473 into opencv:master Feb 4, 2021
@tomoaki0705 tomoaki0705 deleted the fixRgbdOpenCL branch February 4, 2021 10:45
@savuor savuor mentioned this pull request Feb 12, 2021
6 tasks
@alalek alalek mentioned this pull request Apr 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

rgbd: some kernels fail build
4 participants