RGBD_Odometry_FastICP.algorithmic fails on ARM64 #2817

asmorkalov · 2021-01-14T13:38:27Z

System information (version)

OpenCV => 4.5.1-pre ( contrib master: 768ad92 )
Operating System / Platform => Ubuntu 20.04, arm64-v8a
Compiler => GCC 9.3

Hardware:

--   CPU/HW features:
--     Baseline:                    NEON FP16

Detailed description

[ RUN      ] RGBD_Odometry_FastICP.algorithmic
/home/opencv-cn/slave/workspace/experimental/contrib-ubuntu-20.04-arm64@2/opencv/modules/ts/src/ts.cpp:616: Failure
Failed

	failure reason: Invalid function output
	test case #-1
	seed: ffffffffffffffff
-----------------------------------
	LOG:
Can not find Rt between the same frame
Incorrect count of accurate poses [1st case]: 0.000000 / 99.000000
Incorrect count of accurate poses [2nd case]: 0.000000 / 99.000000
-----------------------------------

[  FAILED  ] RGBD_Odometry_FastICP.algorithmic (8427 ms)

Steps to reproduce

Issue submission checklist

I report the issue, it's not a question
I checked the problem with documentation, FAQ, open issues,
answers.opencv.org, Stack Overflow, etc and have not found solution
I updated to latest OpenCV version and the issue is still there
There is reproducer code and related data files: videos, images, onnx, etc

The text was updated successfully, but these errors were encountered:

tomoaki0705 · 2021-02-01T16:10:39Z

Hi @savuor , @asmorkalov
Hi think I root caused this failure.
This FastICPOdometry is calling bilateralFilter

opencv_contrib/modules/rgbd/src/kinfu_frame.cpp

Line 627 in c10b07d

    
           bilateralFilter(depth, smooth, kernelSize, sigmaDepth*depthFactor, sigmaSpatial);

On x86_64 platforms, this arrives to either OpenCL version or IPP version function call

    CV_OCL_RUN(_src.dims() <= 2 && _dst.isUMat(),
               ocl_bilateralFilter_8u(_src, _dst, d, sigmaColor, sigmaSpace, borderType))

    Mat src = _src.getMat(), dst = _dst.getMat();

    CV_IPP_RUN_FAST(ipp_bilateralFilter(src, dst, d, sigmaColor, sigmaSpace, borderType));

but on Arm64, it triggers neither of them, and arrives to CPU version.
Inside this CPU version, it's calling minMaxLoc and inside there, it's doing comparison of floats

bilateralFilter_32f( const Mat& src, Mat& dst, int d,
 :
    minMaxLoc( src.reshape(1), &minValSrc, &maxValSrc );

What hurts the process is that depth image is full of NaN and comparing with NaN needs to be treated carefully.
Current implementation returns NaN in the minimum/maximum value, and that's spreads all over the points and normal array.

The normal implementation part is OK

        for( int i = 0; i < len; i++ )
        {
            T val = src[i];
            if( val < minVal )
            {
                minVal = val;
                minIdx = startIdx + i;
            }
            if( val > maxVal )
            {
                maxVal = val;
                maxIdx = startIdx + i;
            }
        }

but the intrinsic optimized part is mixing the NaN directly.

                        v_float32x4 data = v_load(src + k);
                        v_uint32x4 cmpMin = v_reinterpret_as_u32(data < valMin);
                        v_uint32x4 cmpMax = v_reinterpret_as_u32(data > valMax);
                        idxMin = v_select(cmpMin, idx, idxMin);
                        idxMax = v_select(cmpMax, idx, idxMax);
                        valMin = v_min(data, valMin);
                        valMax = v_max(data, valMax);

I confirmed that the test passes when I disable the intrinsic part and use only the general version.

  CPU/HW features:
    Baseline:                    NEON FP16

OpenCV version: 4.5.1-dev
OpenCV VCS version: 4.5.1-156-g2b787eb
 :
[       OK ] RGBD_Odometry_FastICP.algorithmic (339525 ms)
[----------] 1 test from RGBD_Odometry_FastICP (339525 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test case ran. (339525 ms total)
[  PASSED  ] 1 test.

So we should just treat the NaN correctly before passing the data to bilateral filter.
Unfortunately, bilateral filter doesn't seem to take mask as an input

CV_EXPORTS_W void bilateralFilter( InputArray src, OutputArray dst, int d,
                                   double sigmaColor, double sigmaSpace,
                                   int borderType = BORDER_DEFAULT );

which means, I need to fix from the core module.

Please let me know if there is any alternative way to fix this test failure.
Best

savuor · 2021-02-02T13:05:03Z

@tomoaki0705 You're right, looks like ARM64 implements min and max with NaNs so that the result is also NaN. This is the reason why #2818 happened.
The minMaxLoc should be fixed, of course, but we'll also prepare a workaround for bilateral filter call, it will be in #2853.

tomoaki0705 · 2021-02-02T13:28:21Z

Perhaps, we could work in parallel ?
I'll focus on making minMaxLoc behavior of intrinsic version, same as original implementation.

savuor · 2021-02-02T14:21:22Z

@tomoaki0705 Yes, sounds feasible

alalek · 2021-02-02T16:38:44Z

ARM64 implements min and max with NaNs so that the result is also NaN

It is useful to reinterpret in mind NaN as "any number, but undefined - without exact value". After that all these rules around NaN become more clear.

ARM64

For x86-specific just swap arguments to get "broken" result while handling NaNs:

-                        valMin = v_min(data, valMin);
+                        valMin = v_min(valMin, data);

details: https://stackoverflow.com/a/40199125

P.S. Passing NaN values to OpenCV functions is "undefined behavior" until it is clearly stated in the documentation.

tomoaki0705 · 2021-02-03T23:51:39Z

@alalek Thank you for the explanation.
Right. Now I understand that modifying the behavior of minMaxLoc was too much.
What I'd like to ask is from where to fix this.
In short, should it happen in makeFrameFromDepth, bilateralFilter or other.

The function call are cascaded,

makeFrameFromDepth
bilateralFilter from 1
minMaxLoc from 2

Fixing from 3 was my PR, but that was too much. It's out of choice.
Fixing in 2. could be fairly small. We just need to create a mask for NaN, and pass it to minMaxLoc.
For me, it sounds reasonable to mask out NaN value inside bilateralFilter function.
Again, this is not stated, so I'd like to hear your opinion.

Fixing from makeFrameFromDepth has huge amount of work to be done, since bilateralFilter doesn't take a mask, so either A. modify bilateralFilter API to take a mask as a parameter or B. rewrite the bilateral filter as noted by @savuor

The alternative choice I can imagine is to restrict NaN to be passed to ICP odometry, but I feel this is out of choice, too.

Where should this fix happen ?

asmorkalov added the category: rgbd label Jan 14, 2021

savuor mentioned this issue Jan 19, 2021

WIP: KinFu fix on ARM64 #2826

Closed

8 tasks

DumDereDum mentioned this issue Feb 2, 2021

Fix issue 2817 #2853

Merged

6 tasks

tomoaki0705 mentioned this issue Feb 2, 2021

core: treat NaN correctly in minMaxLoc opencv/opencv#19446

Closed

6 tasks

opencv-pushbot closed this as completed in #2853 Feb 8, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RGBD_Odometry_FastICP.algorithmic fails on ARM64 #2817

RGBD_Odometry_FastICP.algorithmic fails on ARM64 #2817

asmorkalov commented Jan 14, 2021

tomoaki0705 commented Feb 1, 2021

savuor commented Feb 2, 2021

tomoaki0705 commented Feb 2, 2021

savuor commented Feb 2, 2021

alalek commented Feb 2, 2021

tomoaki0705 commented Feb 3, 2021

RGBD_Odometry_FastICP.algorithmic fails on ARM64 #2817

RGBD_Odometry_FastICP.algorithmic fails on ARM64 #2817

Comments

asmorkalov commented Jan 14, 2021

System information (version)

Detailed description

Steps to reproduce

Issue submission checklist

tomoaki0705 commented Feb 1, 2021

savuor commented Feb 2, 2021

tomoaki0705 commented Feb 2, 2021

savuor commented Feb 2, 2021

alalek commented Feb 2, 2021

tomoaki0705 commented Feb 3, 2021