Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RGBD_Odometry_FastICP.algorithmic fails on ARM64 #2817

Closed
4 tasks
asmorkalov opened this issue Jan 14, 2021 · 6 comments · Fixed by #2853
Closed
4 tasks

RGBD_Odometry_FastICP.algorithmic fails on ARM64 #2817

asmorkalov opened this issue Jan 14, 2021 · 6 comments · Fixed by #2853

Comments

@asmorkalov
Copy link
Contributor

System information (version)
  • OpenCV => 4.5.1-pre ( contrib master: 768ad92 )
  • Operating System / Platform => Ubuntu 20.04, arm64-v8a
  • Compiler => GCC 9.3

Hardware:

--   CPU/HW features:
--     Baseline:                    NEON FP16
Detailed description
[ RUN      ] RGBD_Odometry_FastICP.algorithmic
/home/opencv-cn/slave/workspace/experimental/contrib-ubuntu-20.04-arm64@2/opencv/modules/ts/src/ts.cpp:616: Failure
Failed

	failure reason: Invalid function output
	test case #-1
	seed: ffffffffffffffff
-----------------------------------
	LOG:
Can not find Rt between the same frame
Incorrect count of accurate poses [1st case]: 0.000000 / 99.000000
Incorrect count of accurate poses [2nd case]: 0.000000 / 99.000000
-----------------------------------

[  FAILED  ] RGBD_Odometry_FastICP.algorithmic (8427 ms)
Steps to reproduce
Issue submission checklist
  • I report the issue, it's not a question
  • I checked the problem with documentation, FAQ, open issues,
    answers.opencv.org, Stack Overflow, etc and have not found solution
  • I updated to latest OpenCV version and the issue is still there
  • There is reproducer code and related data files: videos, images, onnx, etc
@tomoaki0705
Copy link
Contributor

Hi @savuor , @asmorkalov
Hi think I root caused this failure.
This FastICPOdometry is calling bilateralFilter

bilateralFilter(depth, smooth, kernelSize, sigmaDepth*depthFactor, sigmaSpatial);

On x86_64 platforms, this arrives to either OpenCL version or IPP version function call

    CV_OCL_RUN(_src.dims() <= 2 && _dst.isUMat(),
               ocl_bilateralFilter_8u(_src, _dst, d, sigmaColor, sigmaSpace, borderType))

    Mat src = _src.getMat(), dst = _dst.getMat();

    CV_IPP_RUN_FAST(ipp_bilateralFilter(src, dst, d, sigmaColor, sigmaSpace, borderType));

but on Arm64, it triggers neither of them, and arrives to CPU version.
Inside this CPU version, it's calling minMaxLoc and inside there, it's doing comparison of floats

bilateralFilter_32f( const Mat& src, Mat& dst, int d,
 :
    minMaxLoc( src.reshape(1), &minValSrc, &maxValSrc );

What hurts the process is that depth image is full of NaN and comparing with NaN needs to be treated carefully.
Current implementation returns NaN in the minimum/maximum value, and that's spreads all over the points and normal array.

The normal implementation part is OK

        for( int i = 0; i < len; i++ )
        {
            T val = src[i];
            if( val < minVal )
            {
                minVal = val;
                minIdx = startIdx + i;
            }
            if( val > maxVal )
            {
                maxVal = val;
                maxIdx = startIdx + i;
            }
        }

but the intrinsic optimized part is mixing the NaN directly.

                        v_float32x4 data = v_load(src + k);
                        v_uint32x4 cmpMin = v_reinterpret_as_u32(data < valMin);
                        v_uint32x4 cmpMax = v_reinterpret_as_u32(data > valMax);
                        idxMin = v_select(cmpMin, idx, idxMin);
                        idxMax = v_select(cmpMax, idx, idxMax);
                        valMin = v_min(data, valMin);
                        valMax = v_max(data, valMax);

I confirmed that the test passes when I disable the intrinsic part and use only the general version.

  CPU/HW features:
    Baseline:                    NEON FP16
OpenCV version: 4.5.1-dev
OpenCV VCS version: 4.5.1-156-g2b787eb
 :
[       OK ] RGBD_Odometry_FastICP.algorithmic (339525 ms)
[----------] 1 test from RGBD_Odometry_FastICP (339525 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test case ran. (339525 ms total)
[  PASSED  ] 1 test.

So we should just treat the NaN correctly before passing the data to bilateral filter.
Unfortunately, bilateral filter doesn't seem to take mask as an input

CV_EXPORTS_W void bilateralFilter( InputArray src, OutputArray dst, int d,
                                   double sigmaColor, double sigmaSpace,
                                   int borderType = BORDER_DEFAULT );

which means, I need to fix from the core module.

Please let me know if there is any alternative way to fix this test failure.
Best

@DumDereDum DumDereDum mentioned this issue Feb 2, 2021
6 tasks
@savuor
Copy link
Contributor

savuor commented Feb 2, 2021

@tomoaki0705 You're right, looks like ARM64 implements min and max with NaNs so that the result is also NaN. This is the reason why #2818 happened.
The minMaxLoc should be fixed, of course, but we'll also prepare a workaround for bilateral filter call, it will be in #2853.

@tomoaki0705
Copy link
Contributor

Perhaps, we could work in parallel ?
I'll focus on making minMaxLoc behavior of intrinsic version, same as original implementation.

@savuor
Copy link
Contributor

savuor commented Feb 2, 2021

@tomoaki0705 Yes, sounds feasible

@alalek
Copy link
Member

alalek commented Feb 2, 2021

ARM64 implements min and max with NaNs so that the result is also NaN

It is useful to reinterpret in mind NaN as "any number, but undefined - without exact value". After that all these rules around NaN become more clear.

ARM64

For x86-specific just swap arguments to get "broken" result while handling NaNs:

-                        valMin = v_min(data, valMin);
+                        valMin = v_min(valMin, data);

details: https://stackoverflow.com/a/40199125


P.S. Passing NaN values to OpenCV functions is "undefined behavior" until it is clearly stated in the documentation.

@tomoaki0705
Copy link
Contributor

@alalek Thank you for the explanation.
Right. Now I understand that modifying the behavior of minMaxLoc was too much.
What I'd like to ask is from where to fix this.
In short, should it happen in makeFrameFromDepth, bilateralFilter or other.

The function call are cascaded,

  1. makeFrameFromDepth
  2. bilateralFilter from 1
  3. minMaxLoc from 2

Fixing from 3 was my PR, but that was too much. It's out of choice.
Fixing in 2. could be fairly small. We just need to create a mask for NaN, and pass it to minMaxLoc.
For me, it sounds reasonable to mask out NaN value inside bilateralFilter function.
Again, this is not stated, so I'd like to hear your opinion.

Fixing from makeFrameFromDepth has huge amount of work to be done, since bilateralFilter doesn't take a mask, so either A. modify bilateralFilter API to take a mask as a parameter or B. rewrite the bilateral filter as noted by @savuor

The alternative choice I can imagine is to restrict NaN to be passed to ICP odometry, but I feel this is out of choice, too.

Where should this fix happen ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants