Loop detection freezing with SuperGlue, please fix it. Test on 2022-IlluminationInvariant. #896

cdb0y511 · 2022-09-09T09:25:34Z

Hi, @matlabbe.
It happens with the latest git,
How to reproduce:
with the latest git, and I test with the
https://github.com/introlab/rtabmap/blob/71a28bb570e26f6bbcb78bd7a95ea75c24a4d4d8/archive/2022-IlluminationInvariant/README.md
first DB loc_190321-165128.db, source load from DB, use odometer in the DB, and Kp/DetectorStrategy, Vis/FeatureType should be set to 11 (SuperPoint), and Vis/CorNNType is set to 6 (SuperGlue). It freezes in a few detections like below

The odometer seems to continue, but loop detection stops. I can press stop, but can not close the DB, need to force kill it.
I return to the latest release(https://github.com/introlab/rtabmap/releases/tag/0.20.16)

Everything is OK, superGlue works.
Btw, the superGlue works on the odometer, but freezes with the loop detection.
Ubuntu:20.04, test on both Cuda 11.6 and 11.7. libtorch 1.8.2.
I think there are some issues with loop detection with the recent commit.
I hope you can fix it soon.
Thanks,@matlabbe

cdb0y511 · 2022-09-09T09:33:34Z

Btw the superpoint with kdtree works with loop detection but freezes with superGlue.
I think you may check recent commits about the loop detection, otherwise can not reproduce the results with the latest git for the paper https://doi.org/10.3389/frobt.2022.801886.

matlabbe · 2022-09-12T18:45:25Z

Currently working on a docker image to reproduce the results (working so far but minor issues to fix this week, maybe today if I have time). There is however a known issue with loading python scripts in rtabmap when there is more than one thread using Python at the same time. The results presented in the paper were generated using rtabmap-reprocess tool, which works with single thread, so no Python mutli-threading issue like with standalone ui app.

Related to introlab/rtabmap_ros#534

matlabbe · 2022-09-13T07:40:54Z

Updated README with a docker example: https://github.com/introlab/rtabmap/tree/master/archive/2022-IlluminationInvariant#docker

cdb0y511 · 2022-09-14T02:59:13Z

@matlabbe, thanks
I will look into it.
But I hope this will be fixed for standalone soon.
I wonder if it is related to the python version (currently python 3.8, can 3.9 avoid this issue?).

matlabbe · 2022-09-14T16:14:22Z

Reproduced the problem on standalone, stuck on:

SuperGlue python init()

It seems freezing when initializing superglue:

rtabmap/corelib/src/python/rtabmap_superglue.py

Line 25 in adfb250

print("SuperGlue python init()")

To reproduce:

export XAUTH=/tmp/.docker.xauth
touch $XAUTH
xauth nlist $DISPLAY | sed -e 's/^..../ffff/' | xauth -f $XAUTH nmerge -

docker run --gpus all -it --rm --ipc=host --runtime=nvidia \
    --env="DISPLAY=$DISPLAY" \
    --env="QT_X11_NO_MITSHM=1" \
    --volume="/tmp/.X11-unix:/tmp/.X11-unix:rw" \
    --env="XAUTHORITY=$XAUTH" \
    --volume="$XAUTH:$XAUTH" \
    -v ~/Downloads/Illumination_invariant_databases:/workspace/databases \
    rtabmap_frontiers \
        rtabmap --SuperPoint/ModelPath /workspace/scripts/superpoint_v1.pt \
        --SuperGlue/Path /workspace/scripts/SuperGluePretrainedNetwork/rtabmap_superglue.py \
        --Kp/DetectorStrategy 11 \
        --Mem/UseOdomFeatures false \
        --Vis/CorNNType 6

Say "Yes" to all startup dialogs, then Open Preferences->Source, set Source->Database, scroll down and set database path to any databases in "/workspace/databases". Say "Yes" to use odometry data and "Yes" to process all data. Click ok, new database, then start.

The difference between rtabmap-reprocess and rtabmap, is that the map update thread is not running on the main thread in the standalone. This may cause issue when python interpreter is not initialized in same thread context. For reference, those are the two python classes involved:
https://github.com/introlab/rtabmap/blob/master/corelib/src/python/PythonInterface.cpp
https://github.com/introlab/rtabmap/blob/master/corelib/src/python/PyMatcher.cpp

PythonInterface is initialized on the main thread (constructor here, created here), while rtabmap class is running on a second thread called RtabmapThread. Normally PythonInterface should switch python context between threads, thus it must be a problem when switching contexts.

I see two solutions:

Fix python thread switching context (preferred to handle all new coming python-based approaches), or
Implement superglue in C++ (similarly to SuperPoint to avoid calling python from c++)

cdb0y511 · 2022-09-17T02:39:22Z

I wonder why the latest release (https://github.com/introlab/rtabmap/releases/tag/0.20.16) can not reproduce this issue.

matlabbe · 2022-09-24T22:58:09Z

Tested with 0.20.16 using the docker image (checking out 0.20.16 inside and rebuild it) and the same problem happens. Digging more into the issue, I tried to replicate a minimal example on how python is used inside rtabmap across threads, based on this example:

// runs in a new thread
void f(PyInterpreterState* interp, const char* tname)
{
    std::string code = R"PY(

from __future__ import print_function
import sys

print("TNAME: sys.xxx={}".format(getattr(sys, 'xxx', 'attribute not set')))

    )PY";

    code.replace(code.find("TNAME"), 5, tname);
    
    
    PyThreadState* threadState = PyThreadState_New(interp);
    PyEval_RestoreThread(threadState);
    

    //sub_interpreter::thread_scope scope(interp);
    PyRun_SimpleString(code.c_str());
    
    PyThreadState_Clear(threadState);
    PyThreadState_DeleteCurrent();
}

int main()
{
    initialize init;
    
    PyThreadState* mainState;
    mainState = PyEval_SaveThread();

    PyEval_RestoreThread(mainState);

    PyRun_SimpleString(R"PY(

# set sys.xxx, it will only be reflected in t4, which runs in the context of the main interpreter

from __future__ import print_function
import sys

sys.xxx = ['abc']
print('main: setting sys.xxx={}'.format(sys.xxx))

    )PY");
    
    mainState = PyEval_SaveThread();

    // Simulating here a thread using the main python interpreter
    std::thread t4{f, mainState->interp, "t4(main)"};
    t4.join();
    
    PyEval_RestoreThread(mainState);

    return 0;
}

This works as expected. I then checked where exactly the code is freezing on superglue side, and it seems it happens when it calls load_state_dict here:

self.load_state_dict(torch.load(str(path)))

Maybe related issue: huggingface/transformers#8649

~~Note that when you say:~~

~~I wonder why the latest release (https://github.com/introlab/rtabmap/releases/tag/0.20.16) can not reproduce this issue.~~

~~do you mean the windows cuda binaries? If so, there could be an issue with the pytorch version used.~~ EDIT: The windows binaries don't have python support.

matlabbe · 2022-09-24T23:53:54Z

At least on ROS it works. I tested by adding ros noetic in the rtabmap_frontiers docker image.

Launch the docker image:

export XAUTH=/tmp/.docker.xauth
touch $XAUTH
xauth nlist $DISPLAY | sed -e 's/^..../ffff/' | xauth -f $XAUTH nmerge -

docker run --gpus all -it --rm --ipc=host --runtime=nvidia     \
    --env="DISPLAY=$DISPLAY"     \
    --env="QT_X11_NO_MITSHM=1"     \
    --volume="/tmp/.X11-unix:/tmp/.X11-unix:rw"     \
    --env="XAUTHORITY=$XAUTH"     \
    --volume="$XAUTH:$XAUTH"     \
    --network host \
    --privileged  \
    rtabmap_frontiers

Install ros noetic and build rtabmap_ros in the container, then after launching realsense D435i like in this tutorial, from inside the container:

roslaunch rtabmap_ros rtabmap.launch args:="-d  \
        --SuperPoint/ModelPath /workspace/scripts/superpoint_v1.pt \
        --SuperGlue/Path /workspace/scripts/SuperGluePretrainedNetwork/rtabmap_superglue.py \
        --Reg/RepeatOnce false \
        --Vis/CorGuessWinSize 0 \
        --Kp/DetectorStrategy 11 \
        --Vis/FeatureType 11 \
        --Mem/UseOdomFeatures false \
        --Vis/CorNNType 6" \
      depth_topic:=/camera/aligned_depth_to_color/image_raw \
      rgb_topic:=/camera/color/image_raw \
      camera_info_topic:=/camera/color/camera_info \
      approx_sync:=false \
      wait_imu_to_init:=true \
      imu_topic:=/rtabmap/imu

To make sure there is no second matching done after superglue, set --Reg/RepeatOnce false --Vis/CorGuessWinSize 0. In this example, both rtabmap and rgbd_odometry nodes are using SuperPoint/SuperGlue. To use only SuperPoint/SuperGlue on rtabmap node, change args by rtabmap_args.

mattiasmar · 2023-05-31T14:13:47Z

Hello,
What's the status of this issue?

matlabbe · 2023-06-04T21:34:39Z

On ROS (rtabmap and odometry nodes): working
Reprocess tool: working
Matching tool: working
standalone: freezing

GVMCOTESA · 2023-06-19T12:56:39Z

I am trying to use Superglue via that dockerfile. I installed ROS Noetic using the standard method and everything works okay until I use catkin_make, which brings an error where it cannot find empy. I have found there is an issue when multiple interpreters are installed, but everything seems to be pointing to the missing dependency. The problem is that the environment uses conda, and it cannot install empy due to a conflict with other packages installed in the image. How did you manage to install ROS and rtabmap_ros?

matlabbe · 2023-06-20T00:18:23Z

Step 3 from https://github.com/introlab/rtabmap_ros#build-from-source

GVMCOTESA · 2023-06-22T10:02:22Z

I am doing that inside docker, but there are two versions of python now. In the last-mentioned issue, you do not seem to have an issue with that, ¿ Are you using catkin_make or catkin build to build in noetic? If I try to use pip instead of conda or the pytorch image, the problem is that installing pytorch with pip does not include c++11 abi, so rtabmap is unable to build with the "undefined reference to" error. If you install libtorch with c++11 abi, then pytorch is not installed, and if you install both, they clash and python segfaults. Building using conda results in the error I mentioned in my previous comment.

To clarify, I am using the commands in Step 1 to 3 inside docker.

mattiasmar · 2023-06-23T07:05:30Z

I'm testing SuperPoint/SuperGlue on freiburg2_pioneer_slam3 dataset. I can see detections, but no matches.
@cdb0y511 @matlabbe can you confirm that superglue is expected to work on this dataset?

This is how I test it:
./install/rtabmap/bin/rtabmap-rgbd_dataset --cameras 1 --Rtabmap/PublishRAMUsage true --Rtabmap/DetectionRate 2 --RGBD/LinearUpdate 0 --Mem/STMSize 30 --Mem/UseOdomFeatures false --Vis/CorNNType 6 --Kp/DetectorStrategy 11 --Vis/FeatureType 11 --Reg/RepeatOnce false --SuperGlue/Path ~/ws/src/rtabmap/archive/2022-IlluminationInvariant/scripts/SuperGluePretrainedNetwork/rtabmap_superglue.py --SuperPoint/ModelPath ~/ws/src/rtabmap/archive/2022-IlluminationInvariant/scripts/superpoint_v1.pt --PyMatcher/Cuda false --SuperPoint/Cuda false /data/TUM/rgbd_dataset_freiburg2_pioneer_slam3
Thanks!

matlabbe · 2023-06-26T01:14:02Z

Superglue should work for loop closure detection, not odometry (unless you choose F2F odometry). Here a small comparison between different approaches.

Default parameters (GFTT features for odom and loop closure, with standard nearest neighbor):
rtabmap-rgbd_dataset --cameras 1 --Rtabmap/PublishRAMUsage true --Rtabmap/DetectionRate 2 --RGBD/LinearUpdate 0 --Mem/STMSize 30 --Mem/UseOdomFeatures true --Vis/CorNNType 1 --Kp/DetectorStrategy 8 --Vis/FeatureType 8 --Reg/RepeatOnce true --Odom/ResetCountdown 10 --Vis/CorNNDR 0.8 rgbd_dataset_freiburg2_pioneer_slam3
Default parameters for odom (GFTT), but using SuperPoint + default NN matching for loop closure:
rtabmap-rgbd_dataset --cameras 1 --Rtabmap/PublishRAMUsage true --Rtabmap/DetectionRate 2 --RGBD/LinearUpdate 0 --Mem/STMSize 30 --Mem/UseOdomFeatures true --Vis/CorNNType 6 --Kp/DetectorStrategy 11 --Vis/FeatureType 11 --Reg/RepeatOnce false --SuperGlue/Path ~/workspace/rtabmap/archive/2022-IlluminationInvariant/scripts/SuperGluePretrainedNetwork/rtabmap_superglue.py --SuperPoint/ModelPath ~/superpoint_v1.pt --PyMatcher/Cuda true --SuperPoint/Cuda true --Odom/ResetCountdown 10 --Vis/CorNNDR 0.6 rgbd_dataset_freiburg2_pioneer_slam3
Default parameters for odom (GFTT), but using SuperPoint + SuperGlue for loop closure:
rtabmap-rgbd_dataset --cameras 1 --Rtabmap/PublishRAMUsage true --Rtabmap/DetectionRate 2 --RGBD/LinearUpdate 0 --Mem/STMSize 30 --Mem/UseOdomFeatures false --Vis/CorNNType 6 --Kp/DetectorStrategy 11 --Vis/FeatureType 8 --Reg/RepeatOnce false --SuperGlue/Path ~/workspace/rtabmap/archive/2022-IlluminationInvariant/scripts/SuperGluePretrainedNetwork/rtabmap_superglue.py --SuperPoint/ModelPath ~/superpoint_v1.pt --PyMatcher/Cuda true --SuperPoint/Cuda true --Odom/ResetCountdown 10 --Vis/CorNNDR 0.8 rgbd_dataset_freiburg2_pioneer_slam3
SuperPoint for odom, Superpoint+Superglue for loop closure detection:
rtabmap-rgbd_dataset --cameras 1 --Rtabmap/PublishRAMUsage true --Rtabmap/DetectionRate 2 --RGBD/LinearUpdate 0 --Mem/STMSize 30 --Mem/UseOdomFeatures true --Vis/CorNNType 6 --Kp/DetectorStrategy 11 --Vis/FeatureType 11 --Reg/RepeatOnce false --SuperGlue/Path ~/workspace/rtabmap/archive/2022-IlluminationInvariant/scripts/SuperGluePretrainedNetwork/rtabmap_superglue.py --SuperPoint/ModelPath ~/superpoint_v1.pt --PyMatcher/Cuda true --SuperPoint/Cuda true --Odom/ResetCountdown 10 --Vis/CorNNDR 0.6 rgbd_dataset_freiburg2_pioneer_slam3

For that dataset, it seems there is 3 sec missing while the robot was rotating around 1671st frame. I fixed the code to make Odom/ResetCountdown works with that tool.

Looking at the results, the lack of loop closures for GFTT is more related to binary descriptors, not that it is not Superglue. Here is a difference of matching superpoint features with and without superglue respectively:

matlabbe · 2023-06-26T01:17:15Z

@GVMCOTESA what is your base image? Is it the one from nvidia like in the frontiers dockerfile?

matlabbe · 2023-06-29T06:31:35Z

I did it with native installed libraries. For docker, you may use frontiers dockerfile. If you want to go ROS, I also recently created an image for rtabmap_ros.

GVMCOTESA · 2023-06-29T06:55:07Z

Yes, It is the frontiers one. I will try the new image, thank you.

mattiasmar · 2023-07-06T21:05:29Z

On ROS (rtabmap and odometry nodes): working

Reprocess tool: working

Matching tool: working

standalone: freezing

With the matching tool, is the rtabmap-databaseviewer intended?
When I try to induce a loop closure in the DB viewer I get this error whenever the SP/SG is called more than once:

Superglue execution times:  0.8277442455291748 [-0.827739953994751]
[ INFO] (2023-07-06 21:01:10.415) PythonInterface.cpp:48::~PythonInterface() Py_Finalize() with thread = 673533952
[ INFO] (2023-07-06 21:01:10.654) DatabaseViewer.cpp:8262::refineConstraint() (1 ->2) Registration time: 1.713461 s
[ INFO] (2023-07-06 21:01:10.680) PythonInterface.cpp:25::PythonInterface() Py_Initialize() with thread = 673533952
[ INFO] (2023-07-06 21:01:10.706) PyMatcher.cpp:33::PyMatcher() path = /root/ws/src/rtabmap/archive/2022-IlluminationInvariant/scripts/SuperGluePretrainedNetwork/rtabmap_superglue.py
[ INFO] (2023-07-06 21:01:10.706) PyMatcher.cpp:34::PyMatcher() model = indoor
Segmentation fault (core dumped)

@matlabbe Are you seeing this too?

matlabbe · 2023-07-10T21:16:22Z

The matching tool is not rtabmap-databaseViewer. For database viewer, it may work just one time, then seg fault the second time (when trying to re-initialize the python classes).

GVMCOTESA · 2023-07-14T10:36:39Z

It worked, I was not aware that rtabmap_ros must be in its own container separated from the rest, now I have a separate container for simulating the robot. I have to say, there must be something misconfigured on my part, the map rotates wildly each time a loop closure is detected, and it doesn't seem to be converging. Is there a set of calibration parameters that can help reduce this?

matlabbe · 2023-07-14T16:44:04Z

Can you share the database?

matlabbe · 2023-09-10T21:37:28Z

Regarding the app freezing on superglue initialization (#896 (comment)). Here is a gdb log when it happens:

#0  0x00007ffff078f1f1 in PyThreadState_Clear (tstate=0x7fff0c6dcb00) at ../Python/pystate.c:764
#1  0x00007ffefd33794d in pybind11::gil_scoped_acquire::dec_ref() () at /home/mathieu/.local/lib/python3.8/site-packages/torch/lib/libtorch_python.so
#2  0x00007ffefd33798d in pybind11::gil_scoped_acquire::~gil_scoped_acquire() () at /home/mathieu/.local/lib/python3.8/site-packages/torch/lib/libtorch_python.so
#3  0x00007ffefd6efbcd in torch::autograd::PyFunctionTensorPreHook::~PyFunctionTensorPreHook() () at /home/mathieu/.local/lib/python3.8/site-packages/torch/lib/libtorch_python.so
#4  0x00007ffefd6efbed in torch::autograd::PyFunctionTensorPreHook::~PyFunctionTensorPreHook() () at /home/mathieu/.local/lib/python3.8/site-packages/torch/lib/libtorch_python.so
#5  0x00007fffd80125cf in torch::autograd::AutogradMeta::~AutogradMeta() () at /home/mathieu/.local/lib/python3.8/site-packages/torch/lib/libtorch_cpu.so
#6  0x00007fffeea9da42 in c10::TensorImpl::~TensorImpl() () at /home/mathieu/.local/lib/python3.8/site-packages/torch/lib/libc10.so
#7  0x00007fffeea9dbed in c10::TensorImpl::~TensorImpl() () at /home/mathieu/.local/lib/python3.8/site-packages/torch/lib/libc10.so
#8  0x00007ffefd704d78 in THPVariable_clear(THPVariable*) () at /home/mathieu/.local/lib/python3.8/site-packages/torch/lib/libtorch_python.so
#9  0x00007ffefd705125 in THPVariable_subclass_dealloc(_object*) () at /home/mathieu/.local/lib/python3.8/site-packages/torch/lib/libtorch_python.so
#10 0x00007ffff0878165 in _Py_DECREF (filename=<synthetic pointer>, lineno=541, op=<optimized out>) at ../Include/object.h:478
#11 _Py_XDECREF (op=<optimized out>) at ../Include/object.h:541
#12 free_keys_object (keys=0x7ffd3f192020) at ../Objects/dictobject.c:584
#13 0x00007ffff0878818 in dictkeys_decref (dk=0x7ffd3f192020) at ../Objects/dictobject.c:324
#14 dict_dealloc (mp=0x7fff459e0340) at ../Objects/dictobject.c:1998
#15 0x00007ffff08743a6 in odict_dealloc (self=0x7fff459e0340) at ../Objects/odictobject.c:1367
#16 0x00007ffff067cd9e in _Py_DECREF (filename=<synthetic pointer>, lineno=4971, op=<optimized out>) at ../Include/object.h:478
#17 call_function (tstate=0x7ffda6635b30, pp_stack=0x7fff14ae3930, oparg=<optimized out>, kwnames=0x0) at ../Python/ceval.c:4971
#18 0x00007ffff0684ef6 in _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>) at ../Python/ceval.c:3469
#19 0x00007ffff07d2e4b in _PyEval_EvalCodeWithName
    (_co=<optimized out>, globals=<optimized out>, locals=<optimized out>, args=<optimized out>, argcount=2, kwnames=0x0, kwargs=0x7fff14ae3b60, kwcount=0, kwstep=1, defs=0x0, defcount=0, kwdefs=0x0, closure=0x7fff4d8b3730, name=0x7fff14d43530, qualname=0x7fff4d8b2a30) at ../Python/ceval.c:4298
#20 0x00007ffff08b0124 in _PyFunction_Vectorcall (func=<optimized out>, stack=<optimized out>, nargsf=<optimized out>, kwnames=<optimized out>) at ../Objects/call.c:436
#21 0x00007ffff08b2417 in _PyObject_FastCallDict (callable=callable@entry=0x7fff4d8b78b0, args=args@entry=0x7fff14ae3b50, nargsf=nargsf@entry=2, kwargs=kwargs@entry=0x0)
    at ../Objects/call.c:96
#22 0x00007ffff08b252d in _PyObject_Call_Prepend (callable=0x7fff4d8b78b0, obj=<optimized out>, args=0x7fff14d164c0, kwargs=0x0) at ../Objects/call.c:888
#23 0x00007ffff084bd47 in slot_tp_init (self=0x7fff14d384f0, args=0x7fff14d164c0, kwds=0x0) at ../Objects/typeobject.c:6790
#24 0x00007ffff08511b9 in type_call (type=<optimized out>, args=0x7fff14d164c0, kwds=0x0) at ../Objects/typeobject.c:994
#25 0x00007ffff08b0b2b in _PyObject_MakeTpCall (callable=0x7ffe5da1e7b0, args=<optimized out>, nargs=<optimized out>, keywords=0x0) at ../Objects/call.c:159
#26 0x00007ffff067cdf3 in _PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, args=<optimized out>, callable=0x7ffe5da1e7b0) at ../Include/cpython/abstract.h:125
#27 _PyObject_Vectorcall (kwnames=<optimized out>, nargsf=<optimized out>, args=<optimized out>, callable=<optimized out>) at ../Include/cpython/abstract.h:115
#28 call_function (tstate=0x7ffda6635b30, pp_stack=0x7fff14ae3d58, oparg=<optimized out>, kwnames=0x0) at ../Python/ceval.c:4963
#29 0x00007ffff067e46d in _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>) at ../Python/ceval.c:3500
#30 0x00007ffff068806b in function_code_fastcall (co=<optimized out>, args=<optimized out>, nargs=5, globals=<optimized out>) at ../Objects/call.c:284
#31 0x00007ffff08b0f23 in _PyObject_Vectorcall (kwnames=<optimized out>, nargsf=<optimized out>, args=<optimized out>, callable=<optimized out>) at ../Include/cpython/abstract.h:147
#32 _PyObject_FastCall (nargs=<optimized out>, args=<optimized out>, func=<optimized out>) at ../Include/cpython/abstract.h:147
#33 _PyObject_CallFunctionVa (callable=0x7fff14b4e8b0, format=<optimized out>, va=va@entry=0x7fff14ae3ec0, is_size_t=is_size_t@entry=0) at ../Objects/call.c:941
#34 0x00007ffff08b218f in _PyObject_CallFunctionVa (is_size_t=0, va=0x7fff14ae3ec0, format=<optimized out>, callable=<optimized out>) at ../Objects/call.c:914
#35 PyObject_CallFunction (callable=<optimized out>, format=<optimized out>) at ../Objects/call.c:961
#36 0x00007ffff73eaf33 in rtabmap::PyMatcher::match(cv::Mat const&, cv::Mat const&, std::vector<cv::KeyPoint, std::allocator<cv::KeyPoint> > const&, std::vector<cv::KeyPoint, std::allocator<cv::KeyPoint> > const&, cv::Size_<int> const&) () at /home/mathieu/workspace/rtabmap/build/bin/librtabmap_core.so.0.21
#37 0x00007ffff724cc2c in rtabmap::RegistrationVis::computeTransformationImpl(rtabmap::Signature&, rtabmap::Signature&, rtabmap::Transform, rtabmap::RegistrationInfo&) const ()
    at /home/mathieu/workspace/rtabmap/build/bin/librtabmap_core.so.0.21

I think the problem is that:

we create the python interpreter in main thread of rtabmap,
the matching call is done inside a sub thread, calling python from c++, thus GIL should be acquire,
then in superglue python code, pytorch's c-functions are called, then pybind11 will release the GIL for c-code and re-acquire it again, triggering some memory clearing that makes the app freezes.

The difference between standalone and ros is that for the later the python interpreter is running in same thread than the one pytorch is running onto. Questions:

Should we create one python interpreter per thread to avoid this situation? It looks overkill if rtabmap and odometry are both loading same python modules.
Is there a way that the GIL can be acquired/released not from same thread than python interpreter? Based on that example above, it seems so, but we may improve that example with a python code calling another c-function, that would release/acquire the GIL.

…encv asserts in debug build (876)

matlabbe · 2023-09-17T08:20:50Z

Fixed in f1cd819

cdb0y511 · 2023-09-22T06:52:26Z

I am doing that inside docker, but there are two versions of python now. In the last-mentioned issue, you do not seem to have an issue with that, ¿ Are you using catkin_make or catkin build to build in noetic? If I try to use pip instead of conda or the pytorch image, the problem is that installing pytorch with pip does not include c++11 abi, so rtabmap is unable to build with the "undefined reference to" error. If you install libtorch with c++11 abi, then pytorch is not installed, and if you install both, they clash and python segfaults. Building using conda results in the error I mentioned in my previous comment.

To clarify, I am using the commands in Step 1 to 3 inside docker.

I am glad we could skip the docker for now cause I found it has some performance issues related to the docker itself.
But unfortunately, you should build libtorch from the source to avoid undefined reference error, related to c++11 abi
related to #1063

mattiasmar · 2023-12-17T18:35:31Z

@cdb0y511 I too note that building pytorch from source avoids the ""undefined reference" errors. However, I also note a severe (>>10x) peformance loss with this pytorch compiled from sources.
I'm testing on CPU only and I compile with the flag ENV USE_MKLDNN=1. Prior to that I install Intel's oneDNN like this:

git clone --branch v3.4-pc --recursive https://github.com/oneapi-src/oneDNN.git /one-dnn
mkdir -p build && cd build && cmake .. && make -j  && make install

Question: Did you also recording a loss in inference speed when building pytorch from source? Did you overcome it in some way?

…mSize opencv asserts in debug build (876)

qetuo105487900 · 2024-02-19T09:02:31Z

sorry to bother everyone, Superglue can run rtabslam without docker ? as follow step is correct?

roslaunch realsense2_camera rs_camera.launch align_depth:=true
roslaunch rtabmap_launch rtabmap.launch args:="-d
--SuperPoint/ModelPath /home/lun/rtabmap/archive/2022-IlluminationInvariant/scripts/superpoint_v1.pt
--SuperGlue/Path /home/lun/rtabmap/archive/2022-IlluminationInvariant/scripts/SuperGluePretrainedNetwork/demo_superglue.py
--Reg/RepeatOnce false
--Vis/CorGuessWinSize 0
--Kp/DetectorStrategy 11
--Vis/FeatureType 11
--Mem/UseOdomFeatures false
--Vis/CorNNType 6"
rtabmap_args:="--delete_db_on_start"
depth_topic:=/camera/aligned_depth_to_color/image_raw
rgb_topic:=/camera/color/image_raw
camera_info_topic:=/camera/color/camera_info
approx_sync:=false

i get this : QAQ
Features2d.cpp:594::create() SupertPoint Torch feature cannot be used as RTAB-Map is not built with the option enabled. GFTT/ORB is used instead.

matlabbe · 2024-02-19T21:45:27Z

See #1221 (comment)

qetuo105487900 · 2024-02-23T09:47:37Z

i ran this as follow : up and down just different with add/ not add --Vis/CorNNType 6

roslaunch rtabmap_launch rtabmap.launch args:="-d
--delete_db_on_start
--SuperPoint/ModelPath /home/lun/rtabmap/archive/2022-IlluminationInvariant/scripts/superpoint_v1.pt
--SuperGlue/Path /home/lun/rtabmap/archive/2022-IlluminationInvariant/scripts/SuperGluePretrainedNetwork/demo_superglue.py
--Reg/RepeatOnce false
--Vis/CorGuessWinSize 0
--Kp/DetectorStrategy 11
--Vis/FeatureType 11
--Mem/UseOdomFeatures false
--Vis/CorNNType 6"
depth_topic:=/rs_d435i/aligned_depth_to_color/image_raw
rgb_topic:=/rs_d435i/color/image_raw
camera_info_topic:=/rs_d435i/color/camera_info
approx_sync:=false

and i get

[ERROR] (2024-02-06 00:05:32.077) PyMatcher.cpp:63::PyMatcher() Module "demo_superglue" could not be imported! (File="/home/lun/rtabmap/archive/2022-IlluminationInvariant/scripts/SuperGluePretrainedNetwork/demo_superglue.py")
[ERROR] (2024-02-06 00:05:32.077) PyMatcher.cpp:64::PyMatcher() Traceback (most recent call last):

File "/home/lun/rtabmap/archive/2022-IlluminationInvariant/scripts/SuperGluePretrainedNetwork/demo_superglue.py", line 51, in
import torch

File "/home/lun/.local/lib/python3.8/site-packages/torch/init.py", line 237, in
from torch._C import * # noqa: F403

ImportError: /home/lun/.local/lib/python3.8/site-packages/torch/lib/libtorch_python.so: undefined symbol: _ZNK5torch3jit5Graph8toStringEb

error

but i ran

roslaunch rtabmap_launch rtabmap.launch args:="-d
--delete_db_on_start
--SuperPoint/ModelPath /home/lun/rtabmap/archive/2022-IlluminationInvariant/scripts/superpoint_v1.pt
--SuperGlue/Path /home/lun/rtabmap/archive/2022-IlluminationInvariant/scripts/SuperGluePretrainedNetwork/demo_superglue.py
--Reg/RepeatOnce false
--Vis/CorGuessWinSize 0
--Kp/DetectorStrategy 11
--Vis/FeatureType 11
--Mem/UseOdomFeatures false"
depth_topic:=/rs_d435i/aligned_depth_to_color/image_raw
rgb_topic:=/rs_d435i/color/image_raw
camera_info_topic:=/rs_d435i/color/camera_info
approx_sync:=false

i get

Parameters.cpp:1149::parseArguments() Parameter migration from "SuperGlue/Path" to "PyMatcher/Path" (value=/home/lun/rtabmap/archive/2022-IlluminationInvariant/scripts/SuperGluePretrainedNetwork/demo_superglue.py).

is it correct ?

matlabbe · 2024-02-24T22:39:27Z

If you don't use --Vis/CorNNType 6, you are not using superglue, but you are still using superpoint with standard KNN matching approach.

So you get this error when using superglue:

ImportError: /home/lun/.local/lib/python3.8/site-packages/torch/lib/libtorch_python.so: undefined symbol: _ZNK5torch3jit5Graph8toStringEb

Has pytorch been built from source? Uninstall the one installed with pip if you rebuilt pytorch from source.

qetuo105487900 · 2024-02-25T14:27:27Z

@matlabbe

i follow the website:
https://zhuanlan.zhihu.com/p/363611229

this is not build pytorch, right ?

HelmutE89 · 2024-02-27T12:38:54Z

@matlabbe

i follow the website: https://zhuanlan.zhihu.com/p/363611229

this is not build pytorch, right ?

These are the already compiled pytorch libraries. But the "Download here (cxx11 ABI)" worked for me without having to compile pytorch from the sources. I unpacked them into my home directory "/home/he/projects/libtorch".

I added the library path by adding this line to the end of my ~/.bashrc:
export LD_LIBRARY_PATH=/home/he/projects/libtorch/lib${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

And build my ros2 humble workspace specifying the cmake directory of the library.
colcon build --symlink-install --cmake-args -DCMAKE_BUILD_TYPE=Release -DWITH_TORCH=ON -DWITH_PYTHON=ON -DTorch_DIR=/home/he/projects/libtorch/share/cmake/Torch --packages-up-to rtabmap_ros

qetuo105487900 · 2024-02-27T14:57:30Z

@HelmutE89 thanks, i still have same problem QQ
so, i can think superpoint + standard KNN is better than superpoint ? but maybe worse than superglue ?

matlabbe · 2024-03-02T21:13:43Z

In practice, you can get already great performance with SuperPoint and standard KNN. However, SuperGlue would give more matches in general than KNN and could resolve very large point of view differences (great for loop closure detection).

qetuo105487900 · 2024-03-03T09:59:22Z

In practice, you can get already great performance with SuperPoint and standard KNN. However, SuperGlue would give more matches in general than KNN and could resolve very large point of view differences (great for loop closure detection).

@matlabbe Thank you for your response.

…some elemSize opencv asserts in debug build (876)" This reverts commit f7e4b38.

matlabbe mentioned this issue Nov 19, 2022

Superglue ROS introlab/rtabmap_ros#534

Closed

This was referenced Jun 4, 2023

rtabmap_slam crashes when using Superglue and Superpoint introlab/rtabmap_ros#977

Closed

unable to build rtabmap with python3 support #1043

Closed

matlabbe mentioned this issue Jun 20, 2023

RTabMap + Humble + Torch= True? #1063

Closed

mattiasmar mentioned this issue Jul 9, 2023

DB Viewer crashes with SuperPoint #1090

Closed

matlabbe mentioned this issue Sep 4, 2023

Need some instruction for python api development (PyDetector and PyMatcher) #1123

Open

cdb0y511 mentioned this issue Sep 5, 2023

[Feature-Request] LightGlue #1129

Open

matlabbe added a commit that referenced this issue Sep 17, 2023

Fixed superglue deadlock on standalone (#896). Fixed some elemSize op…

f1cd819

…encv asserts in debug build (876)

matlabbe closed this as completed Sep 17, 2023

hellovuong pushed a commit to hellovuong/rtabmap that referenced this issue Jan 27, 2024

Fixed superglue deadlock on standalone (introlab#896). Fixed some ele…

f7e4b38

…mSize opencv asserts in debug build (876)

hellovuong pushed a commit to hellovuong/rtabmap that referenced this issue Apr 23, 2024

Revert "Fixed superglue deadlock on standalone (introlab#896). Fixed …

35fc599

…some elemSize opencv asserts in debug build (876)" This reverts commit f7e4b38.

Loop detection freezing with SuperGlue, please fix it. Test on 2022-IlluminationInvariant. #896

Loop detection freezing with SuperGlue, please fix it. Test on 2022-IlluminationInvariant. #896

Comments

cdb0y511 commented Sep 9, 2022 • edited Loading

cdb0y511 commented Sep 9, 2022

matlabbe commented Sep 12, 2022 • edited Loading

matlabbe commented Sep 13, 2022

cdb0y511 commented Sep 14, 2022

matlabbe commented Sep 14, 2022 • edited Loading

cdb0y511 commented Sep 17, 2022

matlabbe commented Sep 24, 2022 • edited Loading

matlabbe commented Sep 24, 2022 • edited Loading

mattiasmar commented May 31, 2023

matlabbe commented Jun 4, 2023 • edited Loading

GVMCOTESA commented Jun 19, 2023

matlabbe commented Jun 20, 2023

GVMCOTESA commented Jun 22, 2023

mattiasmar commented Jun 23, 2023 • edited by matlabbe Loading

matlabbe commented Jun 26, 2023 • edited Loading

matlabbe commented Jun 26, 2023

matlabbe commented Jun 29, 2023

GVMCOTESA commented Jun 29, 2023

mattiasmar commented Jul 6, 2023

matlabbe commented Jul 10, 2023

GVMCOTESA commented Jul 14, 2023

matlabbe commented Jul 14, 2023

matlabbe commented Sep 10, 2023

matlabbe commented Sep 17, 2023 • edited Loading

cdb0y511 commented Sep 22, 2023

mattiasmar commented Dec 17, 2023

qetuo105487900 commented Feb 19, 2024 • edited Loading

matlabbe commented Feb 19, 2024

qetuo105487900 commented Feb 23, 2024 • edited Loading

matlabbe commented Feb 24, 2024

qetuo105487900 commented Feb 25, 2024 • edited Loading

HelmutE89 commented Feb 27, 2024

qetuo105487900 commented Feb 27, 2024

matlabbe commented Mar 2, 2024 • edited Loading

qetuo105487900 commented Mar 3, 2024

cdb0y511 commented Sep 9, 2022 •

edited

Loading

matlabbe commented Sep 12, 2022 •

edited

Loading

matlabbe commented Sep 14, 2022 •

edited

Loading

matlabbe commented Sep 24, 2022 •

edited

Loading

matlabbe commented Sep 24, 2022 •

edited

Loading

matlabbe commented Jun 4, 2023 •

edited

Loading

mattiasmar commented Jun 23, 2023 •

edited by matlabbe

Loading

matlabbe commented Jun 26, 2023 •

edited

Loading

matlabbe commented Sep 17, 2023 •

edited

Loading

qetuo105487900 commented Feb 19, 2024 •

edited

Loading

qetuo105487900 commented Feb 23, 2024 •

edited

Loading

qetuo105487900 commented Feb 25, 2024 •

edited

Loading

matlabbe commented Mar 2, 2024 •

edited

Loading