Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loop detection freezing with SuperGlue, please fix it. Test on 2022-IlluminationInvariant. #896

Closed
cdb0y511 opened this issue Sep 9, 2022 · 35 comments

Comments

@cdb0y511
Copy link
Contributor

cdb0y511 commented Sep 9, 2022

Hi, @matlabbe.
It happens with the latest git,
How to reproduce:
with the latest git, and I test with the
https://github.com/introlab/rtabmap/blob/71a28bb570e26f6bbcb78bd7a95ea75c24a4d4d8/archive/2022-IlluminationInvariant/README.md
first DB loc_190321-165128.db, source load from DB, use odometer in the DB, and Kp/DetectorStrategy, Vis/FeatureType should be set to 11 (SuperPoint), and Vis/CorNNType is set to 6 (SuperGlue). It freezes in a few detections like below
2022-09-09 17-11-56 的屏幕截图
The odometer seems to continue, but loop detection stops. I can press stop, but can not close the DB, need to force kill it.
I return to the latest release(https://github.com/introlab/rtabmap/releases/tag/0.20.16)
2022-09-09 17-12-02 的屏幕截图
Everything is OK, superGlue works.
Btw, the superGlue works on the odometer, but freezes with the loop detection.
Ubuntu:20.04, test on both Cuda 11.6 and 11.7. libtorch 1.8.2.
I think there are some issues with loop detection with the recent commit.
I hope you can fix it soon.
Thanks,@matlabbe

@cdb0y511
Copy link
Contributor Author

cdb0y511 commented Sep 9, 2022

Btw the superpoint with kdtree works with loop detection but freezes with superGlue.
I think you may check recent commits about the loop detection, otherwise can not reproduce the results with the latest git for the paper https://doi.org/10.3389/frobt.2022.801886.

@matlabbe
Copy link
Member

matlabbe commented Sep 12, 2022

Currently working on a docker image to reproduce the results (working so far but minor issues to fix this week, maybe today if I have time). There is however a known issue with loading python scripts in rtabmap when there is more than one thread using Python at the same time. The results presented in the paper were generated using rtabmap-reprocess tool, which works with single thread, so no Python mutli-threading issue like with standalone ui app.

Related to introlab/rtabmap_ros#534

@matlabbe
Copy link
Member

@cdb0y511
Copy link
Contributor Author

@matlabbe, thanks
I will look into it.
But I hope this will be fixed for standalone soon.
I wonder if it is related to the python version (currently python 3.8, can 3.9 avoid this issue?).

@matlabbe
Copy link
Member

matlabbe commented Sep 14, 2022

Reproduced the problem on standalone, stuck on:

SuperGlue python init()

It seems freezing when initializing superglue:

print("SuperGlue python init()")

To reproduce:

export XAUTH=/tmp/.docker.xauth
touch $XAUTH
xauth nlist $DISPLAY | sed -e 's/^..../ffff/' | xauth -f $XAUTH nmerge -

docker run --gpus all -it --rm --ipc=host --runtime=nvidia \
    --env="DISPLAY=$DISPLAY" \
    --env="QT_X11_NO_MITSHM=1" \
    --volume="/tmp/.X11-unix:/tmp/.X11-unix:rw" \
    --env="XAUTHORITY=$XAUTH" \
    --volume="$XAUTH:$XAUTH" \
    -v ~/Downloads/Illumination_invariant_databases:/workspace/databases \
    rtabmap_frontiers \
        rtabmap --SuperPoint/ModelPath /workspace/scripts/superpoint_v1.pt \
        --SuperGlue/Path /workspace/scripts/SuperGluePretrainedNetwork/rtabmap_superglue.py \
        --Kp/DetectorStrategy 11 \
        --Mem/UseOdomFeatures false \
        --Vis/CorNNType 6 

Say "Yes" to all startup dialogs, then Open Preferences->Source, set Source->Database, scroll down and set database path to any databases in "/workspace/databases". Say "Yes" to use odometry data and "Yes" to process all data. Click ok, new database, then start.

The difference between rtabmap-reprocess and rtabmap, is that the map update thread is not running on the main thread in the standalone. This may cause issue when python interpreter is not initialized in same thread context. For reference, those are the two python classes involved:
https://github.com/introlab/rtabmap/blob/master/corelib/src/python/PythonInterface.cpp
https://github.com/introlab/rtabmap/blob/master/corelib/src/python/PyMatcher.cpp

PythonInterface is initialized on the main thread (constructor here, created here), while rtabmap class is running on a second thread called RtabmapThread. Normally PythonInterface should switch python context between threads, thus it must be a problem when switching contexts.

I see two solutions:

  • Fix python thread switching context (preferred to handle all new coming python-based approaches), or
  • Implement superglue in C++ (similarly to SuperPoint to avoid calling python from c++)

@cdb0y511
Copy link
Contributor Author

I wonder why the latest release (https://github.com/introlab/rtabmap/releases/tag/0.20.16) can not reproduce this issue.

@matlabbe
Copy link
Member

matlabbe commented Sep 24, 2022

Tested with 0.20.16 using the docker image (checking out 0.20.16 inside and rebuild it) and the same problem happens. Digging more into the issue, I tried to replicate a minimal example on how python is used inside rtabmap across threads, based on this example:

// runs in a new thread
void f(PyInterpreterState* interp, const char* tname)
{
    std::string code = R"PY(

from __future__ import print_function
import sys

print("TNAME: sys.xxx={}".format(getattr(sys, 'xxx', 'attribute not set')))

    )PY";

    code.replace(code.find("TNAME"), 5, tname);
    
    
    PyThreadState* threadState = PyThreadState_New(interp);
    PyEval_RestoreThread(threadState);
    

    //sub_interpreter::thread_scope scope(interp);
    PyRun_SimpleString(code.c_str());
    
    PyThreadState_Clear(threadState);
    PyThreadState_DeleteCurrent();
}

int main()
{
    initialize init;
    
    PyThreadState* mainState;
    mainState = PyEval_SaveThread();

    PyEval_RestoreThread(mainState);

    PyRun_SimpleString(R"PY(

# set sys.xxx, it will only be reflected in t4, which runs in the context of the main interpreter

from __future__ import print_function
import sys

sys.xxx = ['abc']
print('main: setting sys.xxx={}'.format(sys.xxx))

    )PY");
    
    mainState = PyEval_SaveThread();

    // Simulating here a thread using the main python interpreter
    std::thread t4{f, mainState->interp, "t4(main)"};
    t4.join();
    
    PyEval_RestoreThread(mainState);

    return 0;
}

This works as expected. I then checked where exactly the code is freezing on superglue side, and it seems it happens when it calls load_state_dict here:

self.load_state_dict(torch.load(str(path)))

Maybe related issue: huggingface/transformers#8649

Note that when you say:

I wonder why the latest release (https://github.com/introlab/rtabmap/releases/tag/0.20.16) can not reproduce this issue.

do you mean the windows cuda binaries? If so, there could be an issue with the pytorch version used. EDIT: The windows binaries don't have python support.

@matlabbe
Copy link
Member

matlabbe commented Sep 24, 2022

At least on ROS it works. I tested by adding ros noetic in the rtabmap_frontiers docker image.

Launch the docker image:

export XAUTH=/tmp/.docker.xauth
touch $XAUTH
xauth nlist $DISPLAY | sed -e 's/^..../ffff/' | xauth -f $XAUTH nmerge -

docker run --gpus all -it --rm --ipc=host --runtime=nvidia     \
    --env="DISPLAY=$DISPLAY"     \
    --env="QT_X11_NO_MITSHM=1"     \
    --volume="/tmp/.X11-unix:/tmp/.X11-unix:rw"     \
    --env="XAUTHORITY=$XAUTH"     \
    --volume="$XAUTH:$XAUTH"     \
    --network host \
    --privileged  \
    rtabmap_frontiers 

Install ros noetic and build rtabmap_ros in the container, then after launching realsense D435i like in this tutorial, from inside the container:

roslaunch rtabmap_ros rtabmap.launch args:="-d  \
        --SuperPoint/ModelPath /workspace/scripts/superpoint_v1.pt \
        --SuperGlue/Path /workspace/scripts/SuperGluePretrainedNetwork/rtabmap_superglue.py \
        --Reg/RepeatOnce false \
        --Vis/CorGuessWinSize 0 \
        --Kp/DetectorStrategy 11 \
        --Vis/FeatureType 11 \
        --Mem/UseOdomFeatures false \
        --Vis/CorNNType 6" \
      depth_topic:=/camera/aligned_depth_to_color/image_raw \
      rgb_topic:=/camera/color/image_raw \
      camera_info_topic:=/camera/color/camera_info \
      approx_sync:=false \
      wait_imu_to_init:=true \
      imu_topic:=/rtabmap/imu

To make sure there is no second matching done after superglue, set --Reg/RepeatOnce false --Vis/CorGuessWinSize 0. In this example, both rtabmap and rgbd_odometry nodes are using SuperPoint/SuperGlue. To use only SuperPoint/SuperGlue on rtabmap node, change args by rtabmap_args.

@mattiasmar
Copy link

Hello,
What's the status of this issue?

@matlabbe
Copy link
Member

matlabbe commented Jun 4, 2023

  • On ROS (rtabmap and odometry nodes): working
  • Reprocess tool: working
  • Matching tool: working
  • standalone: freezing

@GVMCOTESA
Copy link

I am trying to use Superglue via that dockerfile. I installed ROS Noetic using the standard method and everything works okay until I use catkin_make, which brings an error where it cannot find empy. I have found there is an issue when multiple interpreters are installed, but everything seems to be pointing to the missing dependency. The problem is that the environment uses conda, and it cannot install empy due to a conflict with other packages installed in the image. How did you manage to install ROS and rtabmap_ros?

@matlabbe
Copy link
Member

@GVMCOTESA
Copy link

I am doing that inside docker, but there are two versions of python now. In the last-mentioned issue, you do not seem to have an issue with that, ¿ Are you using catkin_make or catkin build to build in noetic? If I try to use pip instead of conda or the pytorch image, the problem is that installing pytorch with pip does not include c++11 abi, so rtabmap is unable to build with the "undefined reference to" error. If you install libtorch with c++11 abi, then pytorch is not installed, and if you install both, they clash and python segfaults. Building using conda results in the error I mentioned in my previous comment.

To clarify, I am using the commands in Step 1 to 3 inside docker.

@mattiasmar
Copy link

mattiasmar commented Jun 23, 2023

I'm testing SuperPoint/SuperGlue on freiburg2_pioneer_slam3 dataset. I can see detections, but no matches.
@cdb0y511 @matlabbe can you confirm that superglue is expected to work on this dataset?

This is how I test it:
./install/rtabmap/bin/rtabmap-rgbd_dataset --cameras 1 --Rtabmap/PublishRAMUsage true --Rtabmap/DetectionRate 2 --RGBD/LinearUpdate 0 --Mem/STMSize 30 --Mem/UseOdomFeatures false --Vis/CorNNType 6 --Kp/DetectorStrategy 11 --Vis/FeatureType 11 --Reg/RepeatOnce false --SuperGlue/Path ~/ws/src/rtabmap/archive/2022-IlluminationInvariant/scripts/SuperGluePretrainedNetwork/rtabmap_superglue.py --SuperPoint/ModelPath ~/ws/src/rtabmap/archive/2022-IlluminationInvariant/scripts/superpoint_v1.pt --PyMatcher/Cuda false --SuperPoint/Cuda false /data/TUM/rgbd_dataset_freiburg2_pioneer_slam3
Thanks!

@matlabbe
Copy link
Member

matlabbe commented Jun 26, 2023

Superglue should work for loop closure detection, not odometry (unless you choose F2F odometry). Here a small comparison between different approaches.

  • Default parameters (GFTT features for odom and loop closure, with standard nearest neighbor):
    rtabmap-rgbd_dataset --cameras 1 --Rtabmap/PublishRAMUsage true --Rtabmap/DetectionRate 2 --RGBD/LinearUpdate 0 --Mem/STMSize 30 --Mem/UseOdomFeatures true --Vis/CorNNType 1 --Kp/DetectorStrategy 8 --Vis/FeatureType 8 --Reg/RepeatOnce true --Odom/ResetCountdown 10 --Vis/CorNNDR 0.8 rgbd_dataset_freiburg2_pioneer_slam3
    Screenshot from 2023-06-25 17-54-43

  • Default parameters for odom (GFTT), but using SuperPoint + default NN matching for loop closure:
    rtabmap-rgbd_dataset --cameras 1 --Rtabmap/PublishRAMUsage true --Rtabmap/DetectionRate 2 --RGBD/LinearUpdate 0 --Mem/STMSize 30 --Mem/UseOdomFeatures true --Vis/CorNNType 6 --Kp/DetectorStrategy 11 --Vis/FeatureType 11 --Reg/RepeatOnce false --SuperGlue/Path ~/workspace/rtabmap/archive/2022-IlluminationInvariant/scripts/SuperGluePretrainedNetwork/rtabmap_superglue.py --SuperPoint/ModelPath ~/superpoint_v1.pt --PyMatcher/Cuda true --SuperPoint/Cuda true --Odom/ResetCountdown 10 --Vis/CorNNDR 0.6 rgbd_dataset_freiburg2_pioneer_slam3
    Screenshot from 2023-06-25 18-12-45

  • Default parameters for odom (GFTT), but using SuperPoint + SuperGlue for loop closure:
    rtabmap-rgbd_dataset --cameras 1 --Rtabmap/PublishRAMUsage true --Rtabmap/DetectionRate 2 --RGBD/LinearUpdate 0 --Mem/STMSize 30 --Mem/UseOdomFeatures false --Vis/CorNNType 6 --Kp/DetectorStrategy 11 --Vis/FeatureType 8 --Reg/RepeatOnce false --SuperGlue/Path ~/workspace/rtabmap/archive/2022-IlluminationInvariant/scripts/SuperGluePretrainedNetwork/rtabmap_superglue.py --SuperPoint/ModelPath ~/superpoint_v1.pt --PyMatcher/Cuda true --SuperPoint/Cuda true --Odom/ResetCountdown 10 --Vis/CorNNDR 0.8 rgbd_dataset_freiburg2_pioneer_slam3
    Screenshot from 2023-06-25 17-43-55_odomgftt_loopsuperpoint

  • SuperPoint for odom, Superpoint+Superglue for loop closure detection:
    rtabmap-rgbd_dataset --cameras 1 --Rtabmap/PublishRAMUsage true --Rtabmap/DetectionRate 2 --RGBD/LinearUpdate 0 --Mem/STMSize 30 --Mem/UseOdomFeatures true --Vis/CorNNType 6 --Kp/DetectorStrategy 11 --Vis/FeatureType 11 --Reg/RepeatOnce false --SuperGlue/Path ~/workspace/rtabmap/archive/2022-IlluminationInvariant/scripts/SuperGluePretrainedNetwork/rtabmap_superglue.py --SuperPoint/ModelPath ~/superpoint_v1.pt --PyMatcher/Cuda true --SuperPoint/Cuda true --Odom/ResetCountdown 10 --Vis/CorNNDR 0.6 rgbd_dataset_freiburg2_pioneer_slam3
    Screenshot from 2023-06-25 17-40-04

For that dataset, it seems there is 3 sec missing while the robot was rotating around 1671st frame. I fixed the code to make Odom/ResetCountdown works with that tool.

Looking at the results, the lack of loop closures for GFTT is more related to binary descriptors, not that it is not Superglue. Here is a difference of matching superpoint features with and without superglue respectively:
Screenshot from 2023-06-25 18-06-10
Screenshot from 2023-06-25 18-06-45

@matlabbe
Copy link
Member

@GVMCOTESA what is your base image? Is it the one from nvidia like in the frontiers dockerfile?

@matlabbe
Copy link
Member

I did it with native installed libraries. For docker, you may use frontiers dockerfile. If you want to go ROS, I also recently created an image for rtabmap_ros.

@GVMCOTESA
Copy link

Yes, It is the frontiers one. I will try the new image, thank you.

@mattiasmar
Copy link

  • On ROS (rtabmap and odometry nodes): working
  • Reprocess tool: working
  • Matching tool: working
  • standalone: freezing

With the matching tool, is the rtabmap-databaseviewer intended?
When I try to induce a loop closure in the DB viewer I get this error whenever the SP/SG is called more than once:

Superglue execution times:  0.8277442455291748 [-0.827739953994751]
[ INFO] (2023-07-06 21:01:10.415) PythonInterface.cpp:48::~PythonInterface() Py_Finalize() with thread = 673533952
[ INFO] (2023-07-06 21:01:10.654) DatabaseViewer.cpp:8262::refineConstraint() (1 ->2) Registration time: 1.713461 s
[ INFO] (2023-07-06 21:01:10.680) PythonInterface.cpp:25::PythonInterface() Py_Initialize() with thread = 673533952
[ INFO] (2023-07-06 21:01:10.706) PyMatcher.cpp:33::PyMatcher() path = /root/ws/src/rtabmap/archive/2022-IlluminationInvariant/scripts/SuperGluePretrainedNetwork/rtabmap_superglue.py
[ INFO] (2023-07-06 21:01:10.706) PyMatcher.cpp:34::PyMatcher() model = indoor
Segmentation fault (core dumped)

@matlabbe Are you seeing this too?

@matlabbe
Copy link
Member

The matching tool is not rtabmap-databaseViewer. For database viewer, it may work just one time, then seg fault the second time (when trying to re-initialize the python classes).

@GVMCOTESA
Copy link

It worked, I was not aware that rtabmap_ros must be in its own container separated from the rest, now I have a separate container for simulating the robot. I have to say, there must be something misconfigured on my part, the map rotates wildly each time a loop closure is detected, and it doesn't seem to be converging. Is there a set of calibration parameters that can help reduce this?
image

@matlabbe
Copy link
Member

Can you share the database?

@matlabbe
Copy link
Member

Regarding the app freezing on superglue initialization (#896 (comment)). Here is a gdb log when it happens:

#0  0x00007ffff078f1f1 in PyThreadState_Clear (tstate=0x7fff0c6dcb00) at ../Python/pystate.c:764
#1  0x00007ffefd33794d in pybind11::gil_scoped_acquire::dec_ref() () at /home/mathieu/.local/lib/python3.8/site-packages/torch/lib/libtorch_python.so
#2  0x00007ffefd33798d in pybind11::gil_scoped_acquire::~gil_scoped_acquire() () at /home/mathieu/.local/lib/python3.8/site-packages/torch/lib/libtorch_python.so
#3  0x00007ffefd6efbcd in torch::autograd::PyFunctionTensorPreHook::~PyFunctionTensorPreHook() () at /home/mathieu/.local/lib/python3.8/site-packages/torch/lib/libtorch_python.so
#4  0x00007ffefd6efbed in torch::autograd::PyFunctionTensorPreHook::~PyFunctionTensorPreHook() () at /home/mathieu/.local/lib/python3.8/site-packages/torch/lib/libtorch_python.so
#5  0x00007fffd80125cf in torch::autograd::AutogradMeta::~AutogradMeta() () at /home/mathieu/.local/lib/python3.8/site-packages/torch/lib/libtorch_cpu.so
#6  0x00007fffeea9da42 in c10::TensorImpl::~TensorImpl() () at /home/mathieu/.local/lib/python3.8/site-packages/torch/lib/libc10.so
#7  0x00007fffeea9dbed in c10::TensorImpl::~TensorImpl() () at /home/mathieu/.local/lib/python3.8/site-packages/torch/lib/libc10.so
#8  0x00007ffefd704d78 in THPVariable_clear(THPVariable*) () at /home/mathieu/.local/lib/python3.8/site-packages/torch/lib/libtorch_python.so
#9  0x00007ffefd705125 in THPVariable_subclass_dealloc(_object*) () at /home/mathieu/.local/lib/python3.8/site-packages/torch/lib/libtorch_python.so
#10 0x00007ffff0878165 in _Py_DECREF (filename=<synthetic pointer>, lineno=541, op=<optimized out>) at ../Include/object.h:478
#11 _Py_XDECREF (op=<optimized out>) at ../Include/object.h:541
#12 free_keys_object (keys=0x7ffd3f192020) at ../Objects/dictobject.c:584
#13 0x00007ffff0878818 in dictkeys_decref (dk=0x7ffd3f192020) at ../Objects/dictobject.c:324
#14 dict_dealloc (mp=0x7fff459e0340) at ../Objects/dictobject.c:1998
#15 0x00007ffff08743a6 in odict_dealloc (self=0x7fff459e0340) at ../Objects/odictobject.c:1367
#16 0x00007ffff067cd9e in _Py_DECREF (filename=<synthetic pointer>, lineno=4971, op=<optimized out>) at ../Include/object.h:478
#17 call_function (tstate=0x7ffda6635b30, pp_stack=0x7fff14ae3930, oparg=<optimized out>, kwnames=0x0) at ../Python/ceval.c:4971
#18 0x00007ffff0684ef6 in _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>) at ../Python/ceval.c:3469
#19 0x00007ffff07d2e4b in _PyEval_EvalCodeWithName
    (_co=<optimized out>, globals=<optimized out>, locals=<optimized out>, args=<optimized out>, argcount=2, kwnames=0x0, kwargs=0x7fff14ae3b60, kwcount=0, kwstep=1, defs=0x0, defcount=0, kwdefs=0x0, closure=0x7fff4d8b3730, name=0x7fff14d43530, qualname=0x7fff4d8b2a30) at ../Python/ceval.c:4298
#20 0x00007ffff08b0124 in _PyFunction_Vectorcall (func=<optimized out>, stack=<optimized out>, nargsf=<optimized out>, kwnames=<optimized out>) at ../Objects/call.c:436
#21 0x00007ffff08b2417 in _PyObject_FastCallDict (callable=callable@entry=0x7fff4d8b78b0, args=args@entry=0x7fff14ae3b50, nargsf=nargsf@entry=2, kwargs=kwargs@entry=0x0)
    at ../Objects/call.c:96
#22 0x00007ffff08b252d in _PyObject_Call_Prepend (callable=0x7fff4d8b78b0, obj=<optimized out>, args=0x7fff14d164c0, kwargs=0x0) at ../Objects/call.c:888
#23 0x00007ffff084bd47 in slot_tp_init (self=0x7fff14d384f0, args=0x7fff14d164c0, kwds=0x0) at ../Objects/typeobject.c:6790
#24 0x00007ffff08511b9 in type_call (type=<optimized out>, args=0x7fff14d164c0, kwds=0x0) at ../Objects/typeobject.c:994
#25 0x00007ffff08b0b2b in _PyObject_MakeTpCall (callable=0x7ffe5da1e7b0, args=<optimized out>, nargs=<optimized out>, keywords=0x0) at ../Objects/call.c:159
#26 0x00007ffff067cdf3 in _PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, args=<optimized out>, callable=0x7ffe5da1e7b0) at ../Include/cpython/abstract.h:125
#27 _PyObject_Vectorcall (kwnames=<optimized out>, nargsf=<optimized out>, args=<optimized out>, callable=<optimized out>) at ../Include/cpython/abstract.h:115
#28 call_function (tstate=0x7ffda6635b30, pp_stack=0x7fff14ae3d58, oparg=<optimized out>, kwnames=0x0) at ../Python/ceval.c:4963
#29 0x00007ffff067e46d in _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>) at ../Python/ceval.c:3500
#30 0x00007ffff068806b in function_code_fastcall (co=<optimized out>, args=<optimized out>, nargs=5, globals=<optimized out>) at ../Objects/call.c:284
#31 0x00007ffff08b0f23 in _PyObject_Vectorcall (kwnames=<optimized out>, nargsf=<optimized out>, args=<optimized out>, callable=<optimized out>) at ../Include/cpython/abstract.h:147
#32 _PyObject_FastCall (nargs=<optimized out>, args=<optimized out>, func=<optimized out>) at ../Include/cpython/abstract.h:147
#33 _PyObject_CallFunctionVa (callable=0x7fff14b4e8b0, format=<optimized out>, va=va@entry=0x7fff14ae3ec0, is_size_t=is_size_t@entry=0) at ../Objects/call.c:941
#34 0x00007ffff08b218f in _PyObject_CallFunctionVa (is_size_t=0, va=0x7fff14ae3ec0, format=<optimized out>, callable=<optimized out>) at ../Objects/call.c:914
#35 PyObject_CallFunction (callable=<optimized out>, format=<optimized out>) at ../Objects/call.c:961
#36 0x00007ffff73eaf33 in rtabmap::PyMatcher::match(cv::Mat const&, cv::Mat const&, std::vector<cv::KeyPoint, std::allocator<cv::KeyPoint> > const&, std::vector<cv::KeyPoint, std::allocator<cv::KeyPoint> > const&, cv::Size_<int> const&) () at /home/mathieu/workspace/rtabmap/build/bin/librtabmap_core.so.0.21
#37 0x00007ffff724cc2c in rtabmap::RegistrationVis::computeTransformationImpl(rtabmap::Signature&, rtabmap::Signature&, rtabmap::Transform, rtabmap::RegistrationInfo&) const ()
    at /home/mathieu/workspace/rtabmap/build/bin/librtabmap_core.so.0.21

I think the problem is that:

  1. we create the python interpreter in main thread of rtabmap,
  2. the matching call is done inside a sub thread, calling python from c++, thus GIL should be acquire,
  3. then in superglue python code, pytorch's c-functions are called, then pybind11 will release the GIL for c-code and re-acquire it again, triggering some memory clearing that makes the app freezes.

The difference between standalone and ros is that for the later the python interpreter is running in same thread than the one pytorch is running onto. Questions:

  • Should we create one python interpreter per thread to avoid this situation? It looks overkill if rtabmap and odometry are both loading same python modules.
  • Is there a way that the GIL can be acquired/released not from same thread than python interpreter? Based on that example above, it seems so, but we may improve that example with a python code calling another c-function, that would release/acquire the GIL.

matlabbe added a commit that referenced this issue Sep 17, 2023
@matlabbe
Copy link
Member

matlabbe commented Sep 17, 2023

Fixed in f1cd819

@cdb0y511
Copy link
Contributor Author

I am doing that inside docker, but there are two versions of python now. In the last-mentioned issue, you do not seem to have an issue with that, ¿ Are you using catkin_make or catkin build to build in noetic? If I try to use pip instead of conda or the pytorch image, the problem is that installing pytorch with pip does not include c++11 abi, so rtabmap is unable to build with the "undefined reference to" error. If you install libtorch with c++11 abi, then pytorch is not installed, and if you install both, they clash and python segfaults. Building using conda results in the error I mentioned in my previous comment.

To clarify, I am using the commands in Step 1 to 3 inside docker.

I am glad we could skip the docker for now cause I found it has some performance issues related to the docker itself.
But unfortunately, you should build libtorch from the source to avoid undefined reference error, related to c++11 abi
related to #1063

@mattiasmar
Copy link

@cdb0y511 I too note that building pytorch from source avoids the ""undefined reference" errors. However, I also note a severe (>>10x) peformance loss with this pytorch compiled from sources.
I'm testing on CPU only and I compile with the flag ENV USE_MKLDNN=1. Prior to that I install Intel's oneDNN like this:

git clone --branch v3.4-pc --recursive https://github.com/oneapi-src/oneDNN.git /one-dnn
mkdir -p build && cd build && cmake .. && make -j  && make install

Question: Did you also recording a loss in inference speed when building pytorch from source? Did you overcome it in some way?

hellovuong pushed a commit to hellovuong/rtabmap that referenced this issue Jan 27, 2024
@qetuo105487900
Copy link

qetuo105487900 commented Feb 19, 2024

sorry to bother everyone, Superglue can run rtabslam without docker ? as follow step is correct?

  1. roslaunch realsense2_camera rs_camera.launch align_depth:=true
  2. roslaunch rtabmap_launch rtabmap.launch args:="-d
    --SuperPoint/ModelPath /home/lun/rtabmap/archive/2022-IlluminationInvariant/scripts/superpoint_v1.pt
    --SuperGlue/Path /home/lun/rtabmap/archive/2022-IlluminationInvariant/scripts/SuperGluePretrainedNetwork/demo_superglue.py
    --Reg/RepeatOnce false
    --Vis/CorGuessWinSize 0
    --Kp/DetectorStrategy 11
    --Vis/FeatureType 11
    --Mem/UseOdomFeatures false
    --Vis/CorNNType 6"
    rtabmap_args:="--delete_db_on_start"
    depth_topic:=/camera/aligned_depth_to_color/image_raw
    rgb_topic:=/camera/color/image_raw
    camera_info_topic:=/camera/color/camera_info
    approx_sync:=false

i get this : QAQ
Features2d.cpp:594::create() SupertPoint Torch feature cannot be used as RTAB-Map is not built with the option enabled. GFTT/ORB is used instead.

@matlabbe
Copy link
Member

See #1221 (comment)

@qetuo105487900
Copy link

qetuo105487900 commented Feb 23, 2024

i ran this as follow : up and down just different with add/ not add --Vis/CorNNType 6

roslaunch rtabmap_launch rtabmap.launch args:="-d
--delete_db_on_start
--SuperPoint/ModelPath /home/lun/rtabmap/archive/2022-IlluminationInvariant/scripts/superpoint_v1.pt
--SuperGlue/Path /home/lun/rtabmap/archive/2022-IlluminationInvariant/scripts/SuperGluePretrainedNetwork/demo_superglue.py
--Reg/RepeatOnce false
--Vis/CorGuessWinSize 0
--Kp/DetectorStrategy 11
--Vis/FeatureType 11
--Mem/UseOdomFeatures false
--Vis/CorNNType 6"
depth_topic:=/rs_d435i/aligned_depth_to_color/image_raw
rgb_topic:=/rs_d435i/color/image_raw
camera_info_topic:=/rs_d435i/color/camera_info
approx_sync:=false

and i get

[ERROR] (2024-02-06 00:05:32.077) PyMatcher.cpp:63::PyMatcher() Module "demo_superglue" could not be imported! (File="/home/lun/rtabmap/archive/2022-IlluminationInvariant/scripts/SuperGluePretrainedNetwork/demo_superglue.py")
[ERROR] (2024-02-06 00:05:32.077) PyMatcher.cpp:64::PyMatcher() Traceback (most recent call last):

File "/home/lun/rtabmap/archive/2022-IlluminationInvariant/scripts/SuperGluePretrainedNetwork/demo_superglue.py", line 51, in
import torch

File "/home/lun/.local/lib/python3.8/site-packages/torch/init.py", line 237, in
from torch._C import * # noqa: F403

ImportError: /home/lun/.local/lib/python3.8/site-packages/torch/lib/libtorch_python.so: undefined symbol: _ZNK5torch3jit5Graph8toStringEb

error

but i ran

roslaunch rtabmap_launch rtabmap.launch args:="-d
--delete_db_on_start
--SuperPoint/ModelPath /home/lun/rtabmap/archive/2022-IlluminationInvariant/scripts/superpoint_v1.pt
--SuperGlue/Path /home/lun/rtabmap/archive/2022-IlluminationInvariant/scripts/SuperGluePretrainedNetwork/demo_superglue.py
--Reg/RepeatOnce false
--Vis/CorGuessWinSize 0
--Kp/DetectorStrategy 11
--Vis/FeatureType 11
--Mem/UseOdomFeatures false"
depth_topic:=/rs_d435i/aligned_depth_to_color/image_raw
rgb_topic:=/rs_d435i/color/image_raw
camera_info_topic:=/rs_d435i/color/camera_info
approx_sync:=false

i get

Parameters.cpp:1149::parseArguments() Parameter migration from "SuperGlue/Path" to "PyMatcher/Path" (value=/home/lun/rtabmap/archive/2022-IlluminationInvariant/scripts/SuperGluePretrainedNetwork/demo_superglue.py).

is it correct ?

@matlabbe
Copy link
Member

If you don't use --Vis/CorNNType 6, you are not using superglue, but you are still using superpoint with standard KNN matching approach.

So you get this error when using superglue:

ImportError: /home/lun/.local/lib/python3.8/site-packages/torch/lib/libtorch_python.so: undefined symbol: _ZNK5torch3jit5Graph8toStringEb

Has pytorch been built from source? Uninstall the one installed with pip if you rebuilt pytorch from source.

@qetuo105487900
Copy link

qetuo105487900 commented Feb 25, 2024

@matlabbe
Screenshot from 2024-02-25 22-43-10

i follow the website:
https://zhuanlan.zhihu.com/p/363611229

this is not build pytorch, right ?

@HelmutE89
Copy link

@matlabbe Screenshot from 2024-02-25 22-43-10

i follow the website: https://zhuanlan.zhihu.com/p/363611229

this is not build pytorch, right ?

These are the already compiled pytorch libraries. But the "Download here (cxx11 ABI)" worked for me without having to compile pytorch from the sources. I unpacked them into my home directory "/home/he/projects/libtorch".

I added the library path by adding this line to the end of my ~/.bashrc:
export LD_LIBRARY_PATH=/home/he/projects/libtorch/lib${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

And build my ros2 humble workspace specifying the cmake directory of the library.
colcon build --symlink-install --cmake-args -DCMAKE_BUILD_TYPE=Release -DWITH_TORCH=ON -DWITH_PYTHON=ON -DTorch_DIR=/home/he/projects/libtorch/share/cmake/Torch --packages-up-to rtabmap_ros

@qetuo105487900
Copy link

@HelmutE89 thanks, i still have same problem QQ
so, i can think superpoint + standard KNN is better than superpoint ? but maybe worse than superglue ?

@matlabbe
Copy link
Member

matlabbe commented Mar 2, 2024

In practice, you can get already great performance with SuperPoint and standard KNN. However, SuperGlue would give more matches in general than KNN and could resolve very large point of view differences (great for loop closure detection).

@qetuo105487900
Copy link

In practice, you can get already great performance with SuperPoint and standard KNN. However, SuperGlue would give more matches in general than KNN and could resolve very large point of view differences (great for loop closure detection).

@matlabbe Thank you for your response.

hellovuong pushed a commit to hellovuong/rtabmap that referenced this issue Apr 23, 2024
…some elemSize opencv asserts in debug build (876)"

This reverts commit f7e4b38.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants