Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backup Discovery Server Seg Faults on Repeated Use #784

Open
a-krawciw opened this issue Oct 5, 2024 · 8 comments
Open

Backup Discovery Server Seg Faults on Repeated Use #784

a-krawciw opened this issue Oct 5, 2024 · 8 comments
Assignees
Labels
more-information-needed Further information is required

Comments

@a-krawciw
Copy link

a-krawciw commented Oct 5, 2024

Bug report

Required Info:

  • Operating System:
    Ubuntu 22.04 ROS 2 Humble
  • Installation type:
    Binaries
  • Version or commit hash:
    6.2.6-1jammy.20240517.161150
  • DDS implementation:
    rmw: rmw_fastrtps_cpp
  • Client library (if applicable):
    N/A

Steps to reproduce issue

The primary server is running on the robot over the network with id 0.

The first time I ran

$ source /opt/ros/humble/setup.bash && fast-discovery-server -i 1 -p 11811 -b 
### Server is running ###                                                               
  Participant Type:   BACKUP                
  Security:           NO                                                                                                                                                        
  Server ID:          1                                                                                                                                                         
  Server GUID prefix: 44.53.03.5f.45.50.52.4f.53.49.4d.41                               
  Server Addresses:   UDPv4:[0.0.0.0]:11811                                                                                                                                     
^C                                                                                                                                                                              
### Server shut down ### 

everything worked as expected.

The second time I had to use ID 2:

$ source /opt/ros/humble/setup.bash && fast-discovery-server -i 1 -p 11811 -b 
Segmentation fault (core dumped)    
$ source /opt/ros/humble/setup.bash && fast-discovery-server -i 2 -p 11811 -b
### Server is running ###                                                               
  Participant Type:   BACKUP                
  Security:           NO                                                                                                                                                        
  Server ID:          2                                                                                                                                                         
  Server GUID prefix: 44.53.03.5f.45.50.52.4f.53.49.4d.41                               
  Server Addresses:   UDPv4:[0.0.0.0]:11811                                                                                                                                     
^C                                                                                                                                                                              
### Server shut down ### 

The next time I ran the same command

$ source /opt/ros/humble/setup.bash && fast-discovery-server -i 1 -p 11811 -b 
Segmentation fault (core dumped)    
$ source /opt/ros/humble/setup.bash && fast-discovery-server -i 2 -p 11811 -b
fast-discovery-server: malloc.c:2617: sysmalloc: Assertion '(old_top == initial_top (av) && old_size == 0) || ((unsigned long) (old_size) >= MINSIZE && prev_inuse (old_top) && 
((unsigned long) old_end & (pagesize - 1)) == 0)' failed.
Aborted (core dumped)                                                                   
$ source /opt/ros/humble/setup.bash && fast-discovery-server -i 3 -p 11811 -b
### Server is running ###                                                               
  Participant Type:   BACKUP                
  Security:           NO                                                                                                                                                        
  Server ID:          3                                                                                                                                                         
  Server GUID prefix: 44.53.03.5f.45.50.52.4f.53.49.4d.41                               
  Server Addresses:   UDPv4:[0.0.0.0]:11811                                                                                                                                     
^C                                                                                                                                                                              
### Server shut down ### 

Expected behavior

It should be possible to run, stop and re-run the discovery terminal with the same id as long as the original server has closed.
As I run the server more times, I have to keep increasing the ID to allow it to start.
This persists through a reboot. Is there some cache file that is not being deleted properly?

@a-krawciw
Copy link
Author

Update: If I delete the json and db3 files that are created in my home directory before running the backup server again it functions properly.

There is a bug related to re-loading or restarting the backup server at a later time.

@fujitatomoya
Copy link
Collaborator

The primary server is running on the robot over the network with id 0.

root@tomoyafujita:~/ros2_ws/colcon_ws# fastdds discovery --server-id 0
### Server is running ###
  Participant Type:   SERVER
  Security:           NO
  Server ID:          0
  Server GUID prefix: 44.53.00.5f.45.50.52.4f.53.49.4d.41
  Server Addresses:   UDPv4:[0.0.0.0]:11811

this already bind the port 11811.

fast-discovery-server -i 1 -p 11811 -b

this should fail with the following error, since the port cannot be allocated.

root@tomoyafujita:~/ros2_ws/colcon_ws# fast-discovery-server -i 1 -p 11811 -b
2024-10-07 15:37:53.593 [RTPS_PARTICIPANT Error] Discovery Server wasn't able to allocate the specified listening port. -> Function createParticipant
Server creation failed with the given settings. Please review locators setup.
2024-10-07 15:37:53.596 [DOMAIN_PARTICIPANT Error] Problem creating RTPSParticipant -> Function enable

can you provide the complete procedure step by step including primary server setup?
i tried to reproduce the issue but it does not happen with rolling source build.

@fujitatomoya
Copy link
Collaborator

with humble, i can see the expected error as following, the same with rolling. i would like to see the complete procedure to make this happen.

root@tomoyafujita:~/ros2_ws/humble_ws# fastdds discovery --server-id 0
### Server is running ###
  Participant Type:   SERVER
  Security:           NO
  Server ID:          0
  Server GUID prefix: 44.53.00.5f.45.50.52.4f.53.49.4d.41
  Server Addresses:   UDPv4:[0.0.0.0]:11811

and then,

root@tomoyafujita:~/ros2_ws/humble_ws# fast-discovery-server -i 1 -p 11811 -b
2024-10-07 15:46:35.175 [RTPS_PARTICIPANT Error] Discovery Server wasn't able to allocate the specified listening port. -> Function createParticipant
Server creation failed with the given settings. Please review locators setup.
2024-10-07 15:46:35.178 [DOMAIN_PARTICIPANT Error] Problem creating RTPSParticipant -> Function enable

i tried to make it happen with using other ports and stop/start the discovery server with backup option, but so far i cannot reproduce the issue in my local environment.

@fujitatomoya fujitatomoya added the more-information-needed Further information is required label Oct 7, 2024
@fujitatomoya fujitatomoya self-assigned this Oct 7, 2024
@a-krawciw
Copy link
Author

Hi @fujitatomoya thanks for looking into this.

Sorry if the setup wasn't clear. I have two physical machines connected by ethernet. That's how port 11811 will be available for both discovery servers.

My setup
Robot IP: 192.168.131.1
fast-discovery-server -i 0 -p 11811

Laptop IP: 192.168.131.10
fast-discovery-server -i 1 -p 11811 -b

The ROS_DISCOVERY_SERVER variable is set to ROS_DISCOVERY_SERVER="192.168.131.1:11811;192.168.131.10:11811" on both machines.

If there are no files related to the backup server on my laptop, the setup works fine. If the files server-*.json and .db files are present, I get the seg fault.

@fujitatomoya
Copy link
Collaborator

with using 2 machines in the same network.

  • rollilng with ros2/ros2@e1dbaf8, i cannot reproduce segfault with discovery server.
root@edgemaster:~/docker_ws/colcon_ws# fast-discovery-server -i 0 -p 11811
### Server is running ###
  Participant Type:   SERVER
  Security:           NO
  Server ID:          0
  Server GUID prefix: 44.53.00.5f.45.50.52.4f.53.49.4d.41
  Server Addresses:   UDPv4:[0.0.0.0]:11811
^C
### Server shut down ###
root@edgemaster:~/docker_ws/colcon_ws# fast-discovery-server -i 0 -p 11811
### Server is running ###
  Participant Type:   SERVER
  Security:           NO
  Server ID:          0
  Server GUID prefix: 44.53.00.5f.45.50.52.4f.53.49.4d.41
  Server Addresses:   UDPv4:[0.0.0.0]:11811
^C
### Server shut down ###

root@tomoyafujita:~/ros2_ws/colcon_ws# fast-discovery-server -i 1 -p 11811 -b
### Server is running ###
  Participant Type:   BACKUP
  Security:           NO
  Server ID:          1
  Server GUID prefix: 44.53.01.5f.45.50.52.4f.53.49.4d.41
  Server Addresses:   UDPv4:[0.0.0.0]:11811
^C
### Server shut down ###
root@tomoyafujita:~/ros2_ws/colcon_ws# fast-discovery-server -i 1 -p 11811 -b
### Server is running ###
  Participant Type:   BACKUP
  Security:           NO
  Server ID:          1
  Server GUID prefix: 44.53.01.5f.45.50.52.4f.53.49.4d.41
  Server Addresses:   UDPv4:[0.0.0.0]:11811
^C
### Server shut down ###
root@tomoyafujita:~/ros2_ws/colcon_ws# fast-discovery-server -i 2 -p 11811 -b
### Server is running ###
  Participant Type:   BACKUP
  Security:           NO
  Server ID:          2
  Server GUID prefix: 44.53.02.5f.45.50.52.4f.53.49.4d.41
  Server Addresses:   UDPv4:[0.0.0.0]:11811
^C
### Server shut down ###
root@tomoyafujita:~/ros2_ws/colcon_ws# fast-discovery-server -i 2 -p 11811 -b
### Server is running ###
  Participant Type:   BACKUP
  Security:           NO
  Server ID:          2
  Server GUID prefix: 44.53.02.5f.45.50.52.4f.53.49.4d.41
  Server Addresses:   UDPv4:[0.0.0.0]:11811
^C
### Server shut down ###

@fujitatomoya
Copy link
Collaborator

even with released humble environment, i cannot reproduce the issue.

@a-krawciw I need more information to reproduce this issue, can you tell me how you can make this happen with step by step and command by command?

root@edgemaster:~# fast-discovery-server -i 0 -p 11811
### Server is running ###
  Participant Type:   SERVER
  Security:           NO
  Server ID:          0
  Server GUID prefix: 44.53.00.5f.45.50.52.4f.53.49.4d.41
  Server Addresses:   UDPv4:[0.0.0.0]:11811
^C
### Server shut down ###
root@edgemaster:~# fast-discovery-server -i 0 -p 11811
### Server is running ###
  Participant Type:   SERVER
  Security:           NO
  Server ID:          0
  Server GUID prefix: 44.53.00.5f.45.50.52.4f.53.49.4d.41
  Server Addresses:   UDPv4:[0.0.0.0]:11811
^C
### Server shut down ###

root@tomoyafujita:~# fast-discovery-server -i 1 -p 11811 -b
### Server is running ###
  Participant Type:   BACKUP
  Security:           NO
  Server ID:          1
  Server GUID prefix: 44.53.01.5f.45.50.52.4f.53.49.4d.41
  Server Addresses:   UDPv4:[0.0.0.0]:11811
^C
### Server shut down ###
root@tomoyafujita:~# fast-discovery-server -i 1 -p 11811 -b
### Server is running ###
  Participant Type:   BACKUP
  Security:           NO
  Server ID:          1
  Server GUID prefix: 44.53.01.5f.45.50.52.4f.53.49.4d.41
  Server Addresses:   UDPv4:[0.0.0.0]:11811
^C
### Server shut down ###
root@tomoyafujita:~# fast-discovery-server -i 2 -p 11811 -b
### Server is running ###
  Participant Type:   BACKUP
  Security:           NO
  Server ID:          2
  Server GUID prefix: 44.53.02.5f.45.50.52.4f.53.49.4d.41
  Server Addresses:   UDPv4:[0.0.0.0]:11811
^C
### Server shut down ###
root@tomoyafujita:~# fast-discovery-server -i 2 -p 11811 -b
### Server is running ###
  Participant Type:   BACKUP
  Security:           NO
  Server ID:          2
  Server GUID prefix: 44.53.02.5f.45.50.52.4f.53.49.4d.41
  Server Addresses:   UDPv4:[0.0.0.0]:11811
^C
### Server shut down ###

@a-krawciw
Copy link
Author

My setup is as follows.

Robot Computer 1:
fast-discovery-server -i 0 -p 11811

Laptop Computer 2:
fast-discovery-server -i 1 -p 11811 -b
Ctrl+C

Robot Computer 1:
Ctrl+C
fast-discovery-server -i 0 -p 11811

Laptop Computer 2:
fast-discovery-server -i 1 -p 11811 -b

This results in the seg fault for me.

@fujitatomoya
Copy link
Collaborator

This problem cannot be observed with latest humble release to me.

  • machine-A
root@tomoyafujita:~# fast-discovery-server -i 0 -p 11811
### Server is running ###
  Participant Type:   SERVER
  Security:           NO
  Server ID:          0
  Server GUID prefix: 44.53.00.5f.45.50.52.4f.53.49.4d.41
  Server Addresses:   UDPv4:[0.0.0.0]:11811
^C
### Server shut down ###
root@tomoyafujita:~# fast-discovery-server -i 0 -p 11811
### Server is running ###
  Participant Type:   SERVER
  Security:           NO
  Server ID:          0
  Server GUID prefix: 44.53.00.5f.45.50.52.4f.53.49.4d.41
  Server Addresses:   UDPv4:[0.0.0.0]:11811
^C
### Server shut down ###
  • machine-B
root@edgemaster:~# fast-discovery-server -i 1 -p 11811 -b
### Server is running ###
  Participant Type:   BACKUP
  Security:           NO
  Server ID:          1
  Server GUID prefix: 44.53.01.5f.45.50.52.4f.53.49.4d.41
  Server Addresses:   UDPv4:[0.0.0.0]:11811
^C^C
### Server shut down ###
root@edgemaster:~# fast-discovery-server -i 1 -p 11811 -b
### Server is running ###
  Participant Type:   BACKUP
  Security:           NO
  Server ID:          1
  Server GUID prefix: 44.53.01.5f.45.50.52.4f.53.49.4d.41
  Server Addresses:   UDPv4:[0.0.0.0]:11811
^C
### Server shut down ###

Can you try if the problem still happens after apt upgrade? the only difference that i can see here is, ros-humble-rmw-fastrtps-cpp version of mine is up to date.

root@tomoyafujita:~# dpkg -s ros-humble-rmw-fastrtps-cpp
Package: ros-humble-rmw-fastrtps-cpp
Status: install ok installed
Priority: optional
Section: misc
Installed-Size: 352
Maintainer: Michel Hidalgo <[email protected]>
Architecture: amd64
Version: 6.2.7-1jammy.20240728.212513
Depends: libc6 (>= 2.32), libgcc-s1 (>= 3.3.1), libstdc++6 (>= 11), ros-humble-fastcdr, ros-humble-fastrtps, ros-humble-ament-cmake, ros-humble-fastrtps-cmake-module, ros-humble-rcpputils, ros-humble-rcutils, ros-humble-rmw, ros-humble-rmw-dds-common, ros-humble-rmw-fastrtps-shared-cpp, ros-humble-rosidl-cmake, ros-humble-rosidl-runtime-c, ros-humble-rosidl-runtime-cpp, ros-humble-rosidl-typesupport-fastrtps-c, ros-humble-rosidl-typesupport-fastrtps-cpp, ros-humble-tracetools, ros-humble-ros-workspace
Description: Implement the ROS middleware interface using eProsima FastRTPS static code generation in C++.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
more-information-needed Further information is required
Projects
None yet
Development

No branches or pull requests

2 participants