Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nav2_map_server should not catch SIGSEGV, nor report it as "Magick: ..." #4703

Open
mferenduros-ocado opened this issue Oct 3, 2024 · 3 comments
Labels
enhancement New feature or request

Comments

@mferenduros-ocado
Copy link

mferenduros-ocado commented Oct 3, 2024

Bug report

Required Info:

  • Operating System:
    • Ubuntu 22.04.5 LTS
  • ROS2 Version:
    • Iron binaries
  • Version or commit hash:
    • ros-iron-navigation2 1.2.9-1jammy.20240815.145756
  • DDS implementation:
    • Cyclone

Steps to reproduce issue

Run nav2 with composition enabled
Trigger a segfault in any nav2 component (eg. write though a null pointer in a planner or an MPPI critic)

Expected behavior

The component container process segfaults

Actual behavior

An error message that suggests the problem relates to GraphicsMagick, eg

[component_container_isolated-26] Magick: abort due to signal 11 (SIGSEGV) "Segmentation Fault"...

This message has appears to have led several people (including myself) to bad assumptions about what's caused the segfault.

IMO the preferred behaviour would be to leave the signal uncaught so we can use (eg) core-dumps to diagnose crashes edit: Magick calls abort() so we do get core-dumps.

Additional information

Magick::InitializeMagick registers process-wide handlers for a bunch of signals, including SIGSEGV, only skipping those signals that already have handlers registered. A possible (albeit racy) fix might be something like

auto prev_segv_handler = signal(SIGSEGV, SIG_DFL);
InitializeMagick();
signal(SIGSEGV, prev_segv_handler);
@SteveMacenski
Copy link
Member

SteveMacenski commented Oct 3, 2024

I wholeheartedly agree that it is annoying when Magick logs crashes when other things crash. Its a known quirk that developers have learned to ignore and look elsewhere for the actual cause of the crash. However if there's a way to prevent that from happening without making the situation less clear to developers, that would be lovely.

Do you see any clear issues with your suggestion, such as when a signal from ROS launch is being communicated to the map server or seg faults in that process not being reported appropriately (should be SIGINT/SIGTERM so that should be fine)? Have you tested that change to resolve the issue? Any other signals we should redirect (perhaps SIGILL)?

I see 2 places in mapIO that we set the Magick initialization and they can both be called repeatedly, so I think we're safe from that perspective.

@SteveMacenski SteveMacenski added the enhancement New feature or request label Oct 3, 2024
@mferenduros-ocado
Copy link
Author

👍

I haven't tested a fix, but would be happy to test and submit a PR if you're ok with the general approach. It is theoretically racy - if another thread was getting and setting signal handlers at exactly the wrong moment we'd risk overwriting their handlers, but that applies just as much to what magick is currently doing anyway. And in practice no-one else is calling signal() that I can see.

Conceptually I think it's questionable for a component to set any signal handlers, but SIGSEGV is the only one causing genuine annoyance so maybe let's limit it to that one to begin with?

@SteveMacenski
Copy link
Member

SteveMacenski commented Oct 7, 2024

agreed.

auto prev_segv_handler = signal(SIGSEGV, SIG_DFL);

Also with components, would this impact the other components in the same process? My understanding is that signal is global to a process, not a particular thread.

This leads me to believe there really isn't a solution - other than upfront documentation - to getting the magick message on composed process segfaults. I think that's just a quirk we have to live with perhaps

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants