Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failures in lifecycle transitions are not correctly handled #1880

Open
alsora opened this issue Jan 31, 2022 · 0 comments
Open

Failures in lifecycle transitions are not correctly handled #1880

alsora opened this issue Jan 31, 2022 · 0 comments
Assignees

Comments

@alsora
Copy link
Collaborator

alsora commented Jan 31, 2022

I recently realized that there is a bug in the current implementation of lifecycle transitions.
However, it's unclear to me what should be the expected behavior, so opening a ticket to discuss.

The problem is the following.
Let's assume that the node is currently in the active state and there is a "deactivate" transition request.

A lifecycle transition is requested, thus calling change_state function https://github.com/ros2/rclcpp/blob/master/rclcpp_lifecycle/src/lifecycle_node_interface_impl.hpp#L379-L448

The code will then execute rcl_lifecycle_trigger_transition_by_id, trying to transition to the deactivating state.

In the RCL function _trigger_transition https://github.com/ros2/rcl/blob/master/rcl_lifecycle/src/rcl_lifecycle.c#L349-L371 we update the current state state_machine->current_state = transition->goal; before publishing notifications.
If the call to rcl_publish fails (and it can fail) the function will immediately return with an error code and this will result in immediately aborting the change_state function call https://github.com/ros2/rclcpp/blob/master/rclcpp_lifecycle/src/lifecycle_node_interface_impl.hpp#L396-L401 with an error along the lines of "Unable to start transition %u from current state %s: Failed to publish.

The problem is that, although the transition failed to start, we already updated the state machine so the node is now in the deactivatingstate.
The on_deactivate user callback has not been invoked.
Future requests either to activate or to deactivate will be rejected because they are not valid transitions from this state.

The first easy solution I thought of was to just move the state_machine->current_state = transition->goal; line after publishing the notification.
In this way, if the transition really failed to start, the node would remain in the original state.

The same situation can also happen while finalizing the transition.
However, here we can't stay in the current state (which would be "deactivating"), we can't even go back to the initial one, as we already invoked the user callback.

What's your thought on this problem?
Given the fact that transitions are made of multiple stages and that they all can fail, how should a user deal with this?
Should we always bring the node to the error state?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants