Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dynamically changing parameters in controller_server while computeControl is running causes deadlock #2833

Closed
dkuenster opened this issue Feb 25, 2022 · 9 comments

Comments

@dkuenster
Copy link
Contributor

Bug report

Required Info:

  • Operating System:
    • Debian 10
  • ROS2 Version:
    • Rolling Source
  • Version or commit hash:
  • DDS implementation:
    • Cyclone-DDS

Steps to reproduce issue

Use ParameterClient for controller_server from another node to dynamically change any parameters of controller_server while computeControl https://github.com/ros-planning/navigation2/blob/0a961df9cbe50356484877681abfdbeae6ba7b33/nav2_controller/src/controller_server.cpp#L344 is currently running.

Expected behavior

dynamicParametersCallback in controller_server.cpp https://github.com/ros-planning/navigation2/blob/0a961df9cbe50356484877681abfdbeae6ba7b33/nav2_controller/src/controller_server.cpp#L597 is blocked until we reach the current goal.

Actual behavior

Screenshot from 2022-02-25 13-53-19
I added an output right at the start of https://github.com/ros-planning/navigation2/blob/0a961df9cbe50356484877681abfdbeae6ba7b33/nav2_controller/src/controller_server.cpp#L344

Entering the the dynamicParametersCallback while computeControl is running causes a deadlock, controller_server stops sending a heartbeat and is shutdown.
Despite what was said in #2704 they seem to be running in the same thread, which causes the deadlock, as computeControl holds the mutex until it has reached the goal or it gets aborted.
If you remove the mutex from dynamicParametersCallback changing the parameters while computeControl is running works fine.
I'm however not sure why this causes follow_path to abort the current goal handle.

@SteveMacenski
Copy link
Member

Use ParameterClient for controller_server from another node to dynamically change any parameters of controller_server while computeControl

That is explicitly unsupported. It is dangerous to change controller parameters while an existing plugin is running. Changing of kinematic values or other core-behavioral parameters during execution could cause unstable behavior. This is a unique attribute of the controller server since this server deals with-- well.. the control 😉 .

However a dead lock is a pretty crappy way of handling, I agree. I'd support changing the dynamicParametersCallback() to attempt to lock the mutex with try_lock and return with a failure message / return result if the lock is already locked due to currently being under control. That way you don't get a deadlock and you get a meaningful result back if dynamic parameters are not current possible due to on-going execution.

Does that work for you?

@dkuenster
Copy link
Contributor Author

Got a version running with try_lock that rejects parameter changes while running, prints a failure message and returns a corresponding failure result. Could open a PR for that.
What about the dynamicParameterCallbacks in the goal and progress checker plugins though? They are currently not protected by a mutex. Is that intentional?

@SteveMacenski
Copy link
Member

Could open a PR for that.

Please do!

What about the goal and progress checker plugins

Good question. If memory serves @doisyg wanted the ability to change parameters in those systems on the fly. I would generally agree that changing these mid-request is not good practice, but I don't think it has the same instability potential unless being changed multiple times in a small window. Let us see what he says

@doisyg
Copy link
Contributor

doisyg commented Mar 1, 2022

Hi!
My use case requires setting the parameters of the goal and progress checkers before each FollowPath request, so outside computeControl. But I can totally think of use cases when these would need to be changed during the execution of a FollowPath action. For instance you may want to be less strict on you progress failure conditions when crossing specific areas (say crowded). I believe it is already working well like this.

However, I definitely need to be able to change some parameters of the controller while computeControl is executed. For instance the parameters controlling the distance to obstacles (like again increasing or decreasing them when crossing dangerous or safe areas). And it is working well for my controller (a forked TEB that I will merge when I ll have a minute).

As for the parameters of the controller_server, I am on galactic so they are not dynamic for me (and I don't have an use case for them yet)

@SteveMacenski
Copy link
Member

SteveMacenski commented Mar 1, 2022

So I think for now, we leave the plugins as-is but reject dynamic updates for the controller server itself.

The alternative is to fix the deadlock issue, which would be fine as well if someone preferred still being able to change the parameters at runtime and had a good reason for doing so. We do still need a mutex though to make sure that we don't have read/writes at the same time, but we can redesign computeControl() so that it periodically gives up mutex control each iteration so that this can execute.

@dkuenster
Copy link
Contributor Author

dkuenster commented Mar 2, 2022

Actually fixing the deadlock would be preferable probably.
The awkward thing right now is that all those callbacks are registered to the same node. So they all get called when you are changing parameters for the controller_server. So while you would be able to change parameters for the plugins dynamically you would still get the error message from the main controller_server callback, unless you checked if all parameters start with a plugin name before trying to acquire the mutex.

@SteveMacenski
Copy link
Member

SteveMacenski commented Mar 8, 2022

Actually fixing the deadlock would be preferable probably.

I'm not sure there's a good safe way to do that though other than atomic variables -- which we could employ here and remove the mutex. There's no classes being updated from parameter updates, only variables, so if we use atomic variables, then there's no reason they couldn't be left open without a mutex to wrap it. That seems like the move here.

I was thinking about adding the lock in the while loop and then having it unlock briefly at the end https://github.com/ros-planning/navigation2/blob/main/nav2_controller/src/controller_server.cpp#L405 so that it can process parameter callbacks then, then go to the next loop. The issue is that if the parameter update takes a "long" time relative to the 10,20,50hz replanning time, then you could miss the next update and cause a brief instability event. Depending on how the safety timeouts on the robot base works, it could be negligible or significant.

unless you checked if all parameters start with a plugin name before trying to acquire the mutex.

That seems entirely reasonable.

@SteveMacenski
Copy link
Member

Any update here?

@dkuenster
Copy link
Contributor Author

Sry I was on a business trip, created a pull request now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants