Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lifecyle node and C state_machine #265

Merged
merged 63 commits into from
Dec 14, 2016
Merged

Lifecyle node and C state_machine #265

merged 63 commits into from
Dec 14, 2016

Conversation

Karsten1987
Copy link
Contributor

up for design review!
Not crustified yet. Not tested yet (except the demo). First prototype of a lifecycle node demo.

  • rcl_lifecycle contains a C based state machine with primary states and transitions as depicted in design.ros2.org.

  • rclcpp_lifecycle contains the higher level c++ implementation of a managed node. A managed node has a rcl_state_machine_t object as member for keeping track of the state and possible transitions. Further, it has the same API as a rclcpp::Node for creating publishers etc. However, each managed node has a handle to all pub/sub/srv/clients in order to activate/deactivate them.

  • lifecycle_talker.cpp is a first demo application. Missing here is any kind of container which opens services for calling the transition functions such as on_configure() or on_activate(), etc.

First commits are equal to #258

@wjwwood wjwwood added the in progress Actively being worked on (Kanban column) label Nov 1, 2016
@Karsten1987 Karsten1987 self-assigned this Nov 1, 2016
Copy link
Member

@gbiggs gbiggs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be architecturally nicer if the state machine itself could be used to trigger the appropriate callbacks in the node. Then the external API would just have to trigger a transition from the current state, and leave the rest up to the machinery of the (generic) state machine implementation. This would simplify the implementation of the managed lifecycle node class.

* Pure virtual functions as defined in
* http://design.ros2.org/articles/node_lifecycle.html
*/
class NodeInterface
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not give the interface a decent name rather than putting it in a namespace? How about something like "ManagedLifecycleNodeInterface"?

/*static*/ const rcl_state_t LIFECYCLE_EXPORT rcl_state_inactive = {/*.state = */1, /*.label = */"inactive"};
/*static*/ const rcl_state_t LIFECYCLE_EXPORT rcl_state_active = {/*.state = */2, /*.label = */"active"};
/*static*/ const rcl_state_t LIFECYCLE_EXPORT rcl_state_finalized = {/*.state = */3, /*.label = */"finalized"};
/*static*/ const rcl_state_t LIFECYCLE_EXPORT rcl_state_error = {/*.state = */4, /*.label = */"error"};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where are the following states:
CleaningUp
Configuring
Activating
Deactivating
ShuttingDown

ErrorProcessing appears to have been renamed to error. Not a problem, but we should update the design document to match.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only capture the 4 primary states. But yes, you're right. The remaining transition states will be added.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you could argue that the transition states don't need to be actual states. The implementation of the state machine just needs to make sure the correct functions are called as it transitions between the "main" states.

However, having them as actual states would be useful for debugging, and it might make it easier to understand the execution time usage of a managed node that has real-time characteristics.

typedef struct LIFECYCLE_EXPORT _index
{
unsigned int index;
const char* label;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Label of what? The state? If we have the state index why do we need the label as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

label here was meant to be a human readable string such as "active" or "inactive".

* @brief LifecycleNode as child class of rclcpp Node
* has lifecycle nodeinterface for configuring this node.
*/
class LifecycleNode : public rclcpp::node::Node, public lifecycle_interface::NodeInterface
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is starting to show the problem of not having a Node interface. "Favor 'object composition' over 'class inheritance'" is often thrown around and abused, but C++ best practice generally says to not inherit implementations except for minor tweaks to behaviour. It may work as is for this case, but for a major entity like Node, having an abstract interface would be more flexible, and would make things more testable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. The node implementation needs a refreshing. @wjwwood is currently working on a refactoring which addresses the things you mentioned such as composition vs. inheritance.

}

LIFECYCLE_EXPORT
~LifecycleNode(){}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

= default?

{};

template<class T>
struct is_manageable_node<T, typename std::enable_if< has_on_activate<T>::value
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Big yes to doing things this way!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That will work on compile time. However, a base class is needed at some point for dynamic runtime loading of nodes. See ros2/demos#84 as an example.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The two ideas are not incompatible.


LIFECYCLE_EXPORT
virtual bool
on_configure()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not the right name for the external API for controlling the state machine.

This implementation mixes life cycle control (when to configure) and node-specific life cycle functionality (such as what to do in the Configuring transition state/onConfigure() callback) into a single function.

This function should be called configure and there should be a separate method, implemented by users who inherit from this class to define their own managed nodes, called onConfigure. In that function will go all their configuring stuff. In this function will go the life cycle management stuff (which is here already) and any configuration that should be done for every node.

This comment applies to the other transition functions as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I had in mind here is that these on_* functions can be overridden by users in order to take in place their own implementation. What's generally missing here is an external entry point (such as a LifecycleManager or similar) which handles commands such as configure and then calls on_configure on the respective lifecycle node.

Copy link
Member

@gbiggs gbiggs Nov 4, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the external API is what is missing. Your prototype's example gives the impression that you are treating the external and internal APIs the same.

I think that there should be a node interface that defines the external API. When the node is registered with a LIfecycleManager or something like that, then it would work with that API. A user could also manage their node's lifecycle directly from their main function, if they wish. Simultaneously, there needs to be a base class with virtual void methods (on_activate etc) for the implementer of a specific node to override to provide their own functionality.

If you like I can write up a short example, but I think you probably know what I mean and are already working in that direction.


lc_node->print_state_machine();

if (!lc_node->on_configure())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The external API of a managed lifecycle node for controlling transitions between these states is not the callbacks. There should be an API containing the 7 transitions mentioned in the design document (create, configure, etc.) When, for example, configure() is called on the node, it should execute the on_configure() callback and shift its state to the appropriate successor state based on the result.

See also my comment in the managed lifecycle node implementation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no external API at this point. I put the callback calls in the talker example for demonstration. I see the talker code here as a lifecycle manager or similar which itself then has a public external API for triggering configure, activate, etc. (also via service calls)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case, perhaps "talker" is not a good name for the demonstratation class.


lc_node->print_state_machine();

if (!lc_node->on_configure())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good if we can minimise how much the user has to manually control the node's life cycle. Possibly that will come through another API layered on top, which this talker example should be modified to use when it's done.

* possible transitions registered with this
* state machine.
*/
typedef struct LIFECYCLE_EXPORT _rcl_state_machine_t
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you intend to implement managed lifecycle nodes in rcl as well?

If not, what is the purpose of writing the state machine in C rather than C++, or rather than using something like Boost's state machine library?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I indeed wrote this in C for multilanguage purposes such that every rcl* has the same set of states and valid transitions.
I believe ROS2 generally tries not to rely on boost.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no opinion on using or avoiding Boost, and I haven't actually looked at the Boost state machine library in quite a while. I just thought I would mention it as an option.

Anyway, I agree with having it in C so other languages can re-use the managed lifecycle as-is.

@Karsten1987 Karsten1987 force-pushed the lifecycle_impl branch 2 times, most recently from 276a1c6 to d092184 Compare November 4, 2016 23:32
Copy link
Member

@gbiggs gbiggs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the way this is going!

Copy link
Member

@gbiggs gbiggs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the value in the state machine implementation differentiating between states with on-going actions and those once-only states that occur between the on-going states? From a FSM point of view, there is no difference (one requires a trigger to leave, the other leaves as soon as it has done some particular processing). I think it would simplify the implementation if we did not differentiate between the two, except where we define the triggers that allow the machinery to move to the next state.

So "active" would wait for the triggers "error" or "deactivate" before leaving, but "deactivating" would run the on_deactivate function and then immediately transition to "error" or "inactive" based on the result.

)

macro(targets)
if(NOT target_suffix STREQUAL "")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This condition is not necessary and doesn't do what you would expect. The condition is basically always true since target_suffix is never an empty string but undefined in that case. Just always calling get_rclcpp_information is fine.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

direct copy from https://github.com/ros2/examples/blob/master/rclcpp_examples/CMakeLists.txt#L30
we should generally define a CMakeLists.txt as an example.

install(TARGETS rcl_lifecycle
ARCHIVE DESTINATION lib
LIBRARY DESTINATION lib
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs to install to bin on Windows / for RUNTIME DESTINATION.

@dirk-thomas
Copy link
Member

Maybe we should define the ROS-level interface of the managed node / the lifecycle manager. Similar to what is defined in rcl_interfaces for the parameters.

@Karsten1987
Copy link
Contributor Author

@gbiggs I try to replicate basically what's depicted here: http://design.ros2.org/articles/node_lifecycle.html
I agree that these transition states are not completely defined. I just want to keep the vocabulary as is. But this may be open for change.

@dirk-thomas Do you propose to generate the c-style structs via message generation?

@dirk-thomas
Copy link
Member

I was thinking about the services to control the state machine as well as the events / messages when the state changes. Kind of the external API used for introspection / orchestration. I am not sure if it makes sense for the structs.

//#define bool int;
// #ifdef __cplusplus
// #error WRONG COMPILER
// #endif
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about these comments?

//loop_rate.sleep();
++i;
}
*/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about these comments?

@gbiggs
Copy link
Member

gbiggs commented Nov 9, 2016

The managed nodes design document only has an ambiguous description of the management interface. I had done some work on a more detailed description but it got put on hold when Tully said he was going to do the node lifecycle implementation soon (because I wanted to see if anything changed). So I dug that work up and finished it off: ros2/design#99

@Karsten1987
Copy link
Contributor Author

@wjwwood I'd like to bring your attention to the last commit 58ae127. Does this go along with what we discussed? Therefore, I'd separate the node into a base_interface and a communication_interface.

Then please note that in the current talker.cpp if have preprocessor #if/#else derivatives which indicate a possible user api - namely the decision on how strictly we take the DRY principle.

@wjwwood
Copy link
Member

wjwwood commented Nov 10, 2016

e86473e looks like the right direction to me.

@Karsten1987
Copy link
Contributor Author

See lifecycle_talker.cpp as a demo on how a lifecycle orchestration would look like.

Status so far:

  • StateMachine in C
  • basic support for Publisher/Server in RCL data structures
  • LifecycleManager in CPP with two services get_state and change_state

TODO:

  • refactor Pub/Sub/Srv/Clients for extending their constructors in taking rcl_data structures directly.
  • write extensive tests
  • clarify on whether to integrate this PR into rcl/rclpp or in a self containing package called lifecycle.

Any thoughts on the latter?

@Karsten1987 Karsten1987 force-pushed the lifecycle_impl branch 3 times, most recently from 23783e4 to 4f07abc Compare November 24, 2016 00:59
@Karsten1987
Copy link
Contributor Author

This PR requires #279 to be merged.
And also a fix for ROSIDL message generation in C. @mikaelarguedas can you have a look at it?

diff --git a/rmw_fastrtps_cpp/include/rmw_fastrtps_cpp/TypeSupport.h b/rmw_fastrtps_cpp/include/rmw_fastrtps_cpp/TypeSupport.h
index 5fac344..f8ac1d6 100644
--- a/rmw_fastrtps_cpp/include/rmw_fastrtps_cpp/TypeSupport.h
+++ b/rmw_fastrtps_cpp/include/rmw_fastrtps_cpp/TypeSupport.h
@@ -76,6 +76,7 @@ struct StringHelper<rosidl_typesupport_introspection_c__MessageMembers>
     std::string str;
     deser >> str;
     rosidl_generator_c__String * c_str = static_cast<rosidl_generator_c__String *>(field);
+    rosidl_generator_c__String__init(c_str);
     rosidl_generator_c__String__assign(c_str, str.c_str());
   }
 };

@mikaelarguedas
Copy link
Member

as discussed offline, we need to figure out why the rosidl_generator_c__String was not allocated properly before being passed to the fastrtps rmw_implementation. I proposed this quick fix just to get the thing unstuck but I think that reallocating memory in the vendor specific layers "in case it's not properly initialized" is not the proper fix. We should ensure at the rcl/rclcpp level that we are providing properly initialized data structures to all rmw_implementations and not patch this in the vendor specific code if not necessary.

Could you run CI on this to see if the same behavior shows up on Connext?

@Karsten1987 Karsten1987 force-pushed the lifecycle_impl branch 2 times, most recently from 2b42d73 to b371044 Compare November 29, 2016 01:49
@Karsten1987
Copy link
Contributor Author

I refactored this initial PR and extract 3 more dependent PRs on it.

lifecycle_msgs: ros2/common_interfaces#24
rcl_lifecycle: ros2/rcl#91

further depends on two fixes:
rmw-fastrtps: ros2/rmw_fastrtps#69
service in rclcpp: #279

@Karsten1987 Karsten1987 force-pushed the lifecycle_impl branch 3 times, most recently from 9f4cb84 to 456baa0 Compare December 9, 2016 01:43
@Karsten1987 Karsten1987 merged commit 2c6d959 into master Dec 14, 2016
@Karsten1987 Karsten1987 deleted the lifecycle_impl branch December 14, 2016 17:29
@Karsten1987 Karsten1987 removed the in review Waiting for review (Kanban column) label Dec 14, 2016
nnmm pushed a commit to ApexAI/rclcpp that referenced this pull request Jul 9, 2022
* [rcl_lifecycle] remove rosidl deps as this package doesnt generate any messages

* depend on rosidl_generator_c
DensoADAS pushed a commit to DensoADAS/rclcpp that referenced this pull request Aug 5, 2022
* Update build and test workflow
* Update `setup-ros` to 0.0.13
* Update `action-ros-ci` to 0.0.13

Signed-off-by: Zachary Michaels <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants