LFtb - A testbench framework for LF programs #1205

erlingrj · 2022-05-30T17:07:25Z

erlingrj
May 30, 2022
Maintainer

LFtb - A testbench framework for Lingua Franca programs

I have been thinking a bit about how a Lingua Franca program can be tested and verified and the result was this idea about making a verification framework.
I believe that LF actually can simplify the verification of real-time systems as it is very easy to create temporal fault scenarios.
I mentioned this to Marten who told me that there had already been some discussion on this topic, I would like to reboot that discussion.

Summary

Testing and verification are crucial when building safety-critical and time-sensitive systems.
I believe that LF enables a level of system verification that could be very hard to achieve using an ad-hoc framework.
The core feature of LF is the two timelines and their interaction. By controlling the physical time we can easily test the response of the system to logical time lagging behind physical time and trigger the deadline handlers.
We also should support applying test signals on the Input Ports and inspecting/asserting signals on the Output Ports.
Ideally also inspecting/asserting the internal state of the Reactor and also which mode it is in.
Some of these are easy to implement as a LF program with a TestBench Reactor which instantiates the Reactor we wish to test (often called the DUT (device under test)).
However, some require access to the runtimes.
These abilities could be gathered into a Verification framework which lets the user efficiently build a testbench and verify their LF program.

Verification

Verification can be split into two parts.

Formal verification

Formal verification deals with building mathematical proofs about the system (or rather the model of the system).
I believe it has been discussed already to use temporal logic and model checking to prove safety and liveness properties of the Reactor graph.
Formal verification is useful and important, but it is, AFAIK, not that widespread because programmers normally don't know the required math.

Functional Verification

This is also referred to as Testing or Unit Testing or just Verification.
In Functional Verification, we execute or simulate the system for various scenarios to build confidence in it.
Functional Verification is probably used in all real-time software.
To functionally verify a LF program you could

Do unit testing on the reaction code in a test framework in the target language. This would take place "outside" the LF world. For C/C++ you could e.g. use Catch2.
Create a LF test program that wraps the Reactor you wish to test and uses a Timer to generate input events and make assertions on the output events. See the following figures

Verification of temporal behavior

My experience is that it is hard to properly test a systems reactions to temporal faults when using ad-hoc time management. Since you have built your own time management (with timestamping etc) you probably have to build your own testing infrastructure to create different fault scenarios.
Consider the following Reactor Graph which represents a quite normal sensor fusion application, a Kalman Filter combining IMU and GNSS measurements.

Testing that each individual Reaction does what you expect it to do is easy.
Also applying a dataset of IMU and GNSS messages to verify that the Kalman Filter converges to some known ground truth is also quite easy.
This is also easy to test with other middlewares because it is essentially just unit testing.
What is harder is to verify all the different fault handlers. What if there is no GNSS message in 5 sec. What if there is a dropped IMU message etc.
What makes it hard to verify is that these are temporal faults and there is no portable way of specifying those scenarios because you have implemented your own ad-hoc time management.
LF can really simplify this because of this explicit interaction of logical and physical time.
By specifying both the logical and the physical timestamp of an event you can generate any temporal fault scenario.

Currently, I believe there is no clean and simple solution to reliably generate different temporal fault scenarios in LF.
You could create a LF program that instantiates the DUT and sleeps and blocks the program for some time to trigger the deadline violations, but we can do better.

It would be great to know how temporal fault scenarios are tested in the industry, if someone has any references please shout out.

Coverage

Is a measure of how much of the system has been tested. It is typically summarized in a percent. Types of coverages include:

Function coverage
Statement coverage
Branch coverage

In LF-tb we could operate with another measure which is "reaction coverage" which measures the percentage of reactions that have been triggered during the testing.
This would also include Deadline handlers. It would give nice feedback to the user where they have holes. We also want the "normal" coverage measures, but that can be achieved by bundling tools like GCOV with LF-tb

Requirements

LF-tb must be able to control the physical time that the DUT sees
LF-tb must be able to efficiently specify an input signal (with user-specified logical timestamps)
LF-tb should be able to inspect Reactor state and Reactor mode
LF-tb should be able to inspect output signals from the DUT and make assertions on it.
LF-tb could support constrained random verification which is state-of-the-art in HW verification

What exactly does LF-tb look like?

This is really the big question and so far an open question that I would love some feedback on. We could either

LF-tb is a part of the core language. or
LF-tb is a "third-party" library, e.g. written in Python. (With heterogenous Reactors it could maybe suffice with only a Python version)

My gut feeling is telling me that the latter might be a better approach. This Python library would auto-generate the LF program which wraps the DUT and translates input signals to Timers, Reactions, State and output signal assertions to Reactions which asserts the value and timestamp of its trigger. I think cocotb could be an inspiration for such a project. The tracing mechanism could potentially be used also.

The big unknowns, which I think require modifications to the run-times are

How do we control Physical time
How do we get state/mode inspection
... and of course, we have all the unknown unknowns.

I would love some feedback on this topic.

lsk567 · 2022-05-30T18:33:34Z

lsk567
May 30, 2022
Maintainer

Thanks, Erling, for getting the discussion started. I've thought quite a bit on the topic of formally verifying LF, and here is an ongoing attempt using SMT solvers. The progress is saved on the smt-gen branch (PR #794). In this case, the LF program is augmented with logical specifications and the lfc compiler performs verification as part of the compilation flow. This approach seems closer to the "LF-verify as a part of the core language" approach as you have classified above.

An SMT-based verifier for temporal safety properties

Summary: The idea is that we want to exhaustively check whether an LF program can produce traces (with each element being a reaction invocation) that violate certain linear temporal properties. To do this, we formulate an SMT problem with an uninitialized bounded trace and rules for how reactions are triggered. When we give the SMT problem to the SMT solver (Z3, in this case), it will try to find an instantiation of the trace that satisfies the rules but violates the safety property.

Properties of interest: safety properties (counter-examples have finite lengths.)

Modeling the execution of an LF program: a transition system with states being the LF variables (including state variables, values assigned to input/output ports, values assigned to actions), and transitions being the invocation of a reaction.
[Note: the modeling choice here has already baked in a (potentially strong) assumption, which is that the LF variables are only updated after the reaction invocation has concluded.]

Specification: linear temporal logic

Verification tactics: bounded model checking (BMC) & induction

Open questions:

The current approach treats reaction invocations as state transitions. This abstraction is sound only when the runtime system guarantees that the reaction does not produce any side-affects during the reaction invocation. Our runtime system currently does not make this guarantee. In addition, for PLC-based systems, it is possible to build a PLC-based LF runtime that produces side-affects at the end of the logical time. On the verification side, this means that we should treat a logical time advancement, instead of a reaction invocation, as a state transition. There is a rich body of work regarding the relationship between formal modeling and implementation to be explored here.
How to verify physical-time properties? In the approach above, there is no notion of physical time, only logical time. We can build sound formal models for physical time more easily if the underlying execution platform is time-aware, such as RTOSes or PRET machines.
How does an user provide logical specifications? In the current experiment, an user does so using annotations leading with an @ sign. But this might not generalize well when annotations become structural (such as in the form of a dictionary).

Weaknesses:

The state space of the trace grows exponentially by the length of the trace and the system size. This is the well-known state explosion problem. We need a way to tame the complexity growth.

2 replies

edwardalee May 31, 2022
Maintainer

The regression tests that get run on every push to master are, in fact, mostly of the form of what you are calling "functional verification." They include timing tests. There are two key parts to the infrastructure: first, there are the scripts that run the tests and collect the results. Second, there are LF reactors that check their inputs against "known good" values and tags and report errors when they don't match.

erlingrj May 31, 2022
Maintainer Author

@lsk567 That is super cool. I saw some of your notes on this in the wiki (I think) but I cannot find them anymore. This is what I refer to as Formal Verification above. I think you are headed down a very interesting path, research-wise I think this is much more interesting than what I am talking about.

I think what I discuss is complementary to your work on Formal Verification. The way I see it there are multiple ways in which you might want to verify an LF program.

Formal verification: This is what you are addressing. In a sense, you get 100% coverage when you formally verify something.
Functional verification of the runtime. This is what the regressions tests are doing. This is about experimentally verifying that the run-time is in fact implementing the semantics of LF. ( (@edwardalee would you agree with that?)
Functional verification of reactions and deadline-handlers. This is experimentally verifying that the bodies of the reactions and deadline handlers do what you think they do.
Functional verification of the LF program. This is about experimentally verifying that the LF program you built has both the logical and temporal behavior that you think it does.

I guess (3) is really a subset of (4), but I separate them to illustrate what a normal third-party unit testing framework can and cannot achieve. The LF users are gonna assume that (2) is already properly done and that if they do register a deadline of 10 msec then that deadline handler is in fact gonna be executed if the physical time lags by more than 10 msec. What the LF user is interested in is doing (3) and (4). Because, even if you know that the deadline and reaction-handlers will be executed, you don't want to put anything in production which has not been tested. (3) could be achieved by wrapping all reaction bodies in a function and unit testing this function in a third-party framework. But (4) is not that easy to achieve. Because even if you have Unit-tested the individual deadline handler you probably want to enforce a deadline scenario on the complete system (runtime+user code) and verify that everything is working as you expect it to. This is why I propose to make a framework/library for doing both (3) and (4). I think such a framework/library also could be used for doing (2).

Does that make sense? Maybe I could make a tiny example of how such a Python library could look.

erlingrj · 2022-06-01T16:56:00Z

erlingrj
Jun 1, 2022
Maintainer Author

Update from meeting June 1:

Check out Regression test suit, physical time can be sort of controlled by sleeping in the test-bench reactor. Alex has built a "Training mechanism" for Reactors which could be useful here. It might be sufficient to build the test-bench in vanilla Lingua Franca using Timers
Work has started on Heteregenous LF programs. A C++ LF program can import a Reactor written in Python and instantiate it. The interface between the Reactors might use a serialization protocol like Protobuf.
The Top-level Reactor API (which could be Protobuf) might be used by a test-bench framework to apply arbitrary signals on the input ports and make assertions on the signals on the output port. Controlling physical time is only achieved properly by actually having a callback executed in-place of lf_get_physical_time. You can do a sleep which gives you coarse-grained control and guarantees that it has passed a certain value. But that might be fine

0 replies

cmnrd · 2022-06-20T15:35:35Z

cmnrd
Jun 20, 2022
Maintainer

I absolutely agree that we will need a standard way of testing LF applications. For usability, It would be great if this would be somehow integrated with the LF language and tooling (for instance like you can run Rust tests with cargo). I think this also intersects with another old discussion: packages. @revol-xut is currently working on a prototype for a simple package manager for lf, and we should probably consider testing in its design.

I also would like to mention that @jhaye is currently looking into some formal tools for Rust. I think that these could help in both formal verification of the Rust runtime and formal verification of reaction bodies or even entire LF programs. We will discuss this in the semantics meeting once @jhaye collected a bit more knowledge about the tools.

0 replies

erlingrj · 2022-09-20T23:46:33Z

erlingrj
Sep 20, 2022
Maintainer Author

@cmnrd @lhstrh
My understanding of the problem has progressed since writing the first post. To recap our coffe-discussion:

We can be inspired by how testing is performed on HW designs. It is often very extensive and I believe formal methods are much more common than in SW. For saftey-critical this would also be useful.
We want either want to build a testbench-framework, or at least have a guide of how to use existing frameworks and patterns to write tests for LF programs.
A normal LF program can be used to test a Reactor, Alexander has demonstrated this with his training mode. But I am not completely satisfied with this. Ideally there should be a way to get introspection into the state variables , mode and eventQ of the Reactor Under Test (RUT).
We also don't want to recompile the RUT for each testbench. Using federated execution might be a good idea since the RUT will then be a standalone program which can be executed with different testbenches. But how do we get the introspection?

I hope I can get to explore this project further during my stay here

0 replies

lhstrh · 2022-09-21T04:49:29Z

lhstrh
Sep 21, 2022
Maintainer

Thanks for adding these notes, @erlingrj, and reviving this discussion. As @cmnrd mentioned in this thread, it might make sense to discuss this when we sit down to talk about packaging. This has been high on @revol-xut's list, and I think would make sense to carve out some meeting time for this while @cmnrd is visiting so we can construct a roadmap.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LFtb - A testbench framework for LF programs #1205

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 5 comments 2 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

LFtb - A testbench framework for LF programs #1205

erlingrj May 30, 2022 Maintainer

LFtb - A testbench framework for Lingua Franca programs

Summary

Verification

Formal verification

Functional Verification

Verification of temporal behavior

Coverage

Requirements

What exactly does LF-tb look like?

Replies: 5 comments · 2 replies

lsk567 May 30, 2022 Maintainer

edwardalee May 31, 2022 Maintainer

erlingrj May 31, 2022 Maintainer Author

erlingrj Jun 1, 2022 Maintainer Author

cmnrd Jun 20, 2022 Maintainer

erlingrj Sep 20, 2022 Maintainer Author

lhstrh Sep 21, 2022 Maintainer

erlingrj
May 30, 2022
Maintainer

Replies: 5 comments 2 replies

lsk567
May 30, 2022
Maintainer

edwardalee May 31, 2022
Maintainer

erlingrj May 31, 2022
Maintainer Author

erlingrj
Jun 1, 2022
Maintainer Author

cmnrd
Jun 20, 2022
Maintainer

erlingrj
Sep 20, 2022
Maintainer Author

lhstrh
Sep 21, 2022
Maintainer