Quantify and expand coverage of TLC simulation mode #6537

lemmy · 2024-10-07T22:53:34Z

TLC's simulation mode is generally effective at detecting regressions while being less resource-intensive than full model checking, which helps reduce the strain on our CI resources. However, the actual coverage of simulation mode is unknown. Additionally, increasing coverage is highly desirable.

Quantify current coverage
Check liveness properties during simulation
- Check high-level liveness properties in simulation mode #6545
Expand coverage by starting simulation from a larger set of initial states
- Manually defined set of initial states
- Adjust probabilities of failure actions based on length of trace, ...
- RandomSetOfSubsets inductive invariant candidate

lemmy · 2024-10-08T00:33:22Z

Quantify current coverage with five nodes

Main observation: The RcvProposeVoteRequest action rarely, if ever, occurs, likely because the upper limit set for the length of the generated traces (-depth) is too restrictive. Note that exhaustive model checking shows that it takes 28/47 states with 2/3 nodes and 2 configuration for one of th nodes to be Retired Committed (client requests disabled).

The following plots display statistics gathered from a long-running simulation (note that the overall overhead associated with extended statistics (-Dtlc2.tool.Simulator.extendedStatistics=true) and the accuracy of the probabilistic data structure have been analyzed independently).

Python code used to generate the plot

```python import matplotlib.pyplot as plt import pandas as pd data = pd.read_json("SIMccfraft.ndjson", lines=True) json_data = data.iloc[-1].to_dict() actions_df = pd.DataFrame(json_data["actions"].items(), columns=["Action", "Frequency"]) actions_df_sorted = actions_df.sort_values(by="Frequency", ascending=False) plt.figure(figsize=(10, 6)) plt.bar(actions_df_sorted["Action"], actions_df_sorted["Frequency"]) plt.yscale('log') plt.xticks(rotation=90) plt.xlabel('Actions') plt.ylabel('Frequency (Log scale)') plt.title('Action Frequencies with Logarithmic Y-Axis') plt.tight_layout() plt.show() ```

Python code used to generate the plot

```python import matplotlib.pyplot as plt import pandas as pd data = pd.read_json("SIMccfraft.ndjson", lines=True) plt.plot(data['duration'], data['distinct'], marker='x', linestyle='--', color='black', label='distinct') plt.plot(data['duration'], data['generated'], marker='o', linestyle='-', color='blue', label='generated') plt.title('Distinct states over Duration') plt.xlabel('Duration') plt.ylabel('Count') plt.grid(True) plt.legend() plt.show() ```

Variables with finitely many distinct values:

membershipState Cardinality([Servers -> MembershipState])
leadershipState Cardinality([Servers -> LeeadershipState])
retirementCompleted Cardinality([Servers -> SUBSET Servers])
hasJoined Cardinality([Servers -> BOOLEAN])
isNewFollower Cardinality([Servers -> BOOLEAN])

Python code used to generate the plot

```python import matplotlib.pyplot as plt import pandas as pd import matplotlib.cm as cm import numpy as np data = pd.read_json("SIMccfraft.ndjson", lines=True) plt.figure(figsize=(10, 6)) i = 1 colors = cm.rainbow(np.linspace(0, 1, len(data['distinctvalues'].iloc[-1].keys()))) for value, color in zip(data['distinctvalues'].iloc[-1].keys(),colors): plt.subplot(4, 4, i) plt.plot(data['duration'], data['distinctvalues'].apply(lambda x: x.get(value, None)), marker='o', linestyle='-', label=value, color=color) plt.xlabel('Duration') plt.ylabel('Count') plt.grid(True) plt.legend() i += 1 plt.show() ```

@lemmy ➜ /workspaces/CCF (main) $ git diff
diff --git a/tla/consensus/SIMccfraft.cfg b/tla/consensus/SIMccfraft.cfg
index 675b9e322..de6948a74 100644
--- a/tla/consensus/SIMccfraft.cfg
+++ b/tla/consensus/SIMccfraft.cfg
@@ -49,12 +49,15 @@ CONSTANTS
     Extend <- [abs]ABSExtend
     CopyMaxAndExtend <- [abs]ABSCopyMaxAndExtend
 
-CONSTRAINT
-    StopAfter
+\* CONSTRAINT
+\*     StopAfter
 
 CHECK_DEADLOCK
     FALSE
 
+_PERIODIC
+    PrintStats
+
 PROPERTIES 
     CommittedLogAppendOnlyProp
     MonotonicTermProp
diff --git a/tla/consensus/SIMccfraft.tla b/tla/consensus/SIMccfraft.tla
index 3d86eb251..5663f08d2 100644
--- a/tla/consensus/SIMccfraft.tla
+++ b/tla/consensus/SIMccfraft.tla
@@ -45,6 +45,9 @@ DebugInvUpToDepth ==
     \* -depth after generating the first trace.
     TLCGet("level") < TLCGet("config").depth
 
+PrintStats ==
+    Serialize(<<TLCGet("stats")>>, "SIMccfraft.ndjson", [format |-> "NDJSON", charset |-> "UTF-8", openOptions |-> <<"WRITE", "CREATE", "APPEND">>])
+
 ----
 \* Refinement

Slightly better coverage with -depth 1000:

achamayou · 2024-10-08T08:14:19Z

@lemmy increasing the depth does sound like a good step to improve coverage.

RequestVote seems overly represented, even if we think that we want to focus on fault-heavy scenarios. Similarly, it looks like (and this is very rough from eyeballing the graph) that request votes is about two orders of magnitude more frequent than BecomeLeader. That's a lot of failed elections!

My sense is that we should aim to diminish timeout frequency, by maybe an order of magnitude or so, to get to stage where we are generally progressing further. It's still be quite a lot more faults than a typical system.

achamayou · 2024-10-09T12:28:02Z

@lemmy perhaps I missing something here, but this does not seem like an improvement? It looks like there are fewer distinct states in proportion than before, the frequency of RequestVote has increased even more, and it looks like no successful retirement happened (I notice the run is shorter, but I am not sure that explains it).

lemmy · 2024-10-09T15:08:24Z

@lemmy perhaps I missing something here, but this does not seem like an improvement? It looks like there are fewer distinct states in proportion than before, the frequency of RequestVote has increased even more, and it looks like no successful retirement happened (I notice the run is shorter, but I am not sure that explains it).

The plots above display the state-space coverage without a de-prioritization of the actions Timeout, ChangeConfiguration, and CheckQuorum (see diff). Meanwhile, the coverage of the property space has improved, as it included the two liveness properties.

With the changes introduced in 47eb0d9, state-space coverage returned to its previous level (see below), while also checking the two liveness properties:

The plots appear more irregular because the data structure used to store the statistics, in this case HyperLogLog, is probabilistic and the duration is less.

achamayou · 2024-10-09T15:19:08Z

Ah, got it, that's excellent!

lemmy added consensus tla TLA+ specifications labels Oct 7, 2024

This comment was marked as resolved.

Sign in to view

This comment has been minimized.

Sign in to view

lemmy mentioned this issue Oct 9, 2024

Check high-level liveness properties in simulation mode #6545

Merged

lemmy mentioned this issue Oct 9, 2024

Randomization approach to increase state-space coverage in simulation mode interferes with liveness checking, leading to spurious counterexamples tlaplus/tlaplus#1039

Open

lemmy added the liveness label Oct 9, 2024

lemmy mentioned this issue Oct 10, 2024

Adjust the probabilities based on empirical data to achieve less uneven coverage of the action space during simulation #6562

Merged

achamayou linked a pull request Oct 16, 2024 that will close this issue

Remove deprecated SSS dependency #6574

Open

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Quantify and expand coverage of TLC simulation mode #6537

Quantify and expand coverage of TLC simulation mode #6537

lemmy commented Oct 7, 2024 •

edited

Loading

lemmy commented Oct 8, 2024 •

edited

Loading

This comment was marked as resolved.

achamayou commented Oct 8, 2024

This comment has been minimized.

achamayou commented Oct 9, 2024

lemmy commented Oct 9, 2024 •

edited

Loading

achamayou commented Oct 9, 2024

Quantify and expand coverage of TLC simulation mode #6537

Quantify and expand coverage of TLC simulation mode #6537

Comments

lemmy commented Oct 7, 2024 • edited Loading

lemmy commented Oct 8, 2024 • edited Loading

Quantify current coverage with five nodes

This comment was marked as resolved.

achamayou commented Oct 8, 2024

This comment has been minimized.

achamayou commented Oct 9, 2024

lemmy commented Oct 9, 2024 • edited Loading

achamayou commented Oct 9, 2024

lemmy commented Oct 7, 2024 •

edited

Loading

lemmy commented Oct 8, 2024 •

edited

Loading

lemmy commented Oct 9, 2024 •

edited

Loading