-
Notifications
You must be signed in to change notification settings - Fork 24.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add linearizability checker for coordination layer #36943
Conversation
Pinging @elastic/es-distributed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
I added a few comments which are primarily discussion subjects.
server/src/test/java/org/elasticsearch/cluster/coordination/LinearizabilityCheckerTests.java
Show resolved
Hide resolved
server/src/test/java/org/elasticsearch/cluster/coordination/LinearizabilityCheckerTests.java
Outdated
Show resolved
Hide resolved
@Override | ||
public void onFailure(String source, Exception e) { | ||
// do not remove event from history, the write might still take place | ||
// instead, complete history when checking for linearizability |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is something concerning about this. Towards the linearizability checker, we are really saying that the write that came in may complete any time between the call and the next complete cluster shutdown.
Looking at how onFailure
is handled, it varies from ignore to logging an error to propagating out the error. This might be OK, if carefully considered in each case and understanding that the action taken may still complete "later".
In reality, the system probably relies on a failure meaning that the action may have been written at the time of failure, but will not be written much later than that. A user may check if the write was done and repeat if not. This could lead to doing things twice, depending on key generation. But since the write will likely not succeed (since a user/operator interaction takes seconds), this is unlikely to happen.
I think the main actionable item I have is to better document what onFailure
means on ClusterStateTaskListener
, otherwise this is mainly a discussion topic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The failures that can occur might not only come from the system (e.g. publication layer), but also from the user-defined cluster state update function. Task batching can make this even more complicated. Also worth pointing out is that many master-level actions (i.e. subclasses of TransportMasterNodeAction
) already have a retry mechanism built-in that reacts to publication-level failures.
@@ -1093,13 +1098,22 @@ void runRandomly() { | |||
} | |||
|
|||
try { | |||
if (rarely()) { | |||
if (randomBoolean() && randomBoolean() && randomBoolean()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Running all tests in the class, 75-80% of the linearizability checks are against histories where all invocations come before all responses (form 1). Around half of the remaining checks are against histories with form (invocation*, response*, invocation*, response*), ie. two rounds of invocation/response (form 2). Only checking histories larger than size 100, the numbers are 63-65% of form 1 and 77- 80% of form 2. Larger than size 1000, 1 out of 2 histories were of form 1.
I believe form 1 will only fail if responses contains values that are not part of one of the invocations, but otherwise the linearizability checks will not detect any problems.
Perhaps we need to tweak the randomness to better provoke situations that could reveal linearizability problems?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the investigation. Tweaking the randomness here is not too straightforward, and has a more general impact on how these Coordinator tests run. I will leave this to a follow-up PR, as I don't want to any longer delay merging the infrastructure in this PR so that it is available to other follow-up work.
Checks that the core coordination algorithm implemented as part of Zen2 (#32006) supports linearizable semantics. This commit adds a linearizability checker based on the Wing and Gong graph search algorithm with support for compositional checking and activates these checks for all CoordinatorTests.
Checks that the core coordination algorithm implemented as part of Zen2 (#32006) supports linearizable semantics. This commit adds a linearizability checker based on the Wing and Gong graph search algorithm with support for compositional checking and activates these checks for all CoordinatorTests.
Today the default stabilisation time is calculated on the assumption that the elected master has no pending tasks to process when it is elected, but this is not a safe assumption to make. This can result in a cluster reaching the end of its stabilisation time without having stabilised. Furthermore in elastic#36943 we increased the probability that each step in `runRandomly()` enqueues another task, vastly increasing the chance that we hit such a situation. This change extends the stabilisation process to allow time for all pending tasks, plus a task that might currently be in flight. Fixes elastic#41967, in which the master entered the stabilisation phase with over 800 tasks to process.
Today the default stabilisation time is calculated on the assumption that the elected master has no pending tasks to process when it is elected, but this is not a safe assumption to make. This can result in a cluster reaching the end of its stabilisation time without having stabilised. Furthermore in #36943 we increased the probability that each step in `runRandomly()` enqueues another task, vastly increasing the chance that we hit such a situation. This change extends the stabilisation process to allow time for all pending tasks, plus a task that might currently be in flight. Fixes #41967, in which the master entered the stabilisation phase with over 800 tasks to process.
Today the default stabilisation time is calculated on the assumption that the elected master has no pending tasks to process when it is elected, but this is not a safe assumption to make. This can result in a cluster reaching the end of its stabilisation time without having stabilised. Furthermore in #36943 we increased the probability that each step in `runRandomly()` enqueues another task, vastly increasing the chance that we hit such a situation. This change extends the stabilisation process to allow time for all pending tasks, plus a task that might currently be in flight. Fixes #41967, in which the master entered the stabilisation phase with over 800 tasks to process.
Today the default stabilisation time is calculated on the assumption that the elected master has no pending tasks to process when it is elected, but this is not a safe assumption to make. This can result in a cluster reaching the end of its stabilisation time without having stabilised. Furthermore in #36943 we increased the probability that each step in `runRandomly()` enqueues another task, vastly increasing the chance that we hit such a situation. This change extends the stabilisation process to allow time for all pending tasks, plus a task that might currently be in flight. Fixes #41967, in which the master entered the stabilisation phase with over 800 tasks to process.
Today the default stabilisation time is calculated on the assumption that the elected master has no pending tasks to process when it is elected, but this is not a safe assumption to make. This can result in a cluster reaching the end of its stabilisation time without having stabilised. Furthermore in elastic#36943 we increased the probability that each step in `runRandomly()` enqueues another task, vastly increasing the chance that we hit such a situation. This change extends the stabilisation process to allow time for all pending tasks, plus a task that might currently be in flight. Fixes elastic#41967, in which the master entered the stabilisation phase with over 800 tasks to process.
Today the default stabilisation time is calculated on the assumption that the elected master has no pending tasks to process when it is elected, but this is not a safe assumption to make. This can result in a cluster reaching the end of its stabilisation time without having stabilised. Furthermore in #36943 we increased the probability that each step in `runRandomly()` enqueues another task, vastly increasing the chance that we hit such a situation. This change extends the stabilisation process to allow time for all pending tasks, plus a task that might currently be in flight. Fixes #41967, in which the master entered the stabilisation phase with over 800 tasks to process.
Checks that the core coordination algorithm implemented as part of Zen2 (#32006) supports linearizable semantics. This PR adds a linearizability checker based on the Wing and Gong graph search algorithm with support for compositional checking and activates these checks for all
CoordinatorTests
.