Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add linearizability checker for coordination layer #36943

Merged
merged 11 commits into from
Feb 26, 2019

Conversation

ywelsch
Copy link
Contributor

@ywelsch ywelsch commented Dec 21, 2018

Checks that the core coordination algorithm implemented as part of Zen2 (#32006) supports linearizable semantics. This PR adds a linearizability checker based on the Wing and Gong graph search algorithm with support for compositional checking and activates these checks for all CoordinatorTests.

@ywelsch ywelsch added >test Issues or PRs that are addressing/adding tests v7.0.0 :Distributed Coordination/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. labels Dec 21, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

Copy link
Contributor

@henningandersen henningandersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

I added a few comments which are primarily discussion subjects.

@Override
public void onFailure(String source, Exception e) {
// do not remove event from history, the write might still take place
// instead, complete history when checking for linearizability
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is something concerning about this. Towards the linearizability checker, we are really saying that the write that came in may complete any time between the call and the next complete cluster shutdown.

Looking at how onFailure is handled, it varies from ignore to logging an error to propagating out the error. This might be OK, if carefully considered in each case and understanding that the action taken may still complete "later".

In reality, the system probably relies on a failure meaning that the action may have been written at the time of failure, but will not be written much later than that. A user may check if the write was done and repeat if not. This could lead to doing things twice, depending on key generation. But since the write will likely not succeed (since a user/operator interaction takes seconds), this is unlikely to happen.

I think the main actionable item I have is to better document what onFailure means on ClusterStateTaskListener, otherwise this is mainly a discussion topic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The failures that can occur might not only come from the system (e.g. publication layer), but also from the user-defined cluster state update function. Task batching can make this even more complicated. Also worth pointing out is that many master-level actions (i.e. subclasses of TransportMasterNodeAction) already have a retry mechanism built-in that reacts to publication-level failures.

@@ -1093,13 +1098,22 @@ void runRandomly() {
}

try {
if (rarely()) {
if (randomBoolean() && randomBoolean() && randomBoolean()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Running all tests in the class, 75-80% of the linearizability checks are against histories where all invocations come before all responses (form 1). Around half of the remaining checks are against histories with form (invocation*, response*, invocation*, response*), ie. two rounds of invocation/response (form 2). Only checking histories larger than size 100, the numbers are 63-65% of form 1 and 77- 80% of form 2. Larger than size 1000, 1 out of 2 histories were of form 1.

I believe form 1 will only fail if responses contains values that are not part of one of the invocations, but otherwise the linearizability checks will not detect any problems.

Perhaps we need to tweak the randomness to better provoke situations that could reveal linearizability problems?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the investigation. Tweaking the randomness here is not too straightforward, and has a more general impact on how these Coordinator tests run. I will leave this to a follow-up PR, as I don't want to any longer delay merging the infrastructure in this PR so that it is available to other follow-up work.

@ywelsch ywelsch merged commit 8581983 into elastic:master Feb 26, 2019
ywelsch added a commit that referenced this pull request Feb 26, 2019
Checks that the core coordination algorithm implemented as part of Zen2 (#32006) supports
linearizable semantics. This commit adds a linearizability checker based on the Wing and Gong
graph search algorithm with support for compositional checking and activates these checks for all
CoordinatorTests.
ywelsch added a commit that referenced this pull request Feb 26, 2019
Checks that the core coordination algorithm implemented as part of Zen2 (#32006) supports
linearizable semantics. This commit adds a linearizability checker based on the Wing and Gong
graph search algorithm with support for compositional checking and activates these checks for all
CoordinatorTests.
DaveCTurner added a commit to DaveCTurner/elasticsearch that referenced this pull request May 24, 2019
Today the default stabilisation time is calculated on the assumption that the
elected master has no pending tasks to process when it is elected, but this is
not a safe assumption to make. This can result in a cluster reaching the end of
its stabilisation time without having stabilised. Furthermore in elastic#36943 we
increased the probability that each step in `runRandomly()` enqueues another
task, vastly increasing the chance that we hit such a situation.

This change extends the stabilisation process to allow time for all pending
tasks, plus a task that might currently be in flight.

Fixes elastic#41967, in which the master entered the stabilisation phase with over 800
tasks to process.
DaveCTurner added a commit that referenced this pull request May 24, 2019
Today the default stabilisation time is calculated on the assumption that the
elected master has no pending tasks to process when it is elected, but this is
not a safe assumption to make. This can result in a cluster reaching the end of
its stabilisation time without having stabilised. Furthermore in #36943 we
increased the probability that each step in `runRandomly()` enqueues another
task, vastly increasing the chance that we hit such a situation.

This change extends the stabilisation process to allow time for all pending
tasks, plus a task that might currently be in flight.

Fixes #41967, in which the master entered the stabilisation phase with over 800
tasks to process.
DaveCTurner added a commit that referenced this pull request May 24, 2019
Today the default stabilisation time is calculated on the assumption that the
elected master has no pending tasks to process when it is elected, but this is
not a safe assumption to make. This can result in a cluster reaching the end of
its stabilisation time without having stabilised. Furthermore in #36943 we
increased the probability that each step in `runRandomly()` enqueues another
task, vastly increasing the chance that we hit such a situation.

This change extends the stabilisation process to allow time for all pending
tasks, plus a task that might currently be in flight.

Fixes #41967, in which the master entered the stabilisation phase with over 800
tasks to process.
DaveCTurner added a commit that referenced this pull request May 24, 2019
Today the default stabilisation time is calculated on the assumption that the
elected master has no pending tasks to process when it is elected, but this is
not a safe assumption to make. This can result in a cluster reaching the end of
its stabilisation time without having stabilised. Furthermore in #36943 we
increased the probability that each step in `runRandomly()` enqueues another
task, vastly increasing the chance that we hit such a situation.

This change extends the stabilisation process to allow time for all pending
tasks, plus a task that might currently be in flight.

Fixes #41967, in which the master entered the stabilisation phase with over 800
tasks to process.
gurkankaymak pushed a commit to gurkankaymak/elasticsearch that referenced this pull request May 27, 2019
Today the default stabilisation time is calculated on the assumption that the
elected master has no pending tasks to process when it is elected, but this is
not a safe assumption to make. This can result in a cluster reaching the end of
its stabilisation time without having stabilised. Furthermore in elastic#36943 we
increased the probability that each step in `runRandomly()` enqueues another
task, vastly increasing the chance that we hit such a situation.

This change extends the stabilisation process to allow time for all pending
tasks, plus a task that might currently be in flight.

Fixes elastic#41967, in which the master entered the stabilisation phase with over 800
tasks to process.
henningandersen pushed a commit that referenced this pull request Jun 10, 2019
Today the default stabilisation time is calculated on the assumption that the
elected master has no pending tasks to process when it is elected, but this is
not a safe assumption to make. This can result in a cluster reaching the end of
its stabilisation time without having stabilised. Furthermore in #36943 we
increased the probability that each step in `runRandomly()` enqueues another
task, vastly increasing the chance that we hit such a situation.

This change extends the stabilisation process to allow time for all pending
tasks, plus a task that might currently be in flight.

Fixes #41967, in which the master entered the stabilisation phase with over 800
tasks to process.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Coordination/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. >test Issues or PRs that are addressing/adding tests v7.0.0-rc1 v7.2.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants