-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Only start re-assigning persistent tasks if they are not already being reassigned #76258
Only start re-assigning persistent tasks if they are not already being reassigned #76258
Conversation
Pinging @elastic/es-distributed (Team:Distributed) |
@elasticmachine retest this please |
The changes so far are fine, but I think there's an even more important change that needs to be made as well. The seemingly innocuous change of https://github.com/elastic/elasticsearch/pull/72260/files#diff-73e70d1002fe8bcafa19e34b892feb628153094db2ce30d77ca2f56ae5752523R316-R317 which went into 7.14 causes a major problem for any type of persistent task where the reason for failure to assign includes detailed per-node information. It means that if a particular type of persistent task cannot be assigned then it will lead to a vicious circle of: fail to assign -> set assignment failure reason -> update cluster state -> trigger cluster state listener -> try to reassign unassigned persistent tasks -> fail to assign -> set assignment failure reason to something different to what it was before -> update cluster state -> trigger cluster state listener -> etc. For example, the failure reasons might go:
So even though the reasons are effectively the same they're different. The to something different to what it was before is what's different in 7.14. In 7.13 and earlier the second cycle would choose the same assignment failure reason as the first because the nodes would be checked in the same order, hence there wouldn't be a second cluster state update caused by the second assignment attempt because the cluster state would be identical. I think this is a serious bug that will affect other users if it's not fixed soon, so it needs to be fixed for 7.14.1. Two possible fixes I can see are:
It would also be good to add a test that two consecutive assignment failures with the same cluster state generate the same failure reason. |
I had a look and transforms already does this - it puts the per-node detailed reasons in a tree map keyed on node ID. The other types of persistent tasks just use very simple high level reasons, so won't be affected. I think it's just ML that will be affected, although all types of ML persistent tasks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Ben, this looks good. Can we add a test to verify the behavior too such that we do not inadvertently disable this somehow in the future?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test looks good, one minor comment on it. I did not review the other added part yet.
server/src/test/java/org/elasticsearch/persistent/PersistentTasksClusterServiceTests.java
Show resolved
Hide resolved
server/src/test/java/org/elasticsearch/persistent/PersistentTasksClusterServiceTests.java
Outdated
Show resolved
Hide resolved
…sign-p-tasks-if-not-currently-reassigning
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM if you could just make one more tweak
server/src/test/java/org/elasticsearch/persistent/PersistentTasksClusterServiceTests.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
assertThat( | ||
result.getExplanation(), | ||
equalTo( | ||
"Not opening job [incompatible_type_job] on node [{_node_name1}{version=8.0.0}], " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should substitute Version.CURRENT.toString
instead of 8.0.0
, otherwise this test will break every time we release?
Same issue two lines down.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can do that
…sign-p-tasks-if-not-currently-reassigning
💔 Backport failed
To backport manually run: |
…g reassigned (elastic#76258) * Only start re-assigning persistent tasks if they are not already being reassigned * adding tests addressing PR comments * addressing Pr COmments * addressing PR comments + style" * improving test rigor
…g reassigned (elastic#76258) * Only start re-assigning persistent tasks if they are not already being reassigned * adding tests addressing PR comments * addressing Pr COmments * addressing PR comments + style" * improving test rigor
…y being reassigned (#76258) (#76314) * Only start re-assigning persistent tasks if they are not already being reassigned (#76258) * Only start re-assigning persistent tasks if they are not already being reassigned * adding tests addressing PR comments * addressing Pr COmments * addressing PR comments + style" * improving test rigor * test improvement
…dy being reassigned (#76258) (#76315) * Only start re-assigning persistent tasks if they are not already being reassigned (#76258) * Only start re-assigning persistent tasks if they are not already being reassigned * adding tests addressing PR comments * addressing Pr COmments * addressing PR comments + style" * improving test rigor * test improvement
We were on 7.14.0 and experienced the issue described in #76258 (comment) - upgrading to 7.14.1 (which includes the fix in this PR) fixed the problem. Posting some symptoms here in case it helps anyone searching for solutions to their problems:
Upgrading to 7.14.1 and rolling all our pods (we run ECK) resolved the issue entirely. |
In cluster recovery scenarios, it is possible there has been a flurry of cluster state updates. These updates may be routing updates in an attempt to get indices searchable again on nodes.
For each of these updates, a new persistent task re-assignment update may cause more queued cluster update requests.
This can cause unnecessary work, and consequently slow down the cluster's recovery.
This commit guards cluster update action for persistent tasks re-assignment so that only one is queued at a time.
This MAY cause certain persistent tasks to be re-assigned more slowly, but since we periodically recheck for re-assignment, this is acceptable.