-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Properly support task batches in ReservedStateUpdateTaskExecutor #116353
Changes from 3 commits
151a67c
b508682
8a3c21b
fdbcbc1
d8261b8
71b3f29
f249c76
8f2f728
dc19d66
a85a1e1
893a81c
bf82519
5b41abf
70269ba
b8e4944
c152d05
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -14,15 +14,14 @@ | |
import org.elasticsearch.action.ActionListener; | ||
import org.elasticsearch.action.ActionResponse; | ||
import org.elasticsearch.cluster.ClusterState; | ||
import org.elasticsearch.cluster.SimpleBatchedExecutor; | ||
import org.elasticsearch.cluster.ClusterStateTaskExecutor; | ||
import org.elasticsearch.cluster.routing.RerouteService; | ||
import org.elasticsearch.common.Priority; | ||
import org.elasticsearch.core.Tuple; | ||
|
||
/** | ||
* Reserved cluster state update task executor | ||
*/ | ||
public class ReservedStateUpdateTaskExecutor extends SimpleBatchedExecutor<ReservedStateUpdateTask, Void> { | ||
public class ReservedStateUpdateTaskExecutor implements ClusterStateTaskExecutor<ReservedStateUpdateTask> { | ||
|
||
private static final Logger logger = LogManager.getLogger(ReservedStateUpdateTaskExecutor.class); | ||
|
||
|
@@ -34,17 +33,27 @@ public ReservedStateUpdateTaskExecutor(RerouteService rerouteService) { | |
} | ||
|
||
@Override | ||
public Tuple<ClusterState, Void> executeTask(ReservedStateUpdateTask task, ClusterState clusterState) { | ||
return Tuple.tuple(task.execute(clusterState), null); | ||
public final ClusterState execute(BatchExecutionContext<ReservedStateUpdateTask> batchExecutionContext) throws Exception { | ||
var initState = batchExecutionContext.initialState(); | ||
var taskContexts = batchExecutionContext.taskContexts(); | ||
if (taskContexts.isEmpty()) { | ||
return initState; | ||
} | ||
// Only the last update is relevant; the others can be skipped | ||
var taskContext = taskContexts.get(taskContexts.size() - 1); | ||
ClusterState clusterState = initState; | ||
try (var ignored = taskContext.captureResponseHeaders()) { | ||
var task = taskContext.getTask(); | ||
clusterState = task.execute(clusterState); | ||
taskContext.success(() -> task.listener().onResponse(ActionResponse.Empty.INSTANCE)); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You have to mark all the other tasks as having succeeded too. There's an assertion to check this in the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. (Indeed. I'm not a huge fan of mocks TBH because it's so hard to tell what you're actually testing.</rant>) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, I think you could perhaps test this with an There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Also it'd be good to have a test that tries to create the same repository twice in a batch in order to demonstrate the NPE is fixed. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Possibly There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Just thinking out loud on how this might look in an IT: In Based on the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In an masterNodeClusterService.createTaskQueue("block", Priority.NORMAL, batchExecutionContext -> {
safeAwait(barrier);
safeAwait(barrier);
batchExecutionContext.taskContexts().forEach(c -> c.success(() -> {}));
return batchExecutionContext.initialState();
}).submitTask("block", ESTestCase::fail, null); There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Update - I think I have confirmed everything below now. No need to read it in detail.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ok new question... I'm not sure how to trigger this condition in the integration test. I tried bad JSON but that fails in a different way: it never seems to make it to I wonder how I can feed this valid JSON that parses, but then somehow the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Welp I guess I can at least test the happy path. Submit N changes and verify that the last one takes effect. |
||
} catch (Exception e) { | ||
taskContext.onFailure(e); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If the last task fails, and there are earlier tasks in the list, should we try them instead of completely giving up? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is it safe to try multiple tasks? I thought the motivation for this PR was to avoid doing so. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If the task failed then the cluster state remains unchanged, so yes in that case it's safe to try a different one. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So we're effectively trying them in reverse until one succeeds. Makes sense I guess, though it seems a little odd. I wonder if fixing it so they can apply in order would be more going "with the grain". There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That's also a possibility but to achieve that you'd have to go through all the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Every failed task will also kick off a separate task to update error state in reserved cluster state metadata, and these error state updates are also subject to the same version checks. So if we have three tasks with versions (
We'd apply The code for error handling is in several places, but see here for example I wonder if we should simply try to apply the highest version task for now, then follow up with potential improvements (like trying older tasks, or instead trying the queued tasks in reverse order). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That's not my understanding. If we fail to update state There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You are very likely right, but let me just write out what the flow looks like:
Given that error update tasks go to a separate executor and task queue, is it still true that they're guaranteed to run after we've processed tasks in the non-error state update queue? That is, can't it happen that we finish task There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
@DaveCTurner thanks for clarifying! The piece I was missing conceptually is that the execution of batches is also ordered so while we're processing one batch, we won't start processing another batch in parallel. @prdoyle all good to ignore my comments 👍 |
||
} | ||
return clusterState; | ||
} | ||
|
||
@Override | ||
public void taskSucceeded(ReservedStateUpdateTask task, Void unused) { | ||
task.listener().onResponse(ActionResponse.Empty.INSTANCE); | ||
} | ||
|
||
@Override | ||
public void clusterStatePublished() { | ||
public final void clusterStatePublished(ClusterState newClusterState) { | ||
rerouteService.reroute( | ||
"reroute after saving and reserving part of the cluster state", | ||
Priority.NORMAL, | ||
|
@@ -54,4 +63,5 @@ public void clusterStatePublished() { | |
) | ||
); | ||
} | ||
|
||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -279,6 +279,39 @@ public void success(Runnable onPublicationSuccess) { | |
verify(rerouteService, times(1)).reroute(anyString(), any(), any()); | ||
} | ||
|
||
public void testLastUpdateIsApplied() throws Exception { | ||
ClusterState state0 = ClusterState.builder(new ClusterName("test")).version(1000).build(); | ||
ClusterState state1 = ClusterState.builder(new ClusterName("test")).version(1001).build(); | ||
ClusterState state2 = ClusterState.builder(new ClusterName("test")).version(1002).build(); | ||
ReservedStateUpdateTask realTask = new ReservedStateUpdateTask( | ||
"test", | ||
null, | ||
ReservedStateVersionCheck.HIGHER_VERSION_ONLY, | ||
Map.of(), | ||
Set.of(), | ||
errorState -> fail("Unexpected error"), | ||
ActionListener.noop() | ||
); | ||
ReservedStateUpdateTask task1 = spy(realTask); | ||
doReturn(state1).when(task1).execute(any()); | ||
ReservedStateUpdateTask task2 = spy(realTask); | ||
doReturn(state2).when(task2).execute(any()); | ||
RerouteService rerouteService = mock(RerouteService.class); | ||
ReservedStateUpdateTaskExecutor taskExecutor = new ReservedStateUpdateTaskExecutor(rerouteService); | ||
ClusterState newState = taskExecutor.execute( | ||
new ClusterStateTaskExecutor.BatchExecutionContext<>( | ||
state0, | ||
List.of(new TestTaskContext<>(task1), new TestTaskContext<>(task2)), | ||
() -> null | ||
) | ||
); | ||
|
||
assertThat("State should be the final state", newState, sameInstance(state2)); | ||
// Only process the final task; the intermediate ones can be skipped | ||
verify(task1, times(0)).execute(any()); | ||
verify(task2, times(1)).execute(any()); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. FWIW I verified that this fails without the fix. |
||
} | ||
|
||
public void testUpdateErrorState() { | ||
ClusterService clusterService = mock(ClusterService.class); | ||
ClusterState state = ClusterState.builder(new ClusterName("test")).build(); | ||
|
@@ -400,7 +433,7 @@ public TransformState transform(Object source, TransformState prevState) throws | |
var chunk = new ReservedStateChunk(Map.of("one", "two", "maker", "three"), new ReservedStateVersion(2L, BuildVersion.current())); | ||
var orderedHandlers = List.of(exceptionThrower.name(), newStateMaker.name()); | ||
|
||
// We submit a task with two handler, one will cause an exception, the other will create a new state. | ||
// We submit a task with two handlers, one will cause an exception, the other will create a new state. | ||
// When we fail to update the metadata because of version, we ensure that the returned state is equal to the | ||
// original state by pointer reference to avoid cluster state update task to run. | ||
ReservedStateUpdateTask task = new ReservedStateUpdateTask( | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure the tasks are submitted in order, such that the last one in the list is always the best one to apply? Deserves a comment and an assertion I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh... is there some other way to define the "order" of tasks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure, but I understood that file-based settings have some notion of version (carried in the file) to ensure we're not overriding fresh settings with stale ones. I believe it relates to
task.stateChunk.metadata().version()
, although I haven't tracked down the details.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct,
task.stateChunk.metadata().version()
will give you the version of the file-settings file this update task was created for.This is then compared against the last-processed version stored in metadata.
We should pick the update task with the highest version as @DaveCTurner suggests, unless someone from core infra confirms that the tasks are already guaranteed to be "well-ordered".