Skip to content

Commit

Permalink
Ensure no Watches are running after Watcher is stopped.
Browse files Browse the repository at this point in the history
Watcher keeps track of which watches are currently running keyed by watcher name/id.
If a watch is currently running it will not run the same watch and will result in a
message : "Watch is already queued in thread pool" and a state: "not_executed_already_queued"

When Watcher is stopped, it will stop watcher (rejecting any new watches), but allow
the currently running watches to run to completion. Waiting for the currently running
watches to complete is done async to the stopping of Watcher. Meaning that Watcher will
report as fully stopped, but there is still a background thread waiting for all of the
Watches to finish before it removes the watch from it's list of currently running Watches.

The integration test start and stop watcher between each test. The goal to ensure a clean
state between tests. However, since Watcher can report "yes - I am stopped", but there
are still running Watches, the tests may bleed over into each other, especially on slow
machines.  This can result in errors related to "Watch is already queued in thread pool"
and a state: "not_executed_already_queued", and is VERY difficult to reproduce.

This commit changes the waiting for Watches on stop/pause from an aysnc waiting, back to a
sync wait as it worked prior to elastic#30118.  This help ensure that for testing testing scenario
the stop much more predictable, such that after fully stopped, no Watches are running.
This should have little impact if any on production code since Watcher isn't stopped/paused
too often and when it stop/pause it has the same behavior is the same, it will just run on
the calling thread, not a generic thread.
  • Loading branch information
jakelandis committed Jul 2, 2019
1 parent 217b875 commit 926a671
Showing 1 changed file with 9 additions and 10 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -106,7 +106,7 @@ public class ExecutionService {
private final WatchExecutor executor;
private final ExecutorService genericExecutor;

private AtomicReference<CurrentExecutions> currentExecutions = new AtomicReference<>();
private CurrentExecutions currentExecutions;
private final AtomicBoolean paused = new AtomicBoolean(false);

public ExecutionService(Settings settings, HistoryStore historyStore, TriggeredWatchStore triggeredWatchStore, WatchExecutor executor,
Expand All @@ -123,7 +123,7 @@ public ExecutionService(Settings settings, HistoryStore historyStore, TriggeredW
this.client = client;
this.genericExecutor = genericExecutor;
this.indexDefaultTimeout = settings.getAsTime("xpack.watcher.internal.ops.index.default_timeout", TimeValue.timeValueSeconds(30));
this.currentExecutions.set(new CurrentExecutions());
this.currentExecutions = new CurrentExecutions();
}

public void unPause() {
Expand Down Expand Up @@ -169,12 +169,12 @@ public long executionThreadPoolMaxSize() {

// for testing only
CurrentExecutions getCurrentExecutions() {
return currentExecutions.get();
return currentExecutions;
}

public List<WatchExecutionSnapshot> currentExecutions() {
List<WatchExecutionSnapshot> currentExecutions = new ArrayList<>();
for (WatchExecution watchExecution : this.currentExecutions.get()) {
for (WatchExecution watchExecution : this.currentExecutions) {
currentExecutions.add(watchExecution.createSnapshot());
}
// Lets show the longest running watch first:
Expand Down Expand Up @@ -279,7 +279,7 @@ public WatchRecord execute(WatchExecutionContext ctx) {
WatchRecord record = null;
final String watchId = ctx.id().watchId();
try {
boolean executionAlreadyExists = currentExecutions.get().put(watchId, new WatchExecution(ctx, Thread.currentThread()));
boolean executionAlreadyExists = currentExecutions.put(watchId, new WatchExecution(ctx, Thread.currentThread()));
if (executionAlreadyExists) {
logger.trace("not executing watch [{}] because it is already queued", watchId);
record = ctx.abortBeforeExecution(ExecutionState.NOT_EXECUTED_ALREADY_QUEUED, "Watch is already queued in thread pool");
Expand Down Expand Up @@ -334,7 +334,7 @@ record = createWatchRecord(record, ctx, e);

triggeredWatchStore.delete(ctx.id());
}
currentExecutions.get().remove(watchId);
currentExecutions.remove(watchId);
logger.debug("finished [{}]/[{}]", watchId, ctx.id());
}
return record;
Expand Down Expand Up @@ -578,10 +578,9 @@ public Counters executionTimes() {
* This clears out the current executions and sets new empty current executions
* This is needed, because when this method is called, watcher keeps running, so sealing executions would be a bad idea
*/
private void clearExecutions() {
final CurrentExecutions currentExecutionsBeforeSetting = currentExecutions.getAndSet(new CurrentExecutions());
// clear old executions in background, no need to wait
genericExecutor.execute(() -> currentExecutionsBeforeSetting.sealAndAwaitEmpty(maxStopTimeout));
private synchronized void clearExecutions() {
currentExecutions.sealAndAwaitEmpty(maxStopTimeout);
currentExecutions = new CurrentExecutions();
}

// the watch execution task takes another runnable as parameter
Expand Down

0 comments on commit 926a671

Please sign in to comment.