Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Die with dignity while merging #27265

Merged
merged 16 commits into from
Nov 6, 2017
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,6 @@

import org.apache.logging.log4j.Logger;
import org.apache.logging.log4j.message.ParameterizedMessage;
import org.apache.lucene.index.MergePolicy;
import org.elasticsearch.common.SuppressForbidden;
import org.elasticsearch.common.logging.Loggers;

Expand Down Expand Up @@ -68,11 +67,7 @@ public void uncaughtException(Thread t, Throwable e) {

// visible for testing
static boolean isFatalUncaught(Throwable e) {
return isFatalCause(e) || (e instanceof MergePolicy.MergeException && isFatalCause(e.getCause()));
}

private static boolean isFatalCause(Throwable cause) {
return cause instanceof Error;
return e instanceof Error;
}

// visible for testing
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -1579,23 +1579,15 @@ public IndexCommitRef acquireIndexCommit(final boolean flushFirst) throws Engine
}
}

@SuppressWarnings("finally")
private boolean failOnTragicEvent(AlreadyClosedException ex) {
final boolean engineFailed;
// if we are already closed due to some tragic exception
// we need to fail the engine. it might have already been failed before
// but we are double-checking it's failed and closed
if (indexWriter.isOpen() == false && indexWriter.getTragicException() != null) {
if (indexWriter.getTragicException() instanceof Error) {
try {
logger.error("tragic event in index writer", ex);
} finally {
throw (Error) indexWriter.getTragicException();
}
} else {
failEngine("already closed by tragic event on the index writer", (Exception) indexWriter.getTragicException());
engineFailed = true;
}
maybeDie("tragic event in index writer", indexWriter.getTragicException());
failEngine("already closed by tragic event on the index writer", (Exception) indexWriter.getTragicException());
engineFailed = true;
} else if (translog.isOpen() == false && translog.getTragicException() != null) {
failEngine("already closed by tragic event on the translog", translog.getTragicException());
engineFailed = true;
Expand Down Expand Up @@ -1916,7 +1908,6 @@ protected void doRun() throws Exception {

@Override
protected void handleMergeException(final Directory dir, final Throwable exc) {
logger.error("failed to merge", exc);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: shall we keep this here, for fast visibility and also, just in case it happens during node shutdown..

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I glanecd at the lucene code leading here and it seems all good. I do wonder if it's good to assert we can't find a Error in the cause chain of exc.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the Lucene code as written right now, that's not possible. I'm really not sure if I want to add this assertion right now (look at how complicated it is in Netty4Utils#maybeError where Netty can hide things we do not want hidden from us). I understand the point of the assertion is to ensure that what we think we see in the Lucene code always holds, but then it's not clear to me if we'd ever encounter a situation where an Error occurs in testing that would trip one of these assertions anyway. So I see a cost to adding the assertion, and I'm not sure if I see benefit more than that cost.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your point about tests maybe not being enough. I was thinking of something simple like assert ExceptionsHelper.unwrap(exc, Error.class) == null : exc . the maybeError also looks for suppressions which I'm fine with not doing. Alternatively we can convert maybError to a top level utility.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will consider this in a follow-up.

engineConfig.getThreadPool().generic().execute(new AbstractRunnable() {
@Override
public void onFailure(Exception e) {
Expand All @@ -1925,13 +1916,39 @@ public void onFailure(Exception e) {

@Override
protected void doRun() throws Exception {
MergePolicy.MergeException e = new MergePolicy.MergeException(exc, dir);
failEngine("merge failed", e);
/*
* We do this on another thread rather than the merge thread that we are initially called on so that we have complete
* confidence that the call stack does not contain catch statements that would cause the error that might be thrown
* here from being caught and never reaching the uncaught exception handler.
*/
maybeDie("fatal error while merging", exc);
logger.error("failed to merge", exc);
failEngine("merge failed", new MergePolicy.MergeException(exc, dir));
}
});
}
}

/**
* If the specified throwable is a fatal error, this throwable will be thrown. Callers should ensure that there are no catch statements
* that would catch an error in the stack as the fatal error here should go uncaught and be handled by the uncaught exception handler
* that we install during bootstrap. If the specified throwable is indeed a fatal error, the specified message will attempt to be logged
* before throwing the fatal error. If the specified throwable is not a fatal error, this method is a no-op.
*
* @param maybeMessage the message to maybe log
* @param maybeFatal the throwable that is maybe fatal
*/
@SuppressWarnings("finally")
private void maybeDie(final String maybeMessage, final Throwable maybeFatal) {
if (maybeFatal instanceof Error) {
try {
logger.error(maybeMessage, maybeFatal);
} finally {
throw (Error) maybeFatal;
}
}
}

/**
* Commits the specified index writer.
*
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,6 @@

package org.elasticsearch.bootstrap;

import org.apache.lucene.index.MergePolicy;
import org.elasticsearch.test.ESTestCase;
import org.junit.Before;

Expand Down Expand Up @@ -131,7 +130,6 @@ void onNonFatalUncaught(String threadName, Throwable t) {
}

public void testIsFatalCause() {
assertFatal(new MergePolicy.MergeException(new OutOfMemoryError(), null));
assertFatal(new OutOfMemoryError());
assertFatal(new StackOverflowError());
assertFatal(new InternalError());
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@
import java.util.Collections;
import java.util.List;

import static org.elasticsearch.index.translog.TranslogDeletionPolicyTests.createTranslogDeletionPolicy;
import static org.elasticsearch.index.translog.TranslogDeletionPolicies.createTranslogDeletionPolicy;
import static org.mockito.Mockito.mock;
import static org.mockito.Mockito.times;
import static org.mockito.Mockito.verify;
Expand Down
Loading