Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add KsqlUncaughtExceptionHandler and new KsqlRestConfig for enabling it #3425

Conversation

stevenpyzhang
Copy link
Member

@stevenpyzhang stevenpyzhang commented Sep 26, 2019

Description

Currently, there's no way to bring back a thread that dies due to exception. This PR adds a UncaughtExceptionHandler to the CommandRunner thread which will take down the server when the thread dies due to an unknown exception. the executorService was also changed to execute() because submit() doesn't call UncaughtExceptionHandler.

https://stackoverflow.com/questions/1838923/why-is-uncaughtexceptionhandler-not-called-by-executorservice

Example: if the CommandRunner thread encounters problems when polling the commandTopic, the exception returned isn't being handled currently and the thread will die. The server is then left in a state where it can't process any more commands now that the thread is dead. The only way to restore functionality is to just restart the entire server.

This PR also adds a new KafkaStreamsUncaughtExceptionHandler() for persistent queries since if there's an exception in them, the server can still function normally.

Testing done

Set the new config and threw a RuntimeException from CommandRunner thread and it closed the server after it was thrown.

When the config wasn't set and the exception thrown from CommandRunner, the thread died and the server continued running.

Reviewer checklist

  • Ensure docs are updated if necessary. (eg. if a user visible feature is being added or changed).
  • Ensure relevant issues are linked (description should include text like "Fixes #")

@stevenpyzhang stevenpyzhang requested a review from a team as a code owner September 26, 2019 18:01
@stevenpyzhang stevenpyzhang force-pushed the add-unhandled-exception-handler branch from 4a9f746 to d8c6589 Compare September 27, 2019 00:01
@stevenpyzhang stevenpyzhang changed the title feat: add global UncaughtExceptionHandler to all threads (can be overridden) feat: add UncaughtExceptionHandler to CommandRunner thread Sep 27, 2019
Copy link
Contributor

@rodesai rodesai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SuppressFBWarnings
public void uncaughtException(final Thread t, final Throwable e) {
log.error("Unhandled exception caught in thread {}, error: {}", t.getName(), e.getStackTrace());
System.exit(-1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple things:

Can we also write the exception to stderr here?

We should call org.apache.log4j.LogManager.shutdown() to ensure that all the buffered logs (including the one above) get flushed. Ideally this would get passed in the constructor as a Runnable:

class KsqlUncaughtExceptionHandler implements UncaughtExceptionHandler {
    private final Runnable flusher;

    public void KsqlUncaughtExceptionHandler(final Runnable flusher) {
        this.flusher = flusher;
    }
}
...
new KsqlUncaughtExceptionHandler(LogManager::flush);

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add it, but is there a reason why we don't call LogManager.shutdown() currently when the server exits? I don't see any other uses of it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should definitely call it when the server exits cleanly as well. It's more critical here because we might lose the exact information we need to help debug the issue that made the thread crash.

@stevenpyzhang
Copy link
Member Author

stevenpyzhang commented Sep 27, 2019

I thought the plan was to use https://docs.oracle.com/javase/7/docs/api/java/lang/Thread.html#setDefaultUncaughtExceptionHandler(java.lang.Thread.UncaughtExceptionHandler) to set this globally. Any reason we're not doing that?

@rodesai I had that originally. Could there be other threads that can safely die other than the Stream threads? It seems a bit much to enable this globally by default. Maybe we should add a new config that allows us to turn it on or off (Default is that it's off)?

@stevenpyzhang stevenpyzhang force-pushed the add-unhandled-exception-handler branch 3 times, most recently from 04b6d68 to 71e4a60 Compare September 27, 2019 18:24
@stevenpyzhang stevenpyzhang changed the title feat: add UncaughtExceptionHandler to CommandRunner thread feat: add KsqlUncaughtExceptionHandler and new KsqlRestConfig for enabling it Sep 27, 2019
@rodesai
Copy link
Contributor

rodesai commented Sep 27, 2019

Could there be other threads that can safely die other than the Stream threads? It seems a bit much to enable this globally by default. Maybe we should add a new config that allows us to turn it on or off (Default is that it's off)?

I wouldn't ever expect a thread to exit this way. If it does then we're probably in some degraded state. There is some risk of our dependencies having bugs where their threads die, but it would be reasonable to keep running. Let's see how it goes and we can add more targeted handling if we need to.

Copy link
Contributor

@rodesai rodesai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, modulo the comment about config names

@stevenpyzhang stevenpyzhang force-pushed the add-unhandled-exception-handler branch 3 times, most recently from a7e2d34 to bc89f8f Compare September 28, 2019 02:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants