-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add KsqlUncaughtExceptionHandler and new KsqlRestConfig for enabling it #3425
feat: add KsqlUncaughtExceptionHandler and new KsqlRestConfig for enabling it #3425
Conversation
4a9f746
to
d8c6589
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought the plan was to use https://docs.oracle.com/javase/7/docs/api/java/lang/Thread.html#setDefaultUncaughtExceptionHandler(java.lang.Thread.UncaughtExceptionHandler) to set this globally. Any reason we're not doing that?
@SuppressFBWarnings | ||
public void uncaughtException(final Thread t, final Throwable e) { | ||
log.error("Unhandled exception caught in thread {}, error: {}", t.getName(), e.getStackTrace()); | ||
System.exit(-1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple things:
Can we also write the exception to stderr here?
We should call org.apache.log4j.LogManager.shutdown()
to ensure that all the buffered logs (including the one above) get flushed. Ideally this would get passed in the constructor as a Runnable
:
class KsqlUncaughtExceptionHandler implements UncaughtExceptionHandler {
private final Runnable flusher;
public void KsqlUncaughtExceptionHandler(final Runnable flusher) {
this.flusher = flusher;
}
}
...
new KsqlUncaughtExceptionHandler(LogManager::flush);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add it, but is there a reason why we don't call LogManager.shutdown() currently when the server exits? I don't see any other uses of it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should definitely call it when the server exits cleanly as well. It's more critical here because we might lose the exact information we need to help debug the issue that made the thread crash.
@rodesai I had that originally. Could there be other threads that can safely die other than the Stream threads? It seems a bit much to enable this globally by default. Maybe we should add a new config that allows us to turn it on or off (Default is that it's off)? |
04b6d68
to
71e4a60
Compare
I wouldn't ever expect a thread to exit this way. If it does then we're probably in some degraded state. There is some risk of our dependencies having bugs where their threads die, but it would be reasonable to keep running. Let's see how it goes and we can add more targeted handling if we need to. |
ksql-rest-app/src/main/java/io/confluent/ksql/rest/server/KsqlRestConfig.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, modulo the comment about config names
a7e2d34
to
bc89f8f
Compare
bc89f8f
to
3996b20
Compare
Description
Currently, there's no way to bring back a thread that dies due to exception. This PR adds a UncaughtExceptionHandler to the CommandRunner thread which will take down the server when the thread dies due to an unknown exception. the executorService was also changed to execute() because submit() doesn't call UncaughtExceptionHandler.
https://stackoverflow.com/questions/1838923/why-is-uncaughtexceptionhandler-not-called-by-executorservice
Example: if the CommandRunner thread encounters problems when polling the commandTopic, the exception returned isn't being handled currently and the thread will die. The server is then left in a state where it can't process any more commands now that the thread is dead. The only way to restore functionality is to just restart the entire server.
This PR also adds a new KafkaStreamsUncaughtExceptionHandler() for persistent queries since if there's an exception in them, the server can still function normally.
Testing done
Set the new config and threw a RuntimeException from CommandRunner thread and it closed the server after it was thrown.
When the config wasn't set and the exception thrown from CommandRunner, the thread died and the server continued running.
Reviewer checklist