-
Notifications
You must be signed in to change notification settings - Fork 56
Automatically reconnect to the stream if a heartbeat is not received in time #77
Conversation
@@ -34,6 +42,8 @@ | |||
this.config = config; | |||
this.sdkKey = sdkKey; | |||
this.requestor = requestor; | |||
this.heartbeatDetectorService = Executors.newScheduledThreadPool(1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Elsewhere we used named threadpools like this:
ThreadFactory threadFactory = new ThreadFactoryBuilder()
.setNameFormat("LaunchDarkly-HeartbeatDetector-%d")
.build();
this. heartbeatDetectorService = Executors.newSingleThreadScheduledExecutor(threadFactory);
This makes for easier debugging and reading of log statements.
Also, I think we want this to be a single threaded executor, even though practically it won't matter much.
|
||
private final FeatureStore store; | ||
private final LDConfig config; | ||
private final String sdkKey; | ||
private final FeatureRequestor requestor; | ||
private EventSource es; | ||
private final ScheduledExecutorService heartbeatDetectorService; | ||
private DateTime lastHeartbeat; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think we may want lastHeartbeat to be volatile, or an atomic reference. thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is a good idea-- volatile
is sufficient.
|
||
@Override | ||
public void run() { | ||
DateTime fiveMinutesAgo = DateTime.now().minusSeconds(DEAD_CONNECTION_INTERVAL_SECONDS); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's rename fiveMinutesAgo
to reconnectThresholdTime
or something else not tied to the value of the constant?
try { | ||
logger.info("Stream stopped receiving heartbeats- reconnecting."); | ||
es.close(); | ||
start(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if es.close() throws an exception? I don't think we'll restart the stream. I think putting start() in a finally block will help.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if we're actually failing to close the previous EventSource? If that's happening every time (for some unknown reason) then we'll be creating a new ES instance every few minutes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
true- what about checking the state of es in the finally block?
if (es.getState() == ReadyState.CLOSED) { //or whatever state we're in after closing.
es.start();
} else {
// this is bad.
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the code, it's not really possible. Nothing in es.close()
throws an exception-- it's simply there to implement the Closeable
interface. Nevertheless, calling start
in finally
makes sense in case the exception semantics in close
change.
if (es.getState() == ReadyState.SHUTDOWN) { | ||
start(); | ||
} else { | ||
logger.warn("Expected ES to be in state SHUTDOWN, but it's currently in state " + es.getState().toString()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a very unlikely scenario, but we should switch this to an error level statement.
I tested this locally by setting the dead connection interval to 15 seconds-- much faster than the stream API sends heartbeats-- and ensuring that everything reconnected. I'm going to publish a snapshot of this and run it through the full integration suite next. |
@jkodumal what do you recommend setting 'heartbeatIntervalSecs' in the ld-relay config? At the moment we have it set to 15 seconds. Could we get better performance from the relay by reducing this frequency? |
@samhaldane we haven't measured the performance impact of more frequent heartbeats, but our stream API sends heartbeats at 180 seconds. Anything under 5 minutes is reasonable. |
Integration suite passes and testing with aggressive reconnects in visualvm confirms that there are no resource leaks. |
don't give up permanently after a 400 error
No description provided.