Skip to content
This repository has been archived by the owner on May 30, 2024. It is now read-only.

Automatically reconnect to the stream if a heartbeat is not received in time #77

Merged
merged 5 commits into from
Nov 10, 2016

Conversation

jkodumal
Copy link
Contributor

@jkodumal jkodumal commented Nov 9, 2016

No description provided.

@@ -34,6 +42,8 @@
this.config = config;
this.sdkKey = sdkKey;
this.requestor = requestor;
this.heartbeatDetectorService = Executors.newScheduledThreadPool(1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Elsewhere we used named threadpools like this:

ThreadFactory threadFactory = new ThreadFactoryBuilder()
        .setNameFormat("LaunchDarkly-HeartbeatDetector-%d")
        .build();
    this. heartbeatDetectorService = Executors.newSingleThreadScheduledExecutor(threadFactory);

This makes for easier debugging and reading of log statements.
Also, I think we want this to be a single threaded executor, even though practically it won't matter much.


private final FeatureStore store;
private final LDConfig config;
private final String sdkKey;
private final FeatureRequestor requestor;
private EventSource es;
private final ScheduledExecutorService heartbeatDetectorService;
private DateTime lastHeartbeat;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think we may want lastHeartbeat to be volatile, or an atomic reference. thoughts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a good idea-- volatile is sufficient.


@Override
public void run() {
DateTime fiveMinutesAgo = DateTime.now().minusSeconds(DEAD_CONNECTION_INTERVAL_SECONDS);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's rename fiveMinutesAgo to reconnectThresholdTime or something else not tied to the value of the constant?

try {
logger.info("Stream stopped receiving heartbeats- reconnecting.");
es.close();
start();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if es.close() throws an exception? I don't think we'll restart the stream. I think putting start() in a finally block will help.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we're actually failing to close the previous EventSource? If that's happening every time (for some unknown reason) then we'll be creating a new ES instance every few minutes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

true- what about checking the state of es in the finally block?

if (es.getState() == ReadyState.CLOSED) {  //or whatever state we're in after closing.
   es.start();
} else {
   // this is bad.
}
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the code, it's not really possible. Nothing in es.close() throws an exception-- it's simply there to implement the Closeable interface. Nevertheless, calling start in finally makes sense in case the exception semantics in close change.

if (es.getState() == ReadyState.SHUTDOWN) {
start();
} else {
logger.warn("Expected ES to be in state SHUTDOWN, but it's currently in state " + es.getState().toString());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a very unlikely scenario, but we should switch this to an error level statement.

@jkodumal
Copy link
Contributor Author

jkodumal commented Nov 9, 2016

I tested this locally by setting the dead connection interval to 15 seconds-- much faster than the stream API sends heartbeats-- and ensuring that everything reconnected. I'm going to publish a snapshot of this and run it through the full integration suite next.

@samhaldane
Copy link
Contributor

@jkodumal what do you recommend setting 'heartbeatIntervalSecs' in the ld-relay config? At the moment we have it set to 15 seconds. Could we get better performance from the relay by reducing this frequency?

@jkodumal
Copy link
Contributor Author

jkodumal commented Nov 9, 2016

@samhaldane we haven't measured the performance impact of more frequent heartbeats, but our stream API sends heartbeats at 180 seconds. Anything under 5 minutes is reasonable.

@jkodumal
Copy link
Contributor Author

Integration suite passes and testing with aggressive reconnects in visualvm confirms that there are no resource leaks.

@jkodumal jkodumal merged commit 583f403 into master Nov 10, 2016
@jkodumal jkodumal deleted the jko/heartbeat-reconnect branch November 10, 2016 23:35
eli-darkly added a commit that referenced this pull request Jul 17, 2018
don't give up permanently after a 400 error
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants