upgrade connection error log from warn to error #183

mightyguava · 2020-01-17T18:23:53Z

We had a connectivity issue between the client and ld-relay due to some
misconfiguration. It resulted in apps not being able to connect to
LaunchDarkly for a long time.

It took a while to realize that the SDK actually logged a warning. It'd
be nice if this was an error, since at least for us, that gets sent to more noticeable places.

We had a connectivity issue between the client and ld-relay due to some misconfiguration. It resulted in apps not being able to connect to LaunchDarkly for a long time. It took a while to realize that the SDK actually logged a warning. It'd be nice if this was an error.

eli-darkly · 2020-01-21T19:55:49Z

I think we need to be very careful about making a change like this. It's true that logging at ERROR level is more attention-getting, but that's a double-edged sword: we've had complaints before about logging too many things at ERROR that are not really show-stoppers. In this case, a temporary disconnection of the stream is not really all that unusual— it can happen due to random network glitches, or an update being deployed on our end— so customers might not appreciate getting a lot of monitoring alerts for that, as it's almost always brief and inconsequential and not something they can do anything about. I think that in some of the SDKs, some things like that are still being logged as ERROR but in general we've been trying to standardize toward using that very sparingly.

I think what would probably be more useful is the change you requested elsewhere: programmatic access to the current state of the connection. That's something that is implemented in most of our mobile SDKs, but it currently doesn't exist in the server-side ones; we're definitely considering it, but it would be something we would add across the board, not just in Java, and it'll require some architectural changes. The other thing we've considered adding is some logic that would have it make more noise if the connection stays bad for some significant amount of time.

mightyguava · 2020-01-22T01:28:13Z

Thanks for the thoughtful response @eli-darkly! I understand the preference toward something more holistic and I totally agree.

In the meanwhile though, the stream processor is already logging bad http codes on the error level:

java-server-sdk/src/main/java/com/launchdarkly/client/StreamProcessor.java

Line 67 in 11a2eac

logger.error(httpErrorMessage(status, "streaming connection", "will retry"));

. I don't think an error log for a connection issue is materially different in this case than a bad http status code.

An alternative: Maybe we can make some more noise on a failed first attempt, since that might be an indicator that the app is misconfigured or something else is really broken. That's the real thing I want to address in this PR. Happy to update my PR to do that.

misc cleanup in new APIs

eli-darkly · 2020-05-07T17:43:08Z

Update: what we are planning to do for the 5.0.0 release (although it didn't make it into the 5.0.0-rc1 beta) is as follows:

All recoverable errors (that is, any kind of network error, and any kind of HTTP error status that doesn't mean you have an invalid SDK key and should give up) will be logged at WARN level. That includes stream errors, polling request errors, and failed event posts.
There will be a configurable time threshold which, if the stream (or polling) has been failing continuously for that amount of time, will cause the SDK to log a single message at ERROR level. The message will say that updates have been unavailable for X amount of time, and will briefly describe why (network errors, HTTP 503 errors, etc. - any further details would have already been logged in earlier WARN messages). If it then gets a valid connection, it will emit a WARN message saying that the previous outage has been resolved, and then it will reset its state so that if there is another outage it repeats all of the above.
All unrecoverable errors (like invalid SDK key), which by definition can only happen once during the SDK's lifetime, will be logged at ERROR.

eli-darkly · 2020-05-07T19:20:32Z

So, since the 5.0.0 release will be coming fairly soon, and without that new configuration mechanism I think making it log either more or less stuff at ERROR level in 4.x would be undesirable for some users, I think we will close this PR.

I'm still thinking about this suggestion—

Maybe we can make some more noise on a failed first attempt, since that might be an indicator that the app is misconfigured or something else is really broken.

—but it would depend on what's meant by "some more noise". I feel like ERROR level might still be not quite right there, since if it just happens to briefly fail at startup time but immediately recovers then we've generated an unwanted alert. And there is already a way for the application to tell whether the client has successfully initialized (by checking initialized() after calling the constructor— or, starting in 5.0.0, using the new "data source status" API which can provide more details about whether it was a network fault or a server error or what).

eli-darkly · 2020-05-08T17:23:31Z

@mightyguava

Maybe we can make some more noise on a failed first attempt, since that might be an indicator that the app is misconfigured or something else is really broken.

Actually - I just checked the code, and the SDK already does log the "did not successfully startup within the startup timeout" at ERROR level. So I'm not sure we need to make more noise there.

      try {
        startFuture.get(this.config.startWaitMillis, TimeUnit.MILLISECONDS);
      } catch (TimeoutException e) {
        logger.error("Timeout encountered waiting for LaunchDarkly client initialization");
      } catch (Exception e) {
        logger.error("Exception encountered waiting for LaunchDarkly client initialization: {}", e.toString());
        logger.debug(e.toString(), e);
      }

mightyguava requested a review from eli-darkly January 17, 2020 18:23

mightyguava mentioned this pull request Jan 17, 2020

Lack of programmatic feedback on connection issues on service startup #184

Closed

LaunchDarklyCI pushed a commit that referenced this pull request Jan 30, 2020

Merge pull request #183 from launchdarkly/eb/ch62493/cleanup

0f6fdd5

misc cleanup in new APIs

andimarek mentioned this pull request Apr 27, 2020

Differentiate between recoverable and non recoverable connections Problems #190

Closed

eli-darkly closed this May 8, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

upgrade connection error log from warn to error #183

upgrade connection error log from warn to error #183

mightyguava commented Jan 17, 2020

eli-darkly commented Jan 21, 2020

mightyguava commented Jan 22, 2020

eli-darkly commented May 7, 2020

eli-darkly commented May 7, 2020

eli-darkly commented May 8, 2020 •

edited

Loading

upgrade connection error log from warn to error #183

upgrade connection error log from warn to error #183

Conversation

mightyguava commented Jan 17, 2020

eli-darkly commented Jan 21, 2020

mightyguava commented Jan 22, 2020

eli-darkly commented May 7, 2020

eli-darkly commented May 7, 2020

eli-darkly commented May 8, 2020 • edited Loading

eli-darkly commented May 8, 2020 •

edited

Loading