Skip to content
This repository has been archived by the owner on May 30, 2024. It is now read-only.

upgrade connection error log from warn to error #183

Closed
wants to merge 1 commit into from

Conversation

mightyguava
Copy link
Contributor

We had a connectivity issue between the client and ld-relay due to some
misconfiguration. It resulted in apps not being able to connect to
LaunchDarkly for a long time.

It took a while to realize that the SDK actually logged a warning. It'd
be nice if this was an error, since at least for us, that gets sent to more noticeable places.

We had a connectivity issue between the client and ld-relay due to some
misconfiguration. It resulted in apps not being able to connect to
LaunchDarkly for a long time.

It took a while to realize that the SDK actually logged a warning. It'd
be nice if this was an error.
@eli-darkly
Copy link
Contributor

I think we need to be very careful about making a change like this. It's true that logging at ERROR level is more attention-getting, but that's a double-edged sword: we've had complaints before about logging too many things at ERROR that are not really show-stoppers. In this case, a temporary disconnection of the stream is not really all that unusual— it can happen due to random network glitches, or an update being deployed on our end— so customers might not appreciate getting a lot of monitoring alerts for that, as it's almost always brief and inconsequential and not something they can do anything about. I think that in some of the SDKs, some things like that are still being logged as ERROR but in general we've been trying to standardize toward using that very sparingly.

I think what would probably be more useful is the change you requested elsewhere: programmatic access to the current state of the connection. That's something that is implemented in most of our mobile SDKs, but it currently doesn't exist in the server-side ones; we're definitely considering it, but it would be something we would add across the board, not just in Java, and it'll require some architectural changes. The other thing we've considered adding is some logic that would have it make more noise if the connection stays bad for some significant amount of time.

@mightyguava
Copy link
Contributor Author

Thanks for the thoughtful response @eli-darkly! I understand the preference toward something more holistic and I totally agree.

In the meanwhile though, the stream processor is already logging bad http codes on the error level:

logger.error(httpErrorMessage(status, "streaming connection", "will retry"));
. I don't think an error log for a connection issue is materially different in this case than a bad http status code.

An alternative: Maybe we can make some more noise on a failed first attempt, since that might be an indicator that the app is misconfigured or something else is really broken. That's the real thing I want to address in this PR. Happy to update my PR to do that.

@eli-darkly
Copy link
Contributor

Update: what we are planning to do for the 5.0.0 release (although it didn't make it into the 5.0.0-rc1 beta) is as follows:

  • All recoverable errors (that is, any kind of network error, and any kind of HTTP error status that doesn't mean you have an invalid SDK key and should give up) will be logged at WARN level. That includes stream errors, polling request errors, and failed event posts.
  • There will be a configurable time threshold which, if the stream (or polling) has been failing continuously for that amount of time, will cause the SDK to log a single message at ERROR level. The message will say that updates have been unavailable for X amount of time, and will briefly describe why (network errors, HTTP 503 errors, etc. - any further details would have already been logged in earlier WARN messages). If it then gets a valid connection, it will emit a WARN message saying that the previous outage has been resolved, and then it will reset its state so that if there is another outage it repeats all of the above.
  • All unrecoverable errors (like invalid SDK key), which by definition can only happen once during the SDK's lifetime, will be logged at ERROR.

@eli-darkly
Copy link
Contributor

So, since the 5.0.0 release will be coming fairly soon, and without that new configuration mechanism I think making it log either more or less stuff at ERROR level in 4.x would be undesirable for some users, I think we will close this PR.

I'm still thinking about this suggestion—

Maybe we can make some more noise on a failed first attempt, since that might be an indicator that the app is misconfigured or something else is really broken.

—but it would depend on what's meant by "some more noise". I feel like ERROR level might still be not quite right there, since if it just happens to briefly fail at startup time but immediately recovers then we've generated an unwanted alert. And there is already a way for the application to tell whether the client has successfully initialized (by checking initialized() after calling the constructor— or, starting in 5.0.0, using the new "data source status" API which can provide more details about whether it was a network fault or a server error or what).

@eli-darkly eli-darkly closed this May 8, 2020
@eli-darkly
Copy link
Contributor

eli-darkly commented May 8, 2020

@mightyguava

Maybe we can make some more noise on a failed first attempt, since that might be an indicator that the app is misconfigured or something else is really broken.

Actually - I just checked the code, and the SDK already does log the "did not successfully startup within the startup timeout" at ERROR level. So I'm not sure we need to make more noise there.

      try {
        startFuture.get(this.config.startWaitMillis, TimeUnit.MILLISECONDS);
      } catch (TimeoutException e) {
        logger.error("Timeout encountered waiting for LaunchDarkly client initialization");
      } catch (Exception e) {
        logger.error("Exception encountered waiting for LaunchDarkly client initialization: {}", e.toString());
        logger.debug(e.toString(), e);
      }

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants