Skip to content
This repository has been archived by the owner on Nov 22, 2023. It is now read-only.

More robust sync mechanism #63

Merged
merged 3 commits into from
Sep 4, 2018
Merged

More robust sync mechanism #63

merged 3 commits into from
Sep 4, 2018

Conversation

mbyczkowski
Copy link
Contributor

@mbyczkowski mbyczkowski commented Aug 28, 2018

This PR should address issues discussed in #30.

TL;DR: This PR introduces retries with exponential backoff (+ jitter for spreading the load) for each client sync API call

Thoughts/questions:

  • I was on the fence where the retry should happen (in syncer or client), but went with the client to keep the complexity hidden.
  • Currently, we ignore fail counts as long as a retry succeeded. We might care at some point to see how often we have to retry to succeed, but seemed like unnecessary noise at this point.
  • I'm still pondering the best config representation for the backoff, and I'm leaning towards using strings and time.ParseDuration. I've added some simple validation checks for the config, so would be easy to get this changed and handle potential errors. Timeout, MinBackoff and MaxBackoff can now be specified in the same way as PollInterval (so 20s, 100ms etc.)
  • I've relied on a simple 3rd party lib for backoff (that I have used before). It's MIT licensed, so hopefully that's OK.

client.go Outdated
@@ -92,9 +96,17 @@ func (c KeywhizHTTPClient) Logger() *logrus.Entry {

// NewClient produces a read-to-use client struct given PEM-encoded certificate file, key file, and

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this comment should be updated to reflect the change in arguments

Copy link

@stfinney stfinney left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

client.go Outdated
for i := 0; i < c.params.maxRetries; i++ {
now := time.Now()
resp, err = c.httpClient.Get(t.String())
if err != nil || resp.StatusCode == http.StatusOK {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if there's any other status codes we shouldn't retry for. 404 maybe.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably 401, 403 and generally most (if not all 4xx)? I think we should mainly care about 5xx. I will pull this into isIntermittent method

config.go Outdated
}

if config.MaxBackoff == "" {
config.MaxBackoff = "10000ms"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: "10s" is easier to read

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

definitely!

config.go Outdated
MaxRetries uint16 `yaml:"max_retries"` // Optional: retry each HTTP call after non-200 response. Defaults to Keysync's config.
Timeout string `yaml:"timeout"` // Optional: timeout HTTP calls. Defaults to Keysync's config.
MinBackoff string `yaml:"min_backoff"` // Optional: wait time before first retry. Defaults to Keysync's config.
MaxBackoff string `yaml:"max_backoff"` // Optional: max wait time before retries. Defaults to Keysync's config.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to be able to configure backoff per-client, and it's just extra complexity. We don't allow configuring servers per-client, and it makes sense to configure retries and backoff with the server.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool, will simplify 👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mcpherrinm makes sense, I used ConfigClient to refactor NewClient, so I can just make those vars be not readable/settable from YAML client config and just copy them from Config.

MMMMretries, min backoff time and max backoff can be specified in main
keysync config with overrides available for each client. The config
takes a string and later attemps to parse the duration using
`time.ParseDuration` - this behavior is analogous to `poll_interval`.

As part of the refactor, the timeout value is now configurable (defaults
to 60s in absence of keysync's config) and also can be overridden by
each client.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants