-
-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
reverseproxy: SRV dynamic upstream failover #5832
Conversation
Thanks for this fix @mholt I’m wondering why it’s only for SRV and not A lookups for dynamic upstreams? |
It was only requested for SRV upstreams for the moment. If it for sure works we can easily add it for A upstreams too 👍 |
@@ -140,6 +147,12 @@ func (su SRVUpstreams) GetUpstreams(r *http.Request) ([]*Upstream, error) { | |||
// out and an error will be returned alongside the remaining results, if any." Thus, we | |||
// only return an error if no records were also returned. | |||
if len(records) == 0 { | |||
if su.GracePeriod > 0 { | |||
su.logger.Error("SRV lookup failed; using previously cached", zap.Error(err)) | |||
cached.freshness = time.Now().Add(-time.Duration(su.Refresh) - time.Duration(su.GracePeriod)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm second-guessing my math here.
I should write some tests for this. I did a couple in my head but I should probably just write them down.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still second guessing, or should we merge?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Finally got a chance to do the math.
This had a mistake. 😅
The refresh period means "no longer fresh when freshness < now-refresh
". So if the refresh period is 5 minutes, and the grace period is 2 minutes, that means that we should try again in 2 minutes (using only freshness and refresh). In other words, we want freshness < now-refresh
to be true in 2 minutes, or freshness < (now+2)-refresh
. Thus we set freshness to now+grace-refresh
, or now+2-5 = -3
. That means freshness will be 5 minutes ago in 2 minutes.
This also works when grace > refresh; we set freshness ahead of now and the extra time is allowed before trying again.
Pushing a commit soon then will merge.
This is missing Caddyfile support for the |
This should fix #5816 - if the SRV lookup fails and a GracePeriod is defined, it will continue to use the previously-cached records until the GracePeriod is up, after which it will try again.
@cds2-stripe @jjiang-stripe Feel free to give this a shot -- let me know if I need to rebase this onto another branch.