reverseproxy: SRV dynamic upstream failover #5832

mholt · 2023-09-21T17:48:32Z

This should fix #5816 - if the SRV lookup fails and a GracePeriod is defined, it will continue to use the previously-cached records until the GracePeriod is up, after which it will try again.

@cds2-stripe @jjiang-stripe Feel free to give this a shot -- let me know if I need to rebase this onto another branch.

kkroo · 2023-09-21T22:06:12Z

Thanks for this fix @mholt I’m wondering why it’s only for SRV and not A lookups for dynamic upstreams?

mholt · 2023-09-21T22:13:41Z

It was only requested for SRV upstreams for the moment. If it for sure works we can easily add it for A upstreams too 👍

mholt · 2023-09-21T22:15:09Z

modules/caddyhttp/reverseproxy/upstreams.go

@@ -140,6 +147,12 @@ func (su SRVUpstreams) GetUpstreams(r *http.Request) ([]*Upstream, error) {
 		// out and an error will be returned alongside the remaining results, if any." Thus, we
 		// only return an error if no records were also returned.
 		if len(records) == 0 {
+			if su.GracePeriod > 0 {
+				su.logger.Error("SRV lookup failed; using previously cached", zap.Error(err))
+				cached.freshness = time.Now().Add(-time.Duration(su.Refresh) - time.Duration(su.GracePeriod))


I'm second-guessing my math here.

I should write some tests for this. I did a couple in my head but I should probably just write them down.

Still second guessing, or should we merge?

Finally got a chance to do the math.

This had a mistake. 😅

The refresh period means "no longer fresh when freshness < now-refresh". So if the refresh period is 5 minutes, and the grace period is 2 minutes, that means that we should try again in 2 minutes (using only freshness and refresh). In other words, we want freshness < now-refresh to be true in 2 minutes, or freshness < (now+2)-refresh. Thus we set freshness to now+grace-refresh, or now+2-5 = -3. That means freshness will be 5 minutes ago in 2 minutes.

This also works when grace > refresh; we set freshness ahead of now and the extra time is allowed before trying again.

Pushing a commit soon then will merge.

francislavoie · 2024-04-22T09:24:01Z

This is missing Caddyfile support for the grace_period option

mholt added 3 commits September 19, 2023 10:11

Implement grace period, but probably needs sync

a61b6b7

Update cached freshness value

91e5a33

D'oh, actually use the grace period

f759a12

mholt added this to the 2.9.0 milestone Sep 21, 2023

mholt mentioned this pull request Sep 21, 2023

Support falling back to cached upstreams IPs when DNS lookups fail #5816

Closed

mholt commented Sep 21, 2023

View reviewed changes

Merge branch 'master' into upstream-failover

81d1c57

francislavoie added the feature ⚙️ New feature or request label Jan 13, 2024

francislavoie modified the milestones: v2.9.0, v2.8.0 Jan 13, 2024

francislavoie changed the title ~~SRV dynamic upstream failover~~ reverseproxy: SRV dynamic upstream failover Jan 13, 2024

Fix freshness math

5880c3b

mholt merged commit 72ce78d into master Mar 5, 2024
25 checks passed

mholt deleted the upstream-failover branch March 5, 2024 19:08

francislavoie mentioned this pull request Apr 22, 2024

Add Caddyfile wiring for proxy dynamic srv's grace_period option #6261

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

reverseproxy: SRV dynamic upstream failover #5832

reverseproxy: SRV dynamic upstream failover #5832

mholt commented Sep 21, 2023

kkroo commented Sep 21, 2023

mholt commented Sep 21, 2023

mholt Sep 21, 2023

francislavoie Jan 13, 2024

mholt Mar 5, 2024

francislavoie commented Apr 22, 2024

reverseproxy: SRV dynamic upstream failover #5832

reverseproxy: SRV dynamic upstream failover #5832

Conversation

mholt commented Sep 21, 2023

kkroo commented Sep 21, 2023

mholt commented Sep 21, 2023

mholt Sep 21, 2023

Choose a reason for hiding this comment

francislavoie Jan 13, 2024

Choose a reason for hiding this comment

mholt Mar 5, 2024

Choose a reason for hiding this comment

francislavoie commented Apr 22, 2024