-
-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
lb_try_interval/lb_try_duration do not pick up new backends on config reload #4442
Comments
Thanks for the interesting feature request. First I've heard of one like this! As mentioned on Twitter, I'm currently refactoring the reverse proxy so maybe you're in luck. But I'll have to design a solution first and see if it'll fit. In the meantime, do you know the new backend addresses ahead of time? You could possibly configure all the future backends that could possibly be used, along with a few lines of health check configuration to ensure those get skipped from the rotation until they're available. The way it works now: Config reloads are graceful, but they're also isolated, meaning that they won't affect anything from a previous config, including active connections. So Caddy does "notice" that there's a new backend configuration, but it won't upset existing connections. |
The way I'm currently planning on building this I don't think I'll know the backend addresses ahead of time, sadly. My current plan is to build with Kubernetes. I want to be able to do the following:
|
Oh I've just had the most horrible idea... could something like this work?
The idea being that I can change the definition of |
If you feel like watching Matt talk about it for 50 minutes 😅 he streamed the beginning of the work on refactoring the upstreams logic https://youtu.be/hj7yzXb11jU tl;dw, right now the upstreams are a static list, but the plan is to make it possible to have a dynamic list of upstreams, and the source could be whatever (you could write a custom module to provide the list on the fly, via SRV, or maybe fetch from HTTP and cache it for a few seconds, I dunno, whatever you like). Point of note @mholt for this to work though, it would need to fetch a list of upstreams on every retry iteration and not just once before the loop.
Hmm, I don't think so, because Caddy still needs to load a new config, and the new one won't take effect while there's still pending requests. Any config changes change the entire server's config, it's not targetted. |
I tried that locally using this:
But sadly it doesn't seem to work - when I shut down the
|
(I bet I could get this to work by running two instances of Caddy though...) |
FWIW, most of the time,
Yeah that's true, probably. |
I wonder if an API endpoint that just adds/removes backends without a config reload could be helpful. The other piece of this would be we'd have to get the list of upstreams in each iteration of the |
I would definitely find that useful! |
Hmm, silly question: why not route through services instead of directly to pods? |
I'm still getting my head around Kubernetes terminology, but the root challenge I have here is that because my applications use SQLite for persistent data the pods/containers/whatever are stateful: I can't do the normal Kubernetes thing of bringing up a new version, serving traffic from both versions for a few seconds and scaling down the old version. Even if I wasn't using such an unconventional persistence mechanism is still be interested in holding traffic like this - for things like running expensive database schema changes where having ten seconds without any traffic can help me finish a major upgrade without any visible downtime. |
You can still do the same if you use a service on top of your pods. The only difference is that K8s would be the one resolving the IP & port for you. Caddy can simply proxy to the service address and not knowing which pod. When you bring down your only pod, the service is effectively down, and caddy can do its normal retry until a new pod is online, i.e. the service is back up. |
Oh I see what you mean! Yes that's a fantastic idea, I shall try that. |
The issue is still valid though. You'd want to proxy directly to pods in many scenarios, so you can take advantage of |
Also get upstreams at every retry loop iteration instead of just once before the loop. See #4442.
I've implemented the getting of upstreams "per retry" in #4470. The actual API endpoint to adjust the upstreams specifically will have to come in a future PR. |
Also get upstreams at every retry loop iteration instead of just once before the loop. See #4442.
* reverseproxy: Begin refactor to enable dynamic upstreams Streamed here: https://www.youtube.com/watch?v=hj7yzXb11jU * Implement SRV and A/AAA upstream sources Also get upstreams at every retry loop iteration instead of just once before the loop. See #4442. * Minor tweaks from review * Limit size of upstreams caches * Add doc notes deprecating LookupSRV * Provision dynamic upstreams Still WIP, preparing to preserve health checker functionality * Rejigger health checks Move active health check results into handler-specific Upstreams. Improve documentation regarding health checks and upstreams. * Deprecation notice * Add Caddyfile support, use `caddy.Duration` * Interface guards * Implement custom resolvers, add resolvers to http transport Caddyfile * SRV: fix Caddyfile `name` inline arg, remove proto condition * Use pointer receiver * Add debug logs Co-authored-by: Francis Lavoie <[email protected]>
I would like to understand the impact of #4470 wrt @simonw OP. So, yes, Kubernetes service would be a good, albeit extremely heavyweight way to solve the problem he has. That said, I would argue that in the Kubernetes world, In a simpler world, where I do not run Kubernetes, and thus Caddy is extremely valuable, I am picturing how 4470 helps here:
Let me know if I'm missing any detail |
@gc-ss Yeah, so there's a couple ways to handle those kinds of situations:
(Although, upon inspection, I didn't actually implement a way to disable the caching of the SRV results, but you can set a Refresh value of 1 nanosecond which will probably do the trick in the meantime.) |
@simonw does the Dynamic Upstreams feature solve your usecase? https://caddyserver.com/docs/caddyfile/directives/reverse_proxy#dynamic-upstreams Also worth noting, we just merged #4756 which adds I think we can probably close this issue now. |
Just tried this with dynamic upstreams for Docker: https://github.com/invzhi/caddy-docker-upstreams
|
Given the following
Caddyfile
:I use
caddy run
and run a separate server on port 8003 (I'm using datasette -p 8003 here) and it proxies correctly. If I shut down my 8003 server and try to hithttp://localhost/
I get the desired behaviour - my browser spins for up to 30s, and if I restart my 8003 server during that time the request is proxied through and returned from the backend.What I'd really like to be able to do though is to start up a new server on another port (actually in production on another IP/port combination) and have traffic resume against the new server.
So I tried editing the
Caddyfile
to uselocalhost:8004
instead, started up my backend on port 8004 then usedcaddy reload
to load in the new configuration... and my request to port 80 continued to spin. It appears Caddy didn't notice that there was now a new configuration for the backend for thisreverse_proxy
.It would be really cool if that
lb_try_interval/lb_try_duration
feature could respond to updated configurations and seamlessly forward paused traffic to the new backend.(This started as a Twitter conversation: https://twitter.com/mholt6/status/1463656086360051714)
The text was updated successfully, but these errors were encountered: