-
Notifications
You must be signed in to change notification settings - Fork 5.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Acme in HA Traefik Scenario #348
Comments
+1 |
@stongo For sure. |
I love this idea 👍 we can add kubernetes secrets too on storage possibility |
to take this a step further, there are also complications responding to acme challenges in a DNS round robin situation. each node generates its own challenge response, which causes the challenge to fail unless by some miracle the same node responds. to get around this, we should store the challenge response certs in the shared backend as well |
also libkv supports boltdb as a local (non-shared) store, how do we feel about just making all storage go through libkv? so instead of |
@nlf: that's basically what I'm thinking about ;)
Your point on DNS round robin is a real good catch! |
Hashicorp vault would be good for the kv store
|
@kbroughton as Traefik relies on libkv: docker/libkv#123 |
does libkv handle Kubernetes or does Traefik talk directly to etcd in a kubernetes cluster ? |
Traefik talks to the kubernetes api. It does not have any access to underlying storage. I think that on kubernetes we could use secrets as a backend store... |
so all Traefik instances could watch a certain secret ? and configure to respond to challenges, and then also dispatch the newly gotten SSL certs to all Traefik instances, again, watching the certs as they become available ? that'd be reaaaallly cool :) |
Is there anything I could do to help out ? Having Traefik as a load-balanced front-end, able to negotiate new certificates would be exceptionally delightful. It would replace CloudFlare + ELB altogether for me. I see two issues:
I'm thinking more Kubernetes here. Maybe this dicussion belongs in a new issue.. I'd assume we could store the challenges in a Kubernetes Secret. Traefik could watch the given secret and keep an in-memory copy of them when an ACME request arrives. For the certificates, it seems a bit more tricky. Could we listen for a bunch of secrets ? or a single secret with prefixed key=value that would translated to Traefik domaines ? Something like:
Being able to atomically update certain keys in Other option: we could have a small redis service somewhere to centralize that information.. on Kubernetes it's not that costly if it provides full HA Traefik with automatic SSL :) What do you think ? |
Thought of another option: running a single pod RC and forwarding traffic from traefik instances for /.well-known/... to that pod so it can centrally manage challenges and update TLS secrets that traefik instances would pick up. For TLS certs storage, I thought maybe annotations could be used to declare the name of the secret to use when writing let's encrypt certs. |
Hi @abourget, for now, we are working on using kv-stores (etcd, consul, zookeeper...) as a global configuration storage #481. |
The way I did this (in a hurry), was to have a top-level rule that sent all |
@dts HAProxy with some Kubernetes glue in front ? Anything publicly available ? |
Just a heads up, I am in the process of implementing this for Caddy (abstraction is at caddyserver/caddy#913 and Consul impl not pushed to a repo yet). Some things that I hit that y'all might want to consider when developing:
|
@abourget: WIP, caveat emptor, etc, but the HAProxy component is here: https://github.com/dts/kubernetes-haproxy-lb (originally stolen from kubernetes/contrib). The bit for connecting ACME and Kubernetes secrets is here: https://github.com/dts/kubernetes-acme. They don't depend on each other, you could easily use the ACME part with traefik. |
The other option would be to implement a leadership election where the leader is the only one to attempt to generate certificates and respond to acme challenges. Not sure if this is any easier, but definitely another solution to this problem. It would at least solve distributed locking complexities. |
@stongo that's what I was thinking about using https://github.com/docker/leadership. But in this case, slave nodes will have to forward ACME requests to master node. |
@emilevauge do you know if slaves would be able to abort acme requests and make the assumption that eventually the master node will receive the acme challenge? it adds more time delay to the initial generation of the cert, but reduces implementation complexity |
I've implemented a fully HA letsencrypt microservice for Kubernetes, that relies upon Secrets to acquire locks on certificate resources, and ConfigMaps for storing the actual certificates that can be shared. The locking is implemented through a generic interface with only a Secrets-based implementation, but could be swapped out for anything that'll allow for atomic updating (eg. etcd) You can see the full implementation here: https://github.com/munnerz/kube-acme I'd be happy to answer any questions or help out with any dev on this :) |
It would be better to do this in a backend-independent way.
This only requires the user to generate and share a key-pair (or better yet, a configuration secret!) across all servers. Additionally, the same key-pair could be also shared using various secure Secret backends and vaults. |
Storages for secrets (and TLS/SSL private keys are secrets) could be a good way, like vault. |
Acme integration works great when Traefik is running as a singular service.
When running Traefik as a HA service and using DNS round robin for instance, the Acme integration model breaks down because Let's Encrypt will only issue a certificate to one Traefik instance in a given cluster, making all other Traefik instances throw and shutdown.
I think the easiest solution would be to write the certificates to whatever shared backend is already configured - ie etcd, consul - as a secondary source for the certificates.
Any feedback would be appreciated on this.
The text was updated successfully, but these errors were encountered: