Acme in HA Traefik Scenario #348

stongo · 2016-05-02T15:04:37Z

Acme integration works great when Traefik is running as a singular service.
When running Traefik as a HA service and using DNS round robin for instance, the Acme integration model breaks down because Let's Encrypt will only issue a certificate to one Traefik instance in a given cluster, making all other Traefik instances throw and shutdown.
I think the easiest solution would be to write the certificates to whatever shared backend is already configured - ie etcd, consul - as a secondary source for the certificates.
Any feedback would be appreciated on this.

kbroughton · 2016-05-02T16:49:21Z

+1
this might also be useful
https://github.com/alex/letsencrypt-aws

emilevauge · 2016-05-02T21:14:12Z

@stongo For sure.
I would also like to be able to store everything from traefik.toml in a KV store, in order to have a centralized configuration.
Maybe that should be done in the same PR.

ekozan · 2016-05-06T08:54:20Z

I love this idea 👍 we can add kubernetes secrets too on storage possibility

nlf · 2016-05-10T19:32:40Z

to take this a step further, there are also complications responding to acme challenges in a DNS round robin situation. each node generates its own challenge response, which causes the challenge to fail unless by some miracle the same node responds.

to get around this, we should store the challenge response certs in the shared backend as well

nlf · 2016-05-10T19:35:25Z

also libkv supports boltdb as a local (non-shared) store, how do we feel about just making all storage go through libkv? so instead of acme.json we'd create a boltdb with the certificates stored in it, and if you configured a shared backend (currently consul, etcd or zookeeper) they would go there. this means there are no real conditionals, just a configured backend.

emilevauge · 2016-05-12T14:56:22Z

@nlf: that's basically what I'm thinking about ;)

I would also like to be able to store everything from traefik.toml in a KV store, in order to have a centralized configuration.

Your point on DNS round robin is a real good catch!

kbroughton · 2016-05-12T15:08:19Z

Hashicorp vault would be good for the kv store
On May 12, 2016 9:56 AM, "Emile Vauge" [email protected] wrote:

@nlf https://github.com/nlf: that's basically what I'm thinking about ;)

I would also like to be able to store everything from traefik.toml in a KV
store, in order to have a centralized configuration.

—
You are receiving this because you commented.
Reply to this email directly or view it on GitHub
#348 (comment)

emilevauge · 2016-05-12T15:10:49Z

@kbroughton as Traefik relies on libkv: docker/libkv#123

abourget · 2016-05-24T01:36:24Z

does libkv handle Kubernetes or does Traefik talk directly to etcd in a kubernetes cluster ?

errm · 2016-05-26T11:27:11Z

Traefik talks to the kubernetes api. It does not have any access to underlying storage.

I think that on kubernetes we could use secrets as a backend store...

abourget · 2016-05-26T19:49:00Z

so all Traefik instances could watch a certain secret ? and configure to respond to challenges, and then also dispatch the newly gotten SSL certs to all Traefik instances, again, watching the certs as they become available ?

that'd be reaaaallly cool :)

abourget · 2016-07-05T19:57:58Z

Is there anything I could do to help out ? Having Traefik as a load-balanced front-end, able to negotiate new certificates would be exceptionally delightful. It would replace CloudFlare + ELB altogether for me.

I see two issues:

Storing the challenges somewhere so that all Traefik instances can reply to the ACME requests.
Storing the certificates so that any cert granted for a domain can be accessed (or reloaded) by all the Traefik instances.

I'm thinking more Kubernetes here. Maybe this dicussion belongs in a new issue..

I'd assume we could store the challenges in a Kubernetes Secret. Traefik could watch the given secret and keep an in-memory copy of them when an ACME request arrives.

For the certificates, it seems a bit more tricky. Could we listen for a bunch of secrets ? or a single secret with prefixed key=value that would translated to Traefik domaines ? Something like:

apiVersion: v1
kind: Secret
metadata:
  name: traefik-secrets
type: Opaque
data:
  acme-ABCDEFGHIJKL123123123213: ABCDEF12345678==
  acme-ABCDEFGHIJKL123123123213: ABCDEF12345678==
  cert-domain.com-ca: ABCDEF12345678...ABCDEF12345678==
  cert-domain.com-cert: ABCDEF12345678...ABCDEF12345678==
  cert-domain.com-key: ABCDEF12345678...ABCDEF12345678==

Being able to atomically update certain keys in data would be great... otherwise we'd risk overwriting things that another Traefik instance attempts to write in there.

Other option: we could have a small redis service somewhere to centralize that information.. on Kubernetes it's not that costly if it provides full HA Traefik with automatic SSL :)

What do you think ?

abourget · 2016-07-06T06:03:01Z

Thought of another option: running a single pod RC and forwarding traffic from traefik instances for /.well-known/... to that pod so it can centrally manage challenges and update TLS secrets that traefik instances would pick up.

For TLS certs storage, I thought maybe annotations could be used to declare the name of the secret to use when writing let's encrypt certs.

emilevauge · 2016-07-06T10:36:40Z

Hi @abourget, for now, we are working on using kv-stores (etcd, consul, zookeeper...) as a global configuration storage #481.
We will first implement ACME HA using these KV stores.
This would also be great to use k8s secrets in the futur :) Or we could also use the ConfigMap.
We would have to add kubernetes secrets/configmap support in https://github.com/docker/libkv and then use directly in traefik.
Sounds good ?

dts · 2016-07-06T15:52:09Z

The way I did this (in a hurry), was to have a top-level rule that sent all .well-known traffic to a container running an acme manager. The long-lived process managed all the certs, I stored them in a Kubernetes Secret, and then killed the LB pods to load the fresh secret. I wound up using HAProxy (because I found better docs for complex rulesets), but it would work exactly the same with Traefik.

abourget · 2016-07-06T20:16:38Z

@dts HAProxy with some Kubernetes glue in front ? Anything publicly available ?

cretz · 2016-07-06T22:05:51Z

Just a heads up, I am in the process of implementing this for Caddy (abstraction is at caddyserver/caddy#913 and Consul impl not pushed to a repo yet). Some things that I hit that y'all might want to consider when developing:

Goes without saying that data needs to be encrypted. Could use Vault, but I just chose to AES encrypt the vals in Consul.
Things can get racy on auto renewal because you do not know which server is doing the renewal so you'll be tempted to do a shared mutex. The problem is the conditions for renewal often hit multiple servers at the same time, so you need to bail quickly if you can't take the lock (a master/slave leader election process resolves some of these issues, but can damage full HA-ness).
Things can get racy on the second-level local cache you undoubtedly have to have for your certs. Essentially you need to send out an event to the cluster when a cert is updated. Consul supports this of course, just food for thought. Just make sure you don't hold up a request waiting on this new cert. So, renewing w/ enough time to spare helps.

dts · 2016-07-07T00:36:59Z

@abourget: WIP, caveat emptor, etc, but the HAProxy component is here: https://github.com/dts/kubernetes-haproxy-lb (originally stolen from kubernetes/contrib). The bit for connecting ACME and Kubernetes secrets is here: https://github.com/dts/kubernetes-acme. They don't depend on each other, you could easily use the ACME part with traefik.

stongo · 2016-07-12T14:56:38Z

The other option would be to implement a leadership election where the leader is the only one to attempt to generate certificates and respond to acme challenges. Not sure if this is any easier, but definitely another solution to this problem. It would at least solve distributed locking complexities.

emilevauge · 2016-07-12T15:48:36Z

@stongo that's what I was thinking about using https://github.com/docker/leadership. But in this case, slave nodes will have to forward ACME requests to master node.
I still dont know which solution will be easier to implement.

stongo · 2016-07-12T16:11:21Z

@emilevauge do you know if slaves would be able to abort acme requests and make the assumption that eventually the master node will receive the acme challenge? it adds more time delay to the initial generation of the cert, but reduces implementation complexity

munnerz · 2016-07-15T18:34:18Z

I've implemented a fully HA letsencrypt microservice for Kubernetes, that relies upon Secrets to acquire locks on certificate resources, and ConfigMaps for storing the actual certificates that can be shared. The locking is implemented through a generic interface with only a Secrets-based implementation, but could be swapped out for anything that'll allow for atomic updating (eg. etcd)

You can see the full implementation here: https://github.com/munnerz/kube-acme

I'd be happy to answer any questions or help out with any dev on this :)

niieani · 2016-08-30T17:40:43Z

It would be better to do this in a backend-independent way.
I see the solution could as simple as auto-negotiation between all the instances of Traefik:

Traefik does a DNS query for each domain listed in acme.domains
If there are more than one A/AAAA records in the responses, it:
1. waits for all the Traefik instances to be online (all servers must respond properly)
2. contacts all the Traefik instances via the API and synchronizes the certificates in a secure way (e.g. using a hand-generated key pair / secret, which is pre-shared by the user to all the instances)
If no valid certificate is present:
1. it negotiates which server will start an ACME query to get a renewed certificate
2. .well-known traffic is "shared" - i.e. the instance which is sent the query asks the instance that initiated the ACME negotiation for the data to reply to ACME
3. the new certificate is synchronized across the Traefik instances

This only requires the user to generate and share a key-pair (or better yet, a configuration secret!) across all servers. Additionally, the same key-pair could be also shared using various secure Secret backends and vaults.

strarsis · 2016-09-24T14:07:30Z

Storages for secrets (and TLS/SSL private keys are secrets) could be a good way, like vault.

emilevauge added the kind/enhancement a new or improved feature. label May 2, 2016

emilevauge mentioned this issue Aug 5, 2016

Let`s Encrypt enable in etcd #600

Closed

emilevauge mentioned this issue Aug 17, 2016

HA acme support #625

Merged

6 tasks

stongo mentioned this issue Aug 30, 2016

Fronting a domain with DNS A-record round-robin & ACME #654

Closed

niieani mentioned this issue Sep 6, 2016

Potential race condition when using with multiple servers and L7 LB or DNS A record round-robin nginx-proxy/acme-companion#101

Closed

emilevauge closed this as completed in #625 Sep 30, 2016

ldez added the area/acme label Jun 11, 2017

traefik locked and limited conversation to collaborators Sep 1, 2019

traefiker added the status/5-frozen-due-to-age label Sep 1, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Acme in HA Traefik Scenario #348

Acme in HA Traefik Scenario #348

stongo commented May 2, 2016

kbroughton commented May 2, 2016

emilevauge commented May 2, 2016

ekozan commented May 6, 2016

nlf commented May 10, 2016

nlf commented May 10, 2016

emilevauge commented May 12, 2016 •

edited

Loading

kbroughton commented May 12, 2016

emilevauge commented May 12, 2016

abourget commented May 24, 2016

errm commented May 26, 2016

abourget commented May 26, 2016 •

edited

Loading

abourget commented Jul 5, 2016

abourget commented Jul 6, 2016

emilevauge commented Jul 6, 2016

dts commented Jul 6, 2016

abourget commented Jul 6, 2016

cretz commented Jul 6, 2016 •

edited

Loading

dts commented Jul 7, 2016

stongo commented Jul 12, 2016

emilevauge commented Jul 12, 2016

stongo commented Jul 12, 2016

munnerz commented Jul 15, 2016 •

edited

Loading

niieani commented Aug 30, 2016 •

edited

Loading

strarsis commented Sep 24, 2016

Acme in HA Traefik Scenario #348

Acme in HA Traefik Scenario #348

Comments

stongo commented May 2, 2016

kbroughton commented May 2, 2016

emilevauge commented May 2, 2016

ekozan commented May 6, 2016

nlf commented May 10, 2016

nlf commented May 10, 2016

emilevauge commented May 12, 2016 • edited Loading

kbroughton commented May 12, 2016

emilevauge commented May 12, 2016

abourget commented May 24, 2016

errm commented May 26, 2016

abourget commented May 26, 2016 • edited Loading

abourget commented Jul 5, 2016

abourget commented Jul 6, 2016

emilevauge commented Jul 6, 2016

dts commented Jul 6, 2016

abourget commented Jul 6, 2016

cretz commented Jul 6, 2016 • edited Loading

dts commented Jul 7, 2016

stongo commented Jul 12, 2016

emilevauge commented Jul 12, 2016

stongo commented Jul 12, 2016

munnerz commented Jul 15, 2016 • edited Loading

niieani commented Aug 30, 2016 • edited Loading

strarsis commented Sep 24, 2016

emilevauge commented May 12, 2016 •

edited

Loading

abourget commented May 26, 2016 •

edited

Loading

cretz commented Jul 6, 2016 •

edited

Loading

munnerz commented Jul 15, 2016 •

edited

Loading

niieani commented Aug 30, 2016 •

edited

Loading