Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

locksmith fails when -etcd-cafile is specified #948

Open
adborden opened this issue Jan 14, 2023 · 10 comments
Open

locksmith fails when -etcd-cafile is specified #948

adborden opened this issue Jan 14, 2023 · 10 comments
Labels
kind/bug Something isn't working

Comments

@adborden
Copy link

Description

When -etcd-cafile is specified without a client cert/key, locksmith fails with the error:

$ locksmithctl -etcd-cafile=/etc/ssl/certs/ca-certificates.crt status
Error initializing etcd client: open : no such file or directory

I've configured etcd with TLS using self-signed certificates but not TLS client authentication. locksmith seems to be looking for a certificate and key, even though these options are not applicable.

Impact

Error message is confusing, because it relates to an unrelated command line option.

Environment and steps to reproduce

  1. Set-up: Flatcar Linux 3374.2.2
  2. Task: Configuring locksmith with TLS communication and server-only authentication
  3. Action(s):
    a. locksmithctl -etcd-cafile=/etc/ssl/certs/ca-certificates.crt status
  4. Error: Error initializing etcd client: open : no such file or directory

Expected behavior

locksmith uses the specified CA to authenticate the server without client authentication.

Additional information

N/A.

@bmbeverst
Copy link

bmbeverst commented Mar 29, 2024

Seeing this issue with k3s and the embedded etcd as well. k3s with an embedded etcd does work with etcdctl but not locksmithctl.

I noticed a pull request to upgrade locksmith to etcd3 link, maybe that is the issue?

Impact

Not able to use etcd based locksmith reboots with k3s

Environment and steps to reproduce

  1. Set-up: Flatcar Linux 3815.2.1
  2. Task: Install k3s with etcd and configuring locksmith with TLS communication
  3. Action(s):
    1. Install k3s curl -sfL https://get.k3s.io | sh -s - --secrets-encryption --token SuperSecrect --cluster-init
    2. Test with etcdctl: etcdctl --cacert=/var/lib/rancher/k3s/server/tls/etcd/server-ca.crt --cert=/var/lib/rancher/k3s/server/tls/etcd/server-client.crt --key=/var/lib/rancher/k3s/server/tls/etcd/server-client.key endpoint status
    3. Test with locksmith: locksmithctl --etcd-cafile="/var/lib/rancher/k3s/server/tls/etcd/server-ca.crt" --etcd-certfile="/var/lib/rancher/k3s/server/tls/etcd/server-client.crt" --etcd-keyfile="/var/lib/rancher/k3s/server/tls/etcd/server-client.key" status
  4. Error: Error initializing etcd client: creating etcd lock client: EOF

Edit:

Manually passing endpoints instead of using the defaults worked a little more:

locksmithctl --etcd-cafile="/var/lib/rancher/k3s/server/tls/etcd/server-ca.crt" --etcd-certfile="/var/lib/rancher/k3s/server/tls/etcd/server-client.crt" --etcd-keyfile="/var/lib/rancher/k3s/server/tls/etcd/server-client.key" --endpoint https://127.0.0.1:2379,https://10.10.1.41:2379,https://10.10.1.41:2380 status 
Error initializing etcd client: creating etcd lock client: net/http: HTTP/1.x transport connection broken: malformed HTTP response "\x00\x00\x06\x04\x00\x00\x00\x00\x00\x00\x05\x00\x00@\x00"

Also tried the peer files, and those understandably didn't work.

Expected behavior

locksmith works with tls etcd in k3s.

@tormath1
Copy link
Contributor

@bmbeverst can you configure etcd with --enable-v2 to assert the issue comes from the v2/v3? I'll try to restart the PR you linked.

@bmbeverst
Copy link

Thanks @tormath1

I was unable to run etcd with enable-v2. Since when I set the enable-v2: true in the /var/lib/rancher/k3s/server/db/etcd/config file (Where I got the TLS config) it was removed when I rebooted. The node is already part of a cluster. I guess that doesn't allow it to change. I didn't find any help in the k3s docs or Google for enabling v2 in the embedded etcd.

So I did the reverse, configured etcdctl to use v2. Which results in the same error as locksmith:

ETCDCTL_API=2 etcdctl --ca-file="/var/lib/rancher/k3s/server/tls/etcd/server-ca.crt" --cert-file="/var/lib/rancher/k3s/server/tls/etcd/server-client.crt" --key-file="/var/lib/rancher/k3s/server/tls/etcd/server-client.key" --endpoints https://127.0.0.1:2379 cluster-health
cluster may be unhealthy: failed to list members
Error:  client: etcd cluster is unavailable or misconfigured; error #0: net/http: HTTP/1.x transport connection broken: malformed HTTP response "\x00\x00\x06\x04\x00\x00\x00\x00\x00\x00\x05\x00\x00@\x00"

error #0: net/http: HTTP/1.x transport connection broken: malformed HTTP response "\x00\x00\x06\x04\x00\x00\x00\x00\x00\x00\x05\x00\x00@\x00"

That is the same error that locksmith gave.

          net/http: HTTP/1.x transport connection broken: malformed HTTP response "\x00\x00\x06\x04\x00\x00\x00\x00\x00\x00\x05\x00\x00@\x00"

I grabbed the etcdctl version, and it is 3.5.

# etcdctl version
etcdctl version: 3.5.0
API version: 3.5

Also in the startup log of k3s I see this line

"embed/etcd.go:309","msg":"starting an etcd server","etcd-version":"3.5.9","git-sha":"Not provided

And the latest k3s release shows it as running:

Etcd v3.5.9-k3s1

@tormath1
Copy link
Contributor

tormath1 commented Apr 2, 2024

@bmbeverst that should be doable with:

$ curl -sfL https://get.k3s.io | sh -s - --secrets-encryption --token SuperSecrect --cluster-init --etcd-arg=-experimental-enable-v2v3=v2 --etcd-arg=-enable-v2=true

but I kept getting the error, even if I see the flags being processed:

Apr 02 14:36:45 localhost k3s[21600]: {"level":"warn","ts":"2024-04-02T14:36:45.1322Z","caller":"embed/etcd.go:739","msg":"Flag `enable-v2` is deprecated and will get removed in etcd 3.6."}
Apr 02 14:36:45 localhost k3s[21600]: {"level":"warn","ts":"2024-04-02T14:36:45.132252Z","caller":"embed/etcd.go:741","msg":"Flag `experimental-enable-v2v3` is deprecated and will get removed in etcd 3.6."}

@bmbeverst
Copy link

I see the same issue, the etcd server is detecting the new configuration and I see it in the config file. The etcd clients still cannot connect. The same errors are before.

Any luck with the PR?

@tormath1
Copy link
Contributor

tormath1 commented Apr 3, 2024

@bmbeverst yes, I confirm it works correctly with the upgrade PR:

$ sudo ./locksmithctl --etcd-cafile="/var/lib/rancher/k3s/server/tls/etcd/server-ca.crt" --etcd-certfile="/var/lib/rancher/k3s/server/tls/etcd/server-client.crt" --etcd-keyfile="/var/lib/rancher/k3s/server/tls/etcd/server-client.key" --endpoint=https://127.0.0.1:2379 status
Available: 9
Max: 10

MACHINE ID
1325649ad50e4756bf05107701cfca69

@pothos
Copy link
Member

pothos commented Apr 3, 2024

Since you are using k3s, which is Kubernetes, I think you could rather use FLUO https://github.com/flatcar/flatcar-linux-update-operator/ or kured https://github.com/kubereboot/kured/ instead of locksmith, or?

@bmbeverst
Copy link

The simplicity of locksmith is what I like, simple process to reboot nodes without any needing additional Kubernetes configuration. Ideally, Kubernetes should be able to tolerate a node rebooting without any issues.

I didn't like kured because after creating a cluster it still needs to the update service to be deployed and configured in Kubernetes. I did not know about the Flatcar Linux Update Operator, but it also requires Kubernetes setup. Perhaps I am mistaken and this is the best path forward.

I am trying to create a setup where I can fully automate the deployment of a multi-node k3s cluster with automatic updates.

@tormath1 to test the PR, do I build locksmith with your PR and overwrite the binaries in the flatcar OS?

@tormath1
Copy link
Contributor

tormath1 commented Apr 9, 2024

I am trying to create a setup where I can fully automate the deployment of a multi-node k3s cluster with automatic updates.

In this case, I would recommend to investigate further with FLUO or Kured approach. Kured is only a daemon set that runs on each node (and compatible with Flatcar) and it can be easily deployed and it takes care of draining cleanly the nodes before reboot.

For trying the PR you can build locally then upload the binary to your nodes in /opt/bin for example. You might need to copy /opt/bin/locksmithctl to /opt/bin/locksmithd if you want to update locksmithd.service to consume this new binary (by overriding the ExecStart= section). ⚠️ The PR has not been updated, use this for testing only ⚠️

@bmbeverst
Copy link

Thanks for the advice! Totally understand that the PR is not production ready.

Really appreciate the help with this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working
Projects
Development

No branches or pull requests

4 participants