Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hetzner Bare Metal server support #433

Closed
4 of 5 tasks
maaft opened this issue Nov 22, 2022 · 27 comments
Closed
4 of 5 tasks

Hetzner Bare Metal server support #433

maaft opened this issue Nov 22, 2022 · 27 comments
Labels
enhancement New feature or request

Comments

@maaft
Copy link

maaft commented Nov 22, 2022

Did anyone successfully integrated hetzner bare metal servers into their kube-hetzner cluster?

Bare-Metal servers are cheap and ideal for long running, resource-hungry tasks like databases.

Following steps are needed to make this work:
(x means that the step is absolutely possible currently)

  • Install MicroOS on Bare-Metal server
  • configure hetzner VPN access through vSwitch
  • Install k3s
  • Join Cluster
  • Configure kube-hetzners CCM to retrieve IPs from bare-metal servers for LB configuration

The last point is currently missing. But I already successfully integrated Hetzner Bare Metal servers by using a different Hetzner CCM than that from the official repo and using TalosOS.

Questions:

  • What CCM is this kube-hetzner using?
  • Is it possible to use our own CCM (e.g. this one here with hetzner bare metal support)
@mysticaltech
Copy link
Collaborator

@maaft We use the official CCM https://github.com/hetznercloud/hcloud-cloud-controller-manager, the other is a fork of this one (maybe modified).

May that previous discussion is related, please have a look #283.

Also, if you find a way, PRs are most welcome, as long as you test it well on your end.

@phaer You may have more ideas on this! 🙏

@ifeulner
Copy link
Contributor

@maaft I think that should work already, if attached via a vswitch to the private network of the cloud servers? Or do I miss something?

@maaft
Copy link
Author

maaft commented Nov 25, 2022

@ifeulner the official CCM does not support bare metal servers (i.e. robot api). Which makes provisioning LBs with bare metal server IPs impossible currently. (Meaning that routing traffic from LB to any service that runs on bare metal, e.g. database, is also not possible)

@mysticaltech
Copy link
Collaborator

mysticaltech commented Nov 25, 2022

@maaft In that case please do not hesitate to submit a PR that adds the ability to change the CCM by choice (via a variable for instance), as long that everything keeps on working.

@maaft
Copy link
Author

maaft commented Nov 25, 2022

@mysticaltech will do! Could you just kindly point me to the location where the current CCM is being configured/setup?

@mysticaltech
Copy link
Collaborator

mysticaltech commented Nov 25, 2022

@maaft Of course. All of the ccm setup happens in init.tf, just search for "ccm" init it. And you also have the template file here.

Since the one from syself is a fork of the main one, it should be quite straightforward.

@mysticaltech mysticaltech added the enhancement New feature or request label Nov 28, 2022
@mysticaltech mysticaltech changed the title Bare Metal Support Hetzner Bare Metal server support Nov 28, 2022
@maaft
Copy link
Author

maaft commented Dec 1, 2022

Small update: I managed to connect my bare metal servers and use my own CCM.

Unfortunately, I now get this error during node syncing:

E1201 08:35:44.575557       1 node_controller.go:215] error syncing 'worker1': failed to get node modifiers from cloud provider: provided node ip for node "worker1" is not valid: failed to get node address from cloud provider that matches ip: 10.1.0.3, requeuing

worker1 is my hetzner bare metal server.

Question: Who exactly is producing that error message? Is it kubernetes itself? Is it my CCM?

@mysticaltech
Copy link
Collaborator

mysticaltech commented Dec 1, 2022

@maaft It's the CCM that ultimately is the reason for that error, for sure. But well done, you are getting close.

You could maybe research this error on GitHub, and the CCMs repos of course.

@maaft
Copy link
Author

maaft commented Dec 1, 2022

Here is my worker config for k3s:

"flannel-iface": "enp9s0.4000"
"kubelet-arg":
- "cloud-provider=external"
- "volume-plugin-dir=/var/lib/kubelet/volumeplugins"
"node-ip": "10.1.0.2"
"node-label":
- "k3s_upgrade=true"
"node-name": "bm-worker3"
"node-taint": []
"server": "https://<master ip>:6443"
"token": "<token from master>"

The issue is probably this:

  • the CCM has no way to retrieve the 10.1.0.2 IP via the robot API because it's configured manually via nmcli.
  • the only IP it'll get is the public IP of the server

My conclusion is, that there needs to be a way to preconfigure the CCM with all bare metal IPs (internal and external). Not sure if this can be done with k3s. I tried --node-external-ip but still kubectl get nodes -o wide will show external IP: <none> for my bare metal servers.

Side-Note:
Previously I've used that CCM in combination with talos and their kubespan feature which essentially configures a wireguard mesh. Therefore all nodes only have an external IP and no internal IPs. And because there are no private IPs involved, the CCM successfully was able to pull all info, provision the LB and everything worked flawlessly.

Now I'm trying to setup something similar with kube-hetzner by using these k3s flags:

--node-external-ip <SERVER_EXTERNAL_IP> --flannel-backend wireguard-native --flannel-external-ip

as flannel should already be able to handle wireguard meshing.

@maaft
Copy link
Author

maaft commented Dec 2, 2022

Got it working.

Setup:

  • no hetzner private network
  • flannel-backend: wireguard-native
  • use nodes public IPs as node-ip
  • use this ccm

Changes which are needed to this repo to support Bare-Metal servers:

  • way to disable private network
  • way to use custom ccm
  • new "wireguard-mesh" option: usable with cillium (untested, but should be possible) and flannel
    • configures cillium or flannel options automatically

Not sure if I have the resources to do everything by myself. It would be very helpful if anyone could add an option to disable the private network (and adapt firewall + LB configurations accordingly).

As I'm fairly new to terraform, I'm unsure about how to implement the wireguard-mesh option: Just if-/else-constructs to generate the k3s config.yaml ? Also I'd start with flannel support first, as I got it working and see currently no reason to switch to cillium

@maaft
Copy link
Author

maaft commented Dec 2, 2022

I created this PR to support custom CCMs.

Note that the CCM by syself is currently not working, because apparently it cannot handle private networks.

@mysticaltech
Copy link
Collaborator

@maaft Thank you for your research and efforts, this may be valuable for us one day, but currently using the private network is very important to minimize latency between the nodes and remove the need for wireguard encryption.

Please note that this project locks k3s to the private network via this attribute in the k3s config flannel-iface.

Now if you can make it work with both private networks and public networks together (basically just remove the flannel-iface attribute), I will definitely review the PR carefully and give it an honest chance.

@mysticaltech
Copy link
Collaborator

mysticaltech commented Dec 2, 2022

Just for clarity @maaft, I indeed think that we would need to use wireguard for that setup, but keeping private networks in the mix is essential. Basically, k3s can work with both at the same time, it does this by default actually, that's why we have to force it to use only the private network with the above iface attribute.

So hetzner cloud nodes need to keep their private IPs in the k3s config, and if the bare-metal server cannot have a private IP as you explained above, then it needs to use its public one. In which case, need to create another agent-bare-metal.tf file for it, that will not join a private subnet, etc, and pass the external IP to all IP params in the k3s config.

@maaft
Copy link
Author

maaft commented Dec 5, 2022

@mysticaltech

Thanks, I'll look into the dual-interface approach. Would make things definitely easier if I don't need to implement some disable_private_network functionality!

Do you happen to know if flannel can be configured to use its wireguard backend only for public-facing traffic? I'll dig into this

Edit:

Is this what we want here?

@Lennix
Copy link

Lennix commented Dec 5, 2022

@maaft
Sorry for going off-topic here a little, but I've read that the performance using a vSwitch to connect bare metal with cloud servers is quite disppointing. Did you by any chance test that?

Going through public network on the over hand is an order of magnitude faster, which might result in wireguard being a viable option?

@maaft
Copy link
Author

maaft commented Dec 5, 2022

@Lennix it's not off-topic at all! I also read about that and it makes sense. But I wasn't able to connect my bare metal server through the vSwitch, due to some issues with the CCM I use (I believe).

Anyway, when I understood @mysticaltech correctly, his suggestion was to use both VPN (on cloud nodes) and wireguard (on bare metal) as k3s supports this.

Btw, it seems to run fine with the hybrid approach. Only issue currently is that my CCM is not able to create a route (?):

I1205 14:13:04.581520       1 event.go:294] "Event occurred" object="bm-worker3" fieldPath="" kind="Node" apiVersion="" type="Warning" reason="FailedToCreateRoute" message="Could not create route 8c6a8bdb-4931-4189-****-******* 10.42.2.0/24 for node bm-worker3 after 201.328998ms: hcloud/CreateRoute: hcops/AllServersCache.ByName: bm-worker3 hcops/AllServersCache.getCache: not found"

@mysticaltech
Copy link
Collaborator

@maaft The hybrid approach described in the k3s docs seems indeed to be what we need in that case. Basically, it will create a wireguard VPN on top of all interfaces if I understand it correctly. This would be ideal!

@mysticaltech
Copy link
Collaborator

Also @maaft, did you try with this ccm, the author seems to have solved the issue (found it here): https://github.com/identw/hetzner-cloud-controller-manager

@mysticaltech
Copy link
Collaborator

Ah no, it's won't work the above, the syself seems to be the ccm for the job. But interesting to read the docs of the one from @identw, as it says that you need to give names to servers in the robot console (not sure if its important), but there is definitely a way to make this work, you are close @maaft ! 🍀

@maaft
Copy link
Author

maaft commented Dec 6, 2022

@mysticaltech Setting the name is indeed important, because in this way the syself CCM is able to retrieve the correct IP (by matching the names). At least this is how I think it works.

Anyway, currently I'm stuck on two, maybe unrelated, points:

  1. k3s docs state, that for the hybrid approach, agents have to be started just with the --node-external-ip flag. When doing this on my metal servers, they won't connect. Instead, setting --node-ip works. But I'm not sure how this will affect traffic encryption.
  2. When I connect my agents with --node-ip flag, there seems to be a "createRoute"-Issue. No Idea what to do here. Help much appreciated!
I1205 14:13:04.581520       1 event.go:294] "Event occurred" object="bm-worker3" fieldPath="" kind="Node" apiVersion="" type="Warning" reason="FailedToCreateRoute" message="Could not create route 8c6a8bdb-4931-4189-****-******* 10.42.2.0/24 for node bm-worker3 after 201.328998ms: hcloud/CreateRoute: hcops/AllServersCache.ByName: bm-worker3 hcops/AllServersCache.getCache: not found"

If I can't get this to work by end of this week, I have to fallback to my wireguard-only setup, without using any VPN from hetzner.

What is the real speed benefit of wireguard vs. hetzner VPN anyway? Did anyone measure some latencies?

@mysticaltech
Copy link
Collaborator

mysticaltech commented Dec 7, 2022

@maaft My understanding is that when you use the hybrid approach you have to:

Set both node-ip and node-external-ip as the external IP for metal servers.
Set node-ip to the internal IP and node-external-ip to the external IP for cloud servers.

While not forgetting to use the flannel-iface argument and give it the value of "eth0,eth1" for control plane nodes. Or remove it altogether and k3s will scan for available interfaces.

@mysticaltech
Copy link
Collaborator

@maaft I understand you finally cracked the code for this. Please do share whenever you can 🙏

@maaft
Copy link
Author

maaft commented Dec 30, 2022

@mysticaltech I did not bother to run both hetzner VPN and flannel wireguard at the same time. Anyway, here's what I did:

  • use my PR
  • use override_ccm = "https://gist.githubusercontent.com/maaft/3fba8406265aed9b0713452e72bfd385/raw/b2cd9a0c00db70c3071a5a64cf7d3c97ef63112a/kube-hetzner-ccm-wg.yaml"
  • also I disabled (manually; there should be an option to disable it but I was to lazy) all private network related stuff (see this branch for working setup)
  • finally, install any distro on your metal servers, copy /etc/rancher/k3s/config.yaml from first agent to bare metal servers, change hostname and IP, install k3s and its a done deal

If you have any suggestions on how to integrate all of this into the codebase, I'm happy to help. But I need guidance here.

Also, dual stack should of course be tested before adding an disabled_vpn flag. Unfortunately,I didn't have time for this yet.

@mysticaltech
Copy link
Collaborator

Thanks for sharing @maaft, it's good to have the info here for others.

For now, I will close this issue, as it's too custom to officially support just yet.

@schlichtanders
Copy link
Contributor

Is there any further progress to include bare metal support directly as part of kube-hetzner?

@mysticaltech
Copy link
Collaborator

@schlichtanders There is a significant change on hetzner's side. They have included access to bare metal servers via the network and hcloud ccm. That's a game changer! hetznercloud/hcloud-cloud-controller-manager#523

We are not actively working on this at the moment, but PRs welcome.

@Aubermean
Copy link

Discussion opened here: #1311

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

6 participants