-
Notifications
You must be signed in to change notification settings - Fork 118
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Support for Robot Servers (Syself Fork) #523
Comments
great news! <3 |
@apricote |
As mentioned in the Design Doc, this data is not available in the API, and its not possible for us to provide that without deploying a DaemonSet that reads the info from the nodes:
|
This utility function was duplicated with nearly the exact same functionality. This commit cleans it up by extracting to a new package (to avoid cyclic imports). These two methods are about to get more complicated with #523, better to clean it up now than to make changes to both locations in the future. --------- Co-authored-by: Jonas L. <[email protected]>
Based on the Fork by Syself[0] and the Design Doc[1]. [0] https://github.com/syself/hetzner-cloud-controller-manager [1] #523 (comment) This ports most features of the fork while refactoring them to match our coding style and the improvements I made in preparation for this. Closes #525 #526 #527 --------- Co-authored-by: janiskemper <[email protected]> Co-authored-by: Mawe Sprenger <[email protected]> Co-authored-by: Thomas Guettler <[email protected]> Co-authored-by: Anurag <[email protected]> Co-authored-by: batistein <[email protected]>
Basic Support is merged (#561). I will add documentation & migration guide before cutting a release with the feature. There is also a number of features from the Syself Fork which got removed in the first PR, which will also be submitted before a release. |
Forgot to provide an update here: I published a pre-release on Friday (v1.19.0-rc.0) to make it simple for anyone interested to test it. I plan on publishing the proper release on Wednesday (2023-12-06), so if anyone finds any bugs, now is the time to open an issue :) Great to hear its working for you @jahanson! |
Released in |
Summary
We intend to merge the changes from syself/hetzner-cloud-controller-manager to provide support for Hetzner Robot (Dedicated/Bare Metal) servers in addition to cloud servers.
This issue will track the various tasks necessary.
Subtasks
Design Doc
We have written an internal Design Doc to figure out what exactly this means and how we want to tackle the different aspects. For transparency you can read it by expanding the section below.
Design Doc
Support Robot Servers in HCCM
Motivation
hcloud-cloud-controller-manager (HCCM) is used to expose certain Cloud functionality to Kubernetes Clusters. This includes Node Metadata, Network Routes & Load Balancers.
Right now, HCCM only supports the Hetzner Cloud API & cloud servers. Many customers have hybrid clusters running on Hetzner Dedicated (Robot) & Cloud servers. The underlying library of HCCM
kubernetes/cloud-provider
only works, if all nodes in the cluster can be managed by a single implementation. Effectively, this means, that clusters with a current version of hccm remove any non-Cloud Nodes from the Kubernetes Cluster.The Hetzner Cloud API already works with Robot in a few scenarios:
In addition, we can use the Robot API to:
We have had a number of requests for this feature through GitHub.
There already exists a fork of HCCM by Syself that implements Robot support. They have offered to "donate" this to us, so we can use it as a starting point.
Implementation
Robot Client
The Robot team provides a Rest API: https://robot.hetzner.com/doc/webservice/en.html#preface
The Robot team does not publish their own Go API Client. There is an open source client that is partially maintained by Syself. It is currently used in the Syself HCCM fork. As there is no better option, we will continue to use it.
We need to create and inject the client in our application similar to the hcloud-go client. This needs to be optional to not break existing setups.
Testing
We will need to set up a new CI workflow to verify the Robot support with a dedicated server. We might need to add some additional test cases to validate that it works across the different servers.
This test needs to be optional, as many people use HCCM in a cloud-only config.
To make sure that only one pipeline is using the dedicated servers at a time, we will use the GitHub Actions
concurrency
Features. This means that we can only use it from GitHub Actions and not from our internal GitLab.The server will be bootstrapped using
installimage
withautosetup
. The node is then joined to our existing cluster usingk3sup
, same as the Cloud servers.Cloud Provider Controllers
Node Controller /
InstancesV2
This controller adopts the instance initially and makes the connection to our APIs (
ProviderID
).It also returns metadata info about the node.
We always need to know which nodes belong to which "source". We can save this info to the
ProviderID
field. Our existing Cloud servers use the patternhcloud://<SERVER-ID>
. For Robot, we will usehrobot://<SERVER-ID>
. This differs from the Syself Fork, they usehcloud://bm-<ROBOT-ID>
. We will also allow reading the Syself format, to enable users to migrate from the fork to our HCCM.These fields from the
InstancesV2
interface have restrictions for Robot servers:Zone
(Datacenter Name) is lower-cased for Hetzner Cloud API and upper-cased for Hetzner Robot API, we should normalize this to the lower case.Region
(Location Name) needs to be parsed from theZone
(Datacenter Name)If the Robot support is not enabled, and we encounter a
Node
that we can not associate with any Cloud server, we should log a warning. This warning should inform the user that the Node was removed and if they are trying to add Robot servers to their cluster that they should enable the Robot support.Route Controller /
Routes
Using the native routing feature should be possible since launching
expose_routes_to_vswitch
. This can be hard to implement and especially verify. We will not include support for this in the first release, and based on customer demand we might introduce this at a later time.Service Controller /
LoadBalancer
We can add the Robot servers to the load balancer target list through their IP. We can get the IP from the Node object.
Documentation
This is a major new feature that needs to be thoroughly documented:
Alternatives
No support
We can just decide to not support Robot servers. This sucks from a customers perspective because they do not care that Robot & Cloud are different teams/companies. Our Cloud APIs already integrate with Robot on some accounts, so this should also be supported in our integrations.
Forking the hrobot-go client
If we do this in an official manner, customers assume that we are responsible for maintenance and might demand fixes/features. This is out of scope for our team, and might be better owned by Robot. They have no (official) interest in this at this time.
If we encounter issues with hrobot-go, we can still fork it and use a
replace
directive ingo.mod
to quickly release fixes.The text was updated successfully, but these errors were encountered: