-
Notifications
You must be signed in to change notification settings - Fork 78
Kubernetes: Teleport Operator - roles and users #531
Conversation
4c3534c
to
dd80708
Compare
6e93ba5
to
110ea88
Compare
fc09be8
to
6c15c2c
Compare
ba27658
to
2058855
Compare
|
||
Meanwhile (or right after) we need to install the CRDs into the cluster. | ||
|
||
You can do so by issuing `make kubernetes-install`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move this step into helm chart installation.
Helm has a folder crds
that is installed before anything else and there is no need for kustomize
https://helm.sh/docs/chart_best_practices/custom_resource_definitions/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have mixed feelings about this one
I can see three approaches here:
- as is: clone and then install with
kustomize
- move the CRDs into the chart (it would land here Add Teleport Operator to
teleport-cluster
Helm chart teleport#12144) - add the CRDs installation and upgrade to the operator itself
I think we both agree that the first one is the worst, so let's focus on the other two options
Adding the CRDs into the chart has the following advantages:
- it's the recommended tool for the job (as you linked)
However, I think the disadvantages outweigh the list above:
- we will need to create a PR in
teleport
every time thetypes.pb
changes - currently, there's no way to upgrade the CRDs specs
Embedding the CRDs files and using the teleport-operator
to install them will give us better control over upgrades of the spec.
I can see us going with any of the 3 options for now, but I think we will end up with the 3rd one.
My concern with going with a full fledged implementation of option number 3 is that it will increase the diff size of the PR.
My suggestion is to keep it as it is now, but open up a PR right after this one is merged to implement option number 3
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am ok with options 2 & 3.
For me, option number 2 is the only approach if we want to automate the user/role creation via helm.
If we want to automate the user and role creation during the installation process, we first need to install the CRD (helm does that automatically) and then create the custom resources.
It isn't possible to have the same behavior when using option number 3 because it requires that the operator is running and have already installed the CRD.
There is also an issue in option number 2 that might happen if the user mixes a new helm chart release that contains a different CRD and an old operator version. This issue doesn't occur for option number 3 since the CRD deployed is valid for the operator version, and the user cannot mix a new CRD with an old operator by changing the image tag.
If automatic creation of roles and users via Helm is not something we want, I am ok with using option 3.
@r0mant, do you have any thoughts about this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, thank you 🙏
I created a version which relies on the operator to install the CRDs at boot
If you want to play with it, you just need the following:
- checkout
teleport
atmarco/plugins-teleport-operator-charts
minikube start
helm install teleport-cluster <teleport repo location>/examples/chart/teleport-cluster \
--create-namespace --namespace teleport-cluster \
--set clusterName=teleport-cluster.teleport-cluster.svc.cluster.local \
--set operator=true --set teleportVersionOverride=9.3.2
``` | ||
|
||
Now let's wait for the deployment to finish: | ||
`kubectl wait --for=condition=available deployment/teleport-cluster --timeout=2m` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if the wait is necessary, use helm
--wait
flag.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just tried to add the --wait
flag to the helm
command above but it got stuck and eventually times out
This happens, even though the services are up and running
Maybe there's some conflict in using the wait
and install
flags? 🤷♂️
After adding the debug
flag, I can see it's waiting for
ready.go:258: [debug] Service does not have load balancer ingress IP address: teleport-cluster/teleport-cluster
lib/tctl/tctl.go
Outdated
@@ -130,6 +131,26 @@ func (tctl Tctl) Get(ctx context.Context, kind, name string) (types.Resource, er | |||
return resources[0], nil | |||
} | |||
|
|||
// Exists validates a resource existence by its kind and name identifiers. | |||
func (tctl Tctl) Exists(ctx context.Context, kind, name string) (bool, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can this reuse the Get/GetAll function instead of duplicating code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The GetAll
and Get
methods don't work as expected.
They try to parse the output (as yaml) returned by the tctl
binary, but it has an invalid format (or version, not sure)
I didn't try to fix as it seemed to be out of scope for this PR (which is already quite large)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are now using the Get
method
There was a missing version in the resource parser: roles V5
@tigrato Thank you for the review 🙏 |
4dceca6
to
8e9cdaf
Compare
To create an Operator we are going to use the Operator SDK to ease the process This first commit creates the boilerplate necessary to setup the Operator. It was created using the following command: mkdir kubernetes && \ cd kubernetes && \ operator-sdk init --domain teleport.dev --repo github.com/gravitational/teleport-plugins/kubernetes && \ kubebuilder edit --multigroup=true && \ operator-sdk create api --group resources --version v5 --kind Role --resource --controller && \ operator-sdk create api --group resources --version v2 --kind User --resource --controller && \ go get k8s.io/[email protected] && \ go get -u ./... && \ go mod tidy && \ make test
This commit adds the Teleport User/Role Spec to the CRDs We are not communicating to the Teleport cluster but can already manager User and Roles using `kubectl apply -f` or `kubectl delete` We had to include a custom CRD generator because the default one requires that every field within our Specs to have a comment, which we don't currently have. Files changed: user/role_types.go and crdgen folder Everything else was auto-generated
This commit adds a reconciler for the Role and User Resources Everytime we receive an event for one of those resources, we must propagate that change into the Teleport cluster. We have two types of events: update/create and delete For creating/updating we check if the resource exists in Teleport and upsert the entire resource based on the current state presented on the k8s side. For users, we must do 2 calls: GetUser and either Create or Update User For roles, it's simpler as we have the UpsertRole call available For deleting, the recommendation is to use finalizers Finalizers allow us to map an external resource to a K8S resource. So, when we create or update a resource, we add our own finalizer to the K8S resource list of finalizers When we receive a delete event, we check if the finalizer that we added is present, if so, we delete the resource in Teleport and remove the finalizer We do no-op if the finalizer is not there K8S will delete the resource when the list of finalizers is empty Each reconciler must have a Teleport API client. So, how do we get the credentials in order to use the API? We must have access to two things: - `tctl` binary so that we can create a dedicated role and user for our interactions - access to the teleport container's `/var/lib/teleport` so that we can use the `tctl` and have full access For the former: bringing the whole teleport project is hard, there were a lot of conflits because of the replace directives (and what not), so we ended up bringing the `tctl` binary from the official release image For the latter: we created a volume within the pod, so that both containers can reach out to it and use it as usual
We updated the Makefile so that it works with go1.18 The changes were copied from here https://github.com/operator-framework/operator-sdk/blob/f62caa6579d14883155b91e241a98272d7a9c161/testdata/go/v3/memcached-operator/Makefile
223d6d0
to
ae054cf
Compare
ae054cf
to
60475d4
Compare
@Joerger @tigrato Let's continue the review there |
This PR introduces the Teleport Operator
We've tried to streamline the number of features we release in this first PR, and for now, only include Users and Roles Management.
To ease the review, here's the main take away from each commit:
The project's README contain a guide on how to test the operator locally
There's at least one missing part: upload the docker container to
quay.io
I'll complete that at a later stage
If you don't feel like reviewing the PR twice, please don't do it know as I'll probably add that development within this first PR (to have the feature end-to-end)