Kubernetes: Teleport Operator - roles and users #531

marcoandredinis · 2022-04-29T17:25:44Z

This PR introduces the Teleport Operator

We've tried to streamline the number of features we release in this first PR, and for now, only include Users and Roles Management.

To ease the review, here's the main take away from each commit:

first one contains the project bootstrap / scaffolding
second one contains the Role and User definition, including the CRDs auto generation
the third one contains the actual development and the reconciliation loop

The project's README contain a guide on how to test the operator locally

There's at least one missing part: upload the docker container to quay.io
I'll complete that at a later stage
If you don't feel like reviewing the PR twice, please don't do it know as I'll probably add that development within this first PR (to have the feature end-to-end)

kubernetes/README.md

tigrato · 2022-06-06T14:03:24Z

kubernetes/README.md

+
+Meanwhile (or right after) we need to install the CRDs into the cluster.
+
+You can do so by issuing `make kubernetes-install`.


move this step into helm chart installation.
Helm has a folder crds that is installed before anything else and there is no need for kustomize

https://helm.sh/docs/chart_best_practices/custom_resource_definitions/

I have mixed feelings about this one

I can see three approaches here:

as is: clone and then install with kustomize

move the CRDs into the chart (it would land here Add Teleport Operator to teleport-cluster Helm chart teleport#12144)

add the CRDs installation and upgrade to the operator itself

I think we both agree that the first one is the worst, so let's focus on the other two options

Adding the CRDs into the chart has the following advantages:

it's the recommended tool for the job (as you linked)

However, I think the disadvantages outweigh the list above:

we will need to create a PR in teleport every time the types.pb changes

currently, there's no way to upgrade the CRDs specs

Embedding the CRDs files and using the teleport-operator to install them will give us better control over upgrades of the spec.

I can see us going with any of the 3 options for now, but I think we will end up with the 3rd one.
My concern with going with a full fledged implementation of option number 3 is that it will increase the diff size of the PR.
My suggestion is to keep it as it is now, but open up a PR right after this one is merged to implement option number 3

What do you think?

I am ok with options 2 & 3.

For me, option number 2 is the only approach if we want to automate the user/role creation via helm.
If we want to automate the user and role creation during the installation process, we first need to install the CRD (helm does that automatically) and then create the custom resources.

It isn't possible to have the same behavior when using option number 3 because it requires that the operator is running and have already installed the CRD.

There is also an issue in option number 2 that might happen if the user mixes a new helm chart release that contains a different CRD and an old operator version. This issue doesn't occur for option number 3 since the CRD deployed is valid for the operator version, and the user cannot mix a new CRD with an old operator by changing the image tag.

If automatic creation of roles and users via Helm is not something we want, I am ok with using option 3.

@r0mant, do you have any thoughts about this?

Agreed, thank you 🙏

I created a version which relies on the operator to install the CRDs at boot
If you want to play with it, you just need the following:

checkout teleport at marco/plugins-teleport-operator-charts

minikube start

helm install teleport-cluster <teleport repo location>/examples/chart/teleport-cluster \ --create-namespace --namespace teleport-cluster \ --set clusterName=teleport-cluster.teleport-cluster.svc.cluster.local \ --set operator=true --set teleportVersionOverride=9.3.2

tigrato · 2022-06-06T14:08:12Z

kubernetes/README.md

+```
+
+Now let's wait for the deployment to finish:
+`kubectl wait --for=condition=available deployment/teleport-cluster --timeout=2m`


if the wait is necessary, use helm --wait flag.

I just tried to add the --wait flag to the helm command above but it got stuck and eventually times out
This happens, even though the services are up and running
Maybe there's some conflict in using the wait and install flags? 🤷‍♂️

After adding the debug flag, I can see it's waiting for

ready.go:258: [debug] Service does not have load balancer ingress IP address: teleport-cluster/teleport-cluster

tigrato · 2022-06-06T14:23:25Z

lib/tctl/tctl.go

@@ -130,6 +131,26 @@ func (tctl Tctl) Get(ctx context.Context, kind, name string) (types.Resource, er
 	return resources[0], nil
 }

+// Exists validates a resource existence by its kind and name identifiers.
+func (tctl Tctl) Exists(ctx context.Context, kind, name string) (bool, error) {


can this reuse the Get/GetAll function instead of duplicating code?

The GetAll and Get methods don't work as expected.
They try to parse the output (as yaml) returned by the tctl binary, but it has an invalid format (or version, not sure)

I didn't try to fix as it seemed to be out of scope for this PR (which is already quite large)

We are now using the Get method
There was a missing version in the resource parser: roles V5

kubernetes/main.go

marcoandredinis · 2022-06-06T17:15:07Z

@tigrato Thank you for the review 🙏

To create an Operator we are going to use the Operator SDK to ease the process This first commit creates the boilerplate necessary to setup the Operator. It was created using the following command: mkdir kubernetes && \ cd kubernetes && \ operator-sdk init --domain teleport.dev --repo github.com/gravitational/teleport-plugins/kubernetes && \ kubebuilder edit --multigroup=true && \ operator-sdk create api --group resources --version v5 --kind Role --resource --controller && \ operator-sdk create api --group resources --version v2 --kind User --resource --controller && \ go get k8s.io/[email protected] && \ go get -u ./... && \ go mod tidy && \ make test

This commit adds the Teleport User/Role Spec to the CRDs We are not communicating to the Teleport cluster but can already manager User and Roles using `kubectl apply -f` or `kubectl delete` We had to include a custom CRD generator because the default one requires that every field within our Specs to have a comment, which we don't currently have. Files changed: user/role_types.go and crdgen folder Everything else was auto-generated

This commit adds a reconciler for the Role and User Resources Everytime we receive an event for one of those resources, we must propagate that change into the Teleport cluster. We have two types of events: update/create and delete For creating/updating we check if the resource exists in Teleport and upsert the entire resource based on the current state presented on the k8s side. For users, we must do 2 calls: GetUser and either Create or Update User For roles, it's simpler as we have the UpsertRole call available For deleting, the recommendation is to use finalizers Finalizers allow us to map an external resource to a K8S resource. So, when we create or update a resource, we add our own finalizer to the K8S resource list of finalizers When we receive a delete event, we check if the finalizer that we added is present, if so, we delete the resource in Teleport and remove the finalizer We do no-op if the finalizer is not there K8S will delete the resource when the list of finalizers is empty Each reconciler must have a Teleport API client. So, how do we get the credentials in order to use the API? We must have access to two things: - `tctl` binary so that we can create a dedicated role and user for our interactions - access to the teleport container's `/var/lib/teleport` so that we can use the `tctl` and have full access For the former: bringing the whole teleport project is hard, there were a lot of conflits because of the replace directives (and what not), so we ended up bringing the `tctl` binary from the official release image For the latter: we created a volume within the pod, so that both containers can reach out to it and use it as usual

We updated the Makefile so that it works with go1.18 The changes were copied from here https://github.com/operator-framework/operator-sdk/blob/f62caa6579d14883155b91e241a98272d7a9c161/testdata/go/v3/memcached-operator/Makefile

marcoandredinis · 2022-06-14T16:30:04Z

@Joerger @tigrato
We decided to move the operator itself to the main repo
Just opened the PR there: gravitational/teleport#13331

Let's continue the review there
It uses the same approach, but somethings were changed

marcoandredinis force-pushed the marco/k8s-controller branch 4 times, most recently from 4c3534c to dd80708 Compare May 3, 2022 09:19

marcoandredinis force-pushed the marco/k8s-controller branch 26 times, most recently from 6e93ba5 to 110ea88 Compare May 30, 2022 09:54

marcoandredinis force-pushed the marco/k8s-controller branch 7 times, most recently from fc09be8 to 6c15c2c Compare June 3, 2022 17:07

r0mant requested a review from tigrato June 3, 2022 19:40

marcoandredinis force-pushed the marco/k8s-controller branch 4 times, most recently from ba27658 to 2058855 Compare June 6, 2022 10:31

tigrato suggested changes Jun 6, 2022

View reviewed changes

marcoandredinis force-pushed the marco/k8s-controller branch 2 times, most recently from 4dceca6 to 8e9cdaf Compare June 8, 2022 11:21

marcoandredinis added 7 commits June 9, 2022 07:59

fix typos

cf00ccc

Update Makefile to work with go1.18

8e08240

We updated the Makefile so that it works with go1.18 The changes were copied from here https://github.com/operator-framework/operator-sdk/blob/f62caa6579d14883155b91e241a98272d7a9c161/testdata/go/v3/memcached-operator/Makefile

Wire up context with SIGINT and SIGTERM syscalls

79b06d5

re-add test terraform make target

d23fe89

marcoandredinis force-pushed the marco/k8s-controller branch 2 times, most recently from 223d6d0 to ae054cf Compare June 9, 2022 13:26

remove exists and re-use get to list resources

60475d4

marcoandredinis force-pushed the marco/k8s-controller branch from ae054cf to 60475d4 Compare June 9, 2022 13:30

marcoandredinis closed this Jun 22, 2022

marcoandredinis deleted the marco/k8s-controller branch July 14, 2022 07:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kubernetes: Teleport Operator - roles and users #531

Kubernetes: Teleport Operator - roles and users #531

marcoandredinis commented Apr 29, 2022 •

edited

Loading

tigrato Jun 6, 2022

marcoandredinis Jun 6, 2022

tigrato Jun 9, 2022

marcoandredinis Jun 9, 2022

tigrato Jun 6, 2022

marcoandredinis Jun 6, 2022 •

edited

Loading

tigrato Jun 6, 2022

marcoandredinis Jun 6, 2022

marcoandredinis Jun 9, 2022

marcoandredinis commented Jun 6, 2022

marcoandredinis commented Jun 14, 2022


		Meanwhile (or right after) we need to install the CRDs into the cluster.

		You can do so by issuing `make kubernetes-install`.

Kubernetes: Teleport Operator - roles and users #531

Kubernetes: Teleport Operator - roles and users #531

Conversation

marcoandredinis commented Apr 29, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

marcoandredinis Jun 6, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

marcoandredinis commented Jun 6, 2022

marcoandredinis commented Jun 14, 2022

marcoandredinis commented Apr 29, 2022 •

edited

Loading

marcoandredinis Jun 6, 2022 •

edited

Loading