Skip to content
This repository has been archived by the owner on Jun 4, 2024. It is now read-only.

Kubernetes: Teleport Operator - roles and users #531

Closed
wants to merge 8 commits into from

Conversation

marcoandredinis
Copy link
Contributor

@marcoandredinis marcoandredinis commented Apr 29, 2022

This PR introduces the Teleport Operator

We've tried to streamline the number of features we release in this first PR, and for now, only include Users and Roles Management.

To ease the review, here's the main take away from each commit:

  • first one contains the project bootstrap / scaffolding
  • second one contains the Role and User definition, including the CRDs auto generation
  • the third one contains the actual development and the reconciliation loop

The project's README contain a guide on how to test the operator locally

There's at least one missing part: upload the docker container to quay.io
I'll complete that at a later stage
If you don't feel like reviewing the PR twice, please don't do it know as I'll probably add that development within this first PR (to have the feature end-to-end)

@marcoandredinis marcoandredinis force-pushed the marco/k8s-controller branch 4 times, most recently from 4c3534c to dd80708 Compare May 3, 2022 09:19
@marcoandredinis marcoandredinis force-pushed the marco/k8s-controller branch 26 times, most recently from 6e93ba5 to 110ea88 Compare May 30, 2022 09:54
@marcoandredinis marcoandredinis force-pushed the marco/k8s-controller branch 7 times, most recently from fc09be8 to 6c15c2c Compare June 3, 2022 17:07
@r0mant r0mant requested a review from tigrato June 3, 2022 19:40
@marcoandredinis marcoandredinis force-pushed the marco/k8s-controller branch 4 times, most recently from ba27658 to 2058855 Compare June 6, 2022 10:31
kubernetes/README.md Outdated Show resolved Hide resolved

Meanwhile (or right after) we need to install the CRDs into the cluster.

You can do so by issuing `make kubernetes-install`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move this step into helm chart installation.
Helm has a folder crds that is installed before anything else and there is no need for kustomize

https://helm.sh/docs/chart_best_practices/custom_resource_definitions/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have mixed feelings about this one

I can see three approaches here:

I think we both agree that the first one is the worst, so let's focus on the other two options

Adding the CRDs into the chart has the following advantages:

  • it's the recommended tool for the job (as you linked)

However, I think the disadvantages outweigh the list above:

  • we will need to create a PR in teleport every time the types.pb changes
  • currently, there's no way to upgrade the CRDs specs

Embedding the CRDs files and using the teleport-operator to install them will give us better control over upgrades of the spec.

I can see us going with any of the 3 options for now, but I think we will end up with the 3rd one.
My concern with going with a full fledged implementation of option number 3 is that it will increase the diff size of the PR.
My suggestion is to keep it as it is now, but open up a PR right after this one is merged to implement option number 3

What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am ok with options 2 & 3.

For me, option number 2 is the only approach if we want to automate the user/role creation via helm.
If we want to automate the user and role creation during the installation process, we first need to install the CRD (helm does that automatically) and then create the custom resources.

It isn't possible to have the same behavior when using option number 3 because it requires that the operator is running and have already installed the CRD.

There is also an issue in option number 2 that might happen if the user mixes a new helm chart release that contains a different CRD and an old operator version. This issue doesn't occur for option number 3 since the CRD deployed is valid for the operator version, and the user cannot mix a new CRD with an old operator by changing the image tag.

If automatic creation of roles and users via Helm is not something we want, I am ok with using option 3.

@r0mant, do you have any thoughts about this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, thank you 🙏

I created a version which relies on the operator to install the CRDs at boot
If you want to play with it, you just need the following:

  • checkout teleport at marco/plugins-teleport-operator-charts
  • minikube start
helm install teleport-cluster <teleport repo location>/examples/chart/teleport-cluster \
                    --create-namespace --namespace teleport-cluster \
                    --set clusterName=teleport-cluster.teleport-cluster.svc.cluster.local \
                    --set operator=true --set teleportVersionOverride=9.3.2

```

Now let's wait for the deployment to finish:
`kubectl wait --for=condition=available deployment/teleport-cluster --timeout=2m`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the wait is necessary, use helm --wait flag.

Copy link
Contributor Author

@marcoandredinis marcoandredinis Jun 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just tried to add the --wait flag to the helm command above but it got stuck and eventually times out
This happens, even though the services are up and running
Maybe there's some conflict in using the wait and install flags? 🤷‍♂️

After adding the debug flag, I can see it's waiting for

ready.go:258: [debug] Service does not have load balancer ingress IP address: teleport-cluster/teleport-cluster

lib/tctl/tctl.go Outdated
@@ -130,6 +131,26 @@ func (tctl Tctl) Get(ctx context.Context, kind, name string) (types.Resource, er
return resources[0], nil
}

// Exists validates a resource existence by its kind and name identifiers.
func (tctl Tctl) Exists(ctx context.Context, kind, name string) (bool, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this reuse the Get/GetAll function instead of duplicating code?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The GetAll and Get methods don't work as expected.
They try to parse the output (as yaml) returned by the tctl binary, but it has an invalid format (or version, not sure)

I didn't try to fix as it seemed to be out of scope for this PR (which is already quite large)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are now using the Get method
There was a missing version in the resource parser: roles V5

kubernetes/main.go Show resolved Hide resolved
@marcoandredinis
Copy link
Contributor Author

@tigrato Thank you for the review 🙏

@marcoandredinis marcoandredinis force-pushed the marco/k8s-controller branch 2 times, most recently from 4dceca6 to 8e9cdaf Compare June 8, 2022 11:21
To create an Operator we are going to use the Operator SDK to ease the
process

This first commit creates the boilerplate necessary to setup the
Operator.

It was created using the following command:
    mkdir kubernetes && \
    cd kubernetes && \
    operator-sdk init --domain teleport.dev --repo github.com/gravitational/teleport-plugins/kubernetes && \
    kubebuilder edit --multigroup=true && \
    operator-sdk create api --group resources --version v5 --kind Role --resource --controller && \
    operator-sdk create api --group resources --version v2 --kind User --resource --controller && \
    go get k8s.io/[email protected] && \
    go get -u ./... && \
    go mod tidy && \
    make test
This commit adds the Teleport User/Role Spec to the CRDs

We are not communicating to the Teleport cluster but can already manager
User and Roles using `kubectl apply -f` or `kubectl delete`

We had to include a custom CRD generator because the default one
requires that every field within our Specs to have a comment, which we
don't currently have.

Files changed: user/role_types.go and crdgen folder
Everything else was auto-generated
This commit adds a reconciler for the Role and User Resources
Everytime we receive an event for one of those resources, we must propagate that
change into the Teleport cluster.

We have two types of events: update/create and delete
For creating/updating we check if the resource exists in Teleport and
upsert the entire resource based on the current state presented on the
k8s side.
For users, we must do 2 calls: GetUser and either Create or Update User
For roles, it's simpler as we have the UpsertRole call available

For deleting, the recommendation is to use finalizers
Finalizers allow us to map an external resource to a K8S resource.
So, when we create or update a resource, we add our own finalizer to the K8S
resource list of finalizers
When we receive a delete event, we check if the finalizer that we added is
present, if so, we delete the resource in Teleport and remove the finalizer
We do no-op if the finalizer is not there
K8S will delete the resource when the list of finalizers is empty

Each reconciler must have a Teleport API client.
So, how do we get the credentials in order to use the API?
We must have access to two things:
- `tctl` binary so that we can create a dedicated role and user for our
interactions
- access to the teleport container's `/var/lib/teleport` so that we can use the
`tctl` and have full access

For the former: bringing the whole teleport project is hard, there were a lot of
conflits because of the replace directives (and what not), so we ended up
bringing the `tctl` binary from the official release image

For the latter: we created a volume within the pod, so that both containers can
reach out to it and use it as usual
@marcoandredinis marcoandredinis force-pushed the marco/k8s-controller branch 2 times, most recently from 223d6d0 to ae054cf Compare June 9, 2022 13:26
@marcoandredinis
Copy link
Contributor Author

@Joerger @tigrato
We decided to move the operator itself to the main repo
Just opened the PR there: gravitational/teleport#13331

Let's continue the review there
It uses the same approach, but somethings were changed

@marcoandredinis marcoandredinis deleted the marco/k8s-controller branch July 14, 2022 07:14
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants