[MIMO] Move cluster certificate functionality to ClientHelper #3736

hawkowl · 2024-07-30T06:34:05Z

Which issue this PR addresses:

Splitout of MIMO M1 work

What this PR does / why we need it:

Retrofits clienthelper support into some functions that MIMO tasks require

Test plan for issue:

CI, E2E

Is there any documentation that needs to be updated for this PR?

N/A

How do you know this will function as expected in production?

E2E should cover it, since it applies during install as well

hawkowl · 2024-07-30T07:28:22Z

/azp run ci, e2e

azure-pipelines · 2024-07-30T07:28:35Z

Azure Pipelines successfully started running 2 pipeline(s).

mociarain

A few things:

ClientHelper.New is never used. I think we can kill it without trouble.
Can we collapse the New stuff in this so we have private constructors and a public GetClient that builds or creates a client as needed. Basically, should this be a singleton?
General ask to add some more descriptive naming. Can we say what kind of client it is? It think k8sClient but I'm not sure. Maybe a better name for clientHelper is AROK8sClient i.e. our wrapped version of the k8s client.

stevekuznetsov · 2024-07-30T14:02:03Z

pkg/cluster/install.go

 	return err
 }

 // initializeKubernetesClients initializes clients which are used
 // once the cluster is up later on in the install process.
 func (m *manager) initializeOperatorDeployer(ctx context.Context) (err error) {
-	m.aroOperatorDeployer, err = deploy.New(m.log, m.env, m.doc.OpenShiftCluster, m.arocli, m.client, m.extensionscli, m.kubernetescli, m.operatorcli)
+	m.aroOperatorDeployer, err = deploy.New(m.log, m.env, m.doc.OpenShiftCluster, m.arocli, m.ch.Client(), m.extensionscli, m.kubernetescli, m.operatorcli)


Wrapping the client into the helper only to ask the helper to give the client right back is awkward. Either the client helper closes over the client, or hold on to both values, IMO. The in-between seems like an incorrect abstraction that closes over nothing.

I agree with that. My question: is there a good reason to do so here that we are not aware of ?

I would simply make the client public instead of providing a getter as this indirect access doesn't bring much value (specially when we have multiple stuff that is called "Client", methods, struct fields, etc) IMHO.

I thought about it a bit, and I've made it close over the whole client instead, since that was pretty easy. This means that the places that we currently use a controller-runtime client, we can upgrade it to the clienthelper fairly easily.

AldoFusterTurpin

Thank you @hawkowl for this, I like that! 🙂
I just left some comments in case any make sense to you, but not a blocker.

pkg/cluster/apply.go

AldoFusterTurpin · 2024-07-31T08:16:40Z

pkg/cluster/apply.go

+	utilpem "github.com/Azure/ARO-RP/pkg/util/pem"
+)
+
+func EnsureTLSSecretFromKeyvault(ctx context.Context, env env.Interface, ch clienthelper.Interface, target types.NamespacedName, certificateName string) error {


In the same direction as my previous comment and talking about the ch clienthelper.Interface parameter... there is something that I have seen multiple times in the ARO-RP that I will consider a bad pattern. We tend to pass big interfaces where just one of the methods of the interface is used, something pretty common in our beloved friend Java but that can be easily avoided in Go.

In those cases, I think it would be a good idea to define just an interface with the method needed in this function and just update the header of the _ EnsureTLSSecretFromKeyvault_ to use that new one-method-interface, that is one of the beauties of implicit interfaces in Go: we can define an Interface IN THE CONSUMER package, not the implementer so we fulfill the requirement for this function.

I've altered the code to have Reader and Writer interfaces, similar to the controller-runtime client. This should let us define things a bit nicer if we want to restrict down the interface. I'm not too eager to narrow down the interfaces more than this, though.

I think that making single-method interfaces (especially defining them in consumers), is overly restricting and can cause a lot of work if any of those interfaces need to change in the future. Every time I've gone to use it, I've wished I hadn't later on. More granular interfaces from the provider can mitigate some of this duplication, but I'm also not sure how often we can get away with having super-granular interfaces. A Reader/Writer should hopefully do for now.

I also don't think it makes code easier to test when we have verified fakes (e.g. the controller-runtime client). It does make it easier to deal with when we're writing mocks for interfaces, since the mocks (and what the code might call) are therefore smaller, but I firmly believe that mocks are an anti-pattern and that even big interfaces like env.Interface would be a lot more tolerable if designed with a proper fake in parallel.

AldoFusterTurpin · 2024-07-31T08:20:20Z

pkg/cluster/apply.go

+
+	var cb []byte
+	for _, cert := range certs {
+		cb = append(cb, pem.EncodeToMemory(&pem.Block{Type: "CERTIFICATE", Bytes: cert.Raw})...)


Super opinionated and optional: could we please move the line
pem.EncodeToMemory(&pem.Block{Type: "CERTIFICATE", Bytes: cert.Raw})...
to an aux variable for readability.

AldoFusterTurpin · 2024-07-31T09:41:01Z

pkg/cluster/apply.go

+		Type: corev1.SecretTypeTLS,
+	}
+
+	return ch.Ensure(ctx, secret)


As previously mentioned, as we just use the Ensure method of ch, maybe we could define here an interface with just this method to stick to the Interface Segregation Principle.

AldoFusterTurpin · 2024-07-31T09:44:33Z

pkg/cluster/install.go

 	return err
 }

 // initializeKubernetesClients initializes clients which are used
 // once the cluster is up later on in the install process.
 func (m *manager) initializeOperatorDeployer(ctx context.Context) (err error) {
-	m.aroOperatorDeployer, err = deploy.New(m.log, m.env, m.doc.OpenShiftCluster, m.arocli, m.client, m.extensionscli, m.kubernetescli, m.operatorcli)
+	m.aroOperatorDeployer, err = deploy.New(m.log, m.env, m.doc.OpenShiftCluster, m.arocli, m.ch.Client(), m.extensionscli, m.kubernetescli, m.operatorcli)


I agree with that. My question: is there a good reason to do so here that we are not aware of ?

I would simply make the client public instead of providing a getter as this indirect access doesn't bring much value (specially when we have multiple stuff that is called "Client", methods, struct fields, etc) IMHO.

hawkowl · 2024-08-05T04:10:55Z

/azp run ci, e2e

azure-pipelines · 2024-08-05T04:11:07Z

Azure Pipelines successfully started running 2 pipeline(s).

stevekuznetsov · 2024-08-05T19:03:39Z

pkg/util/clienthelper/clienthelper.go

-	EnsureDeleted(ctx context.Context, gvk schema.GroupVersionKind, key types.NamespacedName) error
+type Writer interface {
+	client.Writer
+	// Ensure applies self-contained objects to a Kubernetes API, merging


I know you're not changing this in the PR, so this is not a blocker here, but we should use server-side apply instead of whatever client-side logic exists, as it's guaranteed to be a better implementation :)

I would love to use server-side apply, but the fake implementation of controller-runtime doesn't yet support it :(

Refs:

Support server-side apply (client.Apply) in fake Client kubernetes-sigs/controller-runtime#2341

Add field management support to fake client-go typed client kubernetes/kubernetes#125560 -- the groundwork does look like it's beginning to be there :)

⚠ Client Server side apply kubernetes-sigs/controller-runtime#2702 hi @troy0820 fancy seeing you there :D

No comment, just waving at @hawkowl 👋🏽

Yes, the controller-runtime fakes are horrendous and IMO moving away from them as soon as possible is also a great boon to the project :)

SSA is very complex and faking it is likely not going to happen - not even just in some soon time-frame, but perhaps ever. Client fakes have fallen out of favor and everyone has a horror story of working around the mess that is envtest without gaining any confidence in their application functioning in the real world against a real API server. Even past that, the concerns Alvaro has in Troy's PR are a mountain of work that the upstream SIGs are not interested in taking up or sponsoring, from my understanding.

This is neither here nor there, but you might want to consider not putting SSA support in the controller-runtime fakes as a precondition to using this (amazingly useful and beneficial) technology in your production deployments.

See an example of how to test controllers or similar using static data in, static data out here: https://github.com/openshift/hypershift/blob/main/control-plane-pki-operator/certificaterevocationcontroller/certificaterevocationcontroller_test.go

The issue beyond the requisite work are the apply types that are not scaffold with crd/operator-sdk/kubebuilder. That work is a side effect of the client (controller-runtime) not implementing that whereas client-go has that necessary logic.

The interface client being used here is the mechanism where that can't be utilized. I have no context around what this is trying to accomplish but if it is indeed using corev1 types where a kubernetes.Interface can be used, those applyConfigurations are available to fulfil the server side apply.

The faking of the clients are tracked with a "tracker" and trying to track a server side apply in any client would be trying to hold state for things it wouldn't worry about because of the client's responsibility. Having a test suite where that can happen seems outside of the scope of what the client/fakes could do

I have worked with the controller-runtime ecosystem and upstream Kubernetes client-sets for many years, so I'm pretty familiar with the structure. I don't think I communicated the "static data in, static data out" link well - my suggestion is not that some client helper hold on to apply configurations, nor is it to somehow teach the controller-runtime client how to apply.

In case we get too far in the weeds - I just want to re-iterate that the comments here are not review comments for this PR, just a general comment. I'm not suggesting any changes are made right now.

What I am suggesting, though, is that operator reconciliation loops are written as pure functions - either take some set of cluster state as input, or allow dependency injection to take producers of cluster state as input, and produce state as output. Apply configurations have a wonderful property in that they encode your intent in static data - so, testing your lower-case r reconcile() is straightforward, and no mocks of any persuasion are necessary.

The top-level Reconcile() might look like:

func (r *reconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) { // assemble the data necessary by using listers, etc input := r.getInput() // transform input state into intended state intent, err := r.reconcile(input) // realize intended state onto the cluster return r.execute(intent) }

Where r.execute(intent) takes apply configurations and uses a kubernetes.Clientset to call Apply.

In my experience, relying heavily on mocks leads many teams to stumble into deeply frustrating issues, often many years down the line. Examples I've seen:

implementing highly-performant caching using metadata-only watches requires 100% of tests to be rewritten, as the fakes don't know how to propagate data between the metadata fake and the "real" fake

simplifying controller code to reduce maintenance burden and eliminate incorrect client-side patching by using server-side apply means all unit tests fail

fragile assumptions about the number of times your reconciliation is called and when, in what order, etc, lead integration tests to be a poor substitute for end-to-end testing against actual API servers, leading to test duplication

fakes make many highly surprising decisions, taking valuable mental overhead and productivty away from the team, like:

faking out field selectors (usually handled in the server) with indices, which requires test code to be written to create indices

objects with deletion timestamps but no finalizers cause some internal panic even though it's totally valid for your controller to see one in production

fake clients need subresources registered manually

Teams that spend time writing unit tests for unit-level concerns and do not reach to mocks, in my experience, move faster and get a higher RoI on their tests. End-to-end tests and chaos tests validate production modalities that integration tests attempt to, but cannot.

@stevekuznetsov I completely agree with that. 🥇

Even though I am by far the one less experienced in this conversation about this domain, I have seen similar problems in other domains that made me resonate a lot with your comment while reading it.

Sometimes we focus too much on trying to reproduce the state of the world where our app will run, and we forget that we should primarily test what our app does, and what our business logic is. It is almost always much simpler and easier to understand to just have our business logic in pure functions that expect input data and produce output data. We then forget about how this data will be gathered and what will be done with the result, and we can simply ensure that domainLogicFn(x) == expectedOutput.

This is a common concern that applies to a lot of different domains, but it is pretty common in k8s because most of the time the business logic gets mixed/loosed in the effort to understand/simulate the "state" where all that business logic will happen (due to the stateful nature of k8s controllers). So I just want to reaffirm that avoiding complex mocks when possible with pure functions is a great idea, and not just in this context 🙂

hawkowl · 2024-08-14T03:07:52Z

/azp run ci, e2e

azure-pipelines · 2024-08-14T03:08:04Z

Azure Pipelines successfully started running 2 pipeline(s).

github-actions · 2024-08-27T20:14:34Z

Please rebase pull request.

hawkowl · 2024-09-03T00:20:38Z

/azp run ci, e2e

azure-pipelines · 2024-09-03T00:20:49Z

Azure Pipelines successfully started running 2 pipeline(s).

hawkowl · 2024-09-03T02:34:16Z

/azp run e2e

azure-pipelines · 2024-09-03T02:34:28Z

Azure Pipelines successfully started running 1 pipeline(s).

hawkowl · 2024-09-05T02:30:24Z

/azp run ci, e2e

azure-pipelines · 2024-09-05T02:30:37Z

Azure Pipelines successfully started running 2 pipeline(s).

jaitaiwan

LGTM!

update makfile to use go 1.23 update docs bump gotestsum to 1.12.0 bump golangci-lint to 1.59.1 use the fips compliant golang image generate a secret for the operator from workload identity Update pkg/operator/deploy/deploy.go Co-authored-by: Ayato Tokubi <[email protected]> get subscription info from the subscription doc rather than env test the operator identity secret generation code properly Fixed to correctly reference the local image, preventing unauthorized Docker Hub pulls. Align docs hierarchy Indent bullet points Copy fluentbit image from arointsvc ACR to your ACR It is needed since it is compared against a default image (and digest) from const file ARO-9570: Add a controller to the ARO operator to lay down etchosts machine config ARO-9570: Update controller to watch MCP and ARO Cluster object ARO-9750: Add a controller to create the etchosts machineconfigs if they dont exist Fix linting Add licenses bump golangci-lint to v1.60.3 and exclude printf, SA1006 and S1009 from lint update golangci-lint version use non fips golang image use go 1.22 bump go in ci add git to dockerignore set buildvcs to false upgrade to go 1.22.6 update docs fix go lint address comments remove commented code from onebranch pipelines file change var to const fix unit-tests and api cloudError This is the new CI-RP stage for the pipline (#3768) * This is the new CI-RP stage for the pipline (#3753) * Ensure Podman Service is Started and PODMAN_REMOTE_ARGS is Configured Automatically Ensure Podman Service is Started and PODMAN_REMOTE_ARGS is Configured Automatically Ensure Podman Service is Started and PODMAN_REMOTE_ARGS is Configured Automatically removed the tag Add Podman service start and remote args setup for seamless operation Add sudo to start Podman service for elevated permissions and fix permission errors Add sudo to start Podman service for elevated permissions and fix permission errors Refactor Makefile: Update Podman service handling with sudo and remove default PODMAN_REMOTE_ARGS to improve flexibility and ensure proper permissions. Add sudo to start Podman service for elevated permissions and fix permission errors * Added Podman service target and set PODMAN_REMOTE_ARGS for seamless builds. * fix the makefile * added the port to fix the Makefile Add smoke test for alerts from Alertmanager (#3801) Move ARM swagger to subfolder (#3805) To add new HCP RP, the ARO RP is moved into the subfolder openshiftclusters. There are no additional changes, no impact on the SDK and clients. Add the old make runlocal-rp as an alternative to containerization (#3789) Add smoke test documents (#3813) Adding Ayato to CODEOWNERS Fix make ci-clean and runlocal-rp (#3806) * Fix make ci-clean error for running work containers by buildah that prevents prune from working * Fix make runlocal-rp image syntax Upgrade to Podman 5 to fix the vuln Install required binary for Podman 5 in ci Switch back to OneBranch build image Install crun Install more OCI packages Change home dir to /tmp for podman see containers/podman#23818 for more details. Use sudo for tdnf bump golangci-lint version in dockerfile ci-rp add go flags update go ver in ci.yml update test Correct testing/time issues in pkg/deploy (#3808) - Percolate up the time to wait for LB healthcheck probes, test @ 0 sec - Correct a context timeout test case, test @ 0 sec timeout Fix slow tests in /pkg/backend (#3809) Fix slow tests in /pkg/frontend (#3810) * Clarifying etcd cert renew test - Updated the test to make it clear it is passing because timeout is being reached - Updated the timeout from 10s -> 0s to pass faster * Fix slow changefeed tests Generate smaller OIDC keys for unit tests (#3811) - significantly increases unit test performance by moving from 4096 -> 256 bit keys - preserves 4096 bit keys for all non-testing scenarios Make CI-RP Improvements (#3791) - Remove linting from ci-rp - Remove generate from ci-rp Set CGO_ENABLED update test command in ci-rp dockerfile Separate Makefile targets for local vs containers (#3816) - reverts changes to runlocal-rp - updates old run-portal to runlocal-portal since it uses local bins - adds new targets for containerized run of RP and Portal; opt-in - fixes docs and pipelines to use updated targets Drop some unneccessary dependencies by moving to `bingo` for tooling (#3719) * Move to using bingo for tools * go mod vendor [MIMO] Move cluster certificate functionality to ClientHelper (#3736) * move over TLS applying, as well as some clienthelper work bump go in bingo merge makefile changes from Master more Makefile updates add GO var in toplevel Makefile

hawkowl requested review from jewzaam, bennerv, rogbas, petrkotas, jharrington22, cblecker, cadenmarchese, UlrichSchlueter, SudoBrendan, yjst2012, jaitaiwan, anshulvermapatel, hlipsig, tiguelu, SrinivasAtmakuri, mociarain, AldoFusterTurpin, kimorris27 and tsatam as code owners July 30, 2024 06:34

hawkowl changed the title ~~ove over TLS applying, as well as some clienthelper work~~ [MIMO] Move cluster certificate functionality to ClientHelper Jul 30, 2024

hawkowl force-pushed the hawkowl/cleanups-cluster-tls branch from 1895b7b to 4c797bc Compare July 30, 2024 06:39

mociarain reviewed Jul 30, 2024

View reviewed changes

stevekuznetsov reviewed Jul 30, 2024

View reviewed changes

AldoFusterTurpin reviewed Jul 31, 2024

View reviewed changes

hawkowl force-pushed the hawkowl/cleanups-cluster-tls branch from 4c797bc to 5aefdaf Compare August 5, 2024 03:56

stevekuznetsov reviewed Aug 5, 2024

View reviewed changes

hawkowl force-pushed the hawkowl/cleanups-cluster-tls branch from 9906f6d to 4126dbc Compare August 14, 2024 03:06

github-actions bot added the needs-rebase branch needs a rebase label Aug 27, 2024

hawkowl force-pushed the hawkowl/cleanups-cluster-tls branch from 4126dbc to b32fd60 Compare September 3, 2024 00:19

github-actions bot removed the needs-rebase branch needs a rebase label Sep 3, 2024

hawkowl added 5 commits September 5, 2024 12:30

move over TLS applying, as well as some clienthelper work

193e4b8

cleanups

febff87

review comments

e33ec07

remove this func that isn't needed

8c1b069

fixes

241fe18

hawkowl force-pushed the hawkowl/cleanups-cluster-tls branch from b32fd60 to 241fe18 Compare September 5, 2024 02:30

hawkowl requested a review from bitoku as a code owner September 5, 2024 02:30

jaitaiwan approved these changes Sep 5, 2024

View reviewed changes

jaitaiwan merged commit 44bc3cc into master Sep 5, 2024
24 checks passed

hawkowl deleted the hawkowl/cleanups-cluster-tls branch October 7, 2024 01:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MIMO] Move cluster certificate functionality to ClientHelper #3736

[MIMO] Move cluster certificate functionality to ClientHelper #3736

hawkowl commented Jul 30, 2024

hawkowl commented Jul 30, 2024

azure-pipelines bot commented Jul 30, 2024

mociarain left a comment

stevekuznetsov Jul 30, 2024

AldoFusterTurpin Jul 31, 2024

hawkowl Aug 5, 2024

AldoFusterTurpin left a comment

AldoFusterTurpin Jul 31, 2024

hawkowl Aug 5, 2024

AldoFusterTurpin Jul 31, 2024

AldoFusterTurpin Jul 31, 2024

AldoFusterTurpin Jul 31, 2024

hawkowl commented Aug 5, 2024

azure-pipelines bot commented Aug 5, 2024

stevekuznetsov Aug 5, 2024

hawkowl Aug 5, 2024

troy0820 Aug 6, 2024

stevekuznetsov Aug 6, 2024

troy0820 Aug 7, 2024

stevekuznetsov Aug 7, 2024

AldoFusterTurpin Aug 7, 2024 •

edited

Loading

hawkowl commented Aug 14, 2024

azure-pipelines bot commented Aug 14, 2024

github-actions bot commented Aug 27, 2024

hawkowl commented Sep 3, 2024

azure-pipelines bot commented Sep 3, 2024

hawkowl commented Sep 3, 2024

azure-pipelines bot commented Sep 3, 2024

hawkowl commented Sep 5, 2024

azure-pipelines bot commented Sep 5, 2024

jaitaiwan left a comment

[MIMO] Move cluster certificate functionality to ClientHelper #3736

[MIMO] Move cluster certificate functionality to ClientHelper #3736

Conversation

hawkowl commented Jul 30, 2024

Which issue this PR addresses:

What this PR does / why we need it:

Test plan for issue:

Is there any documentation that needs to be updated for this PR?

How do you know this will function as expected in production?

hawkowl commented Jul 30, 2024

azure-pipelines bot commented Jul 30, 2024

mociarain left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AldoFusterTurpin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hawkowl commented Aug 5, 2024

azure-pipelines bot commented Aug 5, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AldoFusterTurpin Aug 7, 2024 • edited Loading

Choose a reason for hiding this comment

hawkowl commented Aug 14, 2024

azure-pipelines bot commented Aug 14, 2024

github-actions bot commented Aug 27, 2024

hawkowl commented Sep 3, 2024

azure-pipelines bot commented Sep 3, 2024

hawkowl commented Sep 3, 2024

azure-pipelines bot commented Sep 3, 2024

hawkowl commented Sep 5, 2024

azure-pipelines bot commented Sep 5, 2024

jaitaiwan left a comment

Choose a reason for hiding this comment

AldoFusterTurpin Aug 7, 2024 •

edited

Loading