Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore(reconciler): refactors Reconciler to simplify handling dependent resources #185

Merged

Conversation

bartoszmajsak
Copy link
Contributor

@bartoszmajsak bartoszmajsak commented Apr 2, 2024

Description

This improvement is built on top of Reconciler interface to provide a unified approach for handling dependent resources. For this purpose, from the original Reconciler interface, SubResourceReconciler interface has been derived.

type Reconciler interface {
	// Reconcile ensures the resource related to given InferenceService is in the desired state.
	Reconcile(ctx context.Context, log logr.Logger, isvc *kservev1beta1.InferenceService) error
}

type SubResourceReconciler interface {
	Reconciler
	// Delete removes subresource owned by InferenceService.
	Delete(ctx context.Context, log logr.Logger, isvc *kservev1beta1.InferenceService) error
	// Cleanup ensures singleton resource (such as ServiceMonitor) is removed
	// when there is no InferenceServices left in the namespace.
	Cleanup(ctx context.Context, log logr.Logger, isvcNs string) error
}

It includes the Reconcile function from the parent and in addition provides two additional contracts for handling dependent resources:

Cleanup

This is expected to be called when there is no relevant ISVC in the namespace. This allows handling "singleton" resources used for all ISVCs across the namespace, e.g. PodMonitor or PeerAuthentication.

Delete

Allows to remove resources no longer needed when associated ISVC is deleted. This is currently only used by AuthConfig and Route reconcilers. For the latter, it is needed as the route is in another namespace, therefore we cannot use ISVC as the owner reference.

Other considerations

Alternatively, we could make a distinction between sub-resource being dependent on the ISVC instance (e.g. auth config) and one-per namespace (pod monitor) and split SubResourceReconciler to two for separating concerns.

Enhancements

Compile-time interface check

All reconcilers now include the "compile-time assertion of interface implementation" idiom to ensure consistency across the reconcilers (and actually to leverage the idea of interface contracts).

var _ Reconciler = (*ModelMeshInferenceServiceReconciler)(nil)

Traits

To simplify and reduce code duplication there are two default implementations (called traits) - NoResourceRemoval and SingleResourcePerNamespace which can be included in a concrete SubResourceReconciler.

// NoResourceRemoval is a trait to indicate that given reconciler
// is not supposed to delete any resources left.
type NoResourceRemoval struct{}

func (r *NoResourceRemoval) Delete(_ context.Context, _ logr.Logger, _ *kservev1beta1.InferenceService) error {
	// NOOP
	return nil
}

func (r *NoResourceRemoval) Cleanup(_ context.Context, _ logr.Logger, _ string) error {
	// NOOP
	return nil
}

// SingleResourcePerNamespace is a trait to indicate that given reconciler is only supposed to
// clean up owned resources when there is no relevant ISVC left.
type SingleResourcePerNamespace struct{}

func (r *SingleResourcePerNamespace) Delete(_ context.Context, _ logr.Logger, _ *kservev1beta1.InferenceService) error {
	// NOOP it needs to be cleaned up when no ISVCs left in the Namespace
	return nil
}

No schema passing

When constructing a manager the defined scheme instance is passed to the manager's client. Therefore calling manager.GetClient().Scheme(), or in fact client.Scheme() on injected one, makes passing scheme to reconcilers redundant.

This change simplifies the code by removing explicit coupling with the scheme instance.

Reconcilers in the slice

With this approach, there is no need for direct calls in main reconcilers (i.e. KServe and ModelMesh) to handle reconcile/delete/cleanup logic of related resources. The only requirement now is to add an instance of the subresource reconciler to the slice, the rest is handled uniformly.

I also process all sub-reconcilers now instead of failing fast on the first error, following the eventual consistency approach.

Other minor changes

  • reconciler.go becomes types.go as it holds more than one interface now.
  • introduced go-multierror library which is also found in odh-operator to accumulate errors before returning.

Out-of-scope considerations

There are a few things I noted, but felt like a scope creep to work on them as part of this PR (it's already big and I wanted to refactor, so preserve the behavior and only change the internal structure):

  • we could think about enriching ctx with the log instead of passing it down the call chain. logf allows to easily attach and extract log from ctx. We already use it in webhook

How Has This Been Tested?

Merge criteria:

  • The commits are squashed in a cohesive manner and have meaningful messages.
  • Testing instructions have been added in the PR body (for PRs involving changes that are not immediately obvious).
  • The developer has manually tested the changes and verified that the changes work

This way we can stick to the contract introduced by `Reconciler`
interface and keep the code simpler.

Additionally adds an idiom to ensure that each type conforms to the
interface definition at compile time.

Signed-off-by: bartoszmajsak <[email protected]>
@bartoszmajsak
Copy link
Contributor Author

This can also help with #183 (comment)

This improvement is built on top of `Reconciler` interface to provide a unified approach for handling dependent resources. For this purpose, from the original `Reconciler` interface, `SubResourceReconciler` interface has been derived.

It includes the `Reconcile` function from the parent and in addition provides two additional contracts for handling dependent resources:

>  `Cleanup`

This is expected to be called when there is no relevant ISVC in the namespace. This allows handling "singleton" resources used for all ISVCs across the namespace, e.g. `PodMonitor` or `PeerAuthentication`.

> `Delete`

Allows to remove resources no longer needed when associated ISVC is deleted. This is currently only used by AuthConfig and Route reconcilers. For the latter, it is needed as the route is in another namespace, therefore we cannot use ISVC as the owner reference.

> Enhancements

- All reconcilers now include the "compile-time assertion of interface implementation" idiom to ensure consistency across the reconcilers (and to actually leverage the idea of interface contracts).
- To simplify and reduce code duplication there are two default implementations (called traits) - `NoResourceRemoval` and `SingleResourcePerNamespace` which can be included in a concrete `SubResourceReconciler`.

Signed-off-by: bartoszmajsak <[email protected]>
As it now holds more than a single interface

Signed-off-by: bartoszmajsak <[email protected]>
When constructing a manager the defined scheme instance is passed to
manager's client. Therefore calling manager.GetClient().Scheme(), or in
fact client.Scheme() on injected one, makes passing scheme to
reconcilers redundant.

This simplifies the code by removing explicit coupling with the
scheme instance.

Signed-off-by: bartoszmajsak <[email protected]>
With this approach there is no need for extract code in main reconcilers
(i.e. KServe and ModelMesh) to handle reconcile/delete/cleanup logic of
related resources. The only requirement now is to add instance of the
subresource reconciler to the slice, the rest is handled uniformly.

It also process all subreconcilers now instead of failing fast on the
first occured error, following the eventual consistency approach.

This is handled by using go-multierror library which is also found in odh-operator.

Signed-off-by: bartoszmajsak <[email protected]>
@bartoszmajsak bartoszmajsak marked this pull request as ready for review April 3, 2024 12:50
@bartoszmajsak bartoszmajsak requested a review from Jooho April 3, 2024 12:51
@bartoszmajsak bartoszmajsak changed the title chore(reconciler): splits Raw and Serverless reconcilers chore(reconciler): refactors Reconciler interface to simplify handling dependent resources Apr 3, 2024
@bartoszmajsak bartoszmajsak changed the title chore(reconciler): refactors Reconciler interface to simplify handling dependent resources chore(reconciler): refactors Reconciler to simplify handling dependent resources Apr 3, 2024
@bartoszmajsak
Copy link
Contributor Author

Removing a need for passing scheme (1ab78df) can be a separate PR if you want to keep the scope more focused.

@bartoszmajsak bartoszmajsak requested review from aslakknutsen and removed request for Jooho and terrytangyuan April 3, 2024 13:04
Copy link
Contributor

@VedantMahabaleshwarkar VedantMahabaleshwarkar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this idea a lot. Thanks for the PR @bartoszmajsak. I was wondering if it'd be a good idea to document the implementation expectation for new subresource reconcilers? The explanation you've provided in the PR is excellent, maybe we can add it in a more permanent location (docs/DEVELOPERS_GUIDE.MD perhaps?) Might help the repo stay consistent in the future. WDYT

@bartoszmajsak
Copy link
Contributor Author

I like this idea a lot. Thanks for the PR @bartoszmajsak. I was wondering if it'd be a good idea to document the implementation expectation for new subresource reconcilers? The explanation you've provided in the PR is excellent, maybe we can add it in a more permanent location (docs/DEVELOPERS_GUIDE.MD perhaps?) Might help the repo stay consistent in the future. WDYT

Oh totally! Excellent suggestion.

That's more or less the reason I went one step further with PR description :)

Once we get this PR finalized, and I expect some changes based on feedback, I will follow up with refined dev docs.

return err
}
return nil
}

func (r *KserveAuthConfigReconciler) Delete(ctx context.Context, log logr.Logger, isvc *kservev1beta1.InferenceService) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need explicit Delete function implementation for authorinov1beta2.AuthConfig type resource. As I understood, it already connected with owner reference of InferenceService?

Copy link
Contributor Author

@bartoszmajsak bartoszmajsak Apr 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I thought about it too. It's been like that before and I wonder if we need it. We have an internal store of processed templates, that's why there's this hook here, though it's not really about Deleting stuff in the cluster. @aslakknutsen can this be simplified somehow?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The store is just a wrapper around Client and has no logic per say. We could leave clean up to Kubernetes GC as suggested.

Copy link
Contributor Author

@bartoszmajsak bartoszmajsak Apr 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok then it would only leave route reconciler left with Delete, as it's a special case as route sits in istio ns. I will clean it up.

That's this piece in main:

func (r *KserveRouteReconciler) DeleteRoute(ctx context.Context, isvc *kservev1beta1.InferenceService) error {
return r.routeHandler.DeleteRoute(ctx, types.NamespacedName{Name: getKServeRouteName(isvc), Namespace: constants2.IstioNamespace})
}

controllers/reconcilers/kserve_authconfig_reconciler.go Outdated Show resolved Hide resolved
"k8s.io/apimachinery/pkg/types"
ctrl "sigs.k8s.io/controller-runtime"
"sigs.k8s.io/controller-runtime/pkg/client"
)

var _ SubResourceReconciler = (*KserveAuthConfigReconciler)(nil)

type KserveAuthConfigReconciler struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

KserveAuthConfigReconciler should implement NoResourceRemoval interface as it have owner relationship with inference resource.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

controllers/reconcilers/mm_inferenceservice_reconciler.go Outdated Show resolved Hide resolved
Copy link
Member

@vaibhavjainwiz vaibhavjainwiz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code changes looks good to me. I didn't get a chance to test these changes by running a inference on cluster. Could you please verify it once.

@openshift-ci openshift-ci bot added the approved label Apr 4, 2024
Copy link
Member

@spolti spolti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice @bartoszmajsak

Copy link
Contributor

@VedantMahabaleshwarkar VedantMahabaleshwarkar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

Copy link
Contributor

openshift-ci bot commented Apr 5, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bartoszmajsak, spolti, vaibhavjainwiz, VedantMahabaleshwarkar

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [VedantMahabaleshwarkar,spolti,vaibhavjainwiz]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-bot openshift-merge-bot bot merged commit 8a2f788 into opendatahub-io:main Apr 5, 2024
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants