Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(tracker): allows to define ownerReferences for FeatureTrackers #1228

Merged

Conversation

bartoszmajsak
Copy link
Contributor

@bartoszmajsak bartoszmajsak commented Sep 10, 2024

Description

Introduces the ability to define owner reference for FeatureTracker (internal custom resource). This enables garbage collection of resources belonging to the given Feature when its owner such as DSC or DSCI has been removed.

When Features are grouped through means of FeatureHandlers, the ownership is defined under the hood, linking it to DSCI or DSC, depending on the type used.

In addition, this PR simplifies how Feature relies on client.Client instance. Instead of creating its own instance, it expects to have one passed as part of the builder call chain. It creates a default one otherwise. With that we can rely on a client configured for controllers with the right schemas, caching rules, etc. As a consequence the change of retry logic for status needed to be adjusted. Besides a conflict, it needs to be triggered on the NotFound error, as a shared client's cache might not have yet an instance of FeatureTracker created just before the condition is reported. The reason why it was working before is the fact that Feature has its own simple client instance w/o caching.

Important

With DSCI and DSC becoming owners of particular FeatureTrackers, func (c *Component) Cleanup defined for ComponentInterface becomes somewhat speculative. It's been used in the finalizer block for respective controllers to ensure that FTs have been removed, but it also calls any other custom cleanup code attached to a Feature.

The only use case where the custom function is invoked at this moment is unpatching/removing custom authorization provider added to SMCP. This was based on an early assumption that we might want to plug into customers' existing SMCP instance. It's unclear to me if this is still long-term thinking so we might want to revisit the need for this functionality.

From the completeness point of view, if we allow to create/manipulate resources programmatically as part of the Feature DSL, we should be able to register cleanup counter-part functions, especially if we cannot simply remove them (as this is handled through ownership of associated FeatureTracker).

There is, however, a need to perform cleanup when the particular component is "Removed" (not managed anymore). Right now this is handled inside its Reconcile function.

Fixes RHOAIENG-8629

Relates to #1192 (comment)

How Has This Been Tested?

make test

but also:

  • create/remove DSCI to validate if FeatureTrackers are removed
  • create/remove DSC with Kserve

Screenshot or short clip

Merge criteria

  • You have read the contributors guide.
  • Commit messages are meaningful - have a clear and concise summary and detailed explanation of what was changed and why.
  • Pull Request contains a description of the solution, a link to the JIRA issue, and to any dependent or related Pull Request.
  • Testing instructions have been added in the PR body (for PRs involving changes that are not immediately obvious).
  • The developer has manually tested the changes and verified that the changes work

Introduces ability to define owner reference for FeatureTracker (internal
custom resource). This enables garbage collection of resources belonging
to the given Feature when its owner such as DSC or DSCI has been removed.

When Features are grouped through means of `FeatureHandler`s, the
ownership is defined under the hood, linking it to DSCI or DSC,
depending on the type used.

In addition, this PR simplifies how Feature relies on `client.Client`
instance. Instead of creating its own instance it expects to have one
passed as part of the builder call chain. It creates a default one otherwise.
With that we can rely on a client configured for controllers with the
right schemas, caching rules etc. As a consequence the change of retry
logic for status needed to adjusted. Besides a conflict it needs to be
triggered on NotFound error, as shared client's cache might not have yet
an instance of FeatureTracker created just before the condition is
reported. The reason why it was working before is the fact that Feature
has its own simple client instance w/o caching.

> [!IMPORTANT]
>
> With DSCI and DSC becoming owners of particular FeatureTrackers
> `func (c *Component) Cleanup` called in finalizer block becomes
> somewhat speculative. The only use case where custom function is
> invoked is unpatching SMCP authorization provider. This was based
> on early assumption that we might want to plug into customer's
> existing SMCP instance. It's unclear to me if this is still long-term
> thinking so we might want to revisit need for this function.
>
> From the completness point of view, if we allow to create/manipulate
> resources programatically as part of the Feature DSL, we should be able
> to register cleanup counter-part functions, especially if we cannot simply
> remove them (as this is handled through ownership of associated
> FeatureTracker).
>
> There is, however, a need to perform cleanup when particular
> component is "Removed" (not managed anymore). Right now this is
> handled inside its Reconcile function.

Fixes [RHOAIENG-8629](https://issues.redhat.com/browse/RHOAIENG-8629)

Relates to opendatahub-io#1192 (comment)

Co-authored-by: Bartosz Majsak <[email protected]>
@lburgazzoli
Copy link
Contributor

I don't have the full context but the PR looks sensible to me.

One minor question would be if, instead of storing the client and eventually creating a new one as part of the Feature struct , we should pass it to the Apply and Cleanup function as, with my limited understanding, a client would always be available at that point in time.

@bartoszmajsak
Copy link
Contributor Author

bartoszmajsak commented Sep 11, 2024

I don't have the full context but the PR looks sensible to me.

One minor question would be if, instead of storing the client and eventually creating a new one as part of the Feature struct , we should pass it to the Apply and Cleanup function as, with my limited understanding, a client would always be available at that point in time.

@lburgazzoli We could pass it along, and at the surface, this will require changes to Apply and Cleanup. We will need to adjust some internals too, but not too much.

There's a type Action func(ctx context.Context, f *Feature) error which allows to define closures to be used in the Feature as preconditions (e.g. checking if given dependency exists before proceeding - a glorified if in a sense) or programmatic resource creation. In these functions, we rely on f.Client and access data associated with a given Feature instance. Making the client a part of the function signature would make it explicit/obvious it can be used and presumably more idiomatic as it will now have ctx, client as the first two arguments. It will also remove the need of exporting client for not good enough reason. From that angle, decoupling the client, it makes sense to me. I am wondering if there is any other argument in favor.

But with all that said, I would prefer to make it a separate PR.

@lburgazzoli
Copy link
Contributor

But with all that said, I would prefer to make it a separate PR.

That would be perfectly fine with me.

I let others to chime in for an actual review as I don't have the full context.

@bartoszmajsak
Copy link
Contributor Author

@lburgazzoli In the same line of thinking we should pass the logger along, so the origins of Apply/Cleanup are more accurate. Also another PR. Can do both.

@ykaliuta
Copy link
Contributor

I'm wondering how that cache works. For my understanding of caching in general, cache miss causes look up to the main storage and reporting absence only after that.

return cli.Update(ctx, smcp)
// Update the ServiceMeshControlPlane with the removed extension provider.
// As it could have been updated by another controller in the meantime, we need to retry on conflict.
return retry.RetryOnConflict(retry.DefaultBackoff, func() error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's an additional fix, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, discovered during testing of this change. We can extract it to a tiny PR but it's a bit overdoing in my eyes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mostly wanted to confirm that I did not myself miss the point from the patch :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PS: and it deserve to be separated just because it is not part of the PR's change, not because of the size
/me hides :)))

})
dsci.APIVersion = dsciv1.GroupVersion.String()
dsci.Kind = gvk.DSCInitialization.Kind
return dsci, err
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did not get why you need to return dsci if you modifying it inplace.

Copy link
Contributor Author

@bartoszmajsak bartoszmajsak Sep 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Me neither :D I think we missed it in the first run with @cam-garrison.

Fixed in 5d25dc8

Thanks!

Spec: dsciv1.DSCInitializationSpec{
}

_, errCreate := controllerutil.CreateOrUpdate(ctx, cli, dsci, func() error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why it's needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To have proper UID created by kube-apiserver, as DSCI is now used as owner-ref.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. I'm not big fan of mixing allocating and k8s creation in New* function (even if it causes less code changes), but it's up to your conventions :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough, but that's a different question :) We can split it up.

@bartoszmajsak
Copy link
Contributor Author

bartoszmajsak commented Sep 13, 2024

I'm wondering how that cache works. For my understanding of caching in general, cache miss causes look up to the main storage and reporting absence only after that.

The cache in controller-runtime is a bit of a weird beast. IIUC this is how it works under the hood:

  • Reads: The controller retrieves data from the cache.
  • Writes: When the controller writes, it communicates directly with the API server, bypassing the cache. The cache then updates asynchronously.

This means the controller might read stale data, leading to operations based on an outdated state. This behavior is precisely why the retry logic needed adjustment, to address potential inconsistencies between the cached data and the actual state in the API server.

Copy link

openshift-ci bot commented Sep 13, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ykaliuta

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-bot openshift-merge-bot bot merged commit 10dd554 into opendatahub-io:incubation Sep 13, 2024
8 checks passed
Copy link
Member

@zdtsw zdtsw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

bartoszmajsak added a commit to bartoszmajsak/opendatahub-operator that referenced this pull request Sep 13, 2024
Including the client in the function signature makes its usage
explicit and more idiomatic. This approach clarifies that the
client is intended for use within the function. Decoupling the
client from Feature struct  enhances code readability.

It's a follow-up to PR
[opendatahub-io#1228](opendatahub-io#1228 (comment)).
bartoszmajsak added a commit to bartoszmajsak/opendatahub-operator that referenced this pull request Sep 13, 2024
Including the client in the function signature makes its usage
explicit and more idiomatic. This approach clarifies that the
client is intended for use within the function. Decoupling the
client from Feature struct  enhances code readability.

It's a follow-up to PR
[opendatahub-io#1228](opendatahub-io#1228 (comment)).
@bartoszmajsak bartoszmajsak deleted the feature-owner-refs branch September 13, 2024 17:38
bartoszmajsak added a commit to bartoszmajsak/opendatahub-operator that referenced this pull request Sep 13, 2024
Ensures each tests uses different DSCI instance as a test fixture.
Without this tests are failing (as they do in latest incubation commit),
as opendatahub-io#1222 introduced validation check to ensure AppNamespace is
immutable, but opendatahub-io#1228 is trying to update DSCI violating new constraint.

With this change each test uses dedicated DSCI.
bartoszmajsak added a commit to bartoszmajsak/opendatahub-operator that referenced this pull request Sep 13, 2024
Ensures each tests uses different DSCI instance as a test fixture.
Without this tests are failing (as they do in latest incubation commit),
as opendatahub-io#1222 introduced validation check to ensure AppNamespace is
immutable, but opendatahub-io#1228 is trying to update DSCI violating new constraint.

With this change each test uses dedicated DSCI.
bartoszmajsak added a commit to bartoszmajsak/opendatahub-operator that referenced this pull request Sep 13, 2024
Ensures each tests uses different DSCI instance as a test fixture.
Without this tests are failing (as they do in latest incubation commit),
as opendatahub-io#1222 introduced validation check to ensure AppNamespace is
immutable, but opendatahub-io#1228 is trying to update DSCI violating new constraint.

With this change each test uses dedicated DSCI.
openshift-merge-bot bot pushed a commit that referenced this pull request Sep 14, 2024
Ensures each tests uses different DSCI instance as a test fixture.
Without this tests are failing (as they do in latest incubation commit),
as #1222 introduced validation check to ensure AppNamespace is
immutable, but #1228 is trying to update DSCI violating new constraint.

With this change each test uses dedicated DSCI.
openshift-merge-bot bot pushed a commit that referenced this pull request Sep 16, 2024
Including the client in the function signature makes its usage
explicit and more idiomatic. This approach clarifies that the
client is intended for use within the function. Decoupling the
client from Feature struct  enhances code readability.

It's a follow-up to PR
[#1228](#1228 (comment)).
VaishnaviHire pushed a commit to VaishnaviHire/opendatahub-operator that referenced this pull request Sep 16, 2024
…pendatahub-io#1228)

* feat(tracker): feature tracker can have ownerReferences

Introduces ability to define owner reference for FeatureTracker (internal
custom resource). This enables garbage collection of resources belonging
to the given Feature when its owner such as DSC or DSCI has been removed.

When Features are grouped through means of `FeatureHandler`s, the
ownership is defined under the hood, linking it to DSCI or DSC,
depending on the type used.

In addition, this PR simplifies how Feature relies on `client.Client`
instance. Instead of creating its own instance it expects to have one
passed as part of the builder call chain. It creates a default one otherwise.
With that we can rely on a client configured for controllers with the
right schemas, caching rules etc. As a consequence the change of retry
logic for status needed to adjusted. Besides a conflict it needs to be
triggered on NotFound error, as shared client's cache might not have yet
an instance of FeatureTracker created just before the condition is
reported. The reason why it was working before is the fact that Feature
has its own simple client instance w/o caching.

> [!IMPORTANT]
>
> With DSCI and DSC becoming owners of particular FeatureTrackers
> `func (c *Component) Cleanup` called in finalizer block becomes
> somewhat speculative. The only use case where custom function is
> invoked is unpatching SMCP authorization provider. This was based
> on early assumption that we might want to plug into customer's
> existing SMCP instance. It's unclear to me if this is still long-term
> thinking so we might want to revisit need for this function.
>
> From the completness point of view, if we allow to create/manipulate
> resources programatically as part of the Feature DSL, we should be able
> to register cleanup counter-part functions, especially if we cannot simply
> remove them (as this is handled through ownership of associated
> FeatureTracker).
>
> There is, however, a need to perform cleanup when particular
> component is "Removed" (not managed anymore). Right now this is
> handled inside its Reconcile function.

Fixes [RHOAIENG-8629](https://issues.redhat.com/browse/RHOAIENG-8629)

Relates to opendatahub-io#1192 (comment)

Co-authored-by: Bartosz Majsak <[email protected]>

* fix: no need to return dsci

---------

Co-authored-by: Cameron Garrison <[email protected]>
(cherry picked from commit 10dd554)
VaishnaviHire pushed a commit to VaishnaviHire/opendatahub-operator that referenced this pull request Sep 16, 2024
…1235)

Ensures each tests uses different DSCI instance as a test fixture.
Without this tests are failing (as they do in latest incubation commit),
as opendatahub-io#1222 introduced validation check to ensure AppNamespace is
immutable, but opendatahub-io#1228 is trying to update DSCI violating new constraint.

With this change each test uses dedicated DSCI.

(cherry picked from commit df1add0)
VaishnaviHire pushed a commit to VaishnaviHire/opendatahub-operator that referenced this pull request Sep 16, 2024
Including the client in the function signature makes its usage
explicit and more idiomatic. This approach clarifies that the
client is intended for use within the function. Decoupling the
client from Feature struct  enhances code readability.

It's a follow-up to PR
[opendatahub-io#1228](opendatahub-io#1228 (comment)).

(cherry picked from commit fde5d69)
VaishnaviHire pushed a commit to VaishnaviHire/opendatahub-operator that referenced this pull request Sep 16, 2024
…pendatahub-io#1228)

* feat(tracker): feature tracker can have ownerReferences

Introduces ability to define owner reference for FeatureTracker (internal
custom resource). This enables garbage collection of resources belonging
to the given Feature when its owner such as DSC or DSCI has been removed.

When Features are grouped through means of `FeatureHandler`s, the
ownership is defined under the hood, linking it to DSCI or DSC,
depending on the type used.

In addition, this PR simplifies how Feature relies on `client.Client`
instance. Instead of creating its own instance it expects to have one
passed as part of the builder call chain. It creates a default one otherwise.
With that we can rely on a client configured for controllers with the
right schemas, caching rules etc. As a consequence the change of retry
logic for status needed to adjusted. Besides a conflict it needs to be
triggered on NotFound error, as shared client's cache might not have yet
an instance of FeatureTracker created just before the condition is
reported. The reason why it was working before is the fact that Feature
has its own simple client instance w/o caching.

> [!IMPORTANT]
>
> With DSCI and DSC becoming owners of particular FeatureTrackers
> `func (c *Component) Cleanup` called in finalizer block becomes
> somewhat speculative. The only use case where custom function is
> invoked is unpatching SMCP authorization provider. This was based
> on early assumption that we might want to plug into customer's
> existing SMCP instance. It's unclear to me if this is still long-term
> thinking so we might want to revisit need for this function.
>
> From the completness point of view, if we allow to create/manipulate
> resources programatically as part of the Feature DSL, we should be able
> to register cleanup counter-part functions, especially if we cannot simply
> remove them (as this is handled through ownership of associated
> FeatureTracker).
>
> There is, however, a need to perform cleanup when particular
> component is "Removed" (not managed anymore). Right now this is
> handled inside its Reconcile function.

Fixes [RHOAIENG-8629](https://issues.redhat.com/browse/RHOAIENG-8629)

Relates to opendatahub-io#1192 (comment)

Co-authored-by: Bartosz Majsak <[email protected]>

* fix: no need to return dsci

---------

Co-authored-by: Cameron Garrison <[email protected]>
(cherry picked from commit 10dd554)
VaishnaviHire pushed a commit to VaishnaviHire/opendatahub-operator that referenced this pull request Sep 16, 2024
…1235)

Ensures each tests uses different DSCI instance as a test fixture.
Without this tests are failing (as they do in latest incubation commit),
as opendatahub-io#1222 introduced validation check to ensure AppNamespace is
immutable, but opendatahub-io#1228 is trying to update DSCI violating new constraint.

With this change each test uses dedicated DSCI.

(cherry picked from commit df1add0)
VaishnaviHire pushed a commit to VaishnaviHire/opendatahub-operator that referenced this pull request Sep 16, 2024
Including the client in the function signature makes its usage
explicit and more idiomatic. This approach clarifies that the
client is intended for use within the function. Decoupling the
client from Feature struct  enhances code readability.

It's a follow-up to PR
[opendatahub-io#1228](opendatahub-io#1228 (comment)).

(cherry picked from commit fde5d69)
openshift-merge-bot bot pushed a commit to red-hat-data-services/rhods-operator that referenced this pull request Sep 18, 2024
…pendatahub-io#1228)

* feat(tracker): feature tracker can have ownerReferences

Introduces ability to define owner reference for FeatureTracker (internal
custom resource). This enables garbage collection of resources belonging
to the given Feature when its owner such as DSC or DSCI has been removed.

When Features are grouped through means of `FeatureHandler`s, the
ownership is defined under the hood, linking it to DSCI or DSC,
depending on the type used.

In addition, this PR simplifies how Feature relies on `client.Client`
instance. Instead of creating its own instance it expects to have one
passed as part of the builder call chain. It creates a default one otherwise.
With that we can rely on a client configured for controllers with the
right schemas, caching rules etc. As a consequence the change of retry
logic for status needed to adjusted. Besides a conflict it needs to be
triggered on NotFound error, as shared client's cache might not have yet
an instance of FeatureTracker created just before the condition is
reported. The reason why it was working before is the fact that Feature
has its own simple client instance w/o caching.

> [!IMPORTANT]
>
> With DSCI and DSC becoming owners of particular FeatureTrackers
> `func (c *Component) Cleanup` called in finalizer block becomes
> somewhat speculative. The only use case where custom function is
> invoked is unpatching SMCP authorization provider. This was based
> on early assumption that we might want to plug into customer's
> existing SMCP instance. It's unclear to me if this is still long-term
> thinking so we might want to revisit need for this function.
>
> From the completness point of view, if we allow to create/manipulate
> resources programatically as part of the Feature DSL, we should be able
> to register cleanup counter-part functions, especially if we cannot simply
> remove them (as this is handled through ownership of associated
> FeatureTracker).
>
> There is, however, a need to perform cleanup when particular
> component is "Removed" (not managed anymore). Right now this is
> handled inside its Reconcile function.

Fixes [RHOAIENG-8629](https://issues.redhat.com/browse/RHOAIENG-8629)

Relates to opendatahub-io#1192 (comment)

Co-authored-by: Bartosz Majsak <[email protected]>

* fix: no need to return dsci

---------

Co-authored-by: Cameron Garrison <[email protected]>
(cherry picked from commit 10dd554)
openshift-merge-bot bot pushed a commit to red-hat-data-services/rhods-operator that referenced this pull request Sep 18, 2024
…1235)

Ensures each tests uses different DSCI instance as a test fixture.
Without this tests are failing (as they do in latest incubation commit),
as opendatahub-io#1222 introduced validation check to ensure AppNamespace is
immutable, but opendatahub-io#1228 is trying to update DSCI violating new constraint.

With this change each test uses dedicated DSCI.

(cherry picked from commit df1add0)
openshift-merge-bot bot pushed a commit to red-hat-data-services/rhods-operator that referenced this pull request Sep 18, 2024
Including the client in the function signature makes its usage
explicit and more idiomatic. This approach clarifies that the
client is intended for use within the function. Decoupling the
client from Feature struct  enhances code readability.

It's a follow-up to PR
[opendatahub-io#1228](opendatahub-io#1228 (comment)).

(cherry picked from commit fde5d69)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

5 participants