Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(ossm): adds OSSM annotations to the relevant cluster resources #1088

Merged
merged 20 commits into from
Oct 4, 2023

Conversation

cam-garrison
Copy link

@cam-garrison cam-garrison commented Apr 4, 2023

Closes: #1087

Description

This change takes care of populating the following annotations to resources created through ODH Dashboard

  • opendatahub.io/service-mesh can be set to true or false and will be used to adjust ODH components and make them part of the Service Mesh. This is applied on both Project and Notebook resources
  • The dashboard will now link to notebooks through the mesh if feature flag disableServiceMesh=false

In addition, the code hides service-mesh-specific changes to annotations behind disableServiceMesh flag which is added in opendatahub-io/opendatahub-operator#217

How Has This Been Tested?

We have tested this manually to ensure that when disableServiceMesh=false the resources mentioned above contain the corresponding annotations. Also have tested manually to ensure that when disableServiceMesh=false notebook links send the user through the mesh ingress-gateway, and when disableServiceMesh=true the user is sent through the OpenShift route as before.

The cluster adjustments to enable Service Mesh with ODH are still in progress (PR in manifests here) .

One visible change even without the Service Mesh deployed is that when disableServiceMesh=false the oauth-proxy sidecar is not injected anymore. This is intended.

Until this PR is merged and a new bundle is released you can use my build of the ODH Operator that includes the disableServiceMesh flag as part of the ODHDashboardConfig. Do so by running operator-sdk run bundle quay.io/cgarriso/opendatahub-operator-bundle:dev-0.0.2 --namespace $OPERATOR_NAMESPACE (req operator-sdk v1.24.1 )

Either build and push this dashboard image or use quay.io/maistra-dev/odh-dashboard:ossm_annotations image.

Test Impact

If you can offer us some hints on how to test feature flags and resource creation through Jest tests we are happy to extend this PR with it.

Request review criteria:

  • The commits have meaningful messages (squashes happen on merge by the bot).
  • Included any necessary screenshots or gifs if it was a UI change.
  • Included tags to the UX team if it was a UI/UX change.
  • Testing instructions have been added in the PR body (for PRs involving changes that are not immediately obvious).
  • The developer has manually tested the changes and verified that the changes work
  • The developer has added tests or explained why testing cannot be added (unit tests & storybook for related changes)

@openshift-ci openshift-ci bot added the needs-ok-to-test The openshift bot needs to label PRs from non members to avoid strain on the CI label Apr 4, 2023
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Apr 4, 2023

Hi @cam-garrison. Thanks for your PR.

I'm waiting for a opendatahub-io member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@andrewballantyne
Copy link
Member

/ok-to-test

@openshift-ci openshift-ci bot added ok-to-test The openshift bot needs `ok-to-test` to allow non member PRs to run the tests. and removed needs-ok-to-test The openshift bot needs to label PRs from non members to avoid strain on the CI labels Apr 5, 2023
@cam-garrison
Copy link
Author

I also have this recorded demo showing the changes which may be useful for review.

Copy link
Contributor

@lucferbux lucferbux left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm requesting changes, this could be solved just with some clarification about the code, I'm lacking of the context about how it will be handled by the controller, but so far I'm worried about what happens when you enable/disable the feature flag with existing notebooks, which is the most common scenario we'll get.

backend/src/routes/api/namespaces/namespaceUtils.ts Outdated Show resolved Hide resolved
backend/src/routes/api/notebooks/utils.ts Outdated Show resolved Hide resolved
@lucferbux
Copy link
Contributor

@bartoszmajsak @cam-garrison Hi! We've added new features that might conflict with the current PR, if you have any questions let me know and we can get through them.

@openshift-merge-robot openshift-merge-robot removed the needs-rebase PR needs to be rebased label May 16, 2023
cam-garrison and others added 5 commits June 28, 2023 07:33
This change takes care of populating the following annotations to resources created through ODH Dashboard

- `opendatahub.io/service-mesh` can be set to `true` or `false` and will be used to alternate ODH components and make them part of the Service Mesh. This is applied on both `Project` and `Notebook` resources
- `opendatahub.io/hub-url` is added to the `Notebook` resources at this point and is inteded for use with Authorino's `AuthConfig` host
- The dashboard will now link to notebooks through the mesh if feature flag `disableServiceMesh=false`

In addition, the code hides service-mesh-specific changes to annotations behind `disableServiceMesh` flag which is added in opendatahub-io/opendatahub-operator#217

We have tested this manually. If you can offer us some hints on how to test feature flags and resource creation through Jest tests we are happy to extend this PR with it.

Until this [PR](opendatahub-io/opendatahub-operator#217) is merged and a new bundle is released you can use my build of the ODH Operator that includes the `disableServiceMesh` flag as part of the `ODHDashboardConfig`.
Do so by running `operator-sdk run bundle quay.io/cgarriso/opendatahub-operator-bundle:dev-0.0.1 --namespace $OPERATOR_NAMESPACE` (req operator-sdk v1.24.1 )

Either build and push this dashboard image or use `quay.io/maistra-dev/odh-dashboard:ossm_annotations` image.

Co-authored-by: bartoszmajsak <[email protected]>
* use project annotation for ds project nb routing

* chore: passes host instead of the entire route

and removes hub-url annotation as it is not needed

* remove reference to hub-host

---------

Co-authored-by: bartoszmajsak <[email protected]>
* update existing annotation reference

* use ns annotation in backend route fetching
@cam-garrison
Copy link
Author

Hi @lucferbux,

This PR is now ready for your re-review.

Changes since you last reviewed:

  • Added a service-mesh annotation to the created data science projects. This annotation is utilized in our odh-project-controller to add service-mesh resources and annotations to data science projects).
  • Modified the notebook linking logic. Now, instead of fetching the ingress-gateway route from the istio-system namespace, we are fetching the host from an annotation created by the odh-project-controller. This change simplifies the logic when linking to notebooks and prevents permissions issues when trying to fetch a route from the istio-system namespace.
  • Regarding migration - notebooks that were initially created with disableServiceMesh=true and later had disableServiceMesh set to false will have their annotations (inject-oauth and service-mesh) "migrated" upon restart. The commit adding this change can be found here. Please note, this PR does not touch the data science project annotation; this task will be handled elsewhere (currently in our init-job in manifests we will switch all previously created dsp to be a part of the service mesh).

I believe these modifications, particularly concerning the migration of old notebooks' annotations, should address your prior concerns. Please take a look and don't hesitate to reach out if you have any questions!

@lucferbux
Copy link
Contributor

Hi @lucferbux,

This PR is now ready for your re-review.

Changes since you last reviewed:

  • Added a service-mesh annotation to the created data science projects. This annotation is utilized in our odh-project-controller to add service-mesh resources and annotations to data science projects).
  • Modified the notebook linking logic. Now, instead of fetching the ingress-gateway route from the istio-system namespace, we are fetching the host from an annotation created by the odh-project-controller. This change simplifies the logic when linking to notebooks and prevents permissions issues when trying to fetch a route from the istio-system namespace.
  • Regarding migration - notebooks that were initially created with disableServiceMesh=true and later had disableServiceMesh set to false will have their annotations (inject-oauth and service-mesh) "migrated" upon restart. The commit adding this change can be found here. Please note, this PR does not touch the data science project annotation; this task will be handled elsewhere (currently in our init-job in manifests we will switch all previously created dsp to be a part of the service mesh).

I believe these modifications, particularly concerning the migration of old notebooks' annotations, should address your prior concerns. Please take a look and don't hesitate to reach out if you have any questions!

Ok, I think I'll try to clear my morning tomorrow to give a deep look at the changes, thank you very much!

@lucferbux
Copy link
Contributor

@cam-garrison In the meantime, take a look at this warnings that you need to address before merging the PR.
Runing locally npm test should tell you the linting warnings the CI is raising.

@lucferbux
Copy link
Contributor

/hold

@openshift-ci openshift-ci bot added the do-not-merge/hold This PR is hold for some reason label Jul 7, 2023
@cam-garrison
Copy link
Author

Thx for the review! I think I've addressed all points now - left one followup question where I wasn't sure.

Re:

I've added #1838 as a tracker, this should go to incubation first, changed targeted branch to f/ossm

What does "moving to incubation" mean exactly? testing by QE? Will there be some publicly available build available? Just want to know what the incubation phase entails.

Thanks!

@cam-garrison
Copy link
Author

cam-garrison commented Oct 4, 2023

@lucferbux can we re-target this to incubation branch? - going to be adding one more commit as well with manifests updates.

@cam-garrison
Copy link
Author

Added manifests changes from #1900 into this PR:
See closed PR desc for full reasoning, TLDR: since v2 operator pulls manifests from dashboard, need to add requisite service mesh overlay.

@andrewballantyne
Copy link
Member

As I noted in Slack: https://github.com/opendatahub-io/odh-dashboard/blob/main/docs/process-definition/branches.md our branching strategy uses feature branches first. Going directly to incubation removes the layer of indirection that allows it to ever merge back to main & onto downstream products like RHODS.

Copy link
Contributor

@lucferbux lucferbux left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just small nits, I'm not 100% sure about all the config files, i lack a little bit more context for that, and I haven't tested that yet, but so far code looks great.

Please squash before we can merge, thanks a lot for all the work @cam-garrison

enablePipelines?: boolean,
): Promise<NotebookKind> => {
const patches: Patch[] = [];
patches.push(startPatch);
const enableServiceMesh = featureFlagEnabled(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'm talking about something like this: https://github.com/opendatahub-io/odh-dashboard/blob/main/frontend/src/pages/projects/projectSharing/utils.ts#L7 is not mandatory but it might be nice to have

@@ -0,0 +1,10 @@
# Each list is sorted alphabetically, additions should maintain that order
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we have this OWNERS files in the manifest, I don't understand

Copy link
Member

@andrewballantyne andrewballantyne Oct 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OWNERS files can be placed at lower levels to support "just these files" changing. So if something under the overlays/service-mesh folder is the only thing to change, they would need need the repo ownership to step in to review it. It's additive on

I think this is fine for this use-case... We should definitely avoid OWNERS files (with approve access) over code so that we can still approve the holistic nature of that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we have this OWNERS files in the manifest, I don't understand

Our line of thinking is that we offer help and knowledge to keep this thing up-to-date and evolving. So if anything will be proposed to change here we will be pinged

@@ -0,0 +1,4 @@
## Dashboard Minimal overlay
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see the harm on having a minimal overlay, but maybe we would want to have this in other PR?

cc @andrewballantyne @christianvogt what do you think? Not sure if we should stick this, which is related to a deployment of our repo with this same PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a neat idea... I'm not against new overlays like this. @cam-garrison was this something you used in your testing/coding?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this can be removed. It is anyway not possible to combine overlays like it was with KfDef magic, so better to make it separate and shrink this PR.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we used this for testing/running in CRC specifically. +1 on bartosz's point though, can be removed especially since in v2 operator it is not as easy to apply multiple overlays.

Comment on lines +36 to +47
if (enableServiceMesh) {
host = await getServiceMeshGwHost(fastify, namespace).catch((e) => {
fastify.log.warn(`Failed getting route ${notebookName}: ${e.message}`);
return undefined;
});
} else {
const route = await getRoute(fastify, namespace, notebookName).catch((e) => {
fastify.log.warn(`Failed getting route ${notebookName}: ${e.message}`);
return undefined;
});
host = route?.spec.host;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we fall back to retrieving the host through the route?
By enabling the flag without service mesh, we'll get an invalid url.

Suggested change
if (enableServiceMesh) {
host = await getServiceMeshGwHost(fastify, namespace).catch((e) => {
fastify.log.warn(`Failed getting route ${notebookName}: ${e.message}`);
return undefined;
});
} else {
const route = await getRoute(fastify, namespace, notebookName).catch((e) => {
fastify.log.warn(`Failed getting route ${notebookName}: ${e.message}`);
return undefined;
});
host = route?.spec.host;
}
if (enableServiceMesh) {
host = await getServiceMeshGwHost(fastify, namespace).catch((e) => {
fastify.log.warn(`Failed getting service mesh route ${notebookName}: ${e.message}`);
return undefined;
});
}
if (!host) {
const route = await getRoute(fastify, namespace, notebookName).catch((e) => {
fastify.log.warn(`Failed getting route ${notebookName}: ${e.message}`);
return undefined;
});
host = route?.spec.host;
}

Comment on lines +593 to +595
export const featureFlagEnabled = (disabledSettingState?: boolean): boolean =>
disabledSettingState === false;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest adding the same comment as we have present in the frontend code.

Suggested change
export const featureFlagEnabled = (disabledSettingState?: boolean): boolean =>
disabledSettingState === false;
/**
* Feature flags are required in the config -- but upgrades can be mixed and omission of the property
* usually ends up being enabled. This will prevent that as a general utility.
*/
export const featureFlagEnabled = (disabledSettingState?: boolean): boolean =>
disabledSettingState === false;

Comment on lines +29 to +31
const getRoutePromise = !enableServiceMesh
? getRoute(notebookName, projectName).then((route) => route?.spec.host)
: getServiceMeshGwHost(projectName);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should getRoute become a fallback instead of a conditional?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting idea... using the getRoute when getServiceMeshGwHost returns an invalid value?

);
// if not using service mesh fetch openshift route, otherwise get Istio Ingress Gateway route
const getRoutePromise = !enableServiceMesh
? getRoute(notebookName, projectName).then((route) => route?.spec.host)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
? getRoute(notebookName, projectName).then((route) => route?.spec.host)
? getRoute(notebookName, projectName).then((route) => route.spec.host)

if (cancelled) {
return;
}
setRoute(`https://${route.spec.host}/notebook/${projectName}/${notebookName}`);
setRoute(`https://${host}/notebook/${projectName}/${notebookName}/`);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getServiceMeshGwHost can return null creating an invalid url.

Comment on lines +245 to +247
getInjectOAuthPatch(enableServiceMesh),
getServiceMeshPatch(enableServiceMesh),
getProxyInjectPatch(enableServiceMesh),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These seem all very related, we may want to look at cleaning that up in the future to make it one "set of patches" rather than 3 separate ones. Just for code cleanness.

Comment on lines +29 to +31
const getRoutePromise = !enableServiceMesh
? getRoute(notebookName, projectName).then((route) => route?.spec.host)
: getServiceMeshGwHost(projectName);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting idea... using the getRoute when getServiceMeshGwHost returns an invalid value?

frontend/src/utilities/notebookControllerUtils.ts Outdated Show resolved Hide resolved
@andrewballantyne andrewballantyne added do-not-merge/hold This PR is hold for some reason lgtm approved labels Oct 4, 2023
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 4, 2023

[APPROVALNOTIFIER] This PR is APPROVED

Approval requirements bypassed by manually added approval.

This pull-request has been approved by:

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@cam-garrison cam-garrison merged commit e94a848 into opendatahub-io:f/ossm Oct 4, 2023
3 of 4 checks passed
@andrewballantyne
Copy link
Member

andrewballantyne commented Oct 4, 2023

For the record -- I asked Cameron to merge it with a squash to save the time working around the merge commits from main.

He had powers because of the org-management permissions -- I'll look to be removing this shortly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved lgtm ok-to-test The openshift bot needs `ok-to-test` to allow non member PRs to run the tests. pr/no-tests-allowed Added by an official approver - this PR is allowed no tests. Omitted, a test must accompany the PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature Request]: Support use of Service Mesh for routing
6 participants