Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Manifest readiness for Operator v2 #237

Merged
merged 6 commits into from
Oct 19, 2023

Conversation

VedantMahabaleshwarkar
Copy link

@VedantMahabaleshwarkar VedantMahabaleshwarkar commented Oct 16, 2023

Description

  • structured manfests such that the new operator can fetch the manifests from config/.
  • made it easier to maintain sync with upstream by keeping all manifest overrides in overlays/odh so as to minimize merge conflicts while syncing with upstream.
  • separated out modelmesh manifests from odh-model-controller manifests to enable the opendatahub-operator fetching manifests from each individual repo

Testing

The testing for this PR is to be done along with the testing for this and this PR since they target the same set of overall changes.
The ODH operator image has been built with the following changes :

How Has This Been Tested?

ModelMesh testing

  • cluster login
  • git clone --branch transition [email protected]:VedantMahabaleshwarkar/opendatahub-operator.git
  • cd opendatahub-operator
  • make deploy -e IMG=quay.io/vedantm/opendatahub-operator:version_builtin
  • This will install the ODH operator in NS opendatahub-operator-system with the image quay.io/vedantm/opendatahub-operator:version_builtin
  • create the following DSC in NS opendatahub
cat <<EOF |oc create -f - 
apiVersion: datasciencecluster.opendatahub.io/v1
kind: DataScienceCluster
metadata:
  name: example
  namespace: opendatahub
spec:
  components:
    modelmeshserving:
      managementState: Managed
EOF
  • deploy and test a model with modelmesh

Kserve testing

Note: Cleanup kserve CRDs before testing modelmesh
  • cluster login
  • install ODH Kserve + dependencies stack by following instructions
Note: Only install the dependencies, the ODH operator will be a custom install according to next steps
  • git clone [email protected]:VedantMahabaleshwarkar/opendatahub-operator.git
  • cd opendatahub-operator
  • switch to branch manifests_transition_mlserving
  • make deploy -e IMG=quay.io/vedantm/opendatahub-operator:version_builtin
  • This will install the ODH operator in NS opendatahub-operator-system with the image quay.io/vedantm/opendatahub-operator:version_builtin
  • create the following DSC in NS opendatahub
apiVersion: datasciencecluster.opendatahub.io/v1
kind: DataScienceCluster
metadata:
  name: default
spec:
  components:
    kserve:
      managementState: Managed

Merge criteria:

  • The commits are squashed in a cohesive manner and have meaningful messages.
  • Testing instructions have been added in the PR body (for PRs involving changes that are not immediately obvious).
  • The developer has manually tested the changes and verified that the changes work

@openshift-ci
Copy link

openshift-ci bot commented Oct 16, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: VedantMahabaleshwarkar

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [VedantMahabaleshwarkar]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@Jooho
Copy link

Jooho commented Oct 16, 2023

@VedantMahabaleshwarkar do we need to add these two lines to deploy modelmesh?

  components:
    modelmeshserving:
      managementState: Managed
      devFlags:
        manifests:
          - manifestsFolder: config/overlays/odh
            uri: https://github.com/VedantMahabaleshwarkar/modelmesh-serving/tarball/manifests_overlay
          - manifestsFolder: config
            uri: https://github.com/VedantMahabaleshwarkar/modelmesh-serving/tarball/manifests_overlay

@VedantMahabaleshwarkar
Copy link
Author

Custom ODH operator image was built with the following settings in get_all_manifests.sh for the operator :

["odh-model-controller"]="VedantMahabaleshwarkar:odh-model-controller:manifests_transition_test:config:odh-model-controller"
["model-mesh"]="VedantMahabaleshwarkar:modelmesh-serving:manifests_overlay:config:model-mesh"

@israel-hdez
Copy link

Custom ODH operator image was built with the following settings in get_all_manifests.sh for the operator :

["odh-model-controller"]="VedantMahabaleshwarkar:odh-model-controller:manifests_transition_test:config:odh-model-controller"
["model-mesh"]="VedantMahabaleshwarkar:modelmesh-serving:manifests_overlay:config:model-mesh"

I echo Jooho's comment.
Isn't it possible to test using devFlags in the operator? Or is there a specific reason for using your build?

By using your build, I feel like I'm not testing this PR, but potentially something completely different.

@israel-hdez
Copy link

In general and when comparing with results from odh-manifests repo, I'm see a different set of metadata.labels. Is this expected?

Also, nodeAffinity was removed from the controller. Is this expected?

  • made it easier to maintain sync with upstream by keeping all manifest overrides in overlays/odh so as to minimize merge conflicts while syncing with upstream.

I'm not fully in favor of copying files. It will reduce conflicts, but it will also reduce awareness of customizations when syncing code. If we go forward with this way, please make sure all devs are aware that when doing code syncs upstream changes to config/ must be checked against our overrides.

@VedantMahabaleshwarkar
Copy link
Author

I echo Jooho's comment.
Isn't it possible to test using devFlags in the operator? Or is there a specific reason for using your build?
By using your build, I feel like I'm not testing this PR, but potentially something completely different.

@israel-hdez not possible to test without changes to the operator, which are in the operator PR. Discussed this with the operator team today as well and because of the nature of the changes, these cannot be tested with devFlags

@VedantMahabaleshwarkar
Copy link
Author

In general and when comparing with results from odh-manifests repo, I'm see a different set of metadata.labels. Is this expected?

Earlier, somehow the metadatalabeltransformer was only applied on some sections of the manifests and not others, not sure why that was. Now is is uniform and all the generated resources will have consistent labels. So yes the few extra labels you might see in a comparison are expected

@Jooho
Copy link

Jooho commented Oct 18, 2023

Modelmesh case

  • I can deploy odh-model-controller and modelmesh-controller
    odh-model-controller: quay.io/vedantm/odh-model-controller:remove-kserve-flag
    modelmesh-controller: quay.io/opendatahub/modelmesh-controller:fast
    ==> Could you please explain why quay.io/vedantm/odh-model-controller:remove-kserve-flag is used?
    ==> Do we need to update PR to change this image tag?
  • The replica of modelmesh-controller is just 1.
    • Have you differed between original manifests and new manifests?
  • sample model is deployed and works fine
  • when I changed the Managed to Removed, odh-model-controller was not uninstalled. @VaishnaviHire I think this would be operator issue. could you please take a look at this?

KServe case

  • I can deploy odh-model-controller and modelmesh-controller
    odh-model-controller: quay.io/vedantm/odh-model-controller:remove-kserve-flag
    kserve-controller: quay.io/opendatahub/kserve-controller:v0.11.0
    ==> The same question with modelmesh case "Could you please explain why quay.io/vedantm/odh-model-controller:remove-kserve-flag is used?"
    ==> Do we need to update PR to change this image tag?
  • sample model is deployed and works fine
  • Same issue - when I changed the Managed to Removed, odh-model-controller was not uninstalled. @VaishnaviHire I think this would be operator issue. could you please take a look at this?

@Jooho
Copy link

Jooho commented Oct 18, 2023

Earlier, somehow the metadatalabeltransformer was only applied on some sections of the manifests and not others, not sure why that was. Now is is uniform and all the generated resources will have consistent labels. So yes the few extra labels you might see in a comparison are expected

No, we must make it the same. The label difference can stop the upgrade

@VedantMahabaleshwarkar
Copy link
Author

@Jooho

"Could you please explain why quay.io/vedantm/odh-model-controller:remove-kserve-flag is used?"

quay.io/vedantm/odh-model-controller:remove-kserve-flag this image contains the code changes from opendatahub-io/odh-model-controller#92

@israel-hdez
Copy link

not possible to test without changes to the operator, which are in the operator PR.

OK, I will do a review of all together. What's the operator PR? The one in the description is closed.

@VedantMahabaleshwarkar
Copy link
Author

@israel-hdez

What's the operator PR?

opendatahub-io/opendatahub-operator#639

Signed-off-by: Vedant Mahabaleshwarkar <[email protected]>
@VedantMahabaleshwarkar
Copy link
Author

@israel-hdez corrected the label mismatch

Signed-off-by: Vedant Mahabaleshwarkar <[email protected]>
@israel-hdez
Copy link

Similarly to the odh-model-controller PR, there is a ClusterRoleBinding that is being renamed here:

modelmesh-controller-rolebinding vs modelmesh-controller-rolebinding-opendatahub

Signed-off-by: Vedant Mahabaleshwarkar <[email protected]>
@openshift-ci
Copy link

openshift-ci bot commented Oct 19, 2023

@VedantMahabaleshwarkar: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/fvt-odh-manifests 73b2399 link true /test fvt-odh-manifests

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@Jooho Jooho merged commit a22619c into opendatahub-io:main Oct 19, 2023
3 of 5 checks passed
@Jooho
Copy link

Jooho commented Oct 19, 2023

fvt failed because of transition manifests so I manually merged and update images

  • quay.io/opendatahub/modelmesh-controller:fast-73b2399
  • quay.io/opendatahub/modelmesh-controller:fast

VedantMahabaleshwarkar added a commit to VedantMahabaleshwarkar/modelmesh-serving that referenced this pull request Oct 19, 2023
* Use config/ as the manifests source for ODH operator v2

Signed-off-by: Vedant Mahabaleshwarkar <[email protected]>

* move all changes into an overlay

Signed-off-by: Vedant Mahabaleshwarkar <[email protected]>

* restructure manifests so that kustomize does not need LoadRestrictorNone flag

Signed-off-by: Vedant Mahabaleshwarkar <[email protected]>

* correct label mismatch

Signed-off-by: Vedant Mahabaleshwarkar <[email protected]>

* final modifications

Signed-off-by: Vedant Mahabaleshwarkar <[email protected]>

* remove openvino variable as it is unused

Signed-off-by: Vedant Mahabaleshwarkar <[email protected]>

---------

Signed-off-by: Vedant Mahabaleshwarkar <[email protected]>
VedantMahabaleshwarkar added a commit to VedantMahabaleshwarkar/modelmesh-serving that referenced this pull request Oct 19, 2023
* Use config/ as the manifests source for ODH operator v2

Signed-off-by: Vedant Mahabaleshwarkar <[email protected]>

* move all changes into an overlay

Signed-off-by: Vedant Mahabaleshwarkar <[email protected]>

* restructure manifests so that kustomize does not need LoadRestrictorNone flag

Signed-off-by: Vedant Mahabaleshwarkar <[email protected]>

* correct label mismatch

Signed-off-by: Vedant Mahabaleshwarkar <[email protected]>

* final modifications

Signed-off-by: Vedant Mahabaleshwarkar <[email protected]>

* remove openvino variable as it is unused

Signed-off-by: Vedant Mahabaleshwarkar <[email protected]>

---------

Signed-off-by: Vedant Mahabaleshwarkar <[email protected]>
Jooho pushed a commit that referenced this pull request Oct 19, 2023
* Manifest readiness for Operator v2 (#237)

* Use config/ as the manifests source for ODH operator v2

Signed-off-by: Vedant Mahabaleshwarkar <[email protected]>

* move all changes into an overlay

Signed-off-by: Vedant Mahabaleshwarkar <[email protected]>

* restructure manifests so that kustomize does not need LoadRestrictorNone flag

Signed-off-by: Vedant Mahabaleshwarkar <[email protected]>

* correct label mismatch

Signed-off-by: Vedant Mahabaleshwarkar <[email protected]>

* final modifications

Signed-off-by: Vedant Mahabaleshwarkar <[email protected]>

* remove openvino variable as it is unused

Signed-off-by: Vedant Mahabaleshwarkar <[email protected]>

---------

Signed-off-by: Vedant Mahabaleshwarkar <[email protected]>

* change image tags to 0.11.1

Signed-off-by: Vedant Mahabaleshwarkar <[email protected]>

---------

Signed-off-by: Vedant Mahabaleshwarkar <[email protected]>
@VedantMahabaleshwarkar VedantMahabaleshwarkar linked an issue Oct 26, 2023 that may be closed by this pull request
1 task
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[STORY] ModelMesh manifests transition readiness for ODH v2
3 participants