Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable Nvidia NIM as an application #2959

Merged
merged 1 commit into from
Sep 16, 2024

Conversation

xieshenzh
Copy link
Contributor

@xieshenzh xieshenzh commented Jun 26, 2024

Description

Create an OdhApplication CR to add a tile for the Nvidia NIM application.
Support enabling the NIM application with a Nvidia ngc api key.
Create a CronJob to validate the api key and enable the NIM application.

Screenshot 2024-06-25 at 5 06 16 PM

How Has This Been Tested?

Negative scenario:

  1. Enter an invalid ngc api key
  2. Check if the Job for validating the api key fails
  3. Check if the NIM application is not enabled
  4. Check if the CronJob (executed daily) for validating the api key fails
  5. Check if the Secret for pulling NIM images is not created
  6. Check if the ConfigMap that stores NIM images data is not created

Positive scenario:

  1. Enter a valid ngc api key
  2. Check if the Job for validating the api key succeeds
  3. Check if the NIM application is enabled
  4. Check if the CronJob (executed daily) for validating the api key succeeds
  5. Check if the Secret for pulling NIM images is created correctly
  6. Check if the ConfigMap that stores NIM images data is created correctly

Test Impact

No impact on existing code

Request review criteria:

Self checklist (all need to be checked):

  • The developer has manually tested the changes and verified that the changes work
  • Commits have been squashed into descriptive, self-contained units of work (e.g. 'WIP' and 'Implements feedback' style messages have been removed)
  • Testing instructions have been added in the PR body (for PRs involving changes that are not immediately obvious).
  • The developer has added tests or explained why testing cannot be added (unit or cypress tests for related changes)

If you have UI changes:

  • Included any necessary screenshots or gifs if it was a UI change.
  • Included tags to the UX team if it was a UI/UX change (find relevant UX in the SMEs section).

After the PR is posted & before it merges:

  • The developer has tested their solution on a cluster by using the image produced by the PR to main

@openshift-ci openshift-ci bot requested review from dpanshug and mturley June 26, 2024 21:18
@openshift-ci openshift-ci bot added the needs-ok-to-test The openshift bot needs to label PRs from non members to avoid strain on the CI label Jun 26, 2024
Copy link
Contributor

openshift-ci bot commented Jun 26, 2024

Hi @xieshenzh. Thanks for your PR.

I'm waiting for a opendatahub-io member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@xieshenzh
Copy link
Contributor Author

@andrewballantyne Please take a look this PR which adds NIM as application. Thanks.

@@ -12,3 +12,4 @@ resources:
- ./pachyderm
- ./watson-x
- ./rhoai
- ./nvidia-nim
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we know if this is a partner connection? I'd imagine this is only for RHOAI and not for ODH by default 🤔

If so, is it for Managed or Self Managed or both?

We are reworking our manifest folder but currently it's hard to navigate. Answers to these questions will help where to put it. I'd imagine though, you want to find the anaconda stuff and place this effort next to that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On a side note, we may want to look at not directly including it (in the kustomization of where you want to deploy it) and manually installing it for the short term while we work out everything else.

We don't have a flag for disabling manifest files unfortunately. But we could build an overlay that includes it and you could use the DSC devFlags 🤔 Give me a moment to see if I can pen that together.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mpaulgreen Could you please answer Andrew's questions? Thanks.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xieshenzh Please reach out to PM for this. I think it will be Adam. Create this query in Pending Questions doc.

Copy link
Member

@andrewballantyne andrewballantyne Jul 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, so we just reworked the manifest files directory to make it easier to maintain (you got some conflicts) -- Do we have an answer on this front yet where PM wants this inclusion?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andrewballantyne I have rebased.
We haven't got the answers to the questions in this PR.
But it was confirmed earlier by Adam: NIM will be a base offering, not an add-on.

@andrewballantyne
Copy link
Member

/ok-to-test

@openshift-ci openshift-ci bot added ok-to-test The openshift bot needs `ok-to-test` to allow non member PRs to run the tests. and removed needs-ok-to-test The openshift bot needs to label PRs from non members to avoid strain on the CI labels Jun 27, 2024
@andrewballantyne
Copy link
Member

You have conflicts, @xieshenzh. Please rebase (not merge) -- we want to get down to 1 commit when we are done and it will be easier in the long run if you rebase.

@openshift-merge-robot openshift-merge-robot added the needs-rebase PR needs to be rebased label Jul 2, 2024
@openshift-merge-robot openshift-merge-robot removed the needs-rebase PR needs to be rebased label Jul 2, 2024
Copy link
Member

@andrewballantyne andrewballantyne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Waiting for a temporary cluster to test installation on -- only initial comment I have is making sure we get everything deployed

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'll need to add this next to the anaconda-ce-validator-cron.yaml in the ./kustomization.yaml otherwise it won't be deployed to the cluster.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I've added the yamls to the kustomization files.

Previously, the lines were removed to unblock the PR: #2959 (comment)

Copy link
Member

@andrewballantyne andrewballantyne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wasn't able to test this -- cluster bot failed me. But I don't want to hold things up while I am off tomorrow. Their cluster looked good from a "post execution" point of view. I'll test this more next week.

@openshift-ci openshift-ci bot added the lgtm label Sep 12, 2024
Copy link
Contributor

openshift-ci bot commented Sep 12, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: andrewballantyne

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link
Contributor

openshift-ci bot commented Sep 12, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: andrewballantyne

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@andrewballantyne
Copy link
Member

/retest

@andrewballantyne
Copy link
Member

/test images

@andrewballantyne
Copy link
Member

Miscommunication with @xieshenzh -- he will merge his other PRs together (and not with this one)... approving again based on my approval message.

@andrewballantyne
Copy link
Member

/lgtm

@xieshenzh lets not do anymore updates and get this in. Seems it missed the boat on Thursday.

@openshift-ci openshift-ci bot added the lgtm label Sep 16, 2024
Copy link

codecov bot commented Sep 16, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 85.32%. Comparing base (22be73c) to head (3dab16e).
Report is 2 commits behind head on main.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #2959      +/-   ##
==========================================
+ Coverage   85.26%   85.32%   +0.05%     
==========================================
  Files        1270     1270              
  Lines       27900    27900              
  Branches     7422     7422              
==========================================
+ Hits        23790    23805      +15     
+ Misses       4110     4095      -15     

see 6 files with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 22be73c...3dab16e. Read the comment docs.

@openshift-merge-bot openshift-merge-bot bot merged commit f046e94 into opendatahub-io:main Sep 16, 2024
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved lgtm ok-to-test The openshift bot needs `ok-to-test` to allow non member PRs to run the tests.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants