Skip to content
This repository has been archived by the owner on Jan 31, 2024. It is now read-only.

feat: Ray cluster deployment with ODH Jupyterhub #573

Closed
wants to merge 1 commit into from

Conversation

harshad16
Copy link
Member

feat: Ray cluster deployment with ODH Jupyterhub
Signed-off-by: Harshad Reddy Nalla [email protected]

Description:

@openshift-ci
Copy link

openshift-ci bot commented Jun 14, 2022

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: harshad16
To complete the pull request process, please assign vaishnavihire after the PR has been reviewed.
You can assign the PR to them by writing /assign @vaishnavihire in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@harshad16 harshad16 requested a review from LaVLaS June 14, 2022 22:22
Copy link
Member

@anishasthana anishasthana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we be tying Ray in as an overlay for other components?
I think scoping this to just deploying Ray would make more sense, no?

We can then create an overlay for deployment of ray custom images, or w/e

@@ -94,6 +94,10 @@ Contains build chain manifest for CUDA 11.0.3 enabled ubi 8 based images with py

*NOTE:* Builds in this overlay require 4-6 GB of memory

#### [odh-ray-cluster](notebook-images/overlays/odh-ray-cluster/)

Contains deployment manifests for Ray setup with ODH jupyterhub.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think with our move towrads kfnbc as the default notebook provider we should probably gear Ray towards nbc as well. @crobby / @LaVLaS wdyt?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree but we need to define/document how to do that the KFNBC

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@anishasthana @LaVLaS

Currently we are using a Single User Profile as part of our Ray /Jupyterhub deployment that spawns a Ray cluster for the user when they start the "ray notebook image" from the spawner page. Is it possible to do something similar with KFNBC?

Is there an analogues spawner page UI with KFNBC right now? or is it still done by oc apply?

containers:
- name: ray-operator
imagePullPolicy: Always
image: 'quay.io/thoth-station/ray-operator:v0.1.2'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the image be in the Open Data Hub organization?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(even if it's just a mirror of the image in Thoth-station)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It just that, if we keep open-data-org image in here, people facing issue would have to look for the image in here.
i m happy to make the change, if that is way we want to go.

@@ -0,0 +1,4320 @@
apiVersion: apiextensions.k8s.io/v1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So with this we're effectively going to continue requiring cluster admin for the ODH operator if you want to install Ray.

Would it instead be better to have these CRDs installed by the user before deployment of Ray artifacts?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@LaVLaS I'm thinking about our previous discussion where we said ODH wants to get out of the business of installing operators, instead focusing on deploying custom resources for already installed operators

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ODH will no longer install operators that are available in OperatorHub. After doing a quick check, I don't see any OperatorHub packageManifest so this manual operator install would be the only option for now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, this is not available as the operator on operatorhub.

@LaVLaS
Copy link
Contributor

LaVLaS commented Jun 28, 2022

@harshad16 Thx for providing this. Since we have plans to deprecate JupyterHub and migrate to KFNBC, can you separate the Ray operator into a standalone manifest so we can use the deployment with JH or KFNBC.

@harshad16
Copy link
Member Author

Since we have plans to deprecate JupyterHub and migrate to KFNBC, can you separate the Ray operator into a standalone manifest so we can use the deployment with JH or KFNBC.

Thanks for the review @LaVLaS
Should i move the ray operator manifest to the base? or some other director?

@LaVLaS
Copy link
Contributor

LaVLaS commented Jun 29, 2022

Since we have plans to deprecate JupyterHub and migrate to KFNBC, can you separate the Ray operator into a standalone manifest so we can use the deployment with JH or KFNBC.

Thanks for the review @LaVLaS Should i move the ray operator manifest to the base? or some other director?

The base repo will work.

Also, is this ray-operator manifest customized in any way from the helm-chart to run on OCP?

Copy link
Contributor

@LaVLaS LaVLaS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before we start the official review, we'll need some basic tests to verify that it deploys successfully and a simple "hello world" test

@MichaelClifford
Copy link
Contributor

Also, is this ray-operator manifest customized in any way from the helm-chart to run on OCP.

The operator deployment is slightly different then the helm-chart, but not sure if the changes were OCP specific or specific to the thoth images were using. @erikerlandson would know.

@harshad16
Copy link
Member Author

Based on the conversation its seem we would need to make adjustment for ray to function on notebook controller and jupyterhub.
@MichaelClifford would be taking forwards this work in a different PR.
Thanks everyone for the help and thoughts 💯

@harshad16 harshad16 closed this Jul 29, 2022
@harshad16 harshad16 deleted the include-ray branch August 1, 2023 15:56
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants