Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable configuration of New Notebook UI #345

Merged
merged 14 commits into from
Apr 18, 2024

Conversation

ca-scribner
Copy link
Contributor

@ca-scribner ca-scribner commented Apr 6, 2024

This commit adds configuration for:

  • the default number of GPUs and the GPUs available
  • the default PodDefaults selected
  • the Toleration configurations available
  • the Affinity configurations available

These configurations are enabled through newly exposed charm configs. These configs are lightly validated to ensure they're valid yaml, but not validated enough to ensure things like Tolerations or Affinities are proper Kubernetes yaml

Closes kubeflow/kubeflow#336.

Reviewer notes

Code is ready to review, but I still have to flesh out the readme (that is bookkept separately in kubeflow/kubeflow#337). Example config inputs for testing are:

For testing, I build my environment with:

# Test env:
juju deploy kubeflow-dashboard --channel latest/edge --trust
juju deploy kubeflow-profiles --channel latest/edge --trust 
juju relate kubeflow-profiles kubeflow-dashboard


# So the dashboard is provided the `kubeflow-userid` header, and the notebook page is served within the kubeflow dashboard
juju deploy istio-pilot --channel latest/edge --config default-gateway=kubeflow-gateway --trust
juju deploy istio-gateway --config kind=ingress --channel latest/edge --trust istio-ingressgateway
juju relate istio-pilot istio-ingressgateway

juju deploy dex-auth --channel latest/edge --trust
juju deploy oidc-gatekeeper --channel latest/edge

juju relate dex-auth:oidc-client  oidc-gatekeeper:oidc-client
juju relate istio-pilot:ingress dex-auth:ingress

juju config dex-auth public-url="http://10.64.140.43.nip.io" 
juju config oidc-gatekeeper public-url="http://10.64.140.43.nip.io"
juju config dex-auth static-username=user
juju config dex-auth static-password=user

juju relate istio-pilot:ingress kubeflow-dashboard:ingress
juju relate istio-pilot:ingress oidc-gatekeeper:ingress
juju relate istio-pilot:ingress-auth oidc-gatekeeper:ingress-auth

# for permissions
juju deploy kubeflow-roles --channel 1.8/stable --trust
# for poddefaults
juju deploy admission-webhook --trust --channel 1.8/stable

# in case we want to actually deploy a notebook
juju deploy jupyter-controller --channel latest/edge --trust

# Log into the dashboard, enter a username, and let the profile create.  Then 
# create these poddefaults in that namespace
kubectl create -f sample_poddefaults.yaml -n user

# Application under test:
juju deploy ./*.charm --resource oci-image=docker.io/kubeflownotebookswg/jupyter-web-app:v1.8.0 --trust

juju relate kubeflow-dashboard:links jupyter-ui:dashboard-links
juju relate istio-pilot:ingress jupyter-ui:ingress

juju config jupyter-ui affinity-options=@sample_affinity_config.yaml
juju config jupyter-ui gpu-vendors=@sample_gpu_vendors.yaml
juju config jupyter-ui tolerations-options=@sample_tolerations_config.yaml

sample_gpu_vendors.yaml:

- {limitsKey: nvidia.com/gpu, uiName: NVIDIA}
- {limitsKey: amd.com/gpu, uiName: AMD}
- {limitsKey: andrew.com/gpu, uiName: AndrewPU}

sample_affinity_config.yaml:

- configKey: "az_us-east1"
  displayName: "Availability Zone us-east1"
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: topology.kubernetes.io/zone
            operator: In
            values:
            - us-east1
- configKey: "az_us-west1"
  displayName: "Availability Zone us-west1"
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: topology.kubernetes.io/zone
            operator: In
            values:
            - us-west1

sample_tolerations_config.yaml:

- groupKey: "group_1"
  displayName: "4 CPU 8Gb Mem at ~$0.50 USD per day"
  tolerations:
    - key: "dedicated"
      operator: "Equal"
      value: "kubeflow-c5.xlarge"
      effect: "NoSchedule"
- groupKey: "group_2"
  displayName: "8 CPU 16Gb Mem at ~$1.2 USD per day"
  tolerations:
    - key: "dedicated"
      operator: "Equal"
      value: "kubeflow-c5.xxlarge"
      effect: "NoSchedule"

sample_poddefaults.yaml:

apiVersion: kubeflow.org/v1alpha1
kind: PodDefault
metadata:
  name: add-s3-credentials
spec:
  desc: Add default S3 credentials
  selector:
    matchLabels:
      not-a-real-example: "true"
  env:
    - name: NOT_A_REAL_EXAMPLE
      value: /
---
apiVersion: kubeflow.org/v1alpha1
kind: PodDefault
metadata:
  name: add-mlflow-credentials
spec:
  desc: Add default MLflow credentials
  selector:
    matchLabels:
      not-a-real-example: "true"
  env:
    - name: NOT_A_REAL_EXAMPLE
      value: /
---
apiVersion: kubeflow.org/v1alpha1
kind: PodDefault
metadata:
  name: configure-vault
spec:
  desc: Configure Vault for this user
  selector:
    matchLabels:
      not-a-real-example: "true"
  env:
    - name: NOT_A_REAL_EXAMPLE
      value: /

@ca-scribner ca-scribner requested a review from a team as a code owner April 6, 2024 16:41
This commit adds configuration for:
* the default number of GPUs and the GPUs available
* the default PodDefaults selected
* the Toleration configurations available
* the Affinity configurations available

These configurations are enabled through newly exposed charm configs.  These configs are lightly validated to ensure they're valid yaml, but not validated enough to ensure things like Tolerations or Affinities are proper Kubernetes yaml
@ca-scribner ca-scribner force-pushed the KF-5434-enable-spawner-ui-configuration branch from 57be5ca to 5c2f1d4 Compare April 6, 2024 16:48
Copy link
Contributor

@orfeas-k orfeas-k left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, I left some commments + tests seem to still fail.

charms/jupyter-ui/src/charm.py Outdated Show resolved Hide resolved
charms/jupyter-ui/src/charm.py Outdated Show resolved Hide resolved
charms/jupyter-ui/src/charm.py Outdated Show resolved Hide resolved
charms/jupyter-ui/src/charm.py Outdated Show resolved Hide resolved
charms/jupyter-ui/src/charm.py Outdated Show resolved Hide resolved
charms/jupyter-ui/src/charm.py Outdated Show resolved Hide resolved
charms/jupyter-ui/tests/integration/test_charm.py Outdated Show resolved Hide resolved
charms/jupyter-ui/tests/unit/test_operator.py Outdated Show resolved Hide resolved
Previously, _get_image_config incorrectly passed inputs of "True", ""a string"", and "{}" as allowed inputs even though they were not acceptable.  This commit fixes these validation errors and improves the tests around them
Renames _render_jwa_file_with_images_config to _render_jwa_spawner_inputs to reflect its broader scope, and makes the method static
…escriptive variable names

Renames the variables within _get_options_with_default_from_config to be more descriptive of their content
@ca-scribner
Copy link
Contributor Author

ca-scribner commented Apr 16, 2024

Note to reviewers: due to this upstream bug, the jwa expects the default PodDefaults to be picked using the key of their first matchLabels entry, not their name like mentioned in the spawner_ui_config.yaml comments. I think we've implemented everything here, but the input is being interpreted in the jwa differently than documented there

Copy link
Contributor

@orfeas-k orfeas-k left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good job, a slight comment. Other than that, LGTM!

charms/jupyter-ui/src/charm.py Outdated Show resolved Hide resolved
@ca-scribner ca-scribner merged commit 3ec71e9 into main Apr 18, 2024
14 checks passed
@ca-scribner ca-scribner deleted the KF-5434-enable-spawner-ui-configuration branch April 18, 2024 17:31
ca-scribner added a commit that referenced this pull request May 1, 2024
(cherry pick of 3ec71e9 from main)

* Enable configuration of New Notebook UI

This commit adds configuration for:
* the default number of GPUs and the GPUs available
* the default PodDefaults selected
* the Toleration configurations available
* the Affinity configurations available

These configurations are enabled through newly exposed charm configs.  These configs are lightly validated to ensure they're valid yaml, but not validated enough to ensure things like Tolerations or Affinities are proper Kubernetes yaml

(cherry picked from commit 3ec71e9)
kimwnasptd pushed a commit that referenced this pull request May 2, 2024
* Enable configuration of New Notebook UI (#345)

(cherry pick of 3ec71e9 from main)

* Enable configuration of New Notebook UI

This commit adds configuration for:
* the default number of GPUs and the GPUs available
* the default PodDefaults selected
* the Toleration configurations available
* the Affinity configurations available

These configurations are enabled through newly exposed charm configs.  These configs are lightly validated to ensure they're valid yaml, but not validated enough to ensure things like Tolerations or Affinities are proper Kubernetes yaml

(cherry picked from commit 3ec71e9)

* Fix spawner_ui_config.yaml to use correct null values for configurations (#361)

Previously, the spawner_ui_config.yaml was rendered with empty strings and lists being rendered as null values in the configuration file.  For example if the GPU vendors list was empty and the default vendor was `""`,  the config file would have (shown truncated):

```
  gpus:
    value:
      vendor:

      vendors:
```

whereas jupyter web app expected:
```
  gpus:
    value:
      vendor: ""

      vendors: []
```

This commit updates the template to ensure we always output the correct empty values.

Closes #360

(cherry picked from commit 211a37b)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Jupyter pod stuck at ContainerCreating when spawning
2 participants