Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Install JupyterHub in the cern-vre cluster via tf helm provider #21

Closed
17 tasks done
goseind opened this issue Dec 1, 2022 · 20 comments
Closed
17 tasks done

Install JupyterHub in the cern-vre cluster via tf helm provider #21

goseind opened this issue Dec 1, 2022 · 20 comments
Assignees
Labels
cern-vre-infra Things only related and depandant on our team component/images Container images for jhub profiles component/jhub enhancement New feature or request priority/critical Needs to be done very soon

Comments

@goseind
Copy link
Member

goseind commented Dec 1, 2022

Information

Questions

Tasks

  1. 2 of 2
    cern-vre-infra component/images component/jhub
    garciagenrique goseind
@goseind goseind self-assigned this Dec 1, 2022
@goseind goseind removed the terraform label Jan 20, 2023
@goseind goseind removed their assignment Jan 20, 2023
@goseind goseind added enhancement New feature or request component/jhub priority/critical Needs to be done very soon labels Jan 20, 2023
@goseind
Copy link
Member Author

goseind commented Jan 30, 2023

Why did the ESCAPE team decide to go with singleuser.storage.static instead of dynamically provisioned PVCs through a storage class and how does Jhub provide individual storage with a static PVC?

IMG_20230130_172300.jpg

tag @garciagenrique

@garciagenrique
Copy link
Member

  • JupyterHub was installed following the zero2jupyterhub documentation and using helm charts.

    • The config.yaml file is empty so that z2jh starts with the default values.
    • However, the deployment won't succeed because there is no storage assigned to the cluster. See below:
  • Currently the HUB pod is starting (Pending mode) with the following error

|   Type     Reason            Age                  From               Message                                    │
│   ----     ------            ----                 ----               -------                                    │
│   Warning  FailedScheduling  19m (x309 over 26h)  default-scheduler  0/3 nodes are available: 3 pod has unbound │
│  immediate PersistentVolumeClaims. preemption: 0/3 nodes are available: 3 Preemption is not helpful for schedul │
│ ing.

This error is connected with the problem @goseind is stressing above: why static vs dynamic storage for the cluster.

@garciagenrique
Copy link
Member

The following commands where used for the deployment (helm needs to be installed, of course)

$ helm repo add jupyterhub https://jupyterhub.github.io/helm-chart/
$ helm repo update

To install/updated, just modify the config.yaml file, save it and run

$ helm upgrade --cleanup-on-fail \
  --install z2jh jupyterhub/jupyterhub \
  --namespace jupyterhub \
  --create-namespace \
  --version=2.0.0 \
  --values config.yaml

The des installation of the helm chart does not work really well. The way it's currently done is by logging into k9s, selecting the jupyterhub (or whoever it was called), and erasing it completely.

@goseind goseind self-assigned this Feb 6, 2023
@goseind
Copy link
Member Author

goseind commented Feb 6, 2023

I have requested a quota change for the number of shares, so we can use dynamically provisioned storage.

@goseind
Copy link
Member Author

goseind commented Feb 6, 2023

Done, quota updated to 200 shares.

Image

goseind added a commit that referenced this issue Feb 6, 2023
goseind added a commit that referenced this issue Feb 6, 2023
goseind added a commit that referenced this issue Feb 6, 2023
goseind added a commit that referenced this issue Feb 6, 2023
goseind added a commit that referenced this issue Feb 6, 2023
@goseind
Copy link
Member Author

goseind commented Feb 6, 2023

JupyterHub is running under http://137.138.226.35 (IP subject to change). Current status:

tf commands:

terraform apply -target=kubernetes_namespace_v1.ns_jupyterhub
terraform apply -target=kubernetes_storage_class_v1.sc_manila-meyrin-cephfs # sc with Delete policy did not exist yet
terraform destroy -target=helm_release.jupyterhub-chart

@garciagenrique see if you can follow my changes, the extra values I set are described in the customization docs. As I have no idea why the value does not get set, I'll have to wait for an answer in the Forum..

@goseind
Copy link
Member Author

goseind commented Feb 7, 2023

My workaround solution by setting the fsGroup to 0 in order to use root access didn't work as then the following error occurs: Running as root is not recommended. Use --allow-root to bypass. This has been a problem for other users too, see: jupyterhub/zero-to-jupyterhub-k8s#562 or jupyterhub/zero-to-jupyterhub-k8s#2177

Also asking the SWAN team how they worked around this ref.: https://github.com/swan-cern/swan-charts/blob/master/swan/values.yaml#L22

Another idea would be to use extraConfig to directly modify KubeSpawner.

  • Delete PVCs used fo testing

@goseind
Copy link
Member Author

goseind commented Feb 10, 2023

To solve the issue with the extraPodConfig not being set, I created a bug report in the JupyterHub repo: jupyterhub/zero-to-jupyterhub-k8s#3021

@garciagenrique
Copy link
Member

@goseind could we just not set up a podSecurityContext for the moment, finish configuring JHub, and later implement this ?

goseind added a commit that referenced this issue Feb 13, 2023
goseind added a commit that referenced this issue Feb 13, 2023
goseind added a commit that referenced this issue Feb 13, 2023
@goseind
Copy link
Member Author

goseind commented Feb 13, 2023

Set a meeting with Diogo from SWAN for next week Monday and check their storage configuration with eosxd: https://gitlab.cern.ch/kubernetes/automation/charts/cern/-/tree/master/eosxd

goseind added a commit that referenced this issue Feb 13, 2023
goseind added a commit that referenced this issue Feb 13, 2023
goseind added a commit that referenced this issue Feb 13, 2023
@goseind goseind reopened this Feb 13, 2023
@goseind
Copy link
Member Author

goseind commented Feb 13, 2023

For an update see #34, the service is now reachable from within CERN but needs further configuration, as listed in the issue description.
@garciagenrique can you add a PR with the cvmfs configuration?

goseind added a commit that referenced this issue Feb 14, 2023
goseind added a commit that referenced this issue Feb 14, 2023
@goseind goseind added cern-vre-infra Things only related and depandant on our team component/images Container images for jhub profiles labels Feb 14, 2023
@garciagenrique
Copy link
Member

The base image of the ESCAPE-VRE is this one: https://gitlab.cern.ch/escape-wp2/docker-images/-/tree/master/datalake-singleuser
There are some features that could be improved (maybe python version and latests rucio client versions ?), but all the configuration of how to add the rucio-jupyterlab plugin is linked here.

I suggest either start with this image, either based on it.

@wiegerthefarmer
Copy link

@goseind could we just not set up a podSecurityContext for the moment, finish configuring JHub, and later implement this ?

is there a solution to this? I've tried everything to get fsGroupChangePolicy: "OnRootMismatch" set. Nothing gets passed to the pod. Only setting the values in a yaml that is used to start the pod works. But nothing in the spawner works.

@goseind
Copy link
Member Author

goseind commented Apr 17, 2023

@goseind could we just not set up a podSecurityContext for the moment, finish configuring JHub, and later implement this ?

is there a solution to this? I've tried everything to get fsGroupChangePolicy: "OnRootMismatch" set. Nothing gets passed to the pod. Only setting the values in a yaml that is used to start the pod works. But nothing in the spawner works.

So far we haven't found a solution either, some values get set through the YAML values, but for the ones we need not, setting them with the extra config script doesn't work either. We need to further debug this.

@goseind
Copy link
Member Author

goseind commented Jun 5, 2023

  • Adjust the K8s network policy of JHub for URLs to work (optional?)
  • use Lets Encrypt cert with cert-manager
  • Change single-user storage to share once the fsGroup issue is solved
  • rebuild main single-user images on GH (see my example repo)
  • Move deployment to ArgoCD once set up
  • Redo EOS FUSE mount with CERN-provided image or rebuild the old one here (use private image as the keytab needs to stay secret), ref. to: Enable EOS mount OpenStack #130

@goseind
Copy link
Member Author

goseind commented Jun 7, 2023

@garciagenrique I now merged the config, from my side this looks fine. Do you want to take charge of redoing the images, also the EOS image? References can be found in the config of the images I was using/creating.

@goseind
Copy link
Member Author

goseind commented Aug 21, 2023

I think this issue could be closed and the remaining tasks split up into separate tasks as they are not directly linked to the initial goal of this issue. What do you think @garciagenrique ?

@garciagenrique
Copy link
Member

I agree @goseind, although this issue become too big and touched plenty of subjects...

Could we start in this thread an interaction to summary the remaining tasks ?

@goseind goseind removed their assignment Sep 22, 2023
@goseind
Copy link
Member Author

goseind commented Sep 22, 2023

In my opinion, this topic can now be closed as all the tasks have been completed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cern-vre-infra Things only related and depandant on our team component/images Container images for jhub profiles component/jhub enhancement New feature or request priority/critical Needs to be done very soon
Projects
None yet
Development

No branches or pull requests

3 participants