Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 [BUG]: jupyterlab server fails to spawn due to read-only volume mount #392

Closed
mishaschwartz opened this issue Oct 17, 2023 · 17 comments · Fixed by #481
Closed

🐛 [BUG]: jupyterlab server fails to spawn due to read-only volume mount #392

mishaschwartz opened this issue Oct 17, 2023 · 17 comments · Fixed by #481
Assignees
Labels
bug Something isn't working

Comments

@mishaschwartz
Copy link
Collaborator

Summary

The jupyterlab server fails spawn when cowbird settings are enabled that mount the public/wps_outputs directory.

Details

A new jupyterlab container will try to mount to the /notebook_dir/public/wps_outputs directory in the jupyterlab container. Docker complains that it cannot mount to that location.

Possibly because it is a read-only bind-mount and the mount location is a nested directory that does not exist on the container (ie. it needs to create /notebook_dir/public before it creates /notebook_dir/public/wps_outputs and it may be creating /notebook_dir/public as read-only as well).

Traceback (in jupyterhub container):

    Traceback (most recent call last):
      File "/usr/local/lib/python3.10/dist-packages/jupyterhub/user.py", line 798, in spawn
        url = await gen.with_timeout(timedelta(seconds=spawner.start_timeout), f)
      File "/usr/local/lib/python3.10/dist-packages/dockerspawner/dockerspawner.py", line 1304, in start
        await self.start_object()
      File "/usr/local/lib/python3.10/dist-packages/dockerspawner/dockerspawner.py", line 1162, in start_object
        await self.docker("start", self.container_id)
      File "/usr/lib/python3.10/concurrent/futures/thread.py", line 58, in run
        result = self.fn(*self.args, **self.kwargs)
      File "/usr/local/lib/python3.10/dist-packages/dockerspawner/dockerspawner.py", line 948, in _docker
        return m(*args, **kwargs)
      File "/usr/local/lib/python3.10/dist-packages/docker/utils/decorators.py", line 19, in wrapped
        return f(self, resource_id, *args, **kwargs)
      File "/usr/local/lib/python3.10/dist-packages/docker/api/container.py", line 1127, in start
        self._raise_for_status(res)
      File "/usr/local/lib/python3.10/dist-packages/docker/api/client.py", line 270, in _raise_for_status
        raise create_api_error_from_http_exception(e) from e
      File "/usr/local/lib/python3.10/dist-packages/docker/errors.py", line 39, in create_api_error_from_http_exception
        raise cls(e, response=response, explanation=explanation) from e
    docker.errors.APIError: 500 Server Error for http+docker://localhost/v1.43/containers/4cea3661f3131dbccb662a9eda2b0e49f8e06a7435ef64680ab510b6d5aeab18/start: Internal Server Error ("failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error mounting "/data/user_workspaces/public/wps_outputs" to rootfs at "/notebook_dir/public/wps_outputs": mkdir /var/lib/docker/overlay2/83b35d1bc9c7db553a0392f0deb855ccc7057e7a52025360be859cb9402d4894/merged/notebook_dir/public/wps_outputs: read-only file system: unknown")

docker version: Docker version 24.0.2, build cb74dfc

Note that this problem goes away if we set the PUBLIC_WORKSPACE_WPS_OUTPUTS_SUBDIR variable to a non-nested directory:

# env.local
PUBLIC_WORKSPACE_WPS_OUTPUTS_SUBDIR=public-wps-outputs

To Reproduce

Steps to reproduce the behavior:

  1. start birdhouse deploy with the cowbird and jupyterhub components enabled
  2. log in to jupyterhub and spawn a new jupyterlab server
  3. inspect the jupyterhub logs for the error message: docker logs -f jupyterhub

Environment

Information Value
Server/Platform URL daccs.cs.toronto.edu
Version Tag/Commit 1.35.0
Related issues/PR
Related components jupyterhub, cowbird
Custom configuration
docker version Docker version 24.0.2, build cb74dfc

Concerned Organizations

@mishaschwartz mishaschwartz added the bug Something isn't working label Oct 17, 2023
@fmigneault
Copy link
Collaborator

@tlvu
Copy link
Collaborator

tlvu commented Oct 18, 2023

I am guessing /notebook_dir/public/ is read-only and possibly causing problems because this sample config has been enabled?

#export JUPYTERHUB_CONFIG_OVERRIDE="
#
# Sample below will allow for sharing notebooks between Jupyter users.
# Note all shares are public.
#
### public-read paths
#
## /data/jupyterhub_user_data/public-share/
#public_read_on_disk = join(jupyterhub_data_dir, 'public-share')
#
## /notebook_dir/public/
#public_read_in_container = join(notebook_dir, 'public')
#
#c.DockerSpawner.volumes[public_read_on_disk] = {
# 'bind': public_read_in_container,
# 'mode': 'ro',
#}
#
### public-share paths
#
## /data/jupyterhub_user_data/public-share/{username}-public
#public_share_on_disk = join(public_read_on_disk, '{username}-public')
#
## /notebook_dir/mypublic
#public_share_in_container = join(notebook_dir, 'mypublic')
#
#c.DockerSpawner.volumes[public_share_on_disk] = {
# 'bind': public_share_in_container,
# 'mode': 'rw',
#}
#
### create dir with proper permissions
#
#def custom_create_dir_hook(spawner):
# username = spawner.user.name
#
# perso_public_share_dir = public_share_on_disk.format(username=username)
#
# for dir_to_create in [public_read_on_disk, perso_public_share_dir]:
# if not os.path.exists(dir_to_create):
# os.mkdir(dir_to_create, 0o755)
#
# subprocess.call(['chown', '-R', '1000:1000', public_read_on_disk])
#
# # call original create_dir_hook() function
# create_dir_hook(spawner)
#
#c.Spawner.pre_spawn_hook = custom_create_dir_hook
#"

This sample config was our poor-man sharing solution between Jupyter users before Cowbird exists so maybe Cowbird can replace that? So maybe we do not have to enable that sharing solution together with Cowbird so they won't clash with each other?

If we need to keep both sharing mechanism and if they actually clash with each other, I think it would be better for Cowbird to bind to /notebook_dir/public-wps-outputs/ so how about changing the default value to avoid surprise for future users?

Just curious about the Cowbird sharing workflow.

Currently, with the poor-man sharing solution, everyone sees the public share of everyone, not configurable by each user.

With Cowbird, I assume each user will decide to which users specifically they want to share. So how do they "enable" this "a la carte" sharing? Via Magpie?

@Nazim-crim
Copy link
Collaborator

@fmigneault, the spawning error in DAC-584 - Error spawning jupyter notebook images is related to the mount of the jupyterlab google-drive extension in .jupyter. Docker automatically set the user to root when you mount a volume on a directory that does not exist on the image. jupyterhub/dockerspawner#453 . Regarding this bug, I think @tlvu is right and it's because a mount is made to a nested directory with the parent being ro.

@fmigneault
Copy link
Collaborator

@tlvu

We would need to adjust the sample config when using Cowbird.
The /notebook_dir/public/ location, along /wps_outputs/public, should be mounted together under ~/public/ for easy access by the user in the spawned docker. Extra /data/jupyterhub_user_data/public-share could be added in there as well if needed. We just need to establish how all these directories should be combined under ~/public/ in the docker.

The general structure for WPS outputs is as follows:

/data/wps_outputs/
   <bird-wps>/
       <output-files>
   weaver/
       public/
           <jobID>/
               <output-files>
       users/
            <user_id>/
               <jobID>/
                   <output-files>

Cowbird understands that WPS-output structure and aligns permissions on the /wpsoutputs endpoint with corresponding files.
When the notebook is started with Cowbird support adding hardlinks, only the public and user-specific WPS outputs are mounted in respective locations that indicate that they are "public" or "my-outputs".

All WPS outputs volumes are purposely mounted with ro since allowing modification to their contents would mean their process results would not guaranteed to be valid anymore (anyone could have modified or deleted them).

@Nazim-crim
Copy link
Collaborator

@mishaschwartz @tlvu Were you able to reproduce this bug? The default config on cowbird already uses a nested directory and I didn't have the error you mentioned. export PUBLIC_WORKSPACE_WPS_OUTPUTS_SUBDIR="public/wps_outputs" . Do you have more steps to reproduce it other than adding ./components/cowbird in EXTRA_CONF_DIRS and the change in env.local?

@tlvu
Copy link
Collaborator

tlvu commented Oct 24, 2023

@mishaschwartz @tlvu Were you able to reproduce this bug? The default config on cowbird already uses a nested directory and I didn't have the error you mentioned. export PUBLIC_WORKSPACE_WPS_OUTPUTS_SUBDIR="public/wps_outputs" . Do you have more steps to reproduce it other than adding ./components/cowbird in EXTRA_CONF_DIRS and the change in env.local?

@Nazim-crim I have not tried to reproduce. Are you saying @mishaschwartz and you end up with different result when trying to reproduce this? This is odd ! Maybe I should try to reproduce myself.

Since I have your attention, how does this workflow work "With Cowbird, I assume each user will decide to which users specifically they want to share. So how do they "enable" this "a la carte" sharing? Via Magpie?"

@tlvu
Copy link
Collaborator

tlvu commented Oct 24, 2023

mounted together under ~/public/

@fmigneault is this inside each jupyter container? This is new to me.

Previously, I understood any new mounts inside jupyter containers should be under /notebook_dir/ since this is the the root dir visible in the left panel of the Jupyter env.

If a new mount is at ~/public/, the user will never see it visually and can only access it via the terminal or code. I that intended to hide it visually?

@fmigneault
Copy link
Collaborator

fmigneault commented Oct 24, 2023

@tlvu
My bad if it wasn't clear. The ~ I used there meant the "current notebook home", or the root dir shown by the jupyter interface. Effectively, the /notebook_dir/ you mention.

@tlvu
Copy link
Collaborator

tlvu commented Oct 25, 2023

@tlvu My bad if it wasn't clear. The ~ I used there meant the "current notebook home", or the root dir shown by the jupyter interface. Effectively, the /notebook_dir/ you mention.

@fmigneault then I am more confused by your comment #392 (comment)

Where does /notebook_dir/public/ and /wps_outputs/public should appear in

/data/wps_outputs/
   <bird-wps>/
       <output-files>
   weaver/
       public/
           <jobID>/
               <output-files>
       users/
            <user_id>/
               <jobID>/
                   <output-files>

@mishaschwartz
Copy link
Collaborator Author

@Nazim-crim

To reproduce the issue:

@fmigneault
Copy link
Collaborator

@tlvu
/notebook_dir/public/ is populated by Cowbird using a combination of sources including /data/wps_outputs/<bird-wps>, /data/wps_outputs/public and /data/wps_outputs/weaver/public. They are not added "blindly". Cowbird checks with Magpie if those locations are marked public (or rather, are not restricted by https://github.com/bird-house/birdhouse-deploy/tree/master/birdhouse/optional-components/secure-data-proxy), and adds the necessary hardlink if they are permitted for anonymous.

The logic was added to handle these combinations for backward compatibility of the existing WPS outputs data structure that assumed a lot of items were fully open.

@tlvu
Copy link
Collaborator

tlvu commented Feb 17, 2024

FYI, I was able to reproduce this problem as well, while trying to test #415.

However, I think it was fixed by #401 because when I set export PUBLIC_WORKSPACE_WPS_OUTPUTS_SUBDIR="publiccowbird", then ./pavics-compose.sh up -d so the jupyterhub container is re-created, I am now able to start the Jupyterlab server.

@mishaschwartz I let you close this issue to confirm weather the fix is fully working.

This allows the Jupyterlab server to start. However I have not confirmed this variable to fully respected by Cowbird and/or Weaver and whether they can function properly with this variable changed from its public default value.

Now that the Jupyterlab can start, I am faced with another problem: all the data from all my existing users under /notebook_dir/writable-workspace has disappeared. This is because without Cowbird enabled, /notebook_dir/writable-workspace is binded to /data/jupyterhub_user_data/$USER, but with Cowbird enabled, /notebook_dir/writable-workspace is binded to /data/user_workspaces/$USER. And by the way I had to manually create /data/user_workspaces/$USER, otherwise Jupyterlab won't start as well. Basically, activating Cowbird with existing Jupyter users is fairly laborious. This probably deserve a separate issue on its own.

@tlvu
Copy link
Collaborator

tlvu commented Feb 17, 2024

Now that the Jupyterlab can start, I am faced with another problem: all the data from all my existing users under /notebook_dir/writable-workspace has disappeared. This is because without Cowbird enabled, /notebook_dir/writable-workspace is binded to /data/jupyterhub_user_data/$USER, but with Cowbird enabled, /notebook_dir/writable-workspace is binded to /data/user_workspaces/$USER. And by the way I had to manually create /data/user_workspaces/$USER, otherwise Jupyterlab won't start as well. Basically, activating Cowbird with existing Jupyter users is fairly laborious. This probably deserve a separate issue on its own.

#425

@tlvu
Copy link
Collaborator

tlvu commented Feb 17, 2024

Currently, with the poor-man sharing solution, everyone sees the public share of everyone, not configurable by each user.

With Cowbird, I assume each user will decide to which users specifically they want to share. So how do they "enable" this "a la carte" sharing? Via Magpie?

@fmigneault @Nazim-crim @ChaamC Not sure if you guys notice my question above since comment #392 (comment)

@fmigneault
Copy link
Collaborator

For Cowbird, it is harder to tell easily if PUBLIC_WORKSPACE_WPS_OUTPUTS_SUBDIR changes are effective. Adding anonymous group permissions to files under a given user-workspace should trigger Magpie/Cowbird webhooks that would lead to hardlink creation to share the corresponding files publicly. Files accessible under /wpsoutptus only for a specific user should then gradually become accessible when not logged in.

For Weaver, you should see the public part of the job result URL become publiccowbird when submitting a job because of the hook:

You can see this Job URL in the last cell output from:
https://github.com/Ouranosinc/pavics-sdi/blob/master/docs/source/notebook-components/weaver_example.ipynb

@fmigneault
Copy link
Collaborator

Currently, with the poor-man sharing solution, everyone sees the public share of everyone, not configurable by each user.
With Cowbird, I assume each user will decide to which users specifically they want to share. So how do they "enable" this "a la carte" sharing? Via Magpie?

The public directories are intentionally open for anyone, as they are attributed Magpie anonymous group.

A user-protected wps-output should have a form similar to:
https://pavics.ouranos.ca/wpsoutputs/weaver/user/THE_USER/91b62b44-fb06-4be9-ad2b-43d5265d0048/output/some-file.txt

Using secure-data-proxy, you should get a Magpie structure as follows:
image

Using the same structure as defined here: #392 (comment) (need to create child resources for the user-specific structure that is desired), you can set individual user/group permissions to each specific sub-dir/file. When user/group permissions are created in Magpie, this will trigger Webhooks, ie:

Cowbird will receive these Webhooks to perform various operations, such as creating handlinks to make corresponding files "visible" by users. The tricky aspect of all this is that the files that you see in the user-workspace do not themselves get attributed Magpie permissions. There is no Magpie "user-workspace" service. Instead, files placed in the workspace are mapped to corresponding services where they originate from. Therefore, for WPS-outputs, that are accessed via the /wpsoutputs of proxy service, the Magpie "REST API" secure-data-proxy permissions are used. For shapefiles coming from GeoServer, permissions under geoserver are used, as so on.

The "mapping" of service-specific permissions to corresponding user-workspaces contents depends on

sync_permissions:
# Friendly name to identify a sync point (The value is not used by Cowbird so this can be any relevant keyword)
user_workspace:
# [Required] This section defines a list of services and resources that exists in Magpie.
# For more info on the services available in Magpie :
# https://pavics-magpie.readthedocs.io/en/latest/services.html#available-services
# https://pavics-magpie.readthedocs.io/en/latest/autoapi/magpie/services/index.html
services: # Contains the different resources that can be synchronized, ordered by service type
thredds: # Service type, which should also exist in Magpie
# Resource key (ex.: thredds_workspace): Custom name to represent a resource path.
#
# Example of resource that uses variables and a `MULTI_TOKEN`.
# Here, with the config below, if we have an input resource path
# `/geoserver/workspaces/user_xyz/dir1/dir2/file_abc` that matches with the `geoserver_workspace` resource key,
# the `user` variable name would be matched with `user_xyz` and `synced_file`, with `file_abc`.
# Also, this key would need to sync permissions with the `thredds_workspace` resource key, considering the
# `permissions_mapping` defined below. The `thredds_workspace` would be deduced to the resource path
# `/thredds/catalog/workspaces/user_xyz/dir1/dir2/subdir/file_abc`.
# The types of each segment of this target resource path would be deduced
# from the `thredds_workspace` config below.
thredds_workspace:
- name: thredds
type: service
# not a resource in Magpie
# 'catalog' is the file/view format specifier for the rest of the path
# - name: catalog
# type: directory
- name: workspaces
type: directory
- name: "{user}"
type: directory
- name: "**"
type: directory
- name: subdir
type: directory
- name: "{synced_file}"
type: file
geoserver:
geoserver_workspace:
- name: geoserver
type: service
- name: workspaces
type: workspace
- name: "{user}"
type: workspace
- name: "**"
type: workspace
- name: "{synced_file}"
type: workspace

And the FileSystem handler that uses the on_created, on_modified, on_deleted, permission_created, permission_deleted events triggered by either Magpie permissions webhooks or file-system monitoring of user-workspaces.
https://github.com/Ouranosinc/cowbird/blob/e2aa5337e32cd87efb5600f3fe62882d8d4d8b1f/cowbird/handlers/impl/filesystem.py#L226

Currently, users cannot themselves set permissions for their user-workspace files unless they have Magpie admin privileges.
Magpie could employ user-context requests (such as when a user edits their own profile /magpie/ui/users/current) to allow sharing their own files. However, it is tricky to display a partial resource hierarchy without leaking resources of other users (this is why admin-only API/UI are used for now). There is a concept of "owner" (in the DB) for Magpie resources, but they are not currently employed to check access to them. Non-trivial adjustments (new UI pages, new API endpoints) to support user-owned permissions editing would have to be made in Magpie.
Relates to Ouranosinc/Magpie#170

@mishaschwartz
Copy link
Collaborator Author

Discussion continues here:

#425 (comment)

mishaschwartz added a commit that referenced this issue Dec 3, 2024
## Overview
 
The recommended public share folders in the `env.local.example` file
create a conflict with the default `PUBLIC_WORKSPACE_WPS_OUTPUTS_SUBDIR`
path when both are enabled and mounted on a Jupyterlab container. This
change updates the recommended paths for the public share folders to
avoid this conflict and adds a warning helping users to avoid this
conflict.

Note: the conflict arises when `PUBLIC_WORKSPACE_WPS_OUTPUTS_SUBDIR` is
mounted to a container as read-only volume and then Jupyterhub tries to
mount the public share folder within that volume. Since the parent
volume is read-only, the second volume mount fails.

## Changes

**Non-breaking changes**
None, documentation only

**Breaking changes**
None

## Related Issue / Discussion

- Resolves #392

## Additional Information

Links to other issues or sources.

## CI Operations

<!--
The test suite can be run using a different DACCS config with
``birdhouse_daccs_configs_branch: branch_name`` in the PR description.
To globally skip the test suite regardless of the commit message use
``birdhouse_skip_ci`` set to ``true`` in the PR description.

Using ``[<cmd>]`` (with the brackets) where ``<cmd> = skip ci`` in the
commit message will override ``birdhouse_skip_ci`` from the PR
description.
Such commit command can be used to override the PR description behavior
for a specific commit update.
However, a commit message cannot 'force run' a PR which the description
turns off the CI.
To run the CI, the PR should instead be updated with a ``true`` value,
and a running message can be posted in following PR comments to trigger
tests once again.
-->

birdhouse_daccs_configs_branch: master
birdhouse_skip_ci: false
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants