Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Additional packages for module 1 ESPM-157 #5827

Assignees
Labels
package-request Package addition request for a hub

Comments

@cboettig
Copy link

cboettig commented Jun 30, 2024

Package Name

altair, ibis-framework[duckdb], leafmap[libremap], myst

Hub URL

nature.datahub.berkeley.edu

Course Name

ESPM 157

Semester Details

Fall 2024

Installation Deadline

Start of Semester.

Let me know if it would be better to do separate requests? Also, I know most of these we can install ourselves but it looks like myst will require a working nodejs setup.

@balajialg
Copy link
Contributor

balajialg commented Jul 25, 2024

@cboettig Added all the requested packages in https://staging.datahub.berkeley.edu/ via this PR1 & PR2. I currently tested this notebook from ESPM 157 Github repo. It seems like ibis-framework[pandas] is the package required to get the notebook to execute successfully. Do you also need the ibis-framework package which has a large scope in addition to ibis-framework[pandas] - https://ibis-project.org/install

With ibis-framework & without ibis-framework[pandas], I get the following error.
Screenshot 2024-07-25 at 1 16 27 PM

@balajialg
Copy link
Contributor

Closing this issue as changes got merged to prod. @cboettig Please reopen this issue if that is not the case.

@cboettig
Copy link
Author

cboettig commented Aug 7, 2024

@balajialg thanks much! Apologies I'm looping back here and catching up. Yes, can we have both pandas and duckdb backends for ibis? ibis-framework[pandas,duckdb] . (We may be able to stick with pure duckdb if duckdb/duckdb#13121 is merged soon 😁 )

I don't think I have permissions re-open.

  • Also maybe should be a separate request, but would it be possible to have a 4GB allocation for all ESPM-157 students? Testing right now, on my allocation I see just 1 GB on the instance. We can probably make that work if necessary, but it is rather slim...
  • I see we have code-server proxy on datahub.berkeley.edu, but the jupyter extensions aren't installed. Would it be possible to add these? I try adding manually but get an error about version conflict with the code-server version currently on the hub.

@balajialg
Copy link
Contributor

@cboettig Sure, I am assuming that you would want https://pypi.org/project/jupyter-vscode-server/ extension. Any other code server extension that you want as part of the image?

For increasing RAM, we need the bcourses id for your FA 24 course. Can you either share the bcourses id here or create a separate request via this github issue. I will create a PR to address all the above requests.

@balajialg balajialg reopened this Aug 7, 2024
@cboettig
Copy link
Author

cboettig commented Aug 7, 2024

@balajialg Thanks much!! It looks like jupyter-vscode-server already installed on datahub.berkeley.edu. However, Code-server itself has it's own extension system, but it can't install:

image

for whatever reason, code-server also doesn't detect a python installation (probably something about how conda is configured?). It would be great if users could could go to e.g. https://datahub.berkeley.edu/user/cboettig/vscode/ (or appropriate link) and be able to start running in an ipynb notebook there.

@balajialg
Copy link
Contributor

Thanks @cboettig, will look into it.

@cboettig
Copy link
Author

cboettig commented Aug 7, 2024

@balajialg One more python module request for you. (Sorry, just finding so much awesome stuff in the python ecosystem while migrating from the R version of this course. I know we could do quarto but Myst is 😻 )

  • install jupyterlab_myst

@balajialg
Copy link
Contributor

@cboettig No worries, keep it coming :P

@ryanlovett
Copy link
Collaborator

ryanlovett commented Aug 7, 2024

@cboettig Do you mean pre-install the Jupyter and Python extensions?

@balajialg I think this can be done by adding something like this to the docker image:

export VSCODE_EXTENSIONS=${CONDA_DIR}/share/code-server/extensions
code-server --extensions-dir ${VSCODE_EXTENSIONS} --install-extension ms-python.python --force
code-server --extensions-dir ${VSCODE_EXTENSIONS} --install-extension ms-toolsai.jupyter

To be reproducible, we'd need to first download specific versions and then install, e.g.

export VSCODE_EXTENSIONS=${CONDA_DIR}/share/code-server/extensions

VSIX=2024.13.2024080701
wget https://marketplace.visualstudio.com/_apis/public/gallery/publishers/ms-python/vsextensions/python/${VSIX}/vspackage -O /tmp/ms-python.vsix
code-server --extensions-dir ${VSCODE_EXTENSIONS} --install-extension /tmp/ms-python.vsix 

@balajialg
Copy link
Contributor

Thanks @ryanlovett! This is helpful instructions. I will get to this asap!

@balajialg
Copy link
Contributor

balajialg commented Aug 8, 2024

@ryanlovett Just created a PR - #5915 which addresses Carl's requirement

@balajialg
Copy link
Contributor

balajialg commented Aug 16, 2024

@cboettig Just wanted to give you a heads up about where things are currently - We are seeing 3 different challenges to get the extension set up working in VSCode as expected before the start of the instruction,

a) Downloading VSIX files for the required VSCode extensions via wget/curl is failing since there is a discrepancy in the size of the VSIX files when downloaded via GUI vs CLI. We might need to download it via UI, place it in datahub directory and reference it in the docker image. Ref: microsoft/vscode-python#17491
b) Ensuring that a single Python kernel is selected in the VSCode interface amongst multiple kernels requires 3rd party extensions which are not available in the code-server aka open-source version of VSCode. Ref: https://coder.com/docs/code-server/FAQ#how-does-the-config-file-work. We are looking at other options to circumvent this.
c) Hiding other kernels so that a default kernel is selected needs to be done on a per-user/workspace basis and from the limited information we have currently, it might be hard to change default settings. Ref: microsoft/vscode-jupyter#13032 (comment)

Jonathan, Ryan and I met yesterday for a debugging session to get this to work (Jonathan and I had a marathon meeting lasting 3+ hours). We will be meeting again next week to hash out the next steps.

Being transparent - If we cannot get it to work the way you wanted during the next meeting, we might need to postpone handling this request by a month or so since the semester's start is a really hectic time to deep dive into a single issue.

@cboettig
Copy link
Author

ok thanks for the heads up. I think we'll try and stick with pure jupyterhub interface then

@cboettig
Copy link
Author

@balajialg @ryanlovett

Hopefully quick/small thing here. I'd like students to be able to delete folders/directories even if they aren't empty. (I find most GUIs allow this, and not being able to delete non-empty dirs confuses students.) I gather it's possible but couldn't figure out the configuration option (see https://discourse.jupyter.org/t/recursive-delete-directories-in-file-browser/27468/2) any ideas? Is this something we could enable out of the box?

(I know this is potentially destructive, but I really emphasize that students should treat hub storage as ephemeral anyway and want them to be good stewards of that space cleaning up storage without teaching bash commands)

@felder
Copy link
Contributor

felder commented Aug 19, 2024

@cboettig one concern that'd I'd have is we're currently talking about the datahub hub, so a config change for one class will impact all others on the same hub. I wholeheartedly support the idea of being a good steward of resources and informing students to consider hub storage as ephemeral including encouraging folks do their own backups. I wish more people did this!

However, as you've indicated this is potentially destructive and represents a departure from how non empty directories are currently treated. It could present a problem for other courses where less focus is spent talking about resource management. What I'd love to see here is an upstream change where a recursive delete requires a confirmation perhaps even with a checkbox to acknowledge when one is deleting a non empty directory. That would allow for this functionality with less of a chance of a destructive mistake.

@ryanlovett
Copy link
Collaborator

I think it is possible to mount code containing a FileContentsManager.always_delete_dir setting to the jupyter config directory just for a specific course. While we do something like this in hub/values.yml:

singleuser:
  extraFiles:
    jupyter_server_config.json:
      mountPath: /usr/local/etc/jupyter/jupyter_server_config.json
      data:
        FileContentsManager:
          always_delete_dir: true

it would need to be ported to kubespawner override form in order to take effect for just the classes we want. I think it would need to be a volumeMount, e.g.

volumeMounts:
- name: home
  subPath: _espm-157/espm-157.json
  mountPath: /usr/local/etc/jupyter/jupyter_server_config.d/espm-157.json

where we put the config in the filestore in a course subdirectory. We do this for shared directories, but instead of sharing data, we'd "share" configuration. The contents of the file would be:

{"FileContentsManager":{"always_delete_dir":true}}

We may or may not need to precreate /usr/local/etc/jupyter/jupyter_server_config.d. (I don't know if kubernetes would make that for us when it performs the mount.)

See /srv/conda/etc/jupyter/jupyter_server_config.d for config files installed by packages.

Note that I'd rather that this config be code deployed from the repo rather than from a directory on the filestore. One way to do that would be to put the configuration data inside a configMap. There's no existing example of that, but it may not be too hard. The above should work though.

@cboettig
Copy link
Author

@balajialg for some reason import seaborn seems broken for me on the hub? I get this trace:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[1], line 2
      1 import ibis
----> 2 import seaborn as sns

File /srv/conda/lib/python3.11/site-packages/seaborn/__init__.py:9
      7 from .categorical import *  # noqa: F401,F403
      8 from .distributions import *  # noqa: F401,F403
----> 9 from .matrix import *  # noqa: F401,F403
     10 from .miscplot import *  # noqa: F401,F403
     11 from .axisgrid import *  # noqa: F401,F403

File /srv/conda/lib/python3.11/site-packages/seaborn/matrix.py:12
      9 import pandas as pd
     10 from scipy.cluster import hierarchy
---> 12 from . import cm
     13 from .axisgrid import Grid
     14 from .utils import (
     15     despine,
     16     axis_ticklabels_overlap,
   (...)
     19     _draw_figure,
     20 )

File /srv/conda/lib/python3.11/site-packages/seaborn/cm.py:1582
   1579     _cmap_r = colors.ListedColormap(_lut[::-1], _name + "_r")  
   1580     locals()[_name + "_r"] = _cmap_r
-> 1582     mpl_cm.register_cmap(_name, _cmap)
   1583     mpl_cm.register_cmap(_name + "_r", _cmap_r)
   1585 del colors, mpl_cm

AttributeError: module 'matplotlib.cm' has no attribute 'register_cmap'

Also, lemme know if we might get Myst markdown back. Also let me know if you have a reproducible example of the issue, I saw the forum post you've linked and tried asking around a bit and only hear that myst should work well with LaTeX?

@cboettig
Copy link
Author

(it looks like I can resolve the seaborn issue by upgrading via conda install seaborn, but would be much preferable to have seaborn working out of the box on day one of class next week... 😟

@felder
Copy link
Contributor

felder commented Aug 20, 2024

@cboettig currently seaborn is version locked on datahub:

datahub/images/default/environment.yml:27:- seaborn=0.11.*

When I look:

jovyan@jupyter-felder:~$ mamba list | grep -i seaborn

seaborn                   0.11.2               hd8ed1ab_0    conda-forge
seaborn-base              0.11.2             pyhd8ed1ab_0    conda-forge
jovyan@jupyter-felder:~$ mamba install seaborn

Looking for: ['seaborn']

conda-forge/noarch                                  16.1MB @  29.0MB/s  0.9s
conda-forge/linux-64                                37.1MB @  28.9MB/s  2.4s

Pinned packages:
  - python 3.11.*


Transaction

  Prefix: /srv/conda

  All requested packages already installed

jovyan@jupyter-felder:~$ mamba update seaborn

Looking for: ['seaborn']

conda-forge/linux-64                                        Using cache
conda-forge/noarch                                          Using cache

Pinned packages:
  - python 3.11.*


Transaction

  Prefix: /srv/conda

  Updating specs:

   - seaborn
   - ca-certificates
   - certifi
   - openssl


  Package         Version  Build         Channel          Size
────────────────────────────────────────────────────────────────
  Upgrade:
────────────────────────────────────────────────────────────────

  - seaborn-base   0.11.2  pyhd8ed1ab_0  conda-forge     229kB
  + seaborn-base   0.13.2  pyhd8ed1ab_2  conda-forge     235kB
  - seaborn        0.11.2  hd8ed1ab_0    conda-forge       4kB
  + seaborn        0.13.2  hd8ed1ab_2    conda-forge       7kB

  Summary:

  Upgrade: 2 packages

  Total download: 242kB

────────────────────────────────────────────────────────────────


Confirm changes: [Y/n] y
seaborn                                              7.0kB @  47.6kB/s  0.1s
seaborn-base                                       234.6kB @   1.4MB/s  0.2s

Downloading and Extracting Packages:

Preparing transaction: done
Verifying transaction: done
Executing transaction: done
jovyan@jupyter-felder:~$ 

It is definitely not recommended to have students running conda to update packages manually as those updates will not persist between server launches. Can you please confirm that seaborn 0.13.2 meets your needs (or is the version that gets deployed when you run conda install). If so we can probably adjust the version lock for seaborn to deploy 0.13.2.

Also for the future, we version lock packages so that subsequent deployments during the semester will not result in unintended package changes which have a habit of breaking things. It's really helpful if when specifying packages you also specify versions.

@shaneknapp
Copy link
Contributor

@shaneknapp
Copy link
Contributor

@cboettig this is live on nature-staging, please let me know if things are working properly for you. thanks!

@cboettig
Copy link
Author

all lookin good!

@cboettig
Copy link
Author

another!

  • exactextract (from condaforge)

@shaneknapp
Copy link
Contributor

another!

* exactextract (from condaforge)

live on nature-staging!

@shaneknapp
Copy link
Contributor

also live on prod... if you need anything changed pls re-open and let us know

@cboettig
Copy link
Author

could we add mystmd?

@shaneknapp
Copy link
Contributor

@cboettig this is now live on nature-staging.d.b.e, please test and let me know if it looks good to you!

@cboettig
Copy link
Author

@shaneknapp thanks! just testing for pdf output it wants latexmk. (seems that adding this just from conda should do it I think?)

@shaneknapp
Copy link
Contributor

alright this is live on nature-staging! give it a test drive and let me know if you need anything else. :)

@cboettig
Copy link
Author

all good!

@cboettig
Copy link
Author

another one:

  • langchain_chroma

I think this one needs to be pip installed. Thanks!

@shaneknapp
Copy link
Contributor

this is live on nature-staging!

@cboettig
Copy link
Author

🎉 looks great!

@cboettig
Copy link
Author

whoops, and found another -- duckdb-engine

(I tested with pip install; though I think this one should be available with conda too(?). I've noticed we can pip install from the hub but I guess the conda-solver is more RAM hungry and any install request just crashes the pod).

@shaneknapp
Copy link
Contributor

whoops, and found another -- duckdb-engine

(I tested with pip install; though I think this one should be available with conda too(?). I've noticed we can pip install from the hub but I guess the conda-solver is more RAM hungry and any install request just crashes the pod).

yeah, conda-solver is definitely ram hungry. this build locally, so i've created a pr.

@shaneknapp
Copy link
Contributor

duckdb-engine is live on nature-staging

@cboettig
Copy link
Author

perfect!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment