Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hub for Univ. Grenoble satellite team #207

Closed
choldgraf opened this issue Feb 9, 2021 · 32 comments
Closed

Hub for Univ. Grenoble satellite team #207

choldgraf opened this issue Feb 9, 2021 · 32 comments

Comments

@choldgraf
Copy link
Member

choldgraf commented Feb 9, 2021

We need to deploy a Pangeo-style hub for the University of Grenoble. Here's a rough list of "extra" packages that they listed in their questionnaire:

cartopy, dask, distributed, gsw, intake, mamba, matplotlib, numpy, scipy, xarray, xesmf, xgcm, xhistogram,
xmitgcm, xrft, xscale, zarr

They are also going to have like 10-20TB of data stored in Wasabi hot storage somewhere (@rabernat maybe you have context on this?)

Hub Info

@choldgraf
Copy link
Member Author

I've added some metadata about this hub after hearing back from the Grenoble team. I wonder what @yuvipanda or @GeorgianaElena thinks about using that hub info section as a GitHub Issue template for when we add new hubs, so that we can keep track of what we are running (either in this repo, or in a hubs/ repo or something?).

Note that the "link to config" section also reminds me of a question @yuvipanda had a few weeks back, about whether hub configs should be in a sub-directory rather than in a single YAML file. One benefit of this would be that we can more easily create permalinks to the config...

@choldgraf
Copy link
Member Author

Is the process for creating this hub as simple as just adding a new entry to the hubs.yaml file? If so I can try this tomorrow and send to them

@choldgraf choldgraf added Hub and removed Needs Hub labels Mar 4, 2021
yuvipanda added a commit to yuvipanda/pilot-hubs that referenced this issue May 25, 2021
- traefik tag bump was required to get LE working. It's
  already bumped in newer z2jh versions
- NFS server was again set up manually, and needed the `insecure`
  flag - even though other hubs are setup the same way and didn't
  need this. NFS situation needs to be sorted.

Ref 2i2c-org#207
yuvipanda added a commit to yuvipanda/pilot-hubs that referenced this issue May 25, 2021
- traefik tag bump was required to get LE working. It's
  already bumped in newer z2jh versions
- NFS server was again set up manually, and needed the `insecure`
  flag - even though other hubs are setup the same way and didn't
  need this. NFS situation needs to be sorted.

Ref 2i2c-org#207
@yuvipanda
Copy link
Member

yuvipanda commented May 27, 2021

Adding more context - @roxyboy and others have gotten a GCP project + billing account where we're moving the hub to (with #429). This will let them scale up more than they can now.

@roxyboy
Copy link

roxyboy commented May 27, 2021

Currently the Jupyterlab option (which was there during the pilot version) seems to be gone with only the notebook format available but could we add the lab option?

@yuvipanda
Copy link
Member

@roxyboy we default to opening in JupyterLab now - can you give it a shot?

@roxyboy
Copy link

roxyboy commented May 28, 2021

@roxyboy we default to opening in JupyterLab now - can you give it a shot?

Yes, it is now JupyterLab :)

@yuvipanda
Copy link
Member

@roxyboy yay, ok. You have a 1GB memory limit now, and dask-gateway limits aren't set by default. What would you like these to be?

@roxyboy
Copy link

roxyboy commented May 28, 2021

@roxyboy yay, ok. You have a 1GB memory limit now, and dask-gateway limits aren't set by default. What would you like these to be?

@auraoupa @lesommer Do you think 100GB of RAM and maximum of 32 cores be enough?

@yuvipanda
Copy link
Member

Yeah, something from https://cloud.google.com/compute/vm-instance-pricing#e2_predefined - the 'e2' series of machines. e2-standard-32 - 32 cores + 128GB? We can also make a few smaller flavors available so you don't have to spend that much resources unless you need it

@roxyboy
Copy link

roxyboy commented May 28, 2021

Yeah, something from https://cloud.google.com/compute/vm-instance-pricing#e2_predefined - the 'e2' series of machines. e2-standard-32 - 32 cores + 128GB? We can also make a few smaller flavors available so you don't have to spend that much resources unless you need it

Sounds great to me :) Having options for resource allocation would be nice.

@yuvipanda
Copy link
Member

Alright, for now I'll just wait for you to stop using it and then bump up the sizes to not disrupt what you're doing now :)

@roxyboy
Copy link

roxyboy commented May 28, 2021

Alright, for now I'll just wait for you to stop using it and then bump up the sizes to not disrupt what you're doing now :)

Ok, I just logged out :)

@yuvipanda
Copy link
Member

I'm fiddling with it some more :)

@roxyboy
Copy link

roxyboy commented Jun 1, 2021

@yuvipanda @choldgraf Could we set up a storage system for analysis data. @rabernat suggested that either an NSF storage or Pangeo bucket would be the way to go: pangeo-forge/staged-recipes#14 (comment)

@lesommer Do we have funding to pay for such additional storage?

The largest chunk of storage would come from the post-processing of LLC4320 hourly data to daily averages.

@yuvipanda
Copy link
Member

@roxyboy we'll set up a SCRATCH_BUCKET that can function exactly like PANGEO_BUCKET. Do you have an estimate of how much data you might store?

@roxyboy
Copy link

roxyboy commented Jun 1, 2021

@roxyboy we'll set up a SCRATCH_BUCKET that can function exactly like PANGEO_BUCKET. Do you have an estimate of how much data you might store?

I think it'll be around 100-500 Gb.

@yuvipanda
Copy link
Member

@roxyboy cool! I'll try get a writeable SCRATCH_BUCKET working soon.

@yuvipanda
Copy link
Member

@roxyboy ok so there are now two environment variables:

  1. DATA_BUCKET, pointing to a GCS bucket anyone can write to, common to everyone in the hub. This can be for hub-wide processed common data. This will be the same value for every user.
  2. SCRATCH_BUCKET, pointing to a GCS bucket (+ prefix) specific to each user. This is same as PANGEO_SCRATCH in the pangeo maintained hubs - provides a storage space for users to stash intermediary results in. This value will be different for every user.

Try it out and let me know if this works for your purpose?

@roxyboy
Copy link

roxyboy commented Jun 3, 2021

@roxyboy ok so there are now two environment variables:

  1. DATA_BUCKET, pointing to a GCS bucket anyone can write to, common to everyone in the hub. This can be for hub-wide processed common data. This will be the same value for every user.
  2. SCRATCH_BUCKET, pointing to a GCS bucket (+ prefix) specific to each user. This is same as PANGEO_SCRATCH in the pangeo maintained hubs - provides a storage space for users to stash intermediary results in. This value will be different for every user.

Try it out and let me know if this works for your purpose?

Awesome! Thanks @yuvipanda . Could you provide an example syntax how to write to the buckets? I tried the following but got an error...


import os
SCRATCH = os.environ['SCRATCH_BUCKET']
DATA = os.environ['DATA_BUCKET']
# -> gs://pangeo-scratch/<username>
import fsspec
mapper = fsspec.get_mapper(f'{SCRATCH}/roxyboy')
import os
SCRATCH = os.environ['SCRATCH_BUCKET']
DATA = os.environ['DATA_BUCKET']
# -> gs://pangeo-scratch/<username>
import fsspec
mapper = fsspec.get_mapper(f'{SCRATCH}/roxyboy')
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-3-38b48cecfc2a> in <module>
      1 import os
----> 2 SCRATCH = os.environ['SCRATCH_BUCKET']
      3 DATA = os.environ['DATA_BUCKET']
      4 # -> gs://pangeo-scratch/<username>
      5 import fsspec

/srv/conda/envs/notebook/lib/python3.8/os.py in __getitem__(self, key)
    673         except KeyError:
    674             # raise KeyError with the original key value
--> 675             raise KeyError(key) from None
    676         return self.decodevalue(value)
    677 

KeyError: 'SCRATCH_BUCKET'

@yuvipanda
Copy link
Member

You might have to start / stop your server again. I had to do a comple more tweaks. Now, I can test with:

>>> import gcsfs
>>> import os
>>> fs.ls(os.environ['DATA_BUCKET'])
[]
>>> fs.put('Untitled.ipynb', os.environ['SCRATCH_BUCKET'] + '/test')
>>> fs.ls(os.environ['SCRATCH_BUCKET'])
['meom-ige-scratch/yuvipanda/test']

Does that work for you?

yuvipanda added a commit to yuvipanda/pilot-hubs that referenced this issue Jun 3, 2021
- traefik tag bump was required to get LE working. It's
  already bumped in newer z2jh versions
- NFS server was again set up manually, and needed the `insecure`
  flag - even though other hubs are setup the same way and didn't
  need this. NFS situation needs to be sorted.

Ref 2i2c-org#207
@roxyboy
Copy link

roxyboy commented Jun 3, 2021

You might have to start / stop your server again. I had to do a comple more tweaks. Now, I can test with:

>>> import gcsfs
>>> import os
>>> fs.ls(os.environ['DATA_BUCKET'])
[]
>>> fs.put('Untitled.ipynb', os.environ['SCRATCH_BUCKET'] + '/test')
>>> fs.ls(os.environ['SCRATCH_BUCKET'])
['meom-ige-scratch/yuvipanda/test']

Does that work for you?

Sorry, what's the module fs? It seems that I have access now to the buckets :)

SCRATCH = os.environ['SCRATCH_BUCKET']
SCRATCH
'gcs://meom-ige-scratch/roxyboy'

And apologies for elementary questions but in order to save the files, I use fs.put() for example for zarr files?

@yuvipanda
Copy link
Member

Copy paste fail!

fs = gcsfs.GCSFileSystem()

fs.put is probably a good start, but I'm guessing there might be more optimized ones when you're uploading a lot of them. I am very much a n00b at the moment, though.

@rabernat
Copy link
Contributor

rabernat commented Jun 3, 2021

The Pangeo Cloud Docs provide documentation on writing to cloud object storage.

yuvipanda added a commit to yuvipanda/pilot-hubs that referenced this issue Jun 8, 2021
- traefik tag bump was required to get LE working. It's
  already bumped in newer z2jh versions
- NFS server was again set up manually, and needed the `insecure`
  flag - even though other hubs are setup the same way and didn't
  need this. NFS situation needs to be sorted.

Ref 2i2c-org#207
yuvipanda added a commit to yuvipanda/pilot-hubs that referenced this issue Jun 9, 2021
- traefik tag bump was required to get LE working. It's
  already bumped in newer z2jh versions
- NFS server was again set up manually, and needed the `insecure`
  flag - even though other hubs are setup the same way and didn't
  need this. NFS situation needs to be sorted.

Ref 2i2c-org#207
yuvipanda added a commit to yuvipanda/pilot-hubs that referenced this issue Jun 10, 2021
- traefik tag bump was required to get LE working. It's
  already bumped in newer z2jh versions
- NFS server was again set up manually, and needed the `insecure`
  flag - even though other hubs are setup the same way and didn't
  need this. NFS situation needs to be sorted.

Ref 2i2c-org#207
yuvipanda added a commit to yuvipanda/pilot-hubs that referenced this issue Jun 10, 2021
- traefik tag bump was required to get LE working. It's
  already bumped in newer z2jh versions
- NFS server was again set up manually, and needed the `insecure`
  flag - even though other hubs are setup the same way and didn't
  need this. NFS situation needs to be sorted.

Ref 2i2c-org#207
yuvipanda added a commit to yuvipanda/pilot-hubs that referenced this issue Jun 16, 2021
- traefik tag bump was required to get LE working. It's
  already bumped in newer z2jh versions
- NFS server was again set up manually, and needed the `insecure`
  flag - even though other hubs are setup the same way and didn't
  need this. NFS situation needs to be sorted.

Ref 2i2c-org#207
@roxyboy
Copy link

roxyboy commented Jun 18, 2021

Yeah, something from https://cloud.google.com/compute/vm-instance-pricing#e2_predefined - the 'e2' series of machines. e2-standard-32 - 32 cores + 128GB? We can also make a few smaller flavors available so you don't have to spend that much resources unless you need it

@yuvipanda Would bumping up the maximum to 64 cores + 256GB result in a significant increase in operational cost? If not, I think it would be nice to increase it based on my experience so far on the Jupyterhub.

@yuvipanda
Copy link
Member

@roxyboy added that node size too. You can keep an eye on costs in https://console.cloud.google.com/billing/015AF3-346967-3DD18B/reports;grouping=GROUP_BY_SKU;projects=meom-ige-cnrs?project=meom-ige-cnrs&organizationId=353771151905 as well :)

@roxyboy
Copy link

roxyboy commented Jun 22, 2021

The Pangeo Cloud Docs provide documentation on writing to cloud object storage.

I think I've managed to save some files to the SCRATCH_BUCKET but now I can't seem to figure out the syntax to read it... I've tried the following:

endpoint_url = 'https://meom-ige.2i2c.cloud/'
import s3fs
fs_osn = s3fs.S3FileSystem(anon=True, client_kwargs={'endpoint_url': endpoint_url},)
xr.open_zarr(fs_osn.get_mapper(f"gcs://meom-ige-scratch/roxyboy/region01/sep/W_06.zarr"))

which is similar to how I've been opening data sitting on OSN. I have a hunch that the endpoint_url is wrong but any suggestions what I should put there?

@roxyboy
Copy link

roxyboy commented Jun 22, 2021

The Pangeo Cloud Docs provide documentation on writing to cloud object storage.

I think I've managed to save some files to the SCRATCH_BUCKET but now I can't seem to figure out the syntax to read it... I've tried the following:

endpoint_url = 'https://meom-ige.2i2c.cloud/'
import s3fs
fs_osn = s3fs.S3FileSystem(anon=True, client_kwargs={'endpoint_url': endpoint_url},)
xr.open_zarr(fs_osn.get_mapper(f"gcs://meom-ige-scratch/roxyboy/region01/sep/W_06.zarr"))

which is similar to how I've been opening data sitting on OSN. I have a hunch that the endpoint_url is wrong but any suggestions what I should put there?

Nevermind, I realized I was using the wrong mapper. The following solved it :)

gcs = gcsfs.GCSFileSystem(requester_pays=True)
xr.open_zarr(gcs.get_mapper(f"gcs://meom-ige-scratch/roxyboy/region01/sep/W_06.zarr"))

@roxyboy
Copy link

roxyboy commented Jul 26, 2021

Could I ask for an update for dask and distributed to 2021.7?

@choldgraf
Copy link
Member Author

choldgraf commented Jul 26, 2021

@roxyboy we trying out a new workflow that uses FreshDesk as a "point of contact" for support. To use it, simply send an email to:

[email protected]

Could you send a request these updates to that address so we can see how this works out? Hopefully this will make it easier to keep track of requests.

On that note, I'd suggest that we close this issue, since the hub is now "deployed", and I think we are in maintenance mode. We can open up more issues to resolve specific updates/requests that come in via the support channel. Does anybody disagree?

@roxyboy
Copy link

roxyboy commented Aug 3, 2021

@roxyboy we trying out a new workflow that uses FreshDesk as a "point of contact" for support. To use it, simply send an email to:

[email protected]

Could you send a request these updates to that address so we can see how this works out? Hopefully this will make it easier to keep track of requests.

On that note, I'd suggest that we close this issue, since the hub is now "deployed", and I think we are in maintenance mode. We can open up more issues to resolve specific updates/requests that come in via the support channel. Does anybody disagree?

I sent an email but am I supposed to receive a confirmation email...?

@choldgraf
Copy link
Member Author

Ah I guess not - will send a response to you today to make sure it works as expected

@yuvipanda
Copy link
Member

Closing as this hub has been functional for a while!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants