Hub for Univ. Grenoble satellite team #207

choldgraf · 2021-02-09T03:40:14Z

We need to deploy a Pangeo-style hub for the University of Grenoble. Here's a rough list of "extra" packages that they listed in their questionnaire:

cartopy, dask, distributed, gsw, intake, mamba, matplotlib, numpy, scipy, xarray, xesmf, xgcm, xhistogram,
xmitgcm, xrft, xscale, zarr

They are also going to have like 10-20TB of data stored in Wasabi hot storage somewhere (@rabernat maybe you have context on this?)

Hub Info

Hub Type: Pangeo
Authentication Type: GitHub
Administrators:
- roxyboy
- lesommer
- auraoupa
Timeline: ASAP
Link to config: TODO
Link to URL: https://grenoble-swot.pilot.2i2c.cloud

The text was updated successfully, but these errors were encountered:

choldgraf · 2021-02-10T20:36:57Z

I've added some metadata about this hub after hearing back from the Grenoble team. I wonder what @yuvipanda or @GeorgianaElena thinks about using that hub info section as a GitHub Issue template for when we add new hubs, so that we can keep track of what we are running (either in this repo, or in a hubs/ repo or something?).

Note that the "link to config" section also reminds me of a question @yuvipanda had a few weeks back, about whether hub configs should be in a sub-directory rather than in a single YAML file. One benefit of this would be that we can more easily create permalinks to the config...

choldgraf · 2021-02-17T00:58:58Z

Is the process for creating this hub as simple as just adding a new entry to the hubs.yaml file? If so I can try this tomorrow and send to them

- traefik tag bump was required to get LE working. It's already bumped in newer z2jh versions - NFS server was again set up manually, and needed the `insecure` flag - even though other hubs are setup the same way and didn't need this. NFS situation needs to be sorted. Ref 2i2c-org#207

yuvipanda · 2021-05-27T06:29:49Z

Adding more context - @roxyboy and others have gotten a GCP project + billing account where we're moving the hub to (with #429). This will let them scale up more than they can now.

roxyboy · 2021-05-27T18:59:37Z

Currently the Jupyterlab option (which was there during the pilot version) seems to be gone with only the notebook format available but could we add the lab option?

yuvipanda · 2021-05-28T02:19:49Z

@roxyboy we default to opening in JupyterLab now - can you give it a shot?

roxyboy · 2021-05-28T07:26:02Z

@roxyboy we default to opening in JupyterLab now - can you give it a shot?

Yes, it is now JupyterLab :)

yuvipanda · 2021-05-28T07:42:26Z

@roxyboy yay, ok. You have a 1GB memory limit now, and dask-gateway limits aren't set by default. What would you like these to be?

roxyboy · 2021-05-28T07:48:57Z

@roxyboy yay, ok. You have a 1GB memory limit now, and dask-gateway limits aren't set by default. What would you like these to be?

@auraoupa @lesommer Do you think 100GB of RAM and maximum of 32 cores be enough?

yuvipanda · 2021-05-28T08:37:23Z

Yeah, something from https://cloud.google.com/compute/vm-instance-pricing#e2_predefined - the 'e2' series of machines. e2-standard-32 - 32 cores + 128GB? We can also make a few smaller flavors available so you don't have to spend that much resources unless you need it

roxyboy · 2021-05-28T08:39:00Z

Yeah, something from https://cloud.google.com/compute/vm-instance-pricing#e2_predefined - the 'e2' series of machines. e2-standard-32 - 32 cores + 128GB? We can also make a few smaller flavors available so you don't have to spend that much resources unless you need it

Sounds great to me :) Having options for resource allocation would be nice.

yuvipanda · 2021-05-28T08:48:09Z

Alright, for now I'll just wait for you to stop using it and then bump up the sizes to not disrupt what you're doing now :)

roxyboy · 2021-05-28T09:01:49Z

Alright, for now I'll just wait for you to stop using it and then bump up the sizes to not disrupt what you're doing now :)

Ok, I just logged out :)

yuvipanda · 2021-05-28T19:27:04Z

I'm fiddling with it some more :)

roxyboy · 2021-06-01T18:39:23Z

@yuvipanda @choldgraf Could we set up a storage system for analysis data. @rabernat suggested that either an NSF storage or Pangeo bucket would be the way to go: pangeo-forge/staged-recipes#14 (comment)

@lesommer Do we have funding to pay for such additional storage?

The largest chunk of storage would come from the post-processing of LLC4320 hourly data to daily averages.

yuvipanda · 2021-06-01T19:45:18Z

@roxyboy we'll set up a SCRATCH_BUCKET that can function exactly like PANGEO_BUCKET. Do you have an estimate of how much data you might store?

roxyboy · 2021-06-01T20:22:27Z

@roxyboy we'll set up a SCRATCH_BUCKET that can function exactly like PANGEO_BUCKET. Do you have an estimate of how much data you might store?

I think it'll be around 100-500 Gb.

yuvipanda · 2021-06-02T19:20:18Z

@roxyboy cool! I'll try get a writeable SCRATCH_BUCKET working soon.

yuvipanda · 2021-06-03T19:33:30Z

@roxyboy ok so there are now two environment variables:

DATA_BUCKET, pointing to a GCS bucket anyone can write to, common to everyone in the hub. This can be for hub-wide processed common data. This will be the same value for every user.
SCRATCH_BUCKET, pointing to a GCS bucket (+ prefix) specific to each user. This is same as PANGEO_SCRATCH in the pangeo maintained hubs - provides a storage space for users to stash intermediary results in. This value will be different for every user.

Try it out and let me know if this works for your purpose?

roxyboy · 2021-06-03T20:13:36Z

@roxyboy ok so there are now two environment variables:

DATA_BUCKET, pointing to a GCS bucket anyone can write to, common to everyone in the hub. This can be for hub-wide processed common data. This will be the same value for every user.

SCRATCH_BUCKET, pointing to a GCS bucket (+ prefix) specific to each user. This is same as PANGEO_SCRATCH in the pangeo maintained hubs - provides a storage space for users to stash intermediary results in. This value will be different for every user.

Try it out and let me know if this works for your purpose?

Awesome! Thanks @yuvipanda . Could you provide an example syntax how to write to the buckets? I tried the following but got an error...


import os
SCRATCH = os.environ['SCRATCH_BUCKET']
DATA = os.environ['DATA_BUCKET']
# -> gs://pangeo-scratch/<username>
import fsspec
mapper = fsspec.get_mapper(f'{SCRATCH}/roxyboy')
import os
SCRATCH = os.environ['SCRATCH_BUCKET']
DATA = os.environ['DATA_BUCKET']
# -> gs://pangeo-scratch/<username>
import fsspec
mapper = fsspec.get_mapper(f'{SCRATCH}/roxyboy')
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-3-38b48cecfc2a> in <module>
      1 import os
----> 2 SCRATCH = os.environ['SCRATCH_BUCKET']
      3 DATA = os.environ['DATA_BUCKET']
      4 # -> gs://pangeo-scratch/<username>
      5 import fsspec

/srv/conda/envs/notebook/lib/python3.8/os.py in __getitem__(self, key)
    673         except KeyError:
    674             # raise KeyError with the original key value
--> 675             raise KeyError(key) from None
    676         return self.decodevalue(value)
    677 

KeyError: 'SCRATCH_BUCKET'

yuvipanda · 2021-06-03T21:11:51Z

You might have to start / stop your server again. I had to do a comple more tweaks. Now, I can test with:

>>> import gcsfs
>>> import os
>>> fs.ls(os.environ['DATA_BUCKET'])
[]
>>> fs.put('Untitled.ipynb', os.environ['SCRATCH_BUCKET'] + '/test')
>>> fs.ls(os.environ['SCRATCH_BUCKET'])
['meom-ige-scratch/yuvipanda/test']

Does that work for you?

- traefik tag bump was required to get LE working. It's already bumped in newer z2jh versions - NFS server was again set up manually, and needed the `insecure` flag - even though other hubs are setup the same way and didn't need this. NFS situation needs to be sorted. Ref 2i2c-org#207

roxyboy · 2021-06-03T21:32:09Z

You might have to start / stop your server again. I had to do a comple more tweaks. Now, I can test with:
>>> import gcsfs
>>> import os
>>> fs.ls(os.environ['DATA_BUCKET'])
[]
>>> fs.put('Untitled.ipynb', os.environ['SCRATCH_BUCKET'] + '/test')
>>> fs.ls(os.environ['SCRATCH_BUCKET'])
['meom-ige-scratch/yuvipanda/test']
Does that work for you?

Sorry, what's the module fs? It seems that I have access now to the buckets :)

SCRATCH = os.environ['SCRATCH_BUCKET']
SCRATCH
'gcs://meom-ige-scratch/roxyboy'

And apologies for elementary questions but in order to save the files, I use fs.put() for example for zarr files?

yuvipanda · 2021-06-03T22:01:58Z

Copy paste fail!

fs = gcsfs.GCSFileSystem()

fs.put is probably a good start, but I'm guessing there might be more optimized ones when you're uploading a lot of them. I am very much a n00b at the moment, though.

rabernat · 2021-06-03T22:05:36Z

The Pangeo Cloud Docs provide documentation on writing to cloud object storage.

- traefik tag bump was required to get LE working. It's already bumped in newer z2jh versions - NFS server was again set up manually, and needed the `insecure` flag - even though other hubs are setup the same way and didn't need this. NFS situation needs to be sorted. Ref 2i2c-org#207

roxyboy · 2021-06-18T09:31:44Z

Yeah, something from https://cloud.google.com/compute/vm-instance-pricing#e2_predefined - the 'e2' series of machines. e2-standard-32 - 32 cores + 128GB? We can also make a few smaller flavors available so you don't have to spend that much resources unless you need it

@yuvipanda Would bumping up the maximum to 64 cores + 256GB result in a significant increase in operational cost? If not, I think it would be nice to increase it based on my experience so far on the Jupyterhub.

yuvipanda · 2021-06-18T12:36:50Z

@roxyboy added that node size too. You can keep an eye on costs in https://console.cloud.google.com/billing/015AF3-346967-3DD18B/reports;grouping=GROUP_BY_SKU;projects=meom-ige-cnrs?project=meom-ige-cnrs&organizationId=353771151905 as well :)

roxyboy · 2021-06-22T14:09:26Z

The Pangeo Cloud Docs provide documentation on writing to cloud object storage.

I think I've managed to save some files to the SCRATCH_BUCKET but now I can't seem to figure out the syntax to read it... I've tried the following:

endpoint_url = 'https://meom-ige.2i2c.cloud/'
import s3fs
fs_osn = s3fs.S3FileSystem(anon=True, client_kwargs={'endpoint_url': endpoint_url},)
xr.open_zarr(fs_osn.get_mapper(f"gcs://meom-ige-scratch/roxyboy/region01/sep/W_06.zarr"))

which is similar to how I've been opening data sitting on OSN. I have a hunch that the endpoint_url is wrong but any suggestions what I should put there?

roxyboy · 2021-06-22T14:13:47Z

The Pangeo Cloud Docs provide documentation on writing to cloud object storage.

I think I've managed to save some files to the SCRATCH_BUCKET but now I can't seem to figure out the syntax to read it... I've tried the following:
endpoint_url = 'https://meom-ige.2i2c.cloud/'
import s3fs
fs_osn = s3fs.S3FileSystem(anon=True, client_kwargs={'endpoint_url': endpoint_url},)
xr.open_zarr(fs_osn.get_mapper(f"gcs://meom-ige-scratch/roxyboy/region01/sep/W_06.zarr"))
which is similar to how I've been opening data sitting on OSN. I have a hunch that the endpoint_url is wrong but any suggestions what I should put there?

Nevermind, I realized I was using the wrong mapper. The following solved it :)

gcs = gcsfs.GCSFileSystem(requester_pays=True)
xr.open_zarr(gcs.get_mapper(f"gcs://meom-ige-scratch/roxyboy/region01/sep/W_06.zarr"))

roxyboy · 2021-07-26T09:50:33Z

Could I ask for an update for dask and distributed to 2021.7?

choldgraf · 2021-07-26T10:54:26Z

@roxyboy we trying out a new workflow that uses FreshDesk as a "point of contact" for support. To use it, simply send an email to:

[email protected]

Could you send a request these updates to that address so we can see how this works out? Hopefully this will make it easier to keep track of requests.

On that note, I'd suggest that we close this issue, since the hub is now "deployed", and I think we are in maintenance mode. We can open up more issues to resolve specific updates/requests that come in via the support channel. Does anybody disagree?

roxyboy · 2021-08-03T15:17:53Z

@roxyboy we trying out a new workflow that uses FreshDesk as a "point of contact" for support. To use it, simply send an email to:

[email protected]

Could you send a request these updates to that address so we can see how this works out? Hopefully this will make it easier to keep track of requests.

On that note, I'd suggest that we close this issue, since the hub is now "deployed", and I think we are in maintenance mode. We can open up more issues to resolve specific updates/requests that come in via the support channel. Does anybody disagree?

I sent an email but am I supposed to receive a confirmation email...?

choldgraf · 2021-08-04T06:26:50Z

Ah I guess not - will send a response to you today to make sure it works as expected

yuvipanda · 2021-08-23T09:33:31Z

Closing as this hub has been functional for a while!

choldgraf added the New Hub label Feb 9, 2021

choldgraf added Hub and removed Needs Hub labels Mar 4, 2021

yuvipanda mentioned this issue Mar 29, 2021

Fully turnkey cloud resources setup via Terraform for GCP #332

Closed

7 tasks

yuvipanda mentioned this issue Apr 7, 2021

Add docs on setting up a new project #340

Merged

yuvipanda mentioned this issue May 25, 2021

Refactor GCP terraform code + add MOEM-IGE hub #429

Merged

4 tasks

yuvipanda mentioned this issue May 31, 2021

Team Sync - May 31, 2021 2i2c-org/team-compass#111

Closed

choldgraf added the support label Aug 16, 2021

yuvipanda closed this as completed Aug 23, 2021

jnywong mentioned this issue Jun 19, 2024

[Decommission Hub, meom-ige] SWOT Ocean Pangeo Team #4254

Closed

19 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hub for Univ. Grenoble satellite team #207

Hub for Univ. Grenoble satellite team #207

choldgraf commented Feb 9, 2021 •

edited

Loading

choldgraf commented Feb 10, 2021

choldgraf commented Feb 17, 2021

yuvipanda commented May 27, 2021 •

edited

Loading

roxyboy commented May 27, 2021

yuvipanda commented May 28, 2021

roxyboy commented May 28, 2021

yuvipanda commented May 28, 2021

roxyboy commented May 28, 2021 •

edited

Loading

yuvipanda commented May 28, 2021

roxyboy commented May 28, 2021

yuvipanda commented May 28, 2021

roxyboy commented May 28, 2021

yuvipanda commented May 28, 2021

roxyboy commented Jun 1, 2021 •

edited

Loading

yuvipanda commented Jun 1, 2021

roxyboy commented Jun 1, 2021

yuvipanda commented Jun 2, 2021

yuvipanda commented Jun 3, 2021

roxyboy commented Jun 3, 2021 •

edited

Loading

yuvipanda commented Jun 3, 2021

roxyboy commented Jun 3, 2021

yuvipanda commented Jun 3, 2021

rabernat commented Jun 3, 2021

roxyboy commented Jun 18, 2021

yuvipanda commented Jun 18, 2021

roxyboy commented Jun 22, 2021 •

edited

Loading

roxyboy commented Jun 22, 2021

roxyboy commented Jul 26, 2021

choldgraf commented Jul 26, 2021 •

edited

Loading

roxyboy commented Aug 3, 2021

choldgraf commented Aug 4, 2021

yuvipanda commented Aug 23, 2021

Hub for Univ. Grenoble satellite team #207

Hub for Univ. Grenoble satellite team #207

Comments

choldgraf commented Feb 9, 2021 • edited Loading

Hub Info

choldgraf commented Feb 10, 2021

choldgraf commented Feb 17, 2021

yuvipanda commented May 27, 2021 • edited Loading

roxyboy commented May 27, 2021

yuvipanda commented May 28, 2021

roxyboy commented May 28, 2021

yuvipanda commented May 28, 2021

roxyboy commented May 28, 2021 • edited Loading

yuvipanda commented May 28, 2021

roxyboy commented May 28, 2021

yuvipanda commented May 28, 2021

roxyboy commented May 28, 2021

yuvipanda commented May 28, 2021

roxyboy commented Jun 1, 2021 • edited Loading

yuvipanda commented Jun 1, 2021

roxyboy commented Jun 1, 2021

yuvipanda commented Jun 2, 2021

yuvipanda commented Jun 3, 2021

roxyboy commented Jun 3, 2021 • edited Loading

yuvipanda commented Jun 3, 2021

roxyboy commented Jun 3, 2021

yuvipanda commented Jun 3, 2021

rabernat commented Jun 3, 2021

roxyboy commented Jun 18, 2021

yuvipanda commented Jun 18, 2021

roxyboy commented Jun 22, 2021 • edited Loading

roxyboy commented Jun 22, 2021

roxyboy commented Jul 26, 2021

choldgraf commented Jul 26, 2021 • edited Loading

roxyboy commented Aug 3, 2021

choldgraf commented Aug 4, 2021

yuvipanda commented Aug 23, 2021

choldgraf commented Feb 9, 2021 •

edited

Loading

yuvipanda commented May 27, 2021 •

edited

Loading

roxyboy commented May 28, 2021 •

edited

Loading

roxyboy commented Jun 1, 2021 •

edited

Loading

roxyboy commented Jun 3, 2021 •

edited

Loading

roxyboy commented Jun 22, 2021 •

edited

Loading

choldgraf commented Jul 26, 2021 •

edited

Loading