This repo contains (codified) instructions for deploying the eWaterCycle platform. The target audience of these instructions are system administrators. For more information on the eWaterCycle platform (and how to deploy it) see the eWaterCycle documentation.
For instructions on how to use the machine as deployed by this repo see the User guide.
These instructions assume you have some basic knowledge of vagrant and Ansible.
The hardware environment used by the eWaterCycle platform development team is the SURF Research Cloud. Starting a machine on the Surf Research Cloud requires that you have research budget with SURF, for more info see the website of SURF. Once running, access to the machine can be shared to anyone.
The setup instructions in this repo will create an eWaterCycle application(a sort-of VM template) that when started will create a machine with:
- Explorer: web visualization of available models / parameter sets combinations and a way to generate Jupyter notebooks
- Jupyter Hub: to interactivly generate forcings and perform experiments on hydrological models using the eWatercycle Python package
- ERA5 and ERA-Interim global climate data, which can be used to generate forcings
- Installed models and their example parameter sets
An application on the SURF Research cloud is provisioned by running an Ansible playbook (research-cloud-plugin.yml).
In addition to the standard VM storage, additional read-only datasets are mounted at /mnt/data
from dCache using rclone. They may contain things like:
- climate data, see https://ewatercycle.readthedocs.io/en/latest/system_setup.html#download-climate-data
- observation
- parameter-sets
- singularity-images of hydrological models wrapped in grpc4bmi servers
Previously the eWatercycle platform consisted of multiple VM on SURF HPC cloud, see v0.1.2 release for that code.
Deploying a local test VM is mostly useful for developing the SURF Research Cloud applications. This vagrant setup creates a virtual machine with 8Gb memory, 4 virtual cores, and 70Gb storage. This should work on any Linux or Windows machine.
To set up an Explorer/Jupyter server on your local machine with vagrant and Ansible
Create config file research-cloud-plugin.vagrant.vars
with
---
dcache_ro_token: <dcache macaroon with read permission>
rclone_cache_dir: /data/volume_2
# Directory where /home should point to
alt_home_location: /data/volume_3
The token can be found in the eWaterCycle password manager.
vagrant --version
# Vagrant 2.4.1
vagrant plugin install vagrant-vbguest
# Installed the plugin 'vagrant-vbguest (0.32.0)'
vagrant up
Visit site
# Get ip of server with
vagrant ssh -c 'ifconfig eth1'
Go to http://<ip of eth1>
and login with vagrant:vagrant
.
You will get some complaints about unsecure serving, this is OK for local testing and this will not happen on Research Cloud.
WSL2 users should follow steps on https://www.vagrantup.com/docs/other/wsl.
Importantly:
- Work on a folder on the windows file system.
- Export VAGRANT_WSL_WINDOWS_ACCESS_USER_HOME_PATH="/mnt/c/.../infra"
export PATH="$PATH:C:\Program Files\Oracle\VirtualBox"
vagrant up --provider virtualbox
- Approve the firewall popup
This chapter is dedicated for catalog item developers.
On the Research cloud the developer can add an catalog item for other people to use. The generic steps to do this are documented here.
For eWatercycle component following specialization was done
- Use Ansible playbook as component script type
- Use
https://github.com/eWaterCycle/infra.git
as repository URL - Use
research-cloud-plugin.yml
as script path - Use
main
as tag
- Use
- Component parameters, all fixed source type and non-overwitable unless otherwise stated
- Add
dcache_ro_token
parameter for dcache read-only token aka macaroon. The token can be found in the eWaterCycle password manager. This token has an expiration date, so it needs to be updated every now and then. - Add
alt_home_location
parameter with value/data/volume_2
. For mount point of the storage item which should hold homes mounted. - Add
rclone_cache_dir
parameter with value/data/volume_3
. For directory where rclone can store its cache. - Add
rclone_max_gsize
with value45
. For maximum size of cache onrclone_cache_dir
volume. In Gb.
- Add
- Set documentation URL to
https://github.com/eWaterCycle/infra
- Do not allow every org to use this component. Data on the dcache should not be made public.
- Select the organizations (CO) that are allowed to use the component.
For eWatercycle catalog item following specialization was done
- Select the following components:
- SRC-OS
- SRC-CO
- SRC-Nginx
- SRC-External plugin
- eWatercycle
- Set documentation URL to
https://github.com/eWaterCycle/infra
- Add
SURF HPC Cloud
as cloud provider- Set Operating Systems to Ubuntu 22.04
- Set Sizes to all non-gpu and non-disabled sizes
- In parameter settings step keep all values as is except
- Set
co_irods
tofalse
as we do not use irods - Set
co_research_drive
tofalse
as we do not use research drive
- Set
- Set boot disk size to 150Gb, as default size will be mostly used by the conda environment and will trigger out of space warnings.
- Set workspace acces button behavior to
Webinterface (https:)
, so clicking onACCESS
button will open up the eWatercycle experiment explorer web interface - Select the organizations (CO) that are allowed to use the catalog item.
To become root on a VM the user needs to be member of the src_co_admin
group on SRAM.
See docs.
This chapter is dedicated for application deployers.
- Log into Research Cloud
- Create new storage item for home directories
- To store user files
- Use 50Gb size for simple experiments or bigger when required for experiment.
- As each storage item can only be used by a single workspace, give it a name and description so you know which workspace and storage items go together.
- Create new storage item for cache
- To store cached files from dCache by rclone
- Use 50GB size as size
- As each storage item can only be used by a single workspace, give it a name and description so you know which workspace and storage items go together.
- Create a new workspace
- Select eWaterCycle application
- Select collaborative organisation (CO) for example
ewatercycle-nlesc
- Select size of VM (cpus/memory) based on use case
- Select home storage item.
- Order in which the storage items are select is important, make sure to select home before cache storage item.
- Select cache storage item
- Wait for machine to be running
- Visit URL/IP
- When done delete machine
For a new CO make sure
- application is allowed to be used by CO. See Sharing catalog items
- data storage item and home dir are created for the CO
End user should be invited to CO so they can login.
See User guide to see what users have to do to login or use GitHub repository.
To get example notebooks end users should use following URL (with <workspace id>
with your currently running workspace)
https://<workspace id
>.workspaces.live.surfresearchcloud.nl/jupyter/hub/user-redirect/git-pull?repo=https%3A%2F%2Fgithub.com%2FeWaterCycle%2Fewatercycle&urlpath=lab%2Ftree%2Fewatercycle%2Fdocs%2Fexamples%2FMarrmotM01.ipynb&branch=main</workspace
>
TODO add this link to home page of server at
This link uses nbgitpuller to sync a git repo and open a notebook in it.
This chapter is dedicated for application data preparer.
The eWatercycle system setup requires a lot of data files. For the Research cloud virtual machines we will mount a dcache bucket.
To fill the dcache bucket you can run
ansible-playbook \
-e cds_uid=1234 -e cds_api_key <cds api key> \
-e dcache_rw_token=<dcache macaroon with read/write permissions>
shared-data-disk.yml
Runnig this script will download all data files to /mnt/data and upload them to dcache.
The steps above fetch the data from original sources. If you want to sync some files from another location, say, Snellius, you can use rclone directly. In our experience, it works better to sync entire directories than to try and copy single files.
Create the file ~/.config/rclone/rclone.conf
and add the following content:
[ dcache ]
type = webdav
url = https://webdav.grid.surfsara.nl:2880
vendor = other
user =
pass =
bearer_token = <dcache macaroon with read/write permissions>
You can verify your access by running an innocent rclone ls dcache:parameter-sets
.
The command to sync directories is rclone copy somedir dcache:parameter-sets/somedir
.
Beware that this will overwrite any existing files, if different!
Note: password manager can be used for exchanging macaroons.
Create the file ~/.config/rclone/rclone.conf
and add the following content:
[dcache]
type = webdav
url = https://webdav.grid.surfsara.nl:2880
vendor = other
user =
pass =
bearer_token = <dcache macaroon with read permissions>
Install rclone and run following command to mount dcache at ~/dcache
directory.
mkdir ~/dcache
rclone mount --read-only --cache-dir /tmp/rclone-cache --vfs-cache-max-size 30G --vfs-cache-mode full dcache:/ ~/dcache
In ESMValTool config files you can use ~/dcache/climate-data/obs6
for rootpath:OBS6
.
In the eWaterCycle project we make Docker images. The images are hosted on Docker Hub . A project member can create issues here for permisison to push images to Docker Hub.
All services are running with systemd. Their logs can be viewed with journalctl
.
The log of the Jupyter server for each user can be followed with
journalctl -f -u jupyter-vagrant-singleuser.service
(replace vagrant
with own username)