About | Tutorials | Quickstart | Installation
OceanBench is a unifying framework that provides standardized processing steps that comply with domain-expert standards. It is designed with a flexible and pedagogical abstraction to do:
- provides plug-and-play data and pre-configured pipelines for ML researchers to benchmark their models w.r.t. ML and domain-related baselines
- provides a transparent and configurable framework for researchers to customize and extend the pipeline for their tasks.
It is lightweight in terms of the core functionality. We keep the code base simple and focus more on how the user can combine each piece. We adopt a strict functional style because it is easier to maintain and combine sequential transformations.
There are five features we would like to highlight about OceanBench:
- Data availability and version control with DVC.
- An agnostic suite of geoprocessing tools for xarray datasets that were aggregated from different sources
- Hydra integration to pipe sequential transformations
- xrpatcher - A flexible multi-dimensional array generator from xarray datasets that are compatible with common deep learning (DL) frameworks
- A JupyterBook that offers library tutorials and demonstrates use-cases. In the following section, we highlight these components in more detail.
We have a fully fledged Jupyter-Book available to showcase how OceanBench can be used in practice. There are some quickstart tutorials and there are also some more detailed tutorials which highlight some of the intricacies of OceanBench. Some highlighted tutorials are listed in the next section.
We have an open data registry located at the oceanbench-data-registry GitHub repository. You can find some more meta-data about the available datasets as well as how to download them yourself.
We have our utility functions which make up the backbone of the preprocessing, postprocessing, and some plotting. This is what we piece together to use for creating recipes, pipelines and tasks. This repo is located at jejjohnson/ocn-tools. See the docs (TODO) for more information.
We have a set of tasks related to sea surface height interpolation and they come readily integrated into a digestable ML-ready format.
We use our custom xrpatcher package to pipe xarray
data structures to PyTorch datasets/dataloaders.
For ML researchers who want to see how they can get started quickly, look at our Task-to-Patcher demo available.
For more information about the datasets, see the oceanbench-data-registry.
OceanBench can be used to generate the leaderboard for our different interpolation challenges. To generate the leaderboards for different tasks with the available data we have, look at our LeaderBoard demo.
Currently, the most successful algorithm for the SSH challenges is a Bi-Level Optimization algorithm (4DVarNet). To see a reproducible end-to-end example for how a SOTA method was used in conjunction with OceanBench, see our End-to-End demo.
We use conda/mamba as our package manager. To install from the provided environment files run the following command.
git clone https://github.com/jejjohnson/oceanbench.git
cd oceanbench
mamba env create -n environments/linux.yaml
if you want to add the oceanbench conda environment as a jupyter kernel, you need to set the ESMF environment variable:
conda activate oceanbench
mamba install ipykernel -y
python -m ipykernel install --user --name=oceanbench --env ESMFMKFILE "$ESMFMKFILE"
We can directly install it via pip from the.
pip install "git+https://github.com/jejjohnson/oceanbench.git"
Note: There are some known dependency issues related to pyinterp
and xesmf
.
You may need to manually install some of the dependencies before installing oceanbench via pip.
See the pyinterp and xesmf packages for more information.
For developers who want all of the dependencies via pip, we can use poetry to install the package.
git clone https://github.com/jejjohnson/oceanbench.git
cd oceanbench
conda create -n oceanbench python=3.10 poetry
poetry install
We would like to acknowledge the Ocean-Data-Challenge Group for all of their work for providing open source data and a tutorial of metrics for SSH interpolation.