Add documentation to examples (NVIDIA#73)

* add Ahmed body documentation * fix spelling * update wand instructions * adding vortex shedding doc * AFNO documentation * minor edits * afno getting started * first attempt to document SFNO * graphcast documentation * minor change * address review comments * address more review comments * Update AFNO doc * Update SFNO doc * Update GraphCast doc
NickGeneva · Aug 10, 2023 · 966516f · 966516f
1 parent b3dc226
commit 966516f
Show file tree

Hide file tree

Showing 11 changed files with 572 additions and 1 deletion.
diff --git a/.gitattributes b/.gitattributes
@@ -2,3 +2,5 @@
 *.h5 filter=lfs diff=lfs merge=lfs -text
 *.pickle filter=lfs diff=lfs merge=lfs -text
 *.npy filter=lfs diff=lfs merge=lfs -text
+*.png filter=lfs diff=lfs merge=lfs -text
+*.gif filter=lfs diff=lfs merge=lfs -text
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -28,6 +28,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 
 - Added a CHANGELOG.md
 - Ahmed body recipe
+- Documentation for SFNO, GraphCast, vortex shedding, and Ahmed body
 
 ### Changed
 

diff --git a/docs/img/FourCastNet.gif b/docs/img/FourCastNet.gif
diff --git a/docs/img/ahmed_body_results.png b/docs/img/ahmed_body_results.png
diff --git a/docs/img/vortex_shedding.gif b/docs/img/vortex_shedding.gif
diff --git a/examples/cfd/ahmed_body_mgn/README.md b/examples/cfd/ahmed_body_mgn/README.md
@@ -0,0 +1,119 @@
+# AeroGraphNet for external aerodynamic evaluation
+
+This example demonstrates how to train the AeroGraphNet model for external aerodynamic
+analysis of simplified (Ahmed body-type) car geometries. AeroGraphNet is based on the
+MeshGraphNet architecture. It achieves good accuracy on predicting the pressure and
+wall shear stresses on the surface mesh of the Ahmed body-type geometries, as well as
+the drag coefficient.
+
+## Problem overview
+
+To goal is to develop an AI surrogate model that can use simulation data to learn the
+external aerodynamic flow over parameterized Ahmed body shape. This serves as a baseline
+for more refined models for realistic car geometries. The trained model can be used to
+predict the change in drag coefficient,and surface pressure and wall shear stresses due
+to changes in the car geometry. This is a stepping stone to applying similar approaches
+to other application areas such as aerodynamic analysis of aircraft wings, real car
+geometries, etc.
+
+## Dataset
+
+Industry-standard Ahmed-body geometries are characterized by six design parameters:
+length, width, height, ground clearance, slant angle, and fillet radius. Refer
+to the [wiki](https://www.cfd-online.com/Wiki/Ahmed_body) for details on Ahmed
+body geometry. In addition to these design parameters, we include the inlet velocity to
+address a wide variation in Reynolds number. We identify the design points using the
+Latin hypercube sampling scheme for space filling design of experiments and generate
+around 500 design points.
+
+The aerodynamic simulations were performed using the GPU-accelerated OpenFOAM solver
+for steady-state analysis, applying the SST K-omega turbulence model. These simulations
+consist of 7.2 million mesh points on average, but we use the surface mesh as the input
+to training which is roughly around 70k mesh nodes.
+
+To request access to the full dataset, please reach out to the
+[NVIDIA Modulus team]([email protected]).
+
+## Model overview and architecture
+
+The AeroGraphNet model is based on the MeshGraphNet architecture which is instrumental
+for learning from mesh-based data using GNNs. The inputs to the model are:
+
+- Ahmed body surface mesh
+- Reynolds number
+- Geometry parameters (optional, including length, width, height, ground clearance,
+slant angle, and fillet radius)
+- surface normals (optional)
+
+Output of the model are:
+
+- Surface pressure
+- Wall shear stresses
+- Drag coefficient
+
+![Comparison between the AeroGraphNet prediction and the
+ground truth for surface pressure, wall shear stresses, and the drag coefficient for one
+of the samples from the test dataset.](../../../docs/img/ahmed_body_results.png)
+
+The input to the model is in form of a `.vtp` file and is then converted to
+bi-directional DGL graphs in the dataloader. The final results are also written in the
+form of `.vtp` files in the inference code. A hidden dimensionality of 256 is used in
+the encoder, processor, and decoder. The encoder and decoder consist of two hidden
+layers, and the processor includes 15 message passing layers. Batch size per GPU is
+set to 1. Summation aggregation is used in the
+processor for message aggregation. A learning rate of 0.0001 is used, decaying
+exponentially with a rate of 0.99985. Training is performed on 8 NVIDIA A100
+GPUs, leveraging data parallelism. Total training time is 4 hours, and training is
+performed for 500 epochs.
+
+## Getting Started
+
+The dataset for this example is not publicly available. To get access, please reach out
+to the [NVIDIA Modulus team]([email protected]).
+
+This example requires the `pyvista` and `vtk` libraries. Install with
+
+```bash
+pip install pyvista vtk
+```
+
+To train the model, run
+
+```bash
+python train.py
+```
+
+Data parallelism is also supported with multi-GPU runs. To launch a multi-GPU training,
+run
+
+```bash
+mpirun -np <num_GPUs> python train.py
+```
+
+If running in a docker container, you may need to include the `--allow-run-as-root` in
+the multi-GPU run command.
+
+Progress and loss logs can be monitored using Weights & Biases. To activate that,
+set `wandb_mode` to `online` in the `constants.py`. This requires to have an active
+Weights & Biases account. You also need to provide your API key. There are multiple ways
+for providing the API key but you can simply export it as an environment variable
+
+```bash
+export WANDB_API_KEY=<your_api_key>
+```
+
+The URL to the dashboard will be displayed in the terminal after the run is launched.
+Alternatively, the logging utility in `train.py` can be switched to MLFlow.
+
+Once the model is trained, run
+
+```bash
+python inference.py
+```
+
+This will save the predictions for the test dataset in `.vtp` format in the `results`
+directory. Use Paraview to open and explore the results.
+
+## References
+
+- [Learning Mesh-Based Simulation with Graph Networks](https://arxiv.org/abs/2010.03409)
diff --git a/examples/cfd/ahmed_body_mgn/constants.py b/examples/cfd/ahmed_body_mgn/constants.py
@@ -42,7 +42,6 @@ class Constants(BaseModel):
 
     lr: float = 1e-4
     lr_decay_rate: float = 0.99985
-    drag_loss_weight: float = 1.0
 
     amp: bool = False
     jit: bool = False

diff --git a/examples/cfd/vortex_shedding_mgn/README.md b/examples/cfd/vortex_shedding_mgn/README.md
@@ -0,0 +1,130 @@
+# MeshGraphNet for transient vortex shedding
+
+This example is a re-implementation of the DeepMind's vortex shedding example
+<https://github.com/deepmind/deepmind-research/tree/master/meshgraphnets> in PyTorch.
+It demonstrates how to train a Graph Neural Network (GNN) for evaluation of the
+transient vortex shedding on parameterized geometries.
+
+## Problem overview
+
+Mesh-based simulations play a central role in modeling complex physical systems across
+various scientific and engineering disciplines. They offer robust numerical integration
+methods and allow for adaptable resolution to strike a balance between accuracy and
+efficiency. Machine learning surrogate models have emerged as powerful tools to reduce
+the cost of tasks like design optimization, design space exploration, and what-if
+analysis, which involve repetitive high-dimensional scientific simulations.
+
+However, some existing machine learning surrogate models, such as CNN-type models,
+are constrained by structured grids,
+making them less suitable for complex geometries or shells. The homogeneous fidelity of
+CNNs is a significant limitation for many complex physical systems that require an
+adaptive mesh representation to resolve multi-scale physics.
+
+Graph Neural Networks (GNNs) present a viable approach for surrogate modeling in science
+and engineering. They are data-driven and capable of handling complex physics. Being
+mesh-based, GNNs can handle geometry irregularities and multi-scale physics,
+making them well-suited for a wide range of applications.
+
+## Dataset
+
+We rely on DeepMind's vortex shedding dataset for this example. The dataset includes
+1000 training, 100 validation, and 100 test samples that are simulated using COMSOL
+with irregular triangle 2D meshes, each for 600 time steps with a time step size of
+0.01s. These samples vary in the size and the position of the cylinder. Each sample
+has a unique mesh due to geometry variations across samples, and the meshes have 1885
+nodes on average. Note that the model can handle different meshes with different number
+of nodes and edges as the input.
+
+## Model overview and architecture
+
+The model is free-running and auto-regressive. It takes the initial condition as the
+input and predicts the solution at the first time step. It then takes the prediction at
+the first time step to predict the solution at the next time step. The model continues
+to use the prediction at time step $t$ to predict the solution at time step $t+1$, until
+the rollout is complete. Note that the model is also able to predict beyond the
+simulation time span and extrapolate in time. However, the accuracy of the prediction
+might degrade over time and if possible, extrapolation should be avoided unless
+the underlying data patterns remain stationary and consistent.
+
+The model uses the input mesh to construct a bi-directional DGL graph for each sample.
+The node features include (6 in total):
+
+- Velocity components at time step $t$, i.e., $u_t$, $v_t$
+- One-hot encoded node type (interior node, no-slip node, inlet node, outlet node)
+
+The edge features for each sample are time-independent and include (3 in total):
+
+- Relative $x$ and $y$ distance between the two end nodes of an edge
+- L2 norm of the relative distance vector
+
+The output of the model is the velocity components at time step t+1, i.e.,
+$u_{t+1}$, $v_{t+1}$, as well as the pressure $p_{t+1}$.
+
+![Comparison between the MeshGraphNet prediction and the
+ground truth for the horizontal velocity for different test samples.
+](../../../docs/img/vortex_shedding.gif)
+
+A hidden dimensionality of 128 is used in the encoder,
+processor, and decoder. The encoder and decoder consist of two hidden layers, and
+the processor includes 15 message passing layers. Batch size per GPU is set to 1.
+Summation aggregation is used in the
+processor for message aggregation. A learning rate of 0.0001 is used, decaying
+exponentially with a rate of 0.9999991. Training is performed on 8 NVIDIA A100
+GPUs, leveraging data parallelism for 25 epochs.
+
+## Getting Started
+
+This example requires the `tensorflow` library to load the data in the `.tfrecord`
+format. Install with
+
+```bash
+pip install tensorflow
+```
+
+To download the data from DeepMind's repo, run
+
+```bash
+cd raw_dataset
+sh download_dataset.sh cylinder_flow
+```
+
+To train the model, run
+
+```bash
+python train.py
+```
+
+Data parallelism is also supported with multi-GPU runs. To launch a multi-GPU training,
+run
+
+```bash
+mpirun -np <num_GPUs> python train.py
+```
+
+If running in a docker container, you may need to include the `--allow-run-as-root` in
+the multi-GPU run command.
+
+Progress and loss logs can be monitored using Weights & Biases. To activate that,
+set `wandb_mode` to `online` in the `constants.py`. This requires to have an active
+Weights & Biases account. You also need to provide your API key. There are multiple ways
+for providing the API key but you can simply export it as an environment variable
+
+```bash
+export WANDB_API_KEY=<your_api_key>
+```
+
+The URL to the dashboard will be displayed in the terminal after the run is launched.
+Alternatively, the logging utility in `train.py` can be switched to MLFlow.
+
+Once the model is trained, run
+
+```bash
+python inference.py
+```
+
+This will save the predictions for the test dataset in `.gif` format in the `animations`
+directory.
+
+## References
+
+- [Learning Mesh-Based Simulation with Graph Networks](https://arxiv.org/abs/2010.03409)
diff --git a/examples/weather/fcn_afno/README.md b/examples/weather/fcn_afno/README.md
@@ -0,0 +1,115 @@
+# Adaptive Fourier Neural Operator (AFNO) for weather forecasting
+
+This repository contains the code used for [FourCastNet: A Global Data-driven
+High-resolution Weather Model using Adaptive Fourier Neural
+Operators](https://arxiv.org/abs/2202.11214)
+
+The code was developed by the authors of the preprint:
+Jaideep Pathak, Shashank Subramanian, Peter Harrington, Sanjeev Raja,
+Ashesh Chattopadhyay, Morteza Mardani, Thorsten Kurth, David Hall, Zongyi Li,
+Kamyar Azizzadenesheli, Pedram Hassanzadeh, Karthik Kashinath, Animashree Anandkumar
+
+## Problem overview
+
+FourCastNet, short for Fourier Forecasting Neural Network, is a global data-driven
+weather forecasting model that provides accurate short to medium-range global
+predictions at 0.25∘ resolution. FourCastNet accurately forecasts high-resolution,
+fast-timescale variables such as the surface wind speed, precipitation, and atmospheric
+water vapor. It has important implications for planning wind energy resources,
+predicting extreme weather events such as tropical cyclones, extra-tropical cyclones,
+and atmospheric rivers. FourCastNet matches the forecasting accuracy of the ECMWF
+Integrated Forecasting System (IFS), a state-of-the-art Numerical Weather Prediction
+(NWP) model, at short lead times for large-scale variables, while outperforming IFS
+for variables with complex fine-scale structure, including precipitation. FourCastNet
+generates a week-long forecast in less than 2 seconds, orders of magnitude faster than
+IFS. The speed of FourCastNet enables the creation of rapid and inexpensive
+large-ensemble forecasts with thousands of ensemble-members for improving probabilistic
+forecasting. We discuss how data-driven deep learning models such as FourCastNet are a
+valuable addition to the meteorology toolkit to aid and augment NWP models.
+
+FourCastNet is based on the [vision transformer architecture with Adaptive Fourier
+Neural Operator (AFNO) attention](https://openreview.net/pdf?id=EXHG-A3jlM)
+
+![Comparison between the FourCastNet and the ground truth (ERA5) for $u-10$ for
+different lead times.](../../../docs/img/FourCastNet.gif)
+
+## Dataset
+
+The model is trained on a 20-channel subset of the ERA5 reanalysis data on single levels and
+pressure levels that is pre-processed and stored into HDF5 files.
+The subset of the ERA5 training data that FCN was trained on is hosted at the
+National Energy Research Scientific Computing Center (NERSC). For convenience
+[it is available to all via Globus](https://app.globus.org/file-manager?origin_id=945b3c9e-0f8c-11ed-8daf-9f359c660fbd&origin_path=%2F~%2Fdata%2F).
+You will need a Globus account and will need to be logged in to your account in order
+to access the data. You may also need the [Globus Connect](https://www.globus.org/globus-connect)
+to transfer data. The full dataset that this version of FourCastNet was trained on is
+approximately 5TB in size.
+
+## Model overview and architecture
+
+Please refer to the [reference paper](https://arxiv.org/abs/2202.11214) to learn about
+the model architecture.
+
+## Getting Started
+
+To train the model, run
+
+```bash
+python train_era5.py
+```
+
+Progress can be monitored using MLFlow. Open a new terminal and navigate to the training
+directory, then run:
+
+```bash
+mlflow ui -p 2458
+```
+
+View progress in a browser at <http://127.0.0.1:2458>
+
+Data parallelism is also supported with multi-GPU runs. To launch a multi-GPU training,
+run
+
+```bash
+mpirun -np <num_GPUs> python train_era5.py
+```
+
+If running inside a docker container, you may need to include the `--allow-run-as-root`
+in the multi-GPU run command.
+
+## References
+
+If you find this work useful, cite it using:
+
+```text
+@article{pathak2022fourcastnet,
+  title={Fourcastnet: A global data-driven high-resolution weather model 
+         using adaptive fourier neural operators},
+  author={Pathak, Jaideep and Subramanian, Shashank and Harrington, Peter 
+          and Raja, Sanjeev and Chattopadhyay, Ashesh and Mardani, Morteza 
+          and Kurth, Thorsten and Hall, David and Li, Zongyi and Azizzadenesheli, Kamyar
+          and Hassanzadeh, Pedram and Kashinath, Karthik and Anandkumar, Animashree},
+  journal={arXiv preprint arXiv:2202.11214},
+  year={2022}
+}
+```
+
+ERA5 data was downloaded from the Copernicus Climate Change Service (C3S)
+Climate Data Store.
+
+```text
+Hersbach, H., Bell, B., Berrisford, P., Biavati, G., Horányi, A., Muñoz Sabater, J.,
+Nicolas, J., Peubey, C., Radu, R., Rozum, I., Schepers, D., Simmons, A., Soci, C., 
+Dee, D., Thépaut, J-N. (2018): ERA5 hourly data on pressure levels from 1959 to present.
+Copernicus Climate Change Service (C3S) Climate Data Store (CDS). 10.24381/cds.bd0915c6
+
+Hersbach, H., Bell, B., Berrisford, P., Biavati, G., Horányi, A., Muñoz Sabater, J.,
+Nicolas, J., Peubey, C., Radu, R., Rozum, I., Schepers, D., Simmons, A., Soci, C.,
+Dee, D., Thépaut, J-N. (2018): ERA5 hourly data on single levels from 1959 to present.
+Copernicus Climate Change Service (C3S) Climate Data Store (CDS). 10.24381/cds.adbb2d47
+```
+
+Other references:
+
+[Adaptive Fourier Neural Operators:
+Efficient Token Mixers for Transformers](https://openreview.net/pdf?id=EXHG-A3jlM)