Skip to content

Commit

Permalink
fix word
Browse files Browse the repository at this point in the history
  • Loading branch information
rogerkuou committed Aug 20, 2024
1 parent b79ddd0 commit 9792a16
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion paper/paper.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ In the DA process, observations are integrated into the physical model through t

## Statement of Need

The surrogate MO, trained as a ML model, is generally considered valid within a specific spatio-temporal range. [@zhou2008ensemble; @REICHLE20081411; @shan:2022] When dealing with a large spatio-temporal scale, multiple mapping processes may exist, prompting consideration for training separate MOs for distinct spatial and/or temporal partitions of the dataset. As the number of partitions increases, a challenge arises in distributing these training tasks effectively among the partitions.
A surrogate MO, trained as a ML model, is generally considered valid within a specific spatio-temporal range. [@zhou2008ensemble; @REICHLE20081411; @shan:2022] When dealing with a large spatio-temporal scale, multiple mapping processes may exist, prompting consideration for training separate MOs for distinct spatial and/or temporal partitions of the dataset. As the number of partitions increases, a challenge arises in distributing these training tasks effectively among the partitions.

To address this challenge, we developed a novel approach for distributed training of MOs. We present the open Python library `MOTrainer`, which to the best of our knowledge, is the first Python library catering to researchers requiring training independent MOs across extensive spatio-temporal coverage in a distributed manner. `MOTrainer` leverages Xarray's [@Hoyer_xarray_N-D_labeled_2017] support for multi-dimensional datasets to accommodate spatio-temporal features of input/output data of the training tasks. It provides user-friendly functionalities implemented with the Dask [@Rocklin2015DaskPC] library, facilitating the partitioning of large spatio-temporal data for independent model training tasks. Additionally, it streamlines the train-test data split based on customized spatio-temporal coordinates. The Jackknife method [@mccuen1998hydrologic] is implemented as an external Cross-Validation method for Deep Neural Network (DNN) training, with support for Dask parallelization. This feature enables the scaling of training tasks across various computational infrastructures.

Expand Down

0 comments on commit 9792a16

Please sign in to comment.