From 9792a16311501d1d5fc85d5c769a7cd8b14678f4 Mon Sep 17 00:00:00 2001 From: Ou Ku Date: Tue, 20 Aug 2024 13:48:24 +0200 Subject: [PATCH] fix word --- paper/paper.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/paper/paper.md b/paper/paper.md index 6e170fb..cf82ea9 100644 --- a/paper/paper.md +++ b/paper/paper.md @@ -46,7 +46,7 @@ In the DA process, observations are integrated into the physical model through t ## Statement of Need -The surrogate MO, trained as a ML model, is generally considered valid within a specific spatio-temporal range. [@zhou2008ensemble; @REICHLE20081411; @shan:2022] When dealing with a large spatio-temporal scale, multiple mapping processes may exist, prompting consideration for training separate MOs for distinct spatial and/or temporal partitions of the dataset. As the number of partitions increases, a challenge arises in distributing these training tasks effectively among the partitions. +A surrogate MO, trained as a ML model, is generally considered valid within a specific spatio-temporal range. [@zhou2008ensemble; @REICHLE20081411; @shan:2022] When dealing with a large spatio-temporal scale, multiple mapping processes may exist, prompting consideration for training separate MOs for distinct spatial and/or temporal partitions of the dataset. As the number of partitions increases, a challenge arises in distributing these training tasks effectively among the partitions. To address this challenge, we developed a novel approach for distributed training of MOs. We present the open Python library `MOTrainer`, which to the best of our knowledge, is the first Python library catering to researchers requiring training independent MOs across extensive spatio-temporal coverage in a distributed manner. `MOTrainer` leverages Xarray's [@Hoyer_xarray_N-D_labeled_2017] support for multi-dimensional datasets to accommodate spatio-temporal features of input/output data of the training tasks. It provides user-friendly functionalities implemented with the Dask [@Rocklin2015DaskPC] library, facilitating the partitioning of large spatio-temporal data for independent model training tasks. Additionally, it streamlines the train-test data split based on customized spatio-temporal coordinates. The Jackknife method [@mccuen1998hydrologic] is implemented as an external Cross-Validation method for Deep Neural Network (DNN) training, with support for Dask parallelization. This feature enables the scaling of training tasks across various computational infrastructures.