Distributed ESMValTool #1128

bouweandela · 2021-05-14T13:43:39Z

In the IS-ENES3 project, one of the goals is to allow ESMValTool to run distributed across machines attached to multiple different ESGF nodes.

An idea by @nielsdrost to accomplish this would be to split an existing recipe into multiple smaller sub-recipes (containing one or more preprocessing tasks), these parts could then be run remotely and the results combined in a final recipe running the diagnostic tasks. Running remotely could be done conveniently by using a WPS, but initially, we aim for manually running the various sub-recipes from the command line.

This issue is to collect ideas on how this could be implemented. We will need at least the following:

code to split an existing recipe into smaller sub-recipes
code to check that the input of a run from a sub-recipe is identical to what is expected from the main recipe -> simplistic version implemented in Add support for re-using preprocessor output from previous runs #1321 (now checks that entire recipe is the same, may be improved to that preprocessing task is the same)
provenance of preprocessing tasks should be saved to disk in the preprocessing task directories, so it can be loaded for later use -> implemented in Add support for re-using preprocessor output from previous runs #1321
provenance of preprocessing tasks should be loaded from disk when using the data from a run that was done remotely -> implemented in Add support for re-using preprocessor output from previous runs #1321

As part of this distributed compute task, we would also like to implement improved support for intake-esm (#31), if needed augmented with support for OpenDAP access (#1131) and download support using esgf-pyclient (#1130).

The text was updated successfully, but these errors were encountered:

senesis · 2021-05-14T16:14:29Z

We will need at least the following:

* code to split an existing recipe into smaller sub-recipes

* code to check that the ...

And code (+ data) to decide which machine should process which data ....

bouweandela · 2021-05-17T09:33:13Z

To find out which machine hosts which data, we could use either esgf-pyclient or a collection of intake catalogs, e.g. those hosted at https://github.com/NCAR/intake-esm-datastore/tree/master/catalogs

senesis · 2021-05-17T12:49:00Z

To find out which machine hosts which data, we could use either esgf-pyclient or a collection of intake catalogs, e.g. those hosted at https://github.com/NCAR/intake-esm-datastore/tree/master/catalogs

Using ESGF infrastructure alone seems more robust.
ESFG compute logic would tend to favor sending compute tasks preferentially to those datanodes which host the original version of the data (rather than a duplicate at an index node), but only of course if they expose the compute function. This also allows to reach data which maybe is not duplicated. ANd the fallback solution would be to select a compute node whcih is network-ally close to the datanode

zklaus · 2021-09-14T12:26:42Z

This seems to vague at the moment to be included in 2.4.0. @bouweandela, are you ok with bumping this to 2.5.0?

bouweandela · 2021-09-14T12:37:22Z

I think so. It is deliverable 9.3 of the IS-ENES3 project, which is due by the end of the year. Therefore it would have been nice if it would have been ready in time for v2.4, because then it would be in a released version. But I guess they'll just have to accept it as in main and included in the next release. I will try, but it seems unlikely that I'll be able to completely implement this in time.

valeriupredoi · 2021-11-23T12:08:49Z

hey @bouweandela I'd like to help with this in the new year, man! 🍺

bouweandela · 2022-02-03T14:47:01Z

This is pretty far done, except for automatic splitting of recipes. Manual splitting is supported since #1264. I will try to do something about automatic splitting for v2.6, but I'm not sure how important this feature is.

bouweandela · 2024-01-15T10:19:31Z

Closing this issue as it is mostly done. Automatically splitting a recipe based on which machine hosts what data has so far not been requested by any users, so it is probably best to postpone developing such a feature until there is actual demand.

bouweandela added the enhancement New feature or request label May 14, 2021

bouweandela assigned nielsdrost, bouweandela and jvegreg May 14, 2021

bouweandela assigned remi-kazeroni May 17, 2021

bouweandela added this to the v2.4.0 milestone May 19, 2021

This was referenced Sep 13, 2021

Allow wildcard searches when specifying fx variables in preprocessor #1082

Closed

Optional flag that stops the run if any of data files not found #1282

Open

zklaus modified the milestones: v2.4.0, v2.5.0 Sep 14, 2021

This was referenced Sep 14, 2021

Convert a recipe to a synda selection file #1129

Closed

Add support for re-using preprocessor output from previous runs #1321

Merged

bouweandela mentioned this issue Nov 18, 2021

Filter tasks earlier #1264

Merged

10 tasks

bouweandela modified the milestones: v2.5.0, v2.6.0 Feb 3, 2022

sloosvel removed this from the v2.6.0 milestone Jun 7, 2022

bouweandela closed this as completed Jan 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Distributed ESMValTool #1128

Distributed ESMValTool #1128

bouweandela commented May 14, 2021 •

edited

Loading

senesis commented May 14, 2021

bouweandela commented May 17, 2021

senesis commented May 17, 2021

zklaus commented Sep 14, 2021

bouweandela commented Sep 14, 2021

valeriupredoi commented Nov 23, 2021

bouweandela commented Feb 3, 2022

bouweandela commented Jan 15, 2024

Distributed ESMValTool #1128

Distributed ESMValTool #1128

Comments

bouweandela commented May 14, 2021 • edited Loading

senesis commented May 14, 2021

bouweandela commented May 17, 2021

senesis commented May 17, 2021

zklaus commented Sep 14, 2021

bouweandela commented Sep 14, 2021

valeriupredoi commented Nov 23, 2021

bouweandela commented Feb 3, 2022

bouweandela commented Jan 15, 2024

bouweandela commented May 14, 2021 •

edited

Loading