Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Distributed ESMValTool #1128

Closed
bouweandela opened this issue May 14, 2021 · 8 comments
Closed

Distributed ESMValTool #1128

bouweandela opened this issue May 14, 2021 · 8 comments
Assignees
Labels
enhancement New feature or request

Comments

@bouweandela
Copy link
Member

bouweandela commented May 14, 2021

In the IS-ENES3 project, one of the goals is to allow ESMValTool to run distributed across machines attached to multiple different ESGF nodes.

An idea by @nielsdrost to accomplish this would be to split an existing recipe into multiple smaller sub-recipes (containing one or more preprocessing tasks), these parts could then be run remotely and the results combined in a final recipe running the diagnostic tasks. Running remotely could be done conveniently by using a WPS, but initially, we aim for manually running the various sub-recipes from the command line.

This issue is to collect ideas on how this could be implemented. We will need at least the following:

As part of this distributed compute task, we would also like to implement improved support for intake-esm (#31), if needed augmented with support for OpenDAP access (#1131) and download support using esgf-pyclient (#1130).

@bouweandela bouweandela added the enhancement New feature or request label May 14, 2021
@senesis
Copy link
Contributor

senesis commented May 14, 2021

We will need at least the following:

* code to split an existing recipe into smaller sub-recipes

* code to check that the ...

And code (+ data) to decide which machine should process which data ....

@bouweandela
Copy link
Member Author

To find out which machine hosts which data, we could use either esgf-pyclient or a collection of intake catalogs, e.g. those hosted at https://github.com/NCAR/intake-esm-datastore/tree/master/catalogs

@senesis
Copy link
Contributor

senesis commented May 17, 2021

To find out which machine hosts which data, we could use either esgf-pyclient or a collection of intake catalogs, e.g. those hosted at https://github.com/NCAR/intake-esm-datastore/tree/master/catalogs

Using ESGF infrastructure alone seems more robust.
ESFG compute logic would tend to favor sending compute tasks preferentially to those datanodes which host the original version of the data (rather than a duplicate at an index node), but only of course if they expose the compute function. This also allows to reach data which maybe is not duplicated. ANd the fallback solution would be to select a compute node whcih is network-ally close to the datanode

@zklaus
Copy link

zklaus commented Sep 14, 2021

This seems to vague at the moment to be included in 2.4.0. @bouweandela, are you ok with bumping this to 2.5.0?

@bouweandela
Copy link
Member Author

I think so. It is deliverable 9.3 of the IS-ENES3 project, which is due by the end of the year. Therefore it would have been nice if it would have been ready in time for v2.4, because then it would be in a released version. But I guess they'll just have to accept it as in main and included in the next release. I will try, but it seems unlikely that I'll be able to completely implement this in time.

@valeriupredoi
Copy link
Contributor

hey @bouweandela I'd like to help with this in the new year, man! 🍺

@bouweandela bouweandela modified the milestones: v2.5.0, v2.6.0 Feb 3, 2022
@bouweandela
Copy link
Member Author

This is pretty far done, except for automatic splitting of recipes. Manual splitting is supported since #1264. I will try to do something about automatic splitting for v2.6, but I'm not sure how important this feature is.

@sloosvel sloosvel removed this from the v2.6.0 milestone Jun 7, 2022
@bouweandela
Copy link
Member Author

Closing this issue as it is mostly done. Automatically splitting a recipe based on which machine hosts what data has so far not been requested by any users, so it is probably best to postpone developing such a feature until there is actual demand.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

8 participants