-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Distributed ESMValTool #1128
Comments
And code (+ data) to decide which machine should process which data .... |
To find out which machine hosts which data, we could use either esgf-pyclient or a collection of intake catalogs, e.g. those hosted at https://github.com/NCAR/intake-esm-datastore/tree/master/catalogs |
Using ESGF infrastructure alone seems more robust. |
This seems to vague at the moment to be included in 2.4.0. @bouweandela, are you ok with bumping this to 2.5.0? |
I think so. It is deliverable 9.3 of the IS-ENES3 project, which is due by the end of the year. Therefore it would have been nice if it would have been ready in time for v2.4, because then it would be in a released version. But I guess they'll just have to accept it as in |
hey @bouweandela I'd like to help with this in the new year, man! 🍺 |
This is pretty far done, except for automatic splitting of recipes. Manual splitting is supported since #1264. I will try to do something about automatic splitting for v2.6, but I'm not sure how important this feature is. |
Closing this issue as it is mostly done. Automatically splitting a recipe based on which machine hosts what data has so far not been requested by any users, so it is probably best to postpone developing such a feature until there is actual demand. |
In the IS-ENES3 project, one of the goals is to allow ESMValTool to run distributed across machines attached to multiple different ESGF nodes.
An idea by @nielsdrost to accomplish this would be to split an existing recipe into multiple smaller sub-recipes (containing one or more preprocessing tasks), these parts could then be run remotely and the results combined in a final recipe running the diagnostic tasks. Running remotely could be done conveniently by using a WPS, but initially, we aim for manually running the various sub-recipes from the command line.
This issue is to collect ideas on how this could be implemented. We will need at least the following:
As part of this distributed compute task, we would also like to implement improved support for
intake-esm
(#31), if needed augmented with support for OpenDAP access (#1131) and download support usingesgf-pyclient
(#1130).The text was updated successfully, but these errors were encountered: