-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Weights calculation with parallel=True
still performed locally
#399
Comments
That would be incredible, if the weights computation could be lazy. That would also solve some issues where the target and/or destination grids are so large that weights generation gets out of memory. The main reason it is not implemented within ESMF. It is non-python and uses MPI for parallelization. Also, in the perspective of xESMF, the weights generation is one monolith black box, it's not easily divided in dask tasks. (Another important reason is that xESMF doesn't have a real team. We are only three maintainers and these days we don't do much more than maintain...) You can go look at This PR was part of an internship of @charlesgauthier-udm , maybe he remembers other limitations in the parallelization of ESMF that I haven't noted ? However. In my personal opinion, this might not be the best way to go? Depending on a quite heavy non-python dependency is a weight. (It's not on windows, not on pypi). Even though I haven't done much experiments there, I have the feeling that better path for regridding would be to go through pythonic geometry module ( |
I think @aulemahal gave a great overview of the limitations of ESMF parallelization. While we were figuring out parallel weight generation we did consider dask-mpi as a potential way to couple ESMF's MPI implementation with Dask's lazy arrays but it ended up being out of reach for the internship. Perhaps it could be of use for this. |
I was looking into the regridder API for some benchmarks that we are running (see here) and noticed that the weights calculation is always performed client side and also materialized before it is put into the graph.
This creates a pretty large task graph if you want to send the computation to a remote cluster somewhere and is also not super efficient, since you are restricted to your local resources.
I'd be happy to take a stab at keeping the weights calculation lazy and only execute it when needed. Just wanted to check in beforehand to see if there are reasons why this is implemented in the current way that I am not aware off.
The text was updated successfully, but these errors were encountered: