-
Notifications
You must be signed in to change notification settings - Fork 99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve memory load efficiency for shape_availability calculation #243
Improve memory load efficiency for shape_availability calculation #243
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great @calvintr, still need to test it. Some comments on the code below.
Co-authored-by: Fabian Hofmann <[email protected]>
f3db671
to
283369f
Compare
Comparison with multiple rasters and geometries:
Looking very good @calvintr ! |
Running @FabianHofmann recent commit 31f9b0d (left) against the previous one c67bebd (centre) and one where we convert to float before returning the value (right): Converting to float obviously increases the memory consumption, but only once at the end of the function. For the sake of backwards compatability I believe that is something we should do.... |
What are the breaks you saw or have in mind? The follow-up functions run can work with it as well as |
I agree that returning the mask in boolean makes sense from a conceptional point of view. Also, the transformation of the final mask on the return to In my (limited) workflow I did not encounter any breaking errors with the boolean arrays in plotting or writing it to a new .tif file with GDAL. There is however one issue I ran into that might be relevant also for the examples in the documentation: When summing the boolean array for large masks and multiplying it (to calculate the eligible area for example) this creates very large integers that lead to an overflow warning [ |
I get your arguments. @calvintr Your issue is related to what my issue with changing to My issue is how this feature might usually be used in regular code: Take e.g. assigning
for a Example: We can:
We should: |
Thinking about a middle ground where the return dtype is an |
I don't see any benefits of the middle ground solution, more like only downsides:
(btw.:
Based on that I think going with option 2 might be the better one to go with. |
Alright, then let's go for pure option 2. Just for the background of the "middle ground" option: For the availability matrix computation, the boolean masks have to be transformed to |
Profiles and availabilities in |
Changes proposed in this Pull Request
Improving the efficiency of
shape_availibility()
with respect to memory load through changes in dtypes and the method of mask summation.Description
Two main changes are made to reduce the memory load while running
shape_availability()
within gis.py:np.astype()
toint32
are removed on several occasions to keep matrices returned from functions asrasterio.features.geometry_mask()
andscipy.ndimage.morphology.binary_dialation()
in dtypebool
.The dtype transformation to
float64
, applied to the final mask that is returned by the function, is removed.|
OR operator, to keep the single mask in memory as dtypebool
.Motivation and Context
With higher resolution rasters or greater land area covered, the underlying matrices, when calculating the eligible area via
shape_availability()
, quickly grow in size. E.g., a raster with 50 meter resolution bound by the shape of Germany produces an array of shape (17086, 12679). With the current method of storing an individual mask for every raster added to theExclusionContainer()
in dtypeint32
this can lead to infeasible memory usage for conventional systems. The changes enhance the efficiency of storing information, as well the method of combining information form multiple rasters.How Has This Been Tested?
So far tested locally with the latest version of atlite. Tests included rasters with buffers, inverted rasters as well as geometries.
pytest
did not bring up unexpected errors.Type of change
Checklist
pytest
inside the repository and no unexpected problems came up.doc/
.environment.yaml
file.doc/release_notes.rst
.pre-commit run --all
to lint/format/check my contribution