Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regrid: force crop to avoid going out of memory #3517

Closed
wants to merge 1 commit into from

Conversation

jdries
Copy link
Contributor

@jdries jdries commented Aug 7, 2023

force cropping when regrid leads to much smaller tiles, to avoid memory multiplication in serialized rdd

Overview

Issue: Open-EO/openeo-geotrellis-extensions#191
Regrid uses cropping to create new tiles, but by default, cropping retains the original ArrayTile. So when regrid is used to go from large tile sizes (e.g. 1024 pixels) to smaller ones (e.g. 64 pixels) we get many cropped tiles all referencing these larger original tiles. When Spark then serializes these cropped tiles, the original arrays are all separately serialized, and we get a multiplication effect on the RDD size. Deserializing then can make executors go out of memory.

Checklist

  • ./CHANGELOG.md updated, if necessary. Link to the issue if closed, otherwise the PR.
  • Unit tests added for bug-fix or new feature

Demo

Optional. Screenshots/REPL

force cropping when regrid leads to much smaller tiles, to avoid memory multiplication in serialized rdd
@jdries jdries changed the title force crop Regrid: force crop to avoid going out of memory Aug 7, 2023
@pomadchin
Copy link
Member

Thanks for a nice PR! I'll take a look a bit later!

Could I ask you to sign ECA please to make Eclipse license checker happy?

(Eclipse grabs your name and email from the commit, so doublecheck that you commited under the correct email (that was used to sign ECA), right now the ECA is required for (jns@****.be) email).

@pomadchin pomadchin added the bug label Aug 19, 2023
Copy link
Member

@pomadchin pomadchin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for a late review, but it looks like some tiny changes required to make compiler happy.

newKey ->
(oldTile.crop((xSpan.start - oldXstart).toInt,
(ySpan.start - oldYstart).toInt,
(xSpan.end - oldXstart).toInt,
(ySpan.end - oldYstart).toInt),
(ySpan.end - oldYstart).toInt),Crop.Options(force=forceCrop)),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops CI feels a bit unhappy about it. (2.12)

/usr/local/src/spark/src/main/scala/geotrellis/spark/regrid/Regrid.scala:126:92: ';' expected but ',' found         
(ySpan.end - oldYstart).toInt),Crop.Options(force=forceCrop)),

Comment on lines +121 to 128
val forceCrop = newW < 0.6*oldW || newH < 0.6*oldH
newKey ->
(oldTile.crop((xSpan.start - oldXstart).toInt,
(ySpan.start - oldYstart).toInt,
(xSpan.end - oldXstart).toInt,
(ySpan.end - oldYstart).toInt),
(ySpan.end - oldYstart).toInt),Crop.Options(force=forceCrop)),
((xSpan.start - newXrange.start).toInt, (ySpan.start - newYrange.start).toInt)
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does not allow me to add a code change suggestion, but should be smth like:

              val forceCrop = newW < 0.6 * oldW || newH < 0.6 * oldH
              newKey ->
                (
                  oldTile.crop(
                    (xSpan.start - oldXstart).toInt,
                    (ySpan.start - oldYstart).toInt,
                    (xSpan.end - oldXstart).toInt,
                    (ySpan.end - oldYstart).toInt,
                    Crop.Options(force = forceCrop)
                  ),
                  ((xSpan.start - newXrange.start).toInt, (ySpan.start - newYrange.start).toInt)
                )

Crop options is an argument to the crop function.

Copy link
Member

@pomadchin pomadchin Aug 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made up a #3518 to try out the CI, but please feel free to commit changes as a part of this PR as well.

U can also let me pushing into your PR to speed up the progress! (checkbox)

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants