-
Notifications
You must be signed in to change notification settings - Fork 361
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Regrid: force crop to avoid going out of memory #3517
Conversation
force cropping when regrid leads to much smaller tiles, to avoid memory multiplication in serialized rdd
Thanks for a nice PR! I'll take a look a bit later! Could I ask you to sign ECA please to make Eclipse license checker happy? (Eclipse grabs your name and email from the commit, so doublecheck that you commited under the correct email (that was used to sign ECA), right now the ECA is required for (jns@****.be) email). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for a late review, but it looks like some tiny changes required to make compiler happy.
newKey -> | ||
(oldTile.crop((xSpan.start - oldXstart).toInt, | ||
(ySpan.start - oldYstart).toInt, | ||
(xSpan.end - oldXstart).toInt, | ||
(ySpan.end - oldYstart).toInt), | ||
(ySpan.end - oldYstart).toInt),Crop.Options(force=forceCrop)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops CI feels a bit unhappy about it. (2.12)
/usr/local/src/spark/src/main/scala/geotrellis/spark/regrid/Regrid.scala:126:92: ';' expected but ',' found
(ySpan.end - oldYstart).toInt),Crop.Options(force=forceCrop)),
val forceCrop = newW < 0.6*oldW || newH < 0.6*oldH | ||
newKey -> | ||
(oldTile.crop((xSpan.start - oldXstart).toInt, | ||
(ySpan.start - oldYstart).toInt, | ||
(xSpan.end - oldXstart).toInt, | ||
(ySpan.end - oldYstart).toInt), | ||
(ySpan.end - oldYstart).toInt),Crop.Options(force=forceCrop)), | ||
((xSpan.start - newXrange.start).toInt, (ySpan.start - newYrange.start).toInt) | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does not allow me to add a code change suggestion, but should be smth like:
val forceCrop = newW < 0.6 * oldW || newH < 0.6 * oldH
newKey ->
(
oldTile.crop(
(xSpan.start - oldXstart).toInt,
(ySpan.start - oldYstart).toInt,
(xSpan.end - oldXstart).toInt,
(ySpan.end - oldYstart).toInt,
Crop.Options(force = forceCrop)
),
((xSpan.start - newXrange.start).toInt, (ySpan.start - newYrange.start).toInt)
)
Crop options is an argument to the crop function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made up a #3518 to try out the CI, but please feel free to commit changes as a part of this PR as well.
U can also let me pushing into your PR to speed up the progress! (checkbox)
force cropping when regrid leads to much smaller tiles, to avoid memory multiplication in serialized rdd
Overview
Issue: Open-EO/openeo-geotrellis-extensions#191
Regrid uses cropping to create new tiles, but by default, cropping retains the original ArrayTile. So when regrid is used to go from large tile sizes (e.g. 1024 pixels) to smaller ones (e.g. 64 pixels) we get many cropped tiles all referencing these larger original tiles. When Spark then serializes these cropped tiles, the original arrays are all separately serialized, and we get a multiplication effect on the RDD size. Deserializing then can make executors go out of memory.
Checklist
Demo
Optional. Screenshots/REPL