-
Notifications
You must be signed in to change notification settings - Fork 361
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generate Windows That Conform To GeoTiff Segments #2402
Generate Windows That Conform To GeoTiff Segments #2402
Conversation
bc6626c
to
860cf98
Compare
bb9d4d1
to
bccb415
Compare
val segments: RDD[(String, Array[GridBounds])] = | ||
sourceGeoTiffInfo.segmentsByPartitionBytes(partitionBytes, windowSize) | ||
case (_, Some(partitionBytes)) => { | ||
val maxSize = math.min(options.maxTileSize.getOrElse(1<<10), windowSize.getOrElse(1<<10)) // XXX is windowSize a length or an area? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
windowSize
is length, implicitly for square windows.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Why can't we write here just
1024
instead of1 << 10
? - Don't think that
getOrElse
is a good idea here, as it can hide a possible user error. What logic do you want to follow here? Mb we can add these attributes into thematch
function above?match
supports logicalor
too. windowSize
is length
It seems strange but the thinking went is that if you don't have an opinion about the shape of tiles you want you're probably using this RDD as input to tiler. From the perspective of optimizing for IO reading the segments as skinny tiles is fine, from the perspective of tiler chunking a skinny tile into multiple tiles is not problematic either. |
Sounds capital. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It works fine! A couple of comments and most of them are related to code style / api questions.
val segments: RDD[(String, Array[GridBounds])] = | ||
sourceGeoTiffInfo.segmentsByPartitionBytes(partitionBytes, windowSize) | ||
case (_, Some(partitionBytes)) => { | ||
val maxSize = math.min(options.maxTileSize.getOrElse(1<<10), windowSize.getOrElse(1<<10)) // XXX is windowSize a length or an area? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Why can't we write here just
1024
instead of1 << 10
? - Don't think that
getOrElse
is a good idea here, as it can hide a possible user error. What logic do you want to follow here? Mb we can add these attributes into thematch
function above?match
supports logicalor
too. windowSize
is length
val layout = sourceGeoTiffInfo.getGeoTiffInfo(s"s3://$bucket/$key").segmentLayout.tileLayout | ||
|
||
RasterReader | ||
.listWindows(cols, rows, options.maxTileSize.getOrElse(1<<10), layout.tileCols, layout.tileRows) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same is here:
Looks not safe:
options.maxTileSize.getOrElse(1<<10)
Mb to throw an exception or to handle it in a way user will know that smth is used by default (at least to add some warning). In addition, somewhere above you already used getOrElse
two times with 1024
default value, mb it makes sense to add these default values somewhere? As we can easily forget about these default values.
Finally the question, mb Options
should contain 1024
value for maxTileSize
and windowSize
by default? (everything is defined in S3GeoTiffRDD
object).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As long as changing the default maxTileSize
is not construed to constitute an API change, I prefer that.
@@ -59,15 +60,70 @@ private [geotrellis] trait GeoTiffInfoReader extends LazyLogging { | |||
} | |||
} | |||
|
|||
def windowsByBytes( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function description is missing, would be good to add.
Btw have you compared it to segmentsByPartitionBytes
? Should we remove segmentsByPartitionBytes
at all?
(options.maxTileSize, options.partitionBytes) match { | ||
case (_, Some(partitionBytes)) => { | ||
val windows: RDD[(String, Array[GridBounds])] = | ||
sourceGeoTiffInfo.windowsByBytes(partitionBytes, options.maxTileSize.getOrElse(1<<10)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same comments about maxTileSize
and windowSize
and 1 << 10
are valid for Hadoop too.
options: Options | ||
)(implicit sc: SparkContext, rr: RasterReader[Options, (I, V)]): RDD[(K, V)] = { | ||
|
||
val conf = new SerializableConfiguration(configuration(path, options)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why conf
should be wrapped here? Looks like it's only used for input formats arguments => can be used a common configuration without extra wrapper.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That variable makes its way into a sc.parallelize
inside of HadoopGeoTiffInfoReader
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, my bad.
)(implicit sc: SparkContext, rr: RasterReader[Options, (I, V)]): RDD[(K, V)] = { | ||
|
||
val conf = new SerializableConfiguration(configuration(path, options)) | ||
val path2 = path.toString |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well it's always a bit confusing when you see in the code base such variable names. In fact you can write just: // to avoid variable naming confusion
HadoopGeoTiffInfoReader(path.toString, conf, options.tiffExtensions)
val layout = info.getGeoTiffInfo(objectRequest.toString).segmentLayout.tileLayout | ||
|
||
RasterReader | ||
.listWindows(cols, rows, options.maxTileSize.getOrElse(1<<10), layout.tileCols, layout.tileRows) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maxTileSize
comment here too
These are given in the respective Options classes to avoid changing semantics.
80cdadb
to
395fe2d
Compare
All comments have been addressed, I believe. |
💯 |
windowsSize
a length or an area in this context?)Connects #2173