Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rasters FeatureExtraction #3117

Merged
merged 10 commits into from
Oct 17, 2019

Conversation

pomadchin
Copy link
Member

@pomadchin pomadchin commented Oct 6, 2019

Overview

This PR adds a type class to extract PointFeatures from Rasters by a passed Geometry

Checklist

  • docs/CHANGELOG.rst updated, if necessary
  • Unit tests added for bug-fix or new feature

Demo

scala> val ext = Extent(0.0, 0.0, 3.0, 3.0)
ext: geotrellis.vector.Extent = Extent(0.0, 0.0, 3.0, 3.0)

scala>       val data = Array(
     |         1, 2, 3,
     |         4, 5, 6,
     |         7, 8, 9
     |       )
data: Array[Int] = Array(1, 2, 3, 4, 5, 6, 7, 8, 9)

scala> val raster = Raster(ArrayTile(data, 3, 3), ext)
raster: geotrellis.raster.Raster[geotrellis.raster.IntConstantNoDataArrayTile] = Raster(IntConstantNoDataArrayTile([I@5e8a55b6,3,3),Extent(0.0, 0.0, 3.0, 3.0))

scala> val features = raster.cellFeaturesAsPoint[Int](ext.toPolygon)
features: Iterator[geotrellis.vector.Feature[geotrellis.vector.Point,Int]] = non-empty iterator

scala> features.toList
res0: List[geotrellis.vector.Feature[geotrellis.vector.Point,Int]] = List(
Feature(POINT (0.5 2.5),1), 
Feature(POINT (1.5 2.5),2), 
Feature(POINT (2.5 2.5),3), 
Feature(POINT (0.5 1.5),4), 
Feature(POINT (1.5 1.5),5), 
Feature(POINT (2.5 1.5),6), 
Feature(POINT (0.5 0.5),7), 
Feature(POINT (1.5 0.5),8), 
Feature(POINT (2.5 0.5),9))

scala> val areaFeatures = raster.cellFeaturesAs
cellFeaturesAsArea   cellFeaturesAsPoint

scala> val areaFeatures = raster.cellFeaturesAsArea[Int](ext.toPolygon)
areaFeatures: Iterator[geotrellis.vector.Feature[geotrellis.vector.Polygon,Int]] = non-empty iterator

scala> areaFeatures.toList
res1: List[geotrellis.vector.Feature[geotrellis.vector.Polygon,Int]] = List(
Feature(POLYGON ((0 2, 0 3, 1 3, 1 2, 0 2)),1), 
Feature(POLYGON ((1 2, 1 3, 2 3, 2 2, 1 2)),2), 
Feature(POLYGON ((2 2, 2 3, 3 3, 3 2, 2 2)),3), 
Feature(POLYGON ((0 1, 0 2, 1 2, 1 1, 0 1)),4), 
Feature(POLYGON ((1 1, 1 2, 2 2, 2 1, 1 1)),5), 
Feature(POLYGON ((2 1, 2 2, 3 2, 3 1, 2 1)),6), 
Feature(POLYGON ((0 0, 0 1, 1 1, 1 0, 0 0)),7), 
Feature(POLYGON ((1 0, 1 1, 2 1, 2 0, 1 0)),8), 
Feature(POLYGON ((2 0, 2 1, 3 1, 3 0, 2 0)),9))

Notes

The user is able to "easily" provide new instances for CellFeatures type class for in-memory rasters by using CellFeatures.make and providing: getGrid: R => GridExtent[Int], cellValue: (R, Int, Int) => D

R is unbounded so any kind of crazy thing that should be treated as Raster will pass mustard here.
Closes #2942

@pomadchin pomadchin force-pushed the feature/feature-extractor branch 2 times, most recently from 56deb7e to aba9c53 Compare October 7, 2019 19:32
Copy link
Contributor

@CloudNiner CloudNiner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The core interface looks good to my relatively untrained eye. This gets at the core desired functionality.

Don't forget additions to the module hierarchy document and the CHANGELOG for this feature.

pomadchin and others added 9 commits October 17, 2019 01:07
Signed-off-by: Grigory Pomadchin <[email protected]>
The Rasterizer is defined on Geometry, so any type variance on the feature we're using to mask the raster is pretty much wasted
Array of features from a sizable can produce very high load on heap, likely blowing out a spark job if it is used in that context. The expectation is that this feature will be used in context of RDD map/flatMap where Iterator is very benficial or the first operation will be a filter or foldLeft on the generated features.
This is important to decide how to capture or ignore pixels on the border of the geometry.

Also geom and raster switch palces. Since operation is conceptually on raster it takes the first parameter place, that of self.
Previous type-class seemed too wide open for interpretation. This implementation narrows down the meaning for greater clarity.
@echeipesh
Copy link
Contributor

@pomadchin this PR is ready for re-review. This is the result of our conversation last night.
I tried to keep the interface open enough to be able to implement it for RasterSource where the raster would be fetched and featurized in chunks but I don't want to do that here.



/** Type class to convert a raster into features of cell geometries */
trait CellFeatures[R, D] {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

R is unconstrained because I want to be able to do this to RasterSource which doesn't have much in common with Raster[Tile]

trait CellFeatures[R, D] {

/** Describe cell grid of the source raster */
def cellGrid(raster: R): GridExtent[Long]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This very clearly is crying out to be a type class of its own. RasterLike or something like that. That could be a meaningful filter on the Methods as well. If something can't be described with a GridExtent[*] we don't want to be talking about cell features.

IMO we can deal with that later. This method can be removed without impacting subclasses that may spring up in the meantime.

/** Produce a CellFeatures instance given functions to retrieve a values and describe the cell grid.
* This function is suitable for use with rasters that fit entirely in memory.
*/
def make[R, D](getGrid: R => GridExtent[Int], cellValue: (R, Int, Int) => D): CellFeatures[R, D] =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cellValue is Function3 and is not specialized, causing boxing for col/row here. Elsewhere in the code we would make a specific trait. However since the main purpose of this interface is to turn a single double cell value into a nested class with a whole geometry ... its fine, we'll let the boxing slide here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is itself a slow operation, would like to see someday how bad is boxing here.

Copy link
Member Author

@pomadchin pomadchin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I can't approve it due to github limitations, but merging it in. 💯

/** Produce a CellFeatures instance given functions to retrieve a values and describe the cell grid.
* This function is suitable for use with rasters that fit entirely in memory.
*/
def make[R, D](getGrid: R => GridExtent[Int], cellValue: (R, Int, Int) => D): CellFeatures[R, D] =
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is itself a slow operation, would like to see someday how bad is boxing here.

Rasterizer.foreachCellByGeometry(geom, grid.toRasterExtent) { case (col, row) => mask.set(col, row, 1) }
for {
row <- Iterator.range(0, grid.rows)
col <- Iterator.range(0, grid.cols) if mask.get(col, row) == 1
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch 👍

@echeipesh echeipesh assigned pomadchin and unassigned echeipesh Oct 17, 2019
@pomadchin pomadchin merged commit 3901c7f into locationtech:master Oct 17, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add getPixelValue feature
3 participants