-
Notifications
You must be signed in to change notification settings - Fork 108
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GeoMedianMethod and MaxIndexMethod for pixel selection #270
GeoMedianMethod and MaxIndexMethod for pixel selection #270
Conversation
"""Add data to tile.""" | ||
self.tile.append(tile) | ||
|
||
class MaxIndexMethod(MosaicMethodBase): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel this method is a bit complex. I have used custom methods called CustomLastBand...
The idea is to use expression
like B1,B2,B3,{NDVI}
which will create a 4 bands array and then within the mosaic method, we will exclude the last band from the array.
class CustomLastBandHigh(MosaicMethodBase):
"""Feed the mosaic tile using the last band as decision factor."""
@property
def data(self):
"""Return data and mask."""
if self.tile is not None:
return self.tile.data[:-1], ~self.tile.mask[0] * 255
else:
return None, None
def feed(self, tile: numpy.ma.array):
"""Add data to tile."""
if self.tile is None:
self.tile = tile
return
pidex = (
numpy.bitwise_and(tile.data[-1] > self.tile.data[-1], ~tile.mask)
| self.tile.mask
)
mask = numpy.where(pidex, tile.mask, self.tile.mask)
self.tile = numpy.ma.where(pidex, tile, self.tile)
self.tile.mask = mask
class CustomLastBandLow(MosaicMethodBase):
"""Feed the mosaic tile using the last band as decision factor."""
@property
def data(self):
"""Return data and mask."""
if self.tile is not None:
return self.tile.data[:-1], ~self.tile.mask[0] * 255
else:
return None, None
def feed(self, tile: numpy.ma.array):
"""Add data to tile."""
if self.tile is None:
self.tile = tile
return
pidex = (
numpy.bitwise_and(tile.data[-1] < self.tile.data[-1], ~tile.mask)
| self.tile.mask
)
mask = numpy.where(pidex, tile.mask, self.tile.mask)
self.tile = numpy.ma.where(pidex, tile, self.tile)
self.tile.mask = mask
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It makes a lot of sense.
The reason I didn't go for a shipped selection band as part of the asset is in the case of feeding STAC assets directly to mosaic_reader
from an S3 bucket, this selection criterium band is not available, and that would require computing them upfront.
I guess this could be done in the tiler
method passed to mosaic_reader
, as in example below (pseudo-code to illustrate the idea, I haven't tested it):
from satsearch import Search
from rio_tiler_pds.sentinel.aws import S2COGReader
from rio_tiler.mosaic import mosaic_reader
from rio_tiler.mosaic.methods import defaults
from rasterio.crs import CRS
def normalized_difference(a, b):
nd = (a - b) / (a + b)
return nd
def tiler(src_path: str, *args, **kwargs) -> Tuple[np.ndarray, np.ndarray]:
with S2COGReader(src_path) as cog:
data, mask = cog.part(*args, **kwargs)
data[-1] = normalized_difference(data[-2], data[-1])
return data, mask
search = Search(bbox=bbox,
url="https://earth-search.aws.element84.com/v0",
datetime=f'{time_interval[0]}/{time_interval[1]}',
query={'eo:cloud_cover': {'lt': 90}},
collections=['sentinel-s2-l2a-cogs'])
sentinel, assets_used = mosaic_reader(assets, tiler, bbox,
bounds_crs=CRS.from_epsg(bbox.crs),
pixel_selection=defaults.CustomLastBandHigh(),
bands=('B02', 'B03', 'B04', 'B08'))
The problem is that we lose the string reference to the bands in the tiler
method, which is not really an issue as it is custom set by the user, so it can always be aligned with the bands set in mosaic_reader
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wouellette I think you could use expression
for what you are trying to do.
from satsearch import Search
from rio_tiler_pds.sentinel.aws import S2COGReader
from rio_tiler.mosaic import mosaic_reader
from rio_tiler.mosaic.methods import defaults
from rasterio.crs import CRS
def tiler(src_path: str, *args, **kwargs) -> Tuple[np.ndarray, np.ndarray]:
with S2COGReader(src_path) as cog:
return cog.part(*args, **kwargs)
search = Search(
bbox=bbox,
url="https://earth-search.aws.element84.com/v0",
datetime=f'{time_interval[0]}/{time_interval[1]}',
query={'eo:cloud_cover': {'lt': 90}},
collections=['sentinel-s2-l2a-cogs']
)
(data, mask), assets_used = mosaic_reader(
assets, tiler, bbox,
bounds_crs=CRS.from_epsg(bbox.crs),
pixel_selection=defaults.CustomLastBandHigh(),
expression="B02,B03,B04,(B04-B08/B04+B08)"
)
# Note: data would be a (3xnxn) array of type float because of the NDVI band
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I completely overlooked the fact that an expression field was available...
That's sorted then! What's the plan with this? Crystallize these methods in the library? Or was your idea to keep them separate.
# TODO: How to implement geomedian without introducing too many new dependencies? | ||
for xid in range(x): # for each x-axis | ||
for yid in range(y): # for each y-axis | ||
tile[:, yid, xid] = hd.geomedian(stacked_tile[:, :, yid, xid]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that's won't be fast 😬
I wonder if we could vectorize the calculation with numpy ufunc (cc @kylebarron)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree this won't be fast.
I think the main problem here is that hd.geomedian
only supports a two-dimensional array as input. So it can't natively broadcast over a higher dimensional array.
Also how does this work with missing data? Does it automatically mask out portions of the array that are undefined?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
About the missing data, It returns 0s
currently, which you can then convert to NaNs
a posteriori, or directly use the hd.nangeomedian
, which handles missing data directly (should have used that one for rigor's sake).
As I said, I believe the n-dimensional (at least 4-dimensional) implementation of their geomedian sits somewhere at Geoscience Australia already developed, as I can see a geomedian_pcm
(and even a nangeomedian_pcm
) called here,
That's why I'm reluctant to make a PR on hdmedians
as I'm pretty certain the work's already been done.
Paper for reference: https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=8004469
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That seems to import hdstats
, which isn't on PyPI... I can't find anything about that package
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I got confirmation from Dale Roberts that hdstats
will be made available through PyPI in the coming months so we can stall this for now.
@wouellette I'm going to close this PR for now. Feel free to re-open when things move on hdstats |
I added
GeoMedianMethod
andMaxIndexMethod
as pixel selection methods.The former is based on Dale Roberts'
hdmedians
package, a geometric median implementation in cython.For
GeoMedianMethod
, I didn't want to add cython dependencies to rio-tiler, so currently it's just a naive numpy array assignment which is rather slow, but left it like this to spark the discussion on how to best go forward on this one, as I think pixel selection using the geomedian is rather useful. I contacted the Geoscience Australia project, who seem to internally use anhdstats
package that offers an interface to n-dimensional array geomedian compositing: GeoscienceAustralia/digitalearthau#260, but it's currently not on PyPI.For the
MaxIndexMethod
, the idea is to expand the behavior of a max NDVI composite (which is itself missing) to any possible band combination.