Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Band expression for MultiBaseReader #524

Closed
vincentsarago opened this issue Sep 7, 2022 · 3 comments
Closed

Band expression for MultiBaseReader #524

vincentsarago opened this issue Sep 7, 2022 · 3 comments

Comments

@vincentsarago
Copy link
Member

ref #523 (comment)

Right now when passing expression to MultiBaseReader (STAC) methods, we assume each asset is one band e.g:

with STACReader("stac.json") as stac:
    ima = stac.tile(701, 102, 8, expression="green/red")
    print(ima.count)
    >>> 1  # it returns a 1 band image

if one asset has more than one band and if no other option is set, the result could be unexpected.

The way expression works is by using

def apply_expression(
    blocks: Sequence[str], bands: Sequence[Union[str, int]], data: numpy.ndarray,
) -> numpy.ndarray:
    return numpy.array(
        [
            numpy.nan_to_num(
                numexpr.evaluate(bloc.strip(), local_dict=dict(zip(bands, data)))
            )
            for bloc in blocks
            if bloc
        ]
    )

where we assign a name to each band within the numpy array

data = numpy.zeros((3, 256, 256))
list(zip(["b1", "b2"], data))
>>> [('b1',
  array([[0., 0., 0., ..., 0., 0., 0.],
         [0., 0., 0., ..., 0., 0., 0.],
         [0., 0., 0., ..., 0., 0., 0.],
         ...,
         [0., 0., 0., ..., 0., 0., 0.],
         [0., 0., 0., ..., 0., 0., 0.],
         [0., 0., 0., ..., 0., 0., 0.]])),
 ('b2',
  array([[0., 0., 0., ..., 0., 0., 0.],
         [0., 0., 0., ..., 0., 0., 0.],
         [0., 0., 0., ..., 0., 0., 0.],
         ...,
         [0., 0., 0., ..., 0., 0., 0.],
         [0., 0., 0., ..., 0., 0., 0.],
         [0., 0., 0., ..., 0., 0., 0.]]))]

In ☝️ we have a 3 bands data but a 2 bands list ("b1", "b2"), and we construct the dict that will be passed numexpr. We can see that no errors is raised while we have 3 bands and only 2 names.

Proposal

with #523 we now have bands names all the way during the processes, e.g

with STACReader("stac.json") as stac:
    ima = stac.tile(701, 102, 8, assets=("green", "red"))
    print(ima.band_names)
    >>> ["green_b1", "red_b1"]

It means that in theory expression could be in form of expression="green_b1/red_b1"

with STACReader("stac.json") as stac:
    ima = stac.tile(701, 102, 8, expression="green_b1/red_b1")
    print(ima.band_names)
    >>> ["green_b1/red_b1"]

internally rio-tiler will know that for expression "green_b1/red_b1" we want to read assets green and red. Then we will have an image with band_names="green_b1", "red_b1". Those names will be used in the apply_expresion directly because apply_expression is a new method of the ImageData class

def apply_expression(self, expression: str, bands: Optional[List[str]] = None):
"""Apply expression in place."""
blocks = get_expression_blocks(expression)
bands = bands or self.band_names
self.data = apply_expression(blocks, bands, self.data)
self.band_names = blocks

potential issue

Another question is what happens when we use asset_expression?

with STACReader("stac.json") as stac:
    ima = stac.tile(701, 102, 8, assets=("green", "red"), asset_expression={"green": "b1+2"})
    print(ima.band_names)
    >>> ["green_b1+2", "red_b1"]

☝️ we have then a band name that is "green_b1+2" so if we wanted to use expression we would have to write
expression= "green_b1+2/red_b1" but that won't work because rio-tiler will try to parse the main expression and re-apply! +2 to green_b1+2!!!
One solution would be to use _b1 as the first band of the asset even if there is expression applied.

Status Quo ?

While I feel ☝️ is better, I understand that it's a big breaking change.

@vincentsarago
Copy link
Member Author

In ☝️ we have a 3 bands data but a 2 bands list ("b1", "b2"), and we construct the dict that will be passed numexpr. We can see that no errors is raised while we have 3 bands and only 2 names.

I've added a check in d883df0 to make sure the number of bands and data shape match

@geospatial-jeff
Copy link
Member

geospatial-jeff commented Sep 8, 2022

It means that in theory expression could be in form of expression="green_b1/red_b1"

I like this. We are breaking things anyways so might as well get it right now. Representing the full hierarchy of asset::band seems like the best way to support applying expressions across multi-band assets.

One solution would be to use _b1 as the first band of the asset even if there is expression applied.

I think appending _b1 is a good way to normalize applying of an expression to single and multi-band images.

Another question is what happens when we use asset_expression?

Can asset expressions be removed entirely if a user can pass assets to the expression? It seems to me the difference between expression and asset_expression is the former may be used to combine data from multiple assets while the latter applies one expression to each asset individually. Which means everything you can do with asset_expression you could accomplish with multiple calls to expression:

# one asset_expression
asset_expression={"green": "b1+2", "red": "b1+3"}

# two expressions
expression="green_b1+2"
expression="green_b1+3"

Applying multiple expressions like this would be much easier if ImageData.apply_expression did not mutate the ImageData instance but instead created a new ImageData instance with modified data. This way a user could do something like:

with STACReader("stac.json") as stac:
    ima = stac.tile(701, 102, 8)
    first_expression = ima.apply_expression("green_b1+2")
    second_expression = ima.apply_expression("green_b1+2")
    third_exression = ima.apply_expression("green_b1+2/red_b1")

This example ☝️ got me thinking that maybe including expression in the COGReader.tile function call is overloaded now that it is an instance method on the object returned by that call 🤷

@vincentsarago
Copy link
Member Author

🙏 thanks @geospatial-jeff

I like this. We are breaking things anyways so might as well get it right now. Representing the full hierarchy of asset::band seems like the best way to support applying expressions across multi-band assets.

+1

Can asset expressions be removed entirely if a user can pass assets to the expression

I think yes! +1

Applying multiple expressions like this would be much easier if ImageData.apply_expression did not mutate the ImageData instance but instead created a new ImageData instance with modified data.

Yes, just one thing: in your example you still need a way to tell the tile method which assets to read

with STACReader("stac.json") as stac:
    ima = stac.tile(701, 102, 8, assets=("green", "red"))
    first_expression = ima.apply_expression("green_b1+2")
    second_expression = ima.apply_expression("green_b1+2")
    third_exression = ima.apply_expression("green_b1+2/red_b1")

This example ☝️ got me thinking that maybe including expression in the COGReader.tile function call is overloaded now that it is an instance method on the object returned by that call 🤷

Sadly we still need to pass the expression to tile so it parse it and know which asset to read

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants