Band expression for MultiBaseReader #524

vincentsarago · 2022-09-07T11:14:16Z

Right now when passing expression to MultiBaseReader (STAC) methods, we assume each asset is one band e.g:

with STACReader("stac.json") as stac:
    ima = stac.tile(701, 102, 8, expression="green/red")
    print(ima.count)
    >>> 1  # it returns a 1 band image

if one asset has more than one band and if no other option is set, the result could be unexpected.

The way expression works is by using

def apply_expression(
    blocks: Sequence[str], bands: Sequence[Union[str, int]], data: numpy.ndarray,
) -> numpy.ndarray:
    return numpy.array(
        [
            numpy.nan_to_num(
                numexpr.evaluate(bloc.strip(), local_dict=dict(zip(bands, data)))
            )
            for bloc in blocks
            if bloc
        ]
    )

where we assign a name to each band within the numpy array

data = numpy.zeros((3, 256, 256))
list(zip(["b1", "b2"], data))
>>> [('b1',
  array([[0., 0., 0., ..., 0., 0., 0.],
         [0., 0., 0., ..., 0., 0., 0.],
         [0., 0., 0., ..., 0., 0., 0.],
         ...,
         [0., 0., 0., ..., 0., 0., 0.],
         [0., 0., 0., ..., 0., 0., 0.],
         [0., 0., 0., ..., 0., 0., 0.]])),
 ('b2',
  array([[0., 0., 0., ..., 0., 0., 0.],
         [0., 0., 0., ..., 0., 0., 0.],
         [0., 0., 0., ..., 0., 0., 0.],
         ...,
         [0., 0., 0., ..., 0., 0., 0.],
         [0., 0., 0., ..., 0., 0., 0.],
         [0., 0., 0., ..., 0., 0., 0.]]))]

In ☝️ we have a 3 bands data but a 2 bands list ("b1", "b2"), and we construct the dict that will be passed numexpr. We can see that no errors is raised while we have 3 bands and only 2 names.

Proposal

with #523 we now have bands names all the way during the processes, e.g

with STACReader("stac.json") as stac:
    ima = stac.tile(701, 102, 8, assets=("green", "red"))
    print(ima.band_names)
    >>> ["green_b1", "red_b1"]

It means that in theory expression could be in form of expression="green_b1/red_b1"

with STACReader("stac.json") as stac:
    ima = stac.tile(701, 102, 8, expression="green_b1/red_b1")
    print(ima.band_names)
    >>> ["green_b1/red_b1"]

internally rio-tiler will know that for expression "green_b1/red_b1" we want to read assets green and red. Then we will have an image with band_names="green_b1", "red_b1". Those names will be used in the apply_expresion directly because apply_expression is a new method of the ImageData class

rio-tiler/rio_tiler/models.py

Lines 278 to 283 in 55ae897

    
           def apply_expression(self, expression: str, bands: Optional[List[str]] = None): 
        
               """Apply expression in place.""" 
        
               blocks = get_expression_blocks(expression) 
        
               bands = bands or self.band_names 
        
               self.data = apply_expression(blocks, bands, self.data) 
        
               self.band_names = blocks

potential issue

Another question is what happens when we use asset_expression?

with STACReader("stac.json") as stac:
    ima = stac.tile(701, 102, 8, assets=("green", "red"), asset_expression={"green": "b1+2"})
    print(ima.band_names)
    >>> ["green_b1+2", "red_b1"]

☝️ we have then a band name that is "green_b1+2" so if we wanted to use expression we would have to write
expression= "green_b1+2/red_b1" but that won't work because rio-tiler will try to parse the main expression and re-apply! +2 to green_b1+2!!!
One solution would be to use _b1 as the first band of the asset even if there is expression applied.

Status Quo ?

While I feel ☝️ is better, I understand that it's a big breaking change.

The text was updated successfully, but these errors were encountered:

vincentsarago · 2022-09-07T12:09:20Z

In ☝️ we have a 3 bands data but a 2 bands list ("b1", "b2"), and we construct the dict that will be passed numexpr. We can see that no errors is raised while we have 3 bands and only 2 names.

I've added a check in d883df0 to make sure the number of bands and data shape match

geospatial-jeff · 2022-09-08T15:50:14Z

It means that in theory expression could be in form of expression="green_b1/red_b1"

I like this. We are breaking things anyways so might as well get it right now. Representing the full hierarchy of asset::band seems like the best way to support applying expressions across multi-band assets.

One solution would be to use _b1 as the first band of the asset even if there is expression applied.

I think appending _b1 is a good way to normalize applying of an expression to single and multi-band images.

Another question is what happens when we use asset_expression?

Can asset expressions be removed entirely if a user can pass assets to the expression? It seems to me the difference between expression and asset_expression is the former may be used to combine data from multiple assets while the latter applies one expression to each asset individually. Which means everything you can do with asset_expression you could accomplish with multiple calls to expression:

# one asset_expression
asset_expression={"green": "b1+2", "red": "b1+3"}

# two expressions
expression="green_b1+2"
expression="green_b1+3"

Applying multiple expressions like this would be much easier if ImageData.apply_expression did not mutate the ImageData instance but instead created a new ImageData instance with modified data. This way a user could do something like:

with STACReader("stac.json") as stac:
    ima = stac.tile(701, 102, 8)
    first_expression = ima.apply_expression("green_b1+2")
    second_expression = ima.apply_expression("green_b1+2")
    third_exression = ima.apply_expression("green_b1+2/red_b1")

This example ☝️ got me thinking that maybe including expression in the COGReader.tile function call is overloaded now that it is an instance method on the object returned by that call 🤷

vincentsarago · 2022-09-08T19:01:52Z

🙏 thanks @geospatial-jeff

I like this. We are breaking things anyways so might as well get it right now. Representing the full hierarchy of asset::band seems like the best way to support applying expressions across multi-band assets.

+1

Can asset expressions be removed entirely if a user can pass assets to the expression

I think yes! +1

Applying multiple expressions like this would be much easier if ImageData.apply_expression did not mutate the ImageData instance but instead created a new ImageData instance with modified data.

Yes, just one thing: in your example you still need a way to tell the tile method which assets to read

with STACReader("stac.json") as stac:
    ima = stac.tile(701, 102, 8, assets=("green", "red"))
    first_expression = ima.apply_expression("green_b1+2")
    second_expression = ima.apply_expression("green_b1+2")
    third_exression = ima.apply_expression("green_b1+2/red_b1")

This example ☝️ got me thinking that maybe including expression in the COGReader.tile function call is overloaded now that it is an instance method on the object returned by that call 🤷

Sadly we still need to pass the expression to tile so it parse it and know which asset to read

vincentsarago added bug enhancement breaking breaking change 4.0.0 and removed bug labels Sep 7, 2022

vincentsarago mentioned this issue Sep 7, 2022

move apply_expression into ImageData class and use b{ix} for band names #523

Merged

vincentsarago mentioned this issue Sep 9, 2022

Apply expression part2 #525

Merged

vincentsarago added the resolved-by-pr label Sep 9, 2022

vincentsarago closed this as completed Oct 19, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Band expression for MultiBaseReader #524

Band expression for MultiBaseReader #524

vincentsarago commented Sep 7, 2022

vincentsarago commented Sep 7, 2022

geospatial-jeff commented Sep 8, 2022 •

edited

Loading

vincentsarago commented Sep 8, 2022

Band expression for MultiBaseReader #524

Band expression for MultiBaseReader #524

Comments

vincentsarago commented Sep 7, 2022

Proposal

potential issue

Status Quo ?

vincentsarago commented Sep 7, 2022

geospatial-jeff commented Sep 8, 2022 • edited Loading

vincentsarago commented Sep 8, 2022

geospatial-jeff commented Sep 8, 2022 •

edited

Loading