Proposal: optionally return unique pixel counts under each polygon #270

phargogh · 2022-08-03T01:00:54Z

The InVEST HRA model has a couple of columns in an output table (R_%LOW, R_%MED, R_%HIGH in SUMMARY_STATISTICS.csv) that are derived from the pixel counts of each classification in a raster that has pixel values in the set {0, 1, 2, 3}. As it stands, the only way to get accurate reports of this information is to basically reimplement zonal_statistics, adding in the counting. These outputs cannot be derived from the current output of zonal_statistics.

It would be very handy if pygeoprocessing.zonal_statistics allowed for pixel counts under each polygon to be reported. The assumption is that this only makes sense on discrete (integer) rasters. Recording these values on a floating-point raster could easily exhaust available memory, which is why this is best kept as an optional parameter that defaults to False.

I propose modifying the function signature of pygeoprocessing.zonal_statistics to include a new parameter, include_pixel_counts:

def zonal_statistics(
        base_raster_path_band, aggregate_vector_path,
        aggregate_layer_name=None, ignore_nodata=True,
        polygons_might_overlap=True, include_pixel_counts=False,
        working_dir=None):

Using the HRA example, the return value for FID 60 would be: 11, 62, 4

{60: {
    'min': 1,
    'max': 3,
    'sum': 147,
    'count': 77,
    'nodata_count': 0,
    'pixel_counts': {
        1: 11,
        2: 62,
        3: 4,
    }
}

The text was updated successfully, but these errors were encountered:

phargogh · 2022-08-10T20:05:41Z

We discussed this on a software team call and approved this feature without the need for a DD and understanding that pixel_counts fieldname should be renamed to value_counts

Future: consider the real meaning of `statistics` in `zonal_statistics`

We did have some interesting conversation about the meaning of the term statistics in zonal_statistics, and a value_counts dictionary isn't really a statistic as much as it is a report. ArcGIS has a zonal_histogram that produces this same kind of output, we might want to do the same in the future.

Future: consider a `zonal_reduce` with arbitrary reduction operator

We also talked about the possibility of wanting to provide a custom function to a zonal_statistics-style function, which could be implemented later on as a zonal_reduce or something like that later on, with zonal_statistics and zonal_histogram being aliases for specific reduce callables.

RE:natcap#270

…atcap#270

…cap#270

RE:natcap#270

A `Counter` object allows for succinct tallying of landuse codes, so it really makes sense to use in this case. RE:natcap#270

I'm trying to clarify the intent of the function and the created dict ... "sample" isn't a good descriptor for what's going on here. RE:natcap#270

… RE:natcap#270

phargogh self-assigned this Aug 3, 2022

phargogh added in progress Working on it! enhancement New feature or request labels Aug 10, 2022

phargogh added a commit to phargogh/pygeoprocessing that referenced this issue Aug 10, 2022

Adding this feature and starting a test for it.

9e2820f

RE:natcap#270

phargogh added a commit to phargogh/pygeoprocessing that referenced this issue Aug 10, 2022

Updating Rich's email address to mine since his no longer works. RE:n…

3746e45

…atcap#270

phargogh added a commit to phargogh/pygeoprocessing that referenced this issue Aug 10, 2022

Minor grammar fix, object access fix and improving assertions. RE:nat…

58cc652

…cap#270

phargogh mentioned this issue Aug 10, 2022

Add value counts under polygons in zonal statistics #271

Merged

phargogh added a commit to phargogh/pygeoprocessing that referenced this issue Aug 11, 2022

Noting new feature in HISTORY. RE:natcap#270

e5cf607

phargogh added a commit to phargogh/pygeoprocessing that referenced this issue Sep 9, 2022

Correcting HISTORY - not part of 2.3.4 release.

07100c3

RE:natcap#270

phargogh added a commit to phargogh/pygeoprocessing that referenced this issue Sep 9, 2022

More explicitly handling possibility of no matches in a block.

ee83051

RE:natcap#270

phargogh mentioned this issue Sep 10, 2022

HRA summary statistics table not producing correct outputs natcap/invest#1080

Closed

phargogh added a commit to phargogh/pygeoprocessing that referenced this issue Sep 14, 2022

Reworking defaultdict initializer to use a Counter.

039563c

A `Counter` object allows for succinct tallying of landuse codes, so it really makes sense to use in this case. RE:natcap#270

phargogh added a commit to phargogh/pygeoprocessing that referenced this issue Sep 14, 2022

Minor tweak to language in logging message. RE:natcap#270

3a88812

phargogh added a commit to phargogh/pygeoprocessing that referenced this issue Sep 14, 2022

Correcting variable name. RE:natcap#270

4df95f3

phargogh added a commit to phargogh/pygeoprocessing that referenced this issue Sep 14, 2022

Minor linting for line length. RE:natcap#270

b0f5ce6

phargogh added a commit to phargogh/pygeoprocessing that referenced this issue Sep 16, 2022

Updating example stats output to be consistent. RE:natcap#270

e32064c

phargogh added a commit to phargogh/pygeoprocessing that referenced this issue Sep 16, 2022

Adding a comment about how update() is different on a Counter object.…

7dccc77

… RE:natcap#270

emlys closed this as completed in #271 Sep 28, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal: optionally return unique pixel counts under each polygon #270

Proposal: optionally return unique pixel counts under each polygon #270

phargogh commented Aug 3, 2022 •

edited

Loading

phargogh commented Aug 10, 2022

Proposal: optionally return unique pixel counts under each polygon #270

Proposal: optionally return unique pixel counts under each polygon #270

Comments

phargogh commented Aug 3, 2022 • edited Loading

phargogh commented Aug 10, 2022

Future: consider the real meaning of statistics in zonal_statistics

Future: consider a zonal_reduce with arbitrary reduction operator

phargogh commented Aug 3, 2022 •

edited

Loading

Future: consider the real meaning of `statistics` in `zonal_statistics`

Future: consider a `zonal_reduce` with arbitrary reduction operator