Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Correct HRA summary statistics table #1083

Merged

Conversation

phargogh
Copy link
Member

Description

This PR remains a work in progress until natcap/pygeoprocessing#271 is resolved and released.

Fixes #1080

The user's guide will need an update for this change.

Checklist

  • Updated HISTORY.rst (if these changes are user-facing)
  • Updated the user's guide (if needed)
  • Tested the affected models' UIs (if relevant)

phargogh and others added 25 commits September 10, 2022 15:29
Adding this field seems right because it means we can make sure all 4 of
the fields add up to 100%. RE:natcap#1080
Mostly just division by zero or missing keys. RE:natcap#1080
…080-hra-summary-statistcs-table-incorrect

Conflicts:
	HISTORY.rst
…080-hra-summary-statistcs-table-incorrect

Conflicts:
	HISTORY.rst
…ithub.com:phargogh/invest into bugfix/1080-hra-summary-statistcs-table-incorrect
@phargogh phargogh requested a review from davemfish December 14, 2022 04:00
@phargogh phargogh marked this pull request as ready for review December 14, 2022 04:00
@phargogh
Copy link
Member Author

Tests are currently failing pending #1134

Copy link
Contributor

@davemfish davemfish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @phargogh , I had a couple questions in the test to try to clarify why the expected values are what they are. Thanks for taking a look!

HISTORY.rst Outdated
standard error emitted from an invest model.
* Added a new "Save as" dialog window to handle different save options, and
allow the option to use relative paths in a JSON datastack
(`#1088 <https://github.com/natcap/invest/issues/1088>`_)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Somehow these bullets got duplicated here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh, wonder how that happened! Patched in 395a757

reclassified_count = max(stats['R_N_ANY'], 1)
subregion_stats[fieldname] = reduce_func([
subregion_stats[fieldname],
stats_under_feature[opname.lower()]])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This kind of confused me at first...why we're calling the reducing functions like min, max here. Don't we already have those stats in raster_stats? But I guess we're allowing for multiple features with the same name to be treated as the same subregion, and so would need to further accumulate those stats here. Is that right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the intent is to support multiple features having the same name being treated as the same subregion.

I'm not sure I understand what you mean by accumulating them further ... since subregion_stats_by_name is a collections.defaultdict(), we're initializing the values the first time a subregion is identified so this reduce_func accumulation should be the only accumulation that's needed.

The only other sort of accumulation that should be needed would be if no subregion names are provided, in which case we default to all regions having the name "Total Region". In this case, the same reduce_func accumulation applies, we're just accumulating over all the features' stats.

I think we're handling these cases as expected in the tests, too, under test_summary_stats(), but I could absolutely be missing something.

Could you elaborate a little on the further accumulation?

Copy link
Contributor

@davemfish davemfish Dec 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand what you mean by accumulating them further

I was referring to the zonal_stats operation as the first accumulation/reduction -- across pixels under a feature. So I was trying to process why we would still need to be calling a reducing function after we already found the min, max, etc from zonal_stats. But you've answered that...it's the need to reduce (or accumulate, to re-use my poorly chosen term) across features with the same name.

self.workspace_dir, 'summary_classes.tif')
}
pygeoprocessing.numpy_array_to_raster(
numpy.array([[2, 3, 2, 3]], dtype=numpy.int8),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these values in the array directly related to the expected values of the summary stats? If so, is there a way -- either a variable or just a comment -- to make that connection?

Copy link
Member Author

@phargogh phargogh Dec 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, absolutely. In edaa069 I reworked the test so that variable names could be more clear and also to derive test values from the arrays they're generated from. I hope that'll help make the connection ... I think this should make the test easier to maintain too, but let me know what you think!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks that's very helpful!

'R_%HIGH': 50.0,
'R_%MEDIUM': 50.0,
'R_%LOW': 0,
'R_%NONE': 0,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused by these numbers. Somehow the % from all stressors are different than the % from one stressor, even though in this test there is only one stressor, right?

Copy link
Member Author

@phargogh phargogh Dec 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, if we were running the model with only one habitat/stressor pair, we would expect the cumulative risk to match the pairwise risk. In this case in this test, I'm putting in different values for the cumulative risk classes to test the summary table construction only -- these values are definitely not real or reasonable model outputs!

I've added a note to clarify this (and improved variable names a bit) in 43a37ec.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha, thanks for adding that clarification in the test

I wasn't being clear about which classification arrays were affecting
which records in the tests.  This commit changes that by renaming a few
array variables, clarifying which values are derived from which arrays,
and adding a helper function to clearly derive values from arrays.

RE:natcap#1080
@phargogh phargogh requested a review from davemfish December 14, 2022 23:08
Copy link
Contributor

@davemfish davemfish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some unrelated test is still failing simply because the Actions here are not checking out the latest commit on main, and triggering a re-run doesn't seem to make that happen. I'm confident all these will pass on the merge commit, so I'm going to merge.

@davemfish davemfish merged commit 99dc889 into natcap:main Dec 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

HRA summary statistics table not producing correct outputs
2 participants