-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Correct HRA summary statistics table #1083
Correct HRA summary statistics table #1083
Conversation
Adding this field seems right because it means we can make sure all 4 of the fields add up to 100%. RE:natcap#1080
Mostly just division by zero or missing keys. RE:natcap#1080
…080-hra-summary-statistcs-table-incorrect Conflicts: HISTORY.rst
…080-hra-summary-statistcs-table-incorrect Conflicts: HISTORY.rst
…ithub.com:phargogh/invest into bugfix/1080-hra-summary-statistcs-table-incorrect
Tests are currently failing pending #1134 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @phargogh , I had a couple questions in the test to try to clarify why the expected values are what they are. Thanks for taking a look!
HISTORY.rst
Outdated
standard error emitted from an invest model. | ||
* Added a new "Save as" dialog window to handle different save options, and | ||
allow the option to use relative paths in a JSON datastack | ||
(`#1088 <https://github.com/natcap/invest/issues/1088>`_) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Somehow these bullets got duplicated here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huh, wonder how that happened! Patched in 395a757
reclassified_count = max(stats['R_N_ANY'], 1) | ||
subregion_stats[fieldname] = reduce_func([ | ||
subregion_stats[fieldname], | ||
stats_under_feature[opname.lower()]]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This kind of confused me at first...why we're calling the reducing functions like min, max
here. Don't we already have those stats in raster_stats
? But I guess we're allowing for multiple features with the same name to be treated as the same subregion, and so would need to further accumulate those stats here. Is that right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the intent is to support multiple features having the same name being treated as the same subregion.
I'm not sure I understand what you mean by accumulating them further ... since subregion_stats_by_name
is a collections.defaultdict()
, we're initializing the values the first time a subregion is identified so this reduce_func
accumulation should be the only accumulation that's needed.
The only other sort of accumulation that should be needed would be if no subregion names are provided, in which case we default to all regions having the name "Total Region"
. In this case, the same reduce_func
accumulation applies, we're just accumulating over all the features' stats.
I think we're handling these cases as expected in the tests, too, under test_summary_stats()
, but I could absolutely be missing something.
Could you elaborate a little on the further accumulation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I understand what you mean by accumulating them further
I was referring to the zonal_stats
operation as the first accumulation/reduction -- across pixels under a feature. So I was trying to process why we would still need to be calling a reducing function after we already found the min, max, etc
from zonal_stats
. But you've answered that...it's the need to reduce (or accumulate, to re-use my poorly chosen term) across features with the same name.
tests/test_hra.py
Outdated
self.workspace_dir, 'summary_classes.tif') | ||
} | ||
pygeoprocessing.numpy_array_to_raster( | ||
numpy.array([[2, 3, 2, 3]], dtype=numpy.int8), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these values in the array directly related to the expected values of the summary stats? If so, is there a way -- either a variable or just a comment -- to make that connection?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, absolutely. In edaa069 I reworked the test so that variable names could be more clear and also to derive test values from the arrays they're generated from. I hope that'll help make the connection ... I think this should make the test easier to maintain too, but let me know what you think!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks that's very helpful!
tests/test_hra.py
Outdated
'R_%HIGH': 50.0, | ||
'R_%MEDIUM': 50.0, | ||
'R_%LOW': 0, | ||
'R_%NONE': 0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit confused by these numbers. Somehow the % from all stressors are different than the % from one stressor, even though in this test there is only one stressor, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, if we were running the model with only one habitat/stressor pair, we would expect the cumulative risk to match the pairwise risk. In this case in this test, I'm putting in different values for the cumulative risk classes to test the summary table construction only -- these values are definitely not real or reasonable model outputs!
I've added a note to clarify this (and improved variable names a bit) in 43a37ec.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotcha, thanks for adding that clarification in the test
I wasn't being clear about which classification arrays were affecting which records in the tests. This commit changes that by renaming a few array variables, clarifying which values are derived from which arrays, and adding a helper function to clearly derive values from arrays. RE:natcap#1080
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some unrelated test is still failing simply because the Actions here are not checking out the latest commit on main
, and triggering a re-run doesn't seem to make that happen. I'm confident all these will pass on the merge commit, so I'm going to merge.
Description
This PR remains a work in progress until natcap/pygeoprocessing#271 is resolved and released.
Fixes #1080
The user's guide will need an update for this change.
Checklist