-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add pytests for preprocessing #67
Conversation
Codecov Report
@@ Coverage Diff @@
## dscim-v0.4.0 #67 +/- ##
================================================
+ Coverage 39.94% 46.40% +6.46%
================================================
Files 17 17
Lines 1865 1866 +1
================================================
+ Hits 745 866 +121
+ Misses 1120 1000 -120 see 4 files with indirect coverage changes 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
dscim/src/dscim/preprocessing/preprocessing.py Lines 163 to 188 in cda25bc
|
@@ -301,107 +301,3 @@ def subset_USA_ssp_econ( | |||
consolidated=True, | |||
mode="w", | |||
) | |||
|
|||
|
|||
def clip_damages( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't find reference to this function anywhere except dscim-cil
which only references it in this script. If this script were updated to be operational, this function would be removed entirely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JMGilbert can you create a new issue about removing this, then do a new PR that removes this function entirely? Then we can reference the PR in dscim-cil where your link points. Thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noting here that we decided this doesn't need an Issue. It's noted in the CHANGELOG
See #71 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great, @JMGilbert thank you! I know these tests are beasts. I have a couple questions and flagged where we will probably need updates due to recent other merges.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was asked to have a look at this because testing untested legacy code can be such a headache, especially when it deals with IO and files like this. This is awesome work. Great use of the tmp_path
fixture for temporary files. I left a few inline comments with some pointers to help make these cleaner to read and understand when they fail.
xr.testing.assert_equal(ds_out_expected, ds_out_actual) | ||
|
||
|
||
def test_sum_AMEL(tmp_path): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a docstr here giving one line saying the one behavior this test is trying to check. At this stage, testing legacy code, I think it's okay if it's something “Tests the basic functionality of [insert function name here]“. But try to have each test check one behavior or have one reason to fail.
There are a couple other missing docstrs below. Ill try to call them all out.
tests/test_preprocessing.py
Outdated
("risk_aversion", 10, 15), | ||
], | ||
) | ||
def test_reduce_damages(tmp_path, recipe, eta, batchsize): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a docstr here saying what you're testing.
("risk_aversion", 10, "cheese", True), | ||
], | ||
) | ||
def test_ce_from_chunk(tmp_path, recipe, eta, reduction, zero): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add docstr briefly saying what you're testing.
tests/test_preprocessing.py
Outdated
], | ||
) | ||
def test_reduce_damages(tmp_path, recipe, eta, batchsize): | ||
if recipe == "adding_up" and eta is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The if/elses here hint this test is testing more than one behavior. I'd consider breaking it down into multiple test functions so that each one is more clearly checking for one behavior or has one reason to fail. So, in this case, maybe make one test function that checks that this error gets raised, and then another test function checking "the happy path" - testing the functions spits out the correct output, etc. It becomes harder and harder to reason why a test failed or what a test is actually testing if a test starts checking for too many different things.
There are a couple of other places that might have this issue. Happy to discuss this more if needed.
tests/test_preprocessing.py
Outdated
|
||
def test_sum_AMEL(tmp_path): | ||
""" | ||
Test that sum_AMEL outputs a Zarr file returns a file with damage function coefficients summed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JMGilbert is this doc string correct? I thought it would be summing spatial damages, not df coefs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops yeah this is wrong
tests/test_preprocessing.py
Outdated
dummy_soecioeconomics_file, consolidated=True, mode="w" | ||
) | ||
|
||
if batchsize != 15: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this block actually getting tested?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, one of the pytest fixtures makes batchsize equal to 5 which triggers this conditional.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh jeez I actually checked the fixture and saw only 15s. Obviously I misread it. Thanks!
tests/test_preprocessing.py
Outdated
) | ||
) | ||
|
||
if reduction not in ["cc", "no_cc"]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JMGilbert this is another spot where the test is testing two things. This conditional testing of the error should be another test function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For sure -- I was somewhat reluctant to separate a few of these tests because the conditional here still takes some of the synthetic inputs that I generate above this line. I was considering doing a lot of generalization (because I learned about how to make tests share inputs later on), but wasn't sure how much extra time to put in to clean these tests up given that the functionality won't actually change. I could also just copy and paste the synthetic inputs and this conditional into a new test but that feels wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(happy to do either, just wanted to make sure that you want the time put in on this)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see what you mean. I think going the easy route and copying the synthetic inputs to another test func is probably fine - what do you think @brews ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kemccusker I think spare copy-and-paste is fine for an early test for legacy code — like, don't make it too much of a habit. Doing a nice pytest fixture to help setup input data or something might be a better way of doing it, but it's not worth the effort if we do it sloppy or if we're likely to turn around and refactor the code we are testing anyways. I'd let y'all be the judge of that and if its worth cleaning up the test so you can read its intentions later. I would argue the important thing for now, and this PR, is to get some kind of test in place for this code/behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok @JMGilbert let's go w/ a new test the "easy" way for now since it should be quick, keeping in mind Brewster's advice for future refactors.
tests/test_preprocessing.py
Outdated
ce_batch_coords=ce_batch_coords, | ||
) | ||
|
||
if not zero or reduction == "no_cc": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could also consider splitting these two cases into two tests.
@kemccusker I made the tests case by case in all of the places where it makes sense. The only remaining test that has a meaningful if/else is testing the same line of code in both conditions, so I think it makes sense to keep it like that. Should be good to go now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome, looks good to merge, @JMGilbert ! Thanks for all the effort!
No description provided.