Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add pytests for preprocessing #67

Merged
merged 27 commits into from
Apr 25, 2023
Merged

Conversation

JMGilbert
Copy link
Contributor

No description provided.

@codecov-commenter
Copy link

codecov-commenter commented Jan 20, 2023

Codecov Report

Merging #67 (e0244d8) into dscim-v0.4.0 (3d1242b) will increase coverage by 6.46%.
The diff coverage is n/a.

@@               Coverage Diff                @@
##           dscim-v0.4.0      #67      +/-   ##
================================================
+ Coverage         39.94%   46.40%   +6.46%     
================================================
  Files                17       17              
  Lines              1865     1866       +1     
================================================
+ Hits                745      866     +121     
+ Misses             1120     1000     -120     

see 4 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@JMGilbert
Copy link
Contributor Author

def reformat_climate_files():
from dscim.preprocessing.climate.reformat import (
convert_old_to_newformat_AR,
stack_gases,
)
# convert AR6 files
bd = "/shares/gcp/integration/float32/dscim_input_data/climate/AR6"
pathdt = {
"median": f"{bd}/ar6_fair162_medianparams_control_pulse_2020-2080_10yrincrements_conc_rf_temp_lambdaeff_emissions-driven_2naturalfix_v4.0_Jan212022.nc",
"sims": f"{bd}/ar6_fair162_control_pulse_2020-2030-2040-2050-2060-2070-2080_emis_conc_rf_temp_lambdaeff_emissions-driven_naturalfix_v4.0_Jan212022.nc",
}
newds = convert_old_to_newformat_AR(
pathdt,
gas="CO2_Fossil",
var="temperature",
)
newds.to_netcdf(
f"{bd}/ar6_fair162_sim_and_medianparams_control_pulse_2030-2040-2050-2060-2070-2080_emis_conc_rf_temp_lambdaeff_emissions-driven_naturalfix_v4.0_Jan212022.nc"
)
# convert RFF files
gases = {"CO2_Fossil": "Feb072022", "CH4": "Feb072022", "N2O": "Feb072022"}
stack_gases(gas_dict=gases)
should either be generalized or removed. If generalized, we just need the one more test to finish tests for dscim/preprocesing/preprocessing.py.

@@ -301,107 +301,3 @@ def subset_USA_ssp_econ(
consolidated=True,
mode="w",
)


def clip_damages(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't find reference to this function anywhere except dscim-cil which only references it in this script. If this script were updated to be operational, this function would be removed entirely.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JMGilbert can you create a new issue about removing this, then do a new PR that removes this function entirely? Then we can reference the PR in dscim-cil where your link points. Thank you!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noting here that we decided this doesn't need an Issue. It's noted in the CHANGELOG

@JMGilbert JMGilbert marked this pull request as ready for review February 7, 2023 16:55
@JMGilbert JMGilbert changed the title Add pytests for USA subsets and sum_AMEL Add pytests for preprocessing Feb 10, 2023
@JMGilbert
Copy link
Contributor Author

See #71

Copy link
Member

@kemccusker kemccusker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great, @JMGilbert thank you! I know these tests are beasts. I have a couple questions and flagged where we will probably need updates due to recent other merges.

tests/test_preprocessing.py Show resolved Hide resolved
tests/test_preprocessing.py Show resolved Hide resolved
tests/test_preprocessing.py Outdated Show resolved Hide resolved
tests/test_preprocessing.py Outdated Show resolved Hide resolved
@kemccusker kemccusker requested review from brews and removed request for brews April 11, 2023 00:54
Copy link
Member

@brews brews left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was asked to have a look at this because testing untested legacy code can be such a headache, especially when it deals with IO and files like this. This is awesome work. Great use of the tmp_path fixture for temporary files. I left a few inline comments with some pointers to help make these cleaner to read and understand when they fail.

xr.testing.assert_equal(ds_out_expected, ds_out_actual)


def test_sum_AMEL(tmp_path):
Copy link
Member

@brews brews Apr 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a docstr here giving one line saying the one behavior this test is trying to check. At this stage, testing legacy code, I think it's okay if it's something “Tests the basic functionality of [insert function name here]“. But try to have each test check one behavior or have one reason to fail.

There are a couple other missing docstrs below. Ill try to call them all out.

("risk_aversion", 10, 15),
],
)
def test_reduce_damages(tmp_path, recipe, eta, batchsize):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a docstr here saying what you're testing.

("risk_aversion", 10, "cheese", True),
],
)
def test_ce_from_chunk(tmp_path, recipe, eta, reduction, zero):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add docstr briefly saying what you're testing.

],
)
def test_reduce_damages(tmp_path, recipe, eta, batchsize):
if recipe == "adding_up" and eta is not None:
Copy link
Member

@brews brews Apr 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The if/elses here hint this test is testing more than one behavior. I'd consider breaking it down into multiple test functions so that each one is more clearly checking for one behavior or has one reason to fail. So, in this case, maybe make one test function that checks that this error gets raised, and then another test function checking "the happy path" - testing the functions spits out the correct output, etc. It becomes harder and harder to reason why a test failed or what a test is actually testing if a test starts checking for too many different things.

There are a couple of other places that might have this issue. Happy to discuss this more if needed.


def test_sum_AMEL(tmp_path):
"""
Test that sum_AMEL outputs a Zarr file returns a file with damage function coefficients summed
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JMGilbert is this doc string correct? I thought it would be summing spatial damages, not df coefs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops yeah this is wrong

dummy_soecioeconomics_file, consolidated=True, mode="w"
)

if batchsize != 15:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this block actually getting tested?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, one of the pytest fixtures makes batchsize equal to 5 which triggers this conditional.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh jeez I actually checked the fixture and saw only 15s. Obviously I misread it. Thanks!

)
)

if reduction not in ["cc", "no_cc"]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JMGilbert this is another spot where the test is testing two things. This conditional testing of the error should be another test function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For sure -- I was somewhat reluctant to separate a few of these tests because the conditional here still takes some of the synthetic inputs that I generate above this line. I was considering doing a lot of generalization (because I learned about how to make tests share inputs later on), but wasn't sure how much extra time to put in to clean these tests up given that the functionality won't actually change. I could also just copy and paste the synthetic inputs and this conditional into a new test but that feels wrong.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(happy to do either, just wanted to make sure that you want the time put in on this)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see what you mean. I think going the easy route and copying the synthetic inputs to another test func is probably fine - what do you think @brews ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kemccusker I think spare copy-and-paste is fine for an early test for legacy code — like, don't make it too much of a habit. Doing a nice pytest fixture to help setup input data or something might be a better way of doing it, but it's not worth the effort if we do it sloppy or if we're likely to turn around and refactor the code we are testing anyways. I'd let y'all be the judge of that and if its worth cleaning up the test so you can read its intentions later. I would argue the important thing for now, and this PR, is to get some kind of test in place for this code/behavior.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok @JMGilbert let's go w/ a new test the "easy" way for now since it should be quick, keeping in mind Brewster's advice for future refactors.

ce_batch_coords=ce_batch_coords,
)

if not zero or reduction == "no_cc":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could also consider splitting these two cases into two tests.

@kemccusker kemccusker mentioned this pull request Apr 20, 2023
@JMGilbert
Copy link
Contributor Author

@kemccusker I made the tests case by case in all of the places where it makes sense. The only remaining test that has a meaningful if/else is testing the same line of code in both conditions, so I think it makes sense to keep it like that. Should be good to go now.

Copy link
Member

@kemccusker kemccusker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, looks good to merge, @JMGilbert ! Thanks for all the effort!

@kemccusker kemccusker merged commit 58fdca2 into dscim-v0.4.0 Apr 25, 2023
@kemccusker kemccusker deleted the preprocessing_pytests branch April 25, 2023 00:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants