Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert prsn data from mm to cm #113

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open

Convert prsn data from mm to cm #113

wants to merge 3 commits into from

Conversation

eyvorchuk
Copy link
Contributor

The generate_climos.py and generate_prsn.py files have been modified to convert the prsn data from mm to cm (equivalently, kg/m2 to g/cm2). The test_generate_prsn.py file has an additional assert statement to ensure the correct units.

@eyvorchuk eyvorchuk requested a review from corviday May 11, 2020 23:59
@eyvorchuk eyvorchuk changed the title Convert prsn data from cm to mm Convert prsn data from mm to cm May 11, 2020
@eyvorchuk eyvorchuk linked an issue May 12, 2020 that may be closed by this pull request
Copy link
Contributor

@nikola-rados nikola-rados left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few styling comments, otherwise looks good @eyvorchuk!

Comment on lines 447 to 449
logger.info("Converting {} variable to units mm/day".format(flux_var))
else:
logger.info("Converting {} variable to units cm/day".format(flux_var))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The repo has recently been updated to drop support for python <3.6 so that we have access to f-strings. While it may not make a huge difference here, it can make things easier to read overall.

logger.info(f"Converting {flux_var} variable to units mm/day")

Comment on lines 167 to 178
valid_units = [
Unit('kg / m**2 / s'),
Unit('mm / s'),
Unit('mm / d'),
Unit('kg / d / m**2'),
Unit('kg / m**2 / d')
Unit('kg / m**2 / d'),
Unit('g / cm**2 / s'),
Unit('cm / s'),
Unit('cm / d'),
Unit('g / d / cm**2'),
Unit('g / cm**2 / d')
]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be helpful to order these in some manner (maybe grouping by mm, cm, g, kg) just to make it easier to follow. You may want to double check with @corviday to ensure we aren't missing any cases here.

Comment on lines 235 to 237
prsn_data = prsn_data * Q_(1.0, prsn_units_from).to(prsn_units_to).magnitude

output_dataset.variables['prsn'][chunk] = prsn_data
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you considered skipping the extra variable assignment and assigning directly?

output_dataset.variables['prsn'][chunk] = prsn_data * Q_(1.0, prsn_units_from).to(prsn_units_to).magnitude

Copy link
Contributor

@jameshiebert jameshiebert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent, way to dive into this code base and tackle the problem! I think that you have essentially added the logic necessary to solve the problem, so you clearly understood what needed to be done. Good work!

I have a couple comments on making the code a little better, and some comments for giving us confidence in the results. And the units are confusing me a bit, so maybe you can answer my question there.

You can push more commits to this branch, and it will update the PR.

@@ -166,8 +167,14 @@ def check_pr_units(pr_units):
valid_units = [
Unit('kg / m**2 / s'),
Unit('mm / s'),
Unit('mm / d'),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like you're mixing whitespace here... the existing file uses spaces and you've inserted lines with tabs. You'll should conform to what's there already and make it consistent.

Unit('cm / s'),
Unit('cm / d'),
Unit('g / d / cm**2'),
Unit('g / cm**2 / d')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With all of the valid units that you've added to this list, it's probably worth while to eliminate some redundancy by instantiate the Unit objects in a loop. E.g. define the list as strings first and then loop over them.

valid_units = ['kg / m**2 / s' ...]
valid_units = [Unit(x) for x in valid_units]

@@ -442,11 +442,14 @@ def convert_flux_var_units(input_file, climo_data):
flux_variable = input_file.variables[flux_var]
units = Unit.from_udunits_str(flux_variable.units)

if units in [Unit('kg / m**2 / s'), Unit('mm / s')]:
logger.info("Converting {} variable to units mm/day".format(flux_var))
if units in [Unit('kg / m**2 / s'), Unit('mm / s'), Unit('g / cm**2 / s'), Unit('cm / s')]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An option that would make this more readable is to define each set of units as a variable to use in your conditional. For example:

precip_units = {Unit('kg / m**2 / s'), Unit('mm / s'), Unit('g / cm**2 / s'), Unit('cm / s')}
snow_units = {Unit('g / m**2 / s'), Unit('cm / s')}
if units in precip_units:
    ...

if units in [Unit('kg / m**2 / s'), Unit('mm / s')]:
logger.info("Converting {} variable to units mm/day".format(flux_var))
if units in [Unit('kg / m**2 / s'), Unit('mm / s'), Unit('g / cm**2 / s'), Unit('cm / s')]:
if units in [Unit('kg / m**2 / s'), Unit('mm / s')]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need this conditional? It only changes one thing... the output unit in the log message. But you already have the string for the output unit string below: (units * Unit('s / day')).to_udunits_str(). Save that here, then use it both for your log message and for your attributes. Then the conditional can go away.

See "If Statements are a Code Smell".

@@ -222,6 +229,11 @@ def process_to_prsn(variables, output_dataset, max_chunk_len):
}
means = np.mean([chunk_data['tasmin'], chunk_data['tasmax']], axis=0)
prsn_data = np.where(means < freezing, chunk_data['pr'], 0)

prsn_units_from = ureg.parse_units('kg/m^2/s')
prsn_units_to = ureg.parse_units('g/cm^2/s')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you tell me more about using this as the target unit? You've changed both the mass (kg -> g) and the area (per square meter to per square centimeter). Is that definitely what we want? And is it definitely correct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Converting from kg -> g multiplies the data by 1000 and converting from /m^2 -> /cm^2 divides the data by 10000, giving the net effect of dividing the data by 10. Since converting from mm to cm also divides by 10, I think this will work. I can check with Lee when he's back.

@@ -67,7 +67,7 @@ def test_create_prsn_netcdf_from_source(tiny_dataset, fake_dataset):
assert np.shape(fake_dataset.variables['prsn']) == (11688, 4, 2)
assert fake_dataset.variables['prsn'].standard_name == 'snowfall_flux'
assert fake_dataset.variables['prsn'].long_name == 'Precipitation as Snow'

assert fake_dataset.variables['prsn'].units == 'g cm-2 s-1'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's not actually any check on the data values in this test, so it's hard to know whether it's doing the conversion correctly. Could you work out what the sum snowfall in this data should be an add that to the test?

Copy link
Contributor Author

@eyvorchuk eyvorchuk May 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the dataset used for test_process_to_prsn results in a prsn array of all zeros since these tests pass.

result = fake_dataset.variables['prsn'][:] 
result = np.where(result != 0)
for array in result:
    assert len(array) == 0

Is there possibly another dataset I could use to test the conversion?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh right, you had mentioned that. Well that's not a great testing dataset, then is it?! :)

Maybe you could throw a few strategic low temperature values into the test data set... it might throw some other tests out of whack, but you could adjust as necessary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we're only concerned about the conversion, would it be ok if I store the pr dataset in two variables, perform the conversion on one of them, and check to make sure that each value in the two datasets differs by a factor of 10? Or would you prefer I modify the data set and tests where needed?

@eyvorchuk eyvorchuk requested a review from jameshiebert May 19, 2020 23:26
@rod-glover
Copy link
Contributor

rod-glover commented Jun 19, 2020

I have a question: Why convert units? P2A, at least, converts units on the frontend; that is a presentation issue, which is the responsibility of the UI, not of the data backend. Possibly the Marmot may need this, possibly lacking units conversions? (That can be remedied, particularly given the utilities now in P2A.)

@eyvorchuk
Copy link
Contributor Author

I have a question: Why convert units? P2A, at least, converts units on the frontend; that is a presentation issue, which is the responsibility of the UI, not of the data backend. Possibly the Marmot may need this, possibly lacking units conversions? (That can be remedied, particularly given the utilities now in P2A.)

I'm not sure. @corviday wrote the original issue.

@corviday
Copy link
Contributor

I forgot I'd written that issue. At the time I wrote it, we did not yet have p2a, just PCEX, so it didn't occur to me that we might do unit conversion in the front end instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

prsn data should be in cm, not mm
5 participants