-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Convert prsn data from mm to cm #113
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few styling comments, otherwise looks good @eyvorchuk!
dp/generate_climos.py
Outdated
logger.info("Converting {} variable to units mm/day".format(flux_var)) | ||
else: | ||
logger.info("Converting {} variable to units cm/day".format(flux_var)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The repo has recently been updated to drop support for python <3.6
so that we have access to f-strings. While it may not make a huge difference here, it can make things easier to read overall.
logger.info(f"Converting {flux_var} variable to units mm/day")
dp/generate_prsn.py
Outdated
valid_units = [ | ||
Unit('kg / m**2 / s'), | ||
Unit('mm / s'), | ||
Unit('mm / d'), | ||
Unit('kg / d / m**2'), | ||
Unit('kg / m**2 / d') | ||
Unit('kg / m**2 / d'), | ||
Unit('g / cm**2 / s'), | ||
Unit('cm / s'), | ||
Unit('cm / d'), | ||
Unit('g / d / cm**2'), | ||
Unit('g / cm**2 / d') | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may be helpful to order these in some manner (maybe grouping by mm
, cm
, g
, kg
) just to make it easier to follow. You may want to double check with @corviday to ensure we aren't missing any cases here.
dp/generate_prsn.py
Outdated
prsn_data = prsn_data * Q_(1.0, prsn_units_from).to(prsn_units_to).magnitude | ||
|
||
output_dataset.variables['prsn'][chunk] = prsn_data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you considered skipping the extra variable assignment and assigning directly?
output_dataset.variables['prsn'][chunk] = prsn_data * Q_(1.0, prsn_units_from).to(prsn_units_to).magnitude
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent, way to dive into this code base and tackle the problem! I think that you have essentially added the logic necessary to solve the problem, so you clearly understood what needed to be done. Good work!
I have a couple comments on making the code a little better, and some comments for giving us confidence in the results. And the units are confusing me a bit, so maybe you can answer my question there.
You can push more commits to this branch, and it will update the PR.
dp/generate_prsn.py
Outdated
@@ -166,8 +167,14 @@ def check_pr_units(pr_units): | |||
valid_units = [ | |||
Unit('kg / m**2 / s'), | |||
Unit('mm / s'), | |||
Unit('mm / d'), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like you're mixing whitespace here... the existing file uses spaces and you've inserted lines with tabs. You'll should conform to what's there already and make it consistent.
dp/generate_prsn.py
Outdated
Unit('cm / s'), | ||
Unit('cm / d'), | ||
Unit('g / d / cm**2'), | ||
Unit('g / cm**2 / d') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With all of the valid units that you've added to this list, it's probably worth while to eliminate some redundancy by instantiate the Unit
objects in a loop. E.g. define the list as strings first and then loop over them.
valid_units = ['kg / m**2 / s' ...]
valid_units = [Unit(x) for x in valid_units]
dp/generate_climos.py
Outdated
@@ -442,11 +442,14 @@ def convert_flux_var_units(input_file, climo_data): | |||
flux_variable = input_file.variables[flux_var] | |||
units = Unit.from_udunits_str(flux_variable.units) | |||
|
|||
if units in [Unit('kg / m**2 / s'), Unit('mm / s')]: | |||
logger.info("Converting {} variable to units mm/day".format(flux_var)) | |||
if units in [Unit('kg / m**2 / s'), Unit('mm / s'), Unit('g / cm**2 / s'), Unit('cm / s')]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An option that would make this more readable is to define each set of units as a variable to use in your conditional. For example:
precip_units = {Unit('kg / m**2 / s'), Unit('mm / s'), Unit('g / cm**2 / s'), Unit('cm / s')}
snow_units = {Unit('g / m**2 / s'), Unit('cm / s')}
if units in precip_units:
...
dp/generate_climos.py
Outdated
if units in [Unit('kg / m**2 / s'), Unit('mm / s')]: | ||
logger.info("Converting {} variable to units mm/day".format(flux_var)) | ||
if units in [Unit('kg / m**2 / s'), Unit('mm / s'), Unit('g / cm**2 / s'), Unit('cm / s')]: | ||
if units in [Unit('kg / m**2 / s'), Unit('mm / s')]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you need this conditional? It only changes one thing... the output unit in the log message. But you already have the string for the output unit string below: (units * Unit('s / day')).to_udunits_str()
. Save that here, then use it both for your log message and for your attributes. Then the conditional can go away.
@@ -222,6 +229,11 @@ def process_to_prsn(variables, output_dataset, max_chunk_len): | |||
} | |||
means = np.mean([chunk_data['tasmin'], chunk_data['tasmax']], axis=0) | |||
prsn_data = np.where(means < freezing, chunk_data['pr'], 0) | |||
|
|||
prsn_units_from = ureg.parse_units('kg/m^2/s') | |||
prsn_units_to = ureg.parse_units('g/cm^2/s') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you tell me more about using this as the target unit? You've changed both the mass (kg -> g) and the area (per square meter to per square centimeter). Is that definitely what we want? And is it definitely correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Converting from kg -> g multiplies the data by 1000 and converting from /m^2 -> /cm^2 divides the data by 10000, giving the net effect of dividing the data by 10. Since converting from mm to cm also divides by 10, I think this will work. I can check with Lee when he's back.
@@ -67,7 +67,7 @@ def test_create_prsn_netcdf_from_source(tiny_dataset, fake_dataset): | |||
assert np.shape(fake_dataset.variables['prsn']) == (11688, 4, 2) | |||
assert fake_dataset.variables['prsn'].standard_name == 'snowfall_flux' | |||
assert fake_dataset.variables['prsn'].long_name == 'Precipitation as Snow' | |||
|
|||
assert fake_dataset.variables['prsn'].units == 'g cm-2 s-1' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's not actually any check on the data values in this test, so it's hard to know whether it's doing the conversion correctly. Could you work out what the sum snowfall in this data should be an add that to the test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the dataset used for test_process_to_prsn
results in a prsn array of all zeros since these tests pass.
result = fake_dataset.variables['prsn'][:]
result = np.where(result != 0)
for array in result:
assert len(array) == 0
Is there possibly another dataset I could use to test the conversion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh right, you had mentioned that. Well that's not a great testing dataset, then is it?! :)
Maybe you could throw a few strategic low temperature values into the test data set... it might throw some other tests out of whack, but you could adjust as necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we're only concerned about the conversion, would it be ok if I store the pr
dataset in two variables, perform the conversion on one of them, and check to make sure that each value in the two datasets differs by a factor of 10? Or would you prefer I modify the data set and tests where needed?
I have a question: Why convert units? P2A, at least, converts units on the frontend; that is a presentation issue, which is the responsibility of the UI, not of the data backend. Possibly the Marmot may need this, possibly lacking units conversions? (That can be remedied, particularly given the utilities now in P2A.) |
I'm not sure. @corviday wrote the original issue. |
I forgot I'd written that issue. At the time I wrote it, we did not yet have p2a, just PCEX, so it didn't occur to me that we might do unit conversion in the front end instead. |
The
generate_climos.py
andgenerate_prsn.py
files have been modified to convert the prsn data frommm
tocm
(equivalently,kg/m2
tog/cm2
). Thetest_generate_prsn.py
file has an additional assert statement to ensure the correct units.