-
Notifications
You must be signed in to change notification settings - Fork 128
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generalize derivation of variables #667
Generalize derivation of variables #667
Conversation
…on2_generalize_derive
…es' into version2_generalize_derive
Fixing tests... |
The remaining Codacy issues cannot be resolved. Is there a way to ignore some lines? |
… variable derivation)
I've added the possibility to access the |
…on2_generalize_derive
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that it a good idea to organize the variable derivation module better, but it should not add extra complication. This PR adds 500 lines of code, almost doubling the size of the module, without adding new functionality. Can you try to reduce the amount of boilerplate/duplicated code and documentation so we get back a to a number of lines of code similar to what we had before?
Please put everything in the _derive
module (i.e. make a directory _derive and put all files in there) instead of spreading derive functionality over two modules _derive
and derived_variables
.
Wouldn't it make more sense to organize the derived variables in the same way as the cmor tables, i.e. in a cmor_table/mip structure?
One file per derived variable seems very fine grained, some derivation functions are really are just one cube minus another one, that's just four lines of code.
For use in programs or Jupyter notebooks and documentation purposes, it would be good if there was a way of asking the _derive
module which variables it can derive, instead of try some short name and hope you're lucky.
self.variable = {} | ||
else: | ||
self.variable = dict(variable) | ||
if 'short_name' not in self.variable: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems very unlikely to ever happen, because the only way to obtain a DerivedVariable subclass object is by knowing the 'short_name'.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, this is case indeed (and we hit it not long ago) but then the damn netCDF file wouldn't load
at all 😁
from ._derived_variable import DerivedVariable | ||
|
||
|
||
class clhmtisccp(DerivedVariable): # noqa |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use class names that conform with PEP8.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, we need to name the class the same way as the variable - this is what is in fixes
as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can rename the base class to DerivedVariableBase
and every child class DerivedVariable
; the name of variable is already the name of the file/module. Is this an option?
I hardly added any code (I did not change the actual derive functions), so those 500 lines more are mostly docstrings.
But which variables should be combined in one file? All of them? I thought we wanted to separate them? |
bunching the variables in mip files is a good idea but it will lead to confusion when introducing new variables and lead to large scripts back again - I thought the point of this PR was to shrink scripts that one needs to scroll down forever and ever |
bunching them in mip directories - now that's be a good idea 😁 |
and there may be also cases of derived variables resulting from the combination of variables from different mips. |
I fixed all things except for the fx files, but that won't be a big problem. In addtion, I added the function
to get all derived variables. However, Codacy still complains about Regarding the variable files...should I order them in mip directorys? Right know, almost all derived variables are |
I would not do that, as I said there may be cases of "inter-mips" derived varibles. |
To cut down on code for all the variables which are only differences - would it be possible to write general derive_sum / derive_difference functions which only take the variables as input, and then uses the CMOR tables to supply the rest of the information? |
Are further changes needed for this PR to get merged? I also added a derived variable The ~800 lines I added in this PR include ~350 lines of new features (get all derived variables, include fx_files, complexer example recipe, ...) and about 450 lines of docstrings, so it's not really possible to shrink the number of lines in this PR. |
Yes, I think you need to reflect on the design a bit to make it more readable. The PR contains a lot of duplicated documentation and code.
def get_required(self, frequency):
return tuple((
var['short_name'],
var['field_prefix'] + frequency,
var.get('fx_files'),
) for var in self._input) and set the class property
Variables that use (almost) the same code for derivation, this avoids code duplication which leads to easier maintainability, testing and fewer bugs. For example:
|
"""Automatically derive variables.""" | ||
|
||
|
||
import importlib |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe you can just use the variable short name (with the first letter in uppercase of course) as the class name and import everything here with a normal import
statement? That would be much easier to read than all this importlib usage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still use importlib
to dynamically search the _derive
directory for variables, otherwise I would need to add the import statements manually which is not really convenient in my opinion.
ok guys, I have tested this PR with a recipe that involves var derivation and: it works and it produces identical results (previous derivation mechanism vs this derivation mechanism). So far so gut 😁 I do, however, have a question: do we need |
agreed!
a big NO here, herr @bouweandela - we don't want to trade clarity and variable independence for less code duplication; this thing needs to be as clear as possible so that n00bs (scientists) can put their own variable derivations. Don't trust the fact that if you are a specialist and know exactly what you doing, the other (scientific devs) will as well; plus, it's the problem of variables from completely different mips that, just because they share the same derivation mechanism, they sit in the same file - that will be confusing. I personally like the PR and find it very useful, but as Bouwe says, maybe reduce the docstringing. Aye, brewski time now 🍺 |
@bouweandela Thanks for your comments, I will address them now.
@valeriupredoi Yes, I need |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I iz happy with it, if @bouweandela has more comments - up to you guys to sort them out 😁
Done!
Done, every derived class now contains the class member ESMValTool/esmvaltool/preprocessor/_derive/lwcre.py Lines 13 to 14 in 73c1d31
Like @valeriupredoi I also think that for the sake of clarity we should put every derived variable in its own file. In my opinion it's not really useful to write a separate simple math function which is called for many variables if the variables are not connected in any way (in addition, I don't think this would reduce the amount of code since it's always one line...). However, for more complex functions shared among different (similar) derived variables I created the file ESMValTool/esmvaltool/preprocessor/_derive/_shared.py Lines 1 to 21 in 73c1d31
fx files can now be simply accessed via the ESMValTool/esmvaltool/preprocessor/_derive/nbp_grid.py Lines 32 to 33 in 73c1d31
I hope everything is fine now, I removed ~500 lines of code compared to the last commit and also tested all derived variables again and got no differences to the old version. |
The remaining Codacy issue is a known issue of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for cleaning up, it looks much more readable now.
As suggested in #643, this PR simplifies variable derivation by moving them into a designated directory with a python file for each variable.
I moved all existing derived variables into the new recipe_preprocessor_derive_test.yml recipe and compared the
preproc
files in the old and new derivation scheme, all files are identical.Closes #643 and closes #685.