Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Core functionality for fx variables handling in volume and area statistics in light of PR 439 (PR 439 child 2) #511

Closed
wants to merge 39 commits into from

Conversation

valeriupredoi
Copy link
Contributor

@valeriupredoi valeriupredoi commented Feb 21, 2020

Before you start, please read CONTRIBUTING.md.

Tasks

  • Create an issue to discuss what you are going to do, if you haven't done so already (and add the link at the bottom)
  • This pull request has a descriptive title that can be used in a changelog
  • Add unit tests
  • Public functions should have a numpy-style docstring so they appear properly in the API documentation. For all other functions a one line docstring is sufficient.
  • If writing a new/modified preprocessor function, please update the documentation
  • Circle/CI tests pass. Status can be seen below your pull request. If the tests are failing, click the link to find out why.
  • Codacy code quality checks pass. Status can be seen below your pull request. If there is an error, click the link to find out why. If you suspect Codacy may be wrong, please ask by commenting.
  • Please use yamllint to check that your YAML files do not contain mistakes
  • If you make backward incompatible changes to the recipe format, make a new pull request in the ESMValTool repository and add the link below

If you need help with any of the tasks above, please do not hesitate to ask by commenting in the issue or pull request.


READ BEFORE PROCEEDING

Issue #436 lays down the need for preprocessing fx variables as integrated with area_atatistics and volume_statistics; PR #439 implements that; reviewers expressed their consternation that that PR is huuuge so it got broken down; the layout for the PR 439 children is outlined in #436 (comment)
This is the 2nd child which contains the bulk of the new functionality.

Features

  • allows for fx variables preprocessing tasks to be ancestors of the main variable task;
  • runs the same preprocessing steps in the main variable preprocessor on each specified fx variable, asked for area and volume statistics (note: bar the time preprocessors)
  • writes the Preprocessor files for the fx vars in the preproc directory of the main variable
  • allows for data loading from that dir, as if the fx preprocessed file was attached to the main variable from the get going
  • implements a filter on ancestors with identical names (within one task) to eliminate duplication

Copy link

@zklaus zklaus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I managed about a quarter for now. Still pretty big. The feature list in the PR description probably would be a good starting point to break this up further. But maybe we can take it like this for this time.

esmvalcore/_data_finder.py Outdated Show resolved Hide resolved
esmvalcore/_recipe.py Show resolved Hide resolved
esmvalcore/_recipe.py Outdated Show resolved Hide resolved
esmvalcore/_recipe.py Outdated Show resolved Hide resolved
esmvalcore/_recipe.py Show resolved Hide resolved
Copy link

@zklaus zklaus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done with _recipe.py for now. Will deal with tests in a later round.

esmvalcore/_recipe.py Outdated Show resolved Hide resolved
esmvalcore/_recipe.py Outdated Show resolved Hide resolved
esmvalcore/_recipe.py Outdated Show resolved Hide resolved
esmvalcore/_recipe.py Outdated Show resolved Hide resolved
esmvalcore/_recipe.py Outdated Show resolved Hide resolved
esmvalcore/_recipe.py Outdated Show resolved Hide resolved
esmvalcore/_recipe.py Outdated Show resolved Hide resolved
esmvalcore/_recipe.py Show resolved Hide resolved
esmvalcore/_recipe.py Outdated Show resolved Hide resolved
esmvalcore/_recipe.py Show resolved Hide resolved
esmvalcore/_recipe.py Outdated Show resolved Hide resolved
@valeriupredoi
Copy link
Contributor Author

review totally appreciated @zklaus - I'll address the suggestions today 🍺

esmvalcore/_recipe.py Outdated Show resolved Hide resolved
@valeriupredoi
Copy link
Contributor Author

OK so due to all these changes, even if the tests cover most of the thing, I retested with a comprehensive functional test (a big recipe) and all is well 👍

@mattiarighi mattiarighi added preprocessor Related to the preprocessor enhancement New feature or request labels Feb 26, 2020
@valeriupredoi
Copy link
Contributor Author

@zklaus @bouweandela dudes what's your say - we are starting to rely even more on this to be merged and am aware of its shortcomings and kluges but at the moment it'll prove more useful if merged than sitting in a corner - can we try and see about this? Muchos gracias 🍺

@zklaus
Copy link

zklaus commented Mar 4, 2020

That really underscores the importance of smaller and cleaner PRs to begin with. After all, nobody needs a code review to know that 90 lines of code and 7 levels of indentation in one function are not appropriate.

Anyway, I'll copy here the open discussion from the side PR #522:

@zklaus wrote:

Next step: unify _update_fx_settings_mask and _update_fx_settings_stats.

For this it might be helpful to note that _get_spatial_stats_fx_vars really has nothing about spatial or stats things, but rather just gets the requested fx vars from the setting.

Looking at line 534 we notice that when the update method is called, the settings parameter already contains the pertinent subsection so that the manual distinction in lines 520, 524, and 528 is unnecessary.

Indeed, this automatic determination of fx variables makes one wonder if that could replace the manual listing of sft{lf,of,gif} entirely.

@valeriupredoi wrote:

but _get_spatial_stats_fx_vars needs to be called because it's the only function that gets the fx_files attr from the step (the others - mask landsea and weighted - don't need to). Kinda hard to unify the two funcs coz one need the call to get_output_file - ideas? beer

@zklaus wrote:

Well, in one we place the fx file as determined by _get_correct_fx_file in the fx_dict, in the other the result of get_output_file. What's the reason for this difference?

@valeriupredoi wrote:

get_output_file is used as a patch to get the already preprocessed and written to disk fx file, since the preprocessing and file writing is done independently of the whole fx workflow and is done like for any other variable

@zklaus wrote:

But why is that needed for some files, but not for others? What if a custom order is applied and both mask and area files have been regridded, for instance?

@valeriupredoi
Copy link
Contributor Author

But why is that needed for some files, but not for others? What if a custom order is applied and both mask and area files have been regridded, for instance?

That is a very good question! The workflow for fx files that are used for area/volume/zonal_statistics is fundamentally different and herein lies the tricksy nature of this PR (well, the part of it that deals with preprocessing these data):

  • the input fx data needed for those preprocessors are preprocessed entirely and firstly (as ancestors to the main variable) and these preprocessed files are written to disk, the very first thing the workflow does;
  • then, and only then, after all work's been done on those files only do the area/volume/zonal_statistics preproc step should look for files for fx_files - and those are U-turned from the get_output_files() assuming (and knowing) that they are there

This is a kludge - just because variable-fxvariable coupling is unavailable in a true sense now, given the current architecture. @bouweandela and myself we can work next using iris' new cell_measure functionality so we can do this in an elegant and practical way. Note that this method is relatively foolproof ie it is not prone to fail unexpectedly - if it fails, it fails due to an issue in the preprocessing step that's easy to trace.

About regridding - at the moment both the mask and model data will be regridded which is bad, but then again we can always put the area/volume/zonal_statistics preprocessor before the regridding step; we can also include the regrid as a prohibited step for the fx data just as are the time preproc steps. I left that out so I can talk to you about it - and here we are, talking about it 😁

@valeriupredoi
Copy link
Contributor Author

also to note: the preprocessor steps applied to fx data are a subset of the main variable they're associated with: [step1, step2...stepN] where stepN is the last step just before area/volume/zonal_statistics, everything that follows after that for the main var preprocessor is not applied to the fx variable; also step1-N do not include any time preprocessors

@zklaus
Copy link

zklaus commented Mar 5, 2020

Really like these last changes at a glance! Will do in-depth review tomorrow. Of course, the for-loop needs to be turned into a dict comprehension.

@valeriupredoi
Copy link
Contributor Author

Really like these last changes at a glance! Will do in-depth review tomorrow. Of course, the for-loop needs to be turned into a dict comprehension.

cheers dude, done it, and also simplified and added test - now the user can define his own fx variables no matter what mask preproc

@valeriupredoi
Copy link
Contributor Author

this is a terribly outdated branch, together with the other sisters, they awere useful to @ledm and @LisaBock I believe, but I think it's about time to retire them (close them) - what do you think @zklaus and @bouweandela - do we maybe want to recycle some of this functionality towards implementing the cell measures stuffs with new iris?

@bouweandela
Copy link
Member

Maybe keep it around until #999 is merged?

@schlunma
Copy link
Contributor

schlunma commented May 4, 2021

I believe we can close this since #999 got merged, can't we? @valeriupredoi

@zklaus
Copy link

zklaus commented May 5, 2021

Since V wanted to close them for a long time, I'll do it now. @valeriupredoi, @bouweandela please reopen if you disagree.

@zklaus zklaus closed this May 5, 2021
@bouweandela bouweandela deleted the PR439_child2_main_functionality branch May 5, 2021 09:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request preprocessor Related to the preprocessor
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants