Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Development dry run #307

Closed
wants to merge 46 commits into from
Closed

Development dry run #307

wants to merge 46 commits into from

Conversation

valeriupredoi
Copy link
Contributor

@valeriupredoi valeriupredoi commented Oct 11, 2019

OK chaps this is the start of ESMValGroup/ESMValTool#1365 Currently it does pretty much all it needs to do:

  • run the tool with --dry-run flag and it will run through the recipe and run a default preprocessor, without stopping where data is missing;
  • it will tell you where data is missing via MISSING DATA in logger;
  • it runs the cmor checks and fixes and doesn't output the fixed files and then it exits without doing anything else (no diagnostic) - writing to disk is switched off completely bar the main_log and main_log_debug and resources (what's in run) files;
  • works seamlessly with derived variables and fx ones too;

It is pretty much the standard esmvaltool workflow but with all other stuffs than basics removed, and obviously no diagnostics.

CAVEAT: if there are issues with the cmor checks (either data or metadata) it will stop there, just as esmvaltool prod run would do; I need @jvegasbsc 's advice how to allow for just logging those errors and letting the flow on, I suggest using the same dry_check flag that gets passed through _recipe.py and allows for the dry run, but that means changing a bit the error handling in cmor/ - what says you? 🍺

@valeriupredoi
Copy link
Contributor Author

BTW guys - this, a rather complicated workflow rerouting, in +42/-21 lines -> hell, note of a great robust codebase we have - cheers to yous 🍺 See yous in Allemania next week, pub time now 🍺

@valeriupredoi
Copy link
Contributor Author

ok this one works now without saving and proper without running (or trying to run) any diagnostic, ready for review chaps 🍺

@valeriupredoi
Copy link
Contributor Author

still waiting on #374 - giddyup guys! 🍺

@stefsmeets
Copy link
Contributor

stefsmeets commented May 21, 2021

Hi @valeriupredoi , I just had a look at this PR, and I'm wondering if it is still relevant with #917 merged. I merged master, and although I'm not super confident I got everything right (there were a lot of code changes generating merge conflicts), I did get it to work... although I'm not sure what kind of output to expect 😅.

@valeriupredoi
Copy link
Contributor Author

cheers muchly @stefsmeets 🍺 You shouldn't get any output since the purpose of this functionality is not to do any analysis but rather to alert the user to missing data, issues with data (CMOR hiccups) etc - I am not sure of the future of this PR though, @bouweandela was not keen to get it in last time I spoke to him but @axel-lauer was - I am on the fence myself since now we can throttle the CMOR checks and we get better error messages via #917 - what do peeps think?

@stefsmeets
Copy link
Contributor

That's why I was asking if it is still relevant. Personally, I like the idea because it would take away some of my own frustrations with missing data. Having looked at the code however, the changes are quite convoluted, and it does not make the existing code easier to understand. I'm questioning whether this is the right implementation.

@jvegreg
Copy link
Contributor

jvegreg commented May 21, 2021

I think #917 is good enough for when data is missing. I also think we still need a way to check the data without running the full recipe.

About the implementation, it will be possible to just add a dry_run parameter to the task._run function so we just modify the
PreprocessingTask._run implementation to just run the critical preprocessor tasks (load and checking basically, maybe area / level selection) if that flag is set. For the DiagnosticTask it will be just pass.

I have some doubts about how this will interact with the multimodel, though, mostly because I have not a clear idea about how it is implemented

@bouweandela
Copy link
Member

The implementation here indeed looks a bit too complicated. All that would be needed now to just run the cmor checks is some way to disable saving the preprocessor output to file, right? We already have a nice report for missing data and a way to disable running diagnostics.

@valeriupredoi
Copy link
Contributor Author

I reckon Bouwe is right, Javi - all we need is a run through the data to find the missing ones and detect CMOR issues (depending on the chosen level) not a full run of the preprocessor - that takes time and memory and if we don't save any data what's the point? Gonna have a look at it now 🍺

@valeriupredoi
Copy link
Contributor Author

out of curiosity, do we have a separate report just for missing data? I don't think so, and the stdout is buried in all that screen output (or log) that, even if not in debug mode, is still a lot to sift through

@valeriupredoi
Copy link
Contributor Author

sadly this has been sidelined for too long and now it's obsolete like a phone booth. Closing it 😢

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants