Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow for CMOR checks but DO NOT raise exceptions (at the user's risk!) #338

Closed
valeriupredoi opened this issue Oct 29, 2019 · 35 comments · Fixed by #374
Closed

Allow for CMOR checks but DO NOT raise exceptions (at the user's risk!) #338

valeriupredoi opened this issue Oct 29, 2019 · 35 comments · Fixed by #374
Assignees
Labels
cmor Related to the CMOR standard enhancement New feature or request fix for dataset Related to dataset-specific fix files

Comments

@valeriupredoi
Copy link
Contributor

Not good, guys, not good - analyses that need to go through past nit-picky CMOR exceptions still stumble at them and return exceptions with the tool stopping. I thought the whole point of cmor_strict: false was to bypass all these roadblocks 🍺 Running an IPCC stuff for @LisaBock and am getting pestered by height2m not present. Who cares about bloody height2m? 😠

@valeriupredoi valeriupredoi added bug Something isn't working fix for dataset Related to dataset-specific fix files cmor Related to the CMOR standard labels Oct 29, 2019
@mattiarighi
Copy link
Contributor

Which recipe are you trying?
Is it due to the recent merge of #305?

@valeriupredoi
Copy link
Contributor Author

recipe_ipccwg1ar6ch3_highres

@mattiarighi
Copy link
Contributor

Can you try a few another ones to see whether it is a general issue or specific to that recipe?

@schlunma
Copy link
Contributor

I think cmor_strict does not affect any CMOR check, it only affects variable reading, for example allowing the use of the custom tables (at first I also thought it would allow the skipping of some tests, but it doesn't).

@valeriupredoi
Copy link
Contributor Author

check_metadata() has no provision of not raising CMORCheckError (fail_on_error set to False will eventually raise it at the end anyway), it's a fundamental problem with our CMOR issues handling not an issue resulting from a PR, man 🍺

@valeriupredoi
Copy link
Contributor Author

yep, @schlunma is correct - I want the check to find the issues, fix as many as possible but don't die on me if it couldn't fix everything, just report them and march on. Something like the flag in #307 that if set, it should go on at the user's risk, but go on until it hits rock bottom (or not)

@valeriupredoi
Copy link
Contributor Author

...which in my case, it went on and finished the bloody recipe no problemo, without any height2m, coz that's just like Brexit, nobody needs it but it's there for some reason 😁

@valeriupredoi valeriupredoi changed the title cmor_strict: false for CMIPX still performs checks and throws exceptions Allow for CMOR checks but DO NOT raise exceptions (at the user's risk!) Oct 29, 2019
@bouweandela
Copy link
Member

bouweandela commented Nov 1, 2019

If you want no CMOR checks (see also #88), you can disable them in the recipe by making sure that every preprocessor contains:

fix_metadata: {cmor_table: null}
fix_data: {cmor_table: null}
cmor_check_metadata: false
cmor_check_data: false

However, if you're missing a 2m height coordinate for a dataset I would recommend implementing a fix for it, this shouldn't be much work.

Note however that many preprocessor functions and diagnostic scripts depend on the data being in the format specified by the CMOR standard, so disabling the checks is not recommended.

@valeriupredoi
Copy link
Contributor Author

@bouweandela the issue I am raising is slightly more fundamental than simply adding a few lines in the recipe setting the checks to false and skipping those preprocessor steps: this is equivalent to a tank commander stripping the whole armor off the tank to allow for greater mobility. Instead I'd propose a mission-driven installation of armor:

  • allow the user to choose their own level of strictness for CMOR checks;
  • I would propose three levels set at runtime:
    • no checks at all (super risky but good for testing), the tool should still perform the checks and log the errors but should go on and don't raise the exceptions, what Development dry run #307 does;
    • basic checks and fixes - something like if your longitude is called long cows and time is measured in chicken; log but not raise if eg height2m is missing;
    • full checks and fixes as we have now;

This is something that I am sure @axel-lauer would agree with from the perspective of science friendliness 🍺

@axel-lauer
Copy link
Contributor

The 3-level idea proposed by @valeriupredoi sounds absolutely fantastic! The basic checks could be the default setting for doing "exploratory science", full checks the recommended setting for publications and no checks a "last aid" setting for "I know what I do, just read that data set" cases. I also particularly like the idea of logging all errors put not throwing exceptions in the "basic checks and fixes" and the "no checks" cases.

@veyring
Copy link
Member

veyring commented Nov 5, 2019

There has been a lot of discussion regarding the strict CMOR checks that are currently implemented in ESMValTool v2. What I am hearing from many of you is that “It is incredibly frustrating that ESMValTool can't handle any minor deviation from CMOR data”. While from a “good practice” coding point of view the checks make a lot of sense, in the meantime this really turns out to be a huge obstacle in actually using the tool for what it is built for, namely facilitating routine and rapid evaluation of the CMIP ensemble.

I therefore strongly support the proposal from @valeriupredoi to implement three different levels of checks to be selected by the user. I would like to suggest to implement this immediately. If you (particularly @ESMValGroup/esmvaltool-coreteam) have further suggestion on this proposal, please ensure to comment here by the end of this week. I suggest that then early next week @valeriupredoi is implementing his proposal if there are no further strong convincing arguments against this.

@valeriupredoi
Copy link
Contributor Author

I see ❤️ 's all round, I should candidate for the next elections in the UK in December 🤣 Cheers @veyring for your note! I reckon it's best if @jvegasbsc and myself will be working on this, rather than just myself, since Javi is the authority when it comes to CMOR+ESMValTool. Javi will hopefully recover from his illness by next week (get better soon, bud!!) so we can start the work 🍺

@ledm
Copy link
Contributor

ledm commented Nov 5, 2019

“It is incredibly frustrating that ESMValTool can't handle any minor deviation from CMOR data”

I said this, but I would like to clarify what I mean. I'm not frustrated that we have strict CMOR compliance built in, but rather that there is so much variability in the standardisation of CMIP datasets. This variability combined with the strictness of ESMValTools CMOR checks means that almost no data can be read as is (in my experience). I've spent far more time working on fixes than on diagnostics or recipes.

I wish that checks had been made earlier in the process perhaps by the modelling groups or by WCRP before submitting or publishing non-standardised data. This is a model-intercomparison project (cMIP!), so it shouldn't be too much to ask to require the data to be inter-comparable.

@valeriupredoi
Copy link
Contributor Author

@ledm asking scientists to use (and not just observe) a set of standards is like herding cats, man

@ledm
Copy link
Contributor

ledm commented Nov 6, 2019

Just to make sure it more widely known, this is from @veyring's email:

There is a formal way to catch and report errors provided by ES-Doc, see: https://errata.es-doc.org/static/index.html?project=CMIP6

I have a feeling that once the SOD is done, I'm going to need a few days to report all the issues with msftyz.

@valeriupredoi
Copy link
Contributor Author

valeriupredoi commented Nov 6, 2019 via email

@veyring
Copy link
Member

veyring commented Nov 6, 2019

Information from Karl Taylor: "There is extensive QC in place to check the global attributes and some of the variable attributes, which are used extensively throughout the CMIP6 infrastructure, but the QC of the data itself is left to those contributing data. Those groups using CMOR to write their data have subjected it to additional QC checks not part of the ESGF basic checks, and I suspect that fewer groups may be using CMOR this round than used it in previous phases, so this might explain some of the problems. See https://goo.gl/NmuENr for documentation on what QC is done by PrePARE as part of the publication procedure on ESGF."

Please report issues to the responsible modeling groups as described at https://pcmdi.llnl.gov/CMIP6/Guide/dataUsers.html#6-reporting-suspected-errors. The modeling groups will verify the problems and then enter the information into the errata data base at https://errata.es-doc.org/static/index.html.

@valeriupredoi
Copy link
Contributor Author

I am going to create a dedicated issue for reporting dataset issues and stop the conversation here since this issue deals with implementing the three-tiered checks and should not blow up on multiple fronts 🍺

@bouweandela
Copy link
Member

bouweandela commented Nov 8, 2019

Before starting to code the intermediate level of checking, it would be really useful to know what @valeriupredoi actually going to implement. That is, we need to have a list of what errors are serious enough to stop and/or what errors will only cause a warning. @ledm and @axel-lauer you seem to have an opinion on this, so maybe you could help out by starting to compile such a list?

It would also be good to know what to do with errors not in the list, i.e. should the default behaviour be to stop if an error is encountered, unless it's on the list of errors that are hopefully safe to ignore?

@valeriupredoi
Copy link
Contributor Author

valeriupredoi commented Nov 8, 2019

So this is best laid down if we actually list the types of metadata checks (note I don't propose tiering any of the data checks) performed by the tool. I propose naming Checks Levels: NONE, MEDIUM, STRICT and listing the checks I propose the following actions depending on check level. Note that for None we should log all issues but don't report any error (don't raise any exception):

CMORCheck.check_metadata: _check_var_metadata():

  • standard_name not CMOR: log if MEDIUM, report error if STRICT
  • long_name not CMOR: log if MEDIUM, report error if STRICT
  • units not CMOR: log if MEDIUM, report error if STRICT
  • attributes not present: log if MEDIUM, report error if STRICT
  • rank not == CMOR rank: log if MEDIUM, report error if STRICT
  • coordinates:
    • standard_name not CMOR (1d, 2d [latitude, longitude] is treated as exceptional for CMIP6): log if MEDIUM, report error if STRICT
    • does not exist: report error if MEDIUM, report error if STRICT
    • units: report error if MEDIUM, report error if STRICT
    • monotonicity and direction: report error if MEDIUM, report error if STRICT
    • values (longitude, latitude: monotinic, CMOR range): report error if MEDIUM, report error if STRICT
    • values (time: monotonic, reference time, frequency): report error if MEDIUM, report error if STRICT

I invite people to chime in. Also @jvegasbsc or anyone else pls add any other check that I may have omitted 🍺

@valeriupredoi
Copy link
Contributor Author

valeriupredoi commented Nov 8, 2019

Basically I am trying to replicate iris's behavior towards a Regular Joe netCDF file for MEDIUM with the addition of extra security for units and monotonic coord values; STRICT is strict like hell 😈

@zklaus
Copy link

zklaus commented Nov 8, 2019

Just a table version of @valeriupredoi list above; no other changes.

CMORCheck.check_metadata MEDIUM STRICT
standard_name not CMOR log error
long_name not CMOR log error
units not CMOR log error
attributes not present log error
rank not == CMOR rank log error
coordinates
standard_name not CMOR (1d, 2d [latitude, longitude] is treated as exceptional for CMIP6) log error
does not exist error error
units error error
monotonicity and direction error error
values (longitude, latitude: monotinic, CMOR range) error error
values (time: monotonic, reference time, frequency) error error

@valeriupredoi
Copy link
Contributor Author

darn swanky table @zklaus cheers 🍺

@valeriupredoi
Copy link
Contributor Author

the implementation should be very straightforward since report_error() in check.py for each of these functions should be replaced with a more flexible report() function that'll take both report_error() and report_warning() and use them according to case. Prob a piece of cake for @jvegasbsc or @sloosvel 🍺

@sloosvel
Copy link
Contributor

Some of the checks on the table deal with the errors with automatic fixes. Would in this cases be necessary to add this intermediate level of reporting?

Prob a piece of cake for @jvegasbsc or @sloosvel

Well thanks for assuming that. But I currently have more than enough cake with the primavera metrics. So you are more than welcome to help out if you can / want to.

@valeriupredoi
Copy link
Contributor Author

valeriupredoi commented Nov 11, 2019 via email

@sloosvel
Copy link
Contributor

Enjoy the retreat! Wednesday is fine for me to start working on that.

@axel-lauer
Copy link
Contributor

The error handling proposed by Klaus is great. 3 small issues with datasets have turned out to be very annoying in the past, so I would propose to write warnings to the log file instead of handling them as errors when cmor check is set to medium, i.e.:

CMORCheck.check_metadata MEDIUM STRICT
monotonicity of time coordinate log error
frequency of time coordinate log error
longitude = -180...180 instead of 0...360 automatic fix + log error

@bouweandela
Copy link
Member

I would like to propose a different naming scheme for the different levels of checking, e.g. none, relaxed, default, as there is the possibility for confusion with the cmor_strict option in config-developer.yml and I think that the default behaviour should remain unchanged. Also, I think there is no need to write the levels in capital letters.

Regarding the technical implementation, I would propose to add the keyword argument relaxed=true to the function cmor_check_metadata which sets the level of checking to relaxed. This keyword could then be set by a command line option which modifies the preprocessors defined in the recipe, this option could e.g. be called --cmor-checks and takes as argument one of the 3 different levels none, relaxed or default.

@valeriupredoi
Copy link
Contributor Author

@bouweandela I like the naming convention, especially relaxed 😁 Note that default may be slightly misleading ie it doesn't give the user info on the level of strictness, but if we explain it well in the documentation then it'll be a-ok.

Do you not think the cmor_checks could be an option in config-user.yml? Or that file is too crowded anyway?

@bouweandela
Copy link
Member

Do you not think the cmor_checks could be an option in config-user.yml? Or that file is too crowded anyway?

I think that using --cmor-checks none or --cmor-checks relaxed fits in well with other command line options like e.g. --skip-nonexistent and `--max-years`` which also modify the recipe in some way for the convenience of the user who doesn't care too much about getting the exact same results and is prepared to accept the risk that the recipe crashes because they're trying to use it for something that it wasn't designed for.

@valeriupredoi
Copy link
Contributor Author

valeriupredoi commented Nov 13, 2019

sounds about right, man 🍺 I also found this one picture guide to be quite useful to show what the relationship is between ESMValTool and CMIP6 data
cmip6

@sloosvel
Copy link
Contributor

I also prefer the command line option, I think it's safer. So in summary, the report_error functions should be adapted to general report functions that give either an error, a warning or nothing (as @valeriupredoi pointed out) depending on the level of checking given by --cmor-checks?

@valeriupredoi
Copy link
Contributor Author

yup, started work here #374

@bouweandela bouweandela added enhancement New feature or request and removed bug Something isn't working labels Nov 14, 2019
@valeriupredoi
Copy link
Contributor Author

I am going to create a dedicated issue for reporting dataset issues and stop the conversation here since this issue deals with implementing the three-tiered checks and should not blow up on multiple fronts

the data issue template has its first use in #384

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cmor Related to the CMOR standard enhancement New feature or request fix for dataset Related to dataset-specific fix files
Projects
None yet
Development

Successfully merging a pull request may close this issue.

10 participants