Allow for CMOR checks but DO NOT raise exceptions (at the user's risk!) #338

valeriupredoi · 2019-10-29T12:57:14Z

Not good, guys, not good - analyses that need to go through past nit-picky CMOR exceptions still stumble at them and return exceptions with the tool stopping. I thought the whole point of cmor_strict: false was to bypass all these roadblocks 🍺 Running an IPCC stuff for @LisaBock and am getting pestered by height2m not present. Who cares about bloody height2m? 😠

The text was updated successfully, but these errors were encountered:

mattiarighi · 2019-10-29T13:18:04Z

Which recipe are you trying?
Is it due to the recent merge of #305?

valeriupredoi · 2019-10-29T13:41:56Z

recipe_ipccwg1ar6ch3_highres

mattiarighi · 2019-10-29T13:42:55Z

Can you try a few another ones to see whether it is a general issue or specific to that recipe?

schlunma · 2019-10-29T13:55:07Z

I think cmor_strict does not affect any CMOR check, it only affects variable reading, for example allowing the use of the custom tables (at first I also thought it would allow the skipping of some tests, but it doesn't).

valeriupredoi · 2019-10-29T13:58:32Z

check_metadata() has no provision of not raising CMORCheckError (fail_on_error set to False will eventually raise it at the end anyway), it's a fundamental problem with our CMOR issues handling not an issue resulting from a PR, man 🍺

valeriupredoi · 2019-10-29T14:04:13Z

yep, @schlunma is correct - I want the check to find the issues, fix as many as possible but don't die on me if it couldn't fix everything, just report them and march on. Something like the flag in #307 that if set, it should go on at the user's risk, but go on until it hits rock bottom (or not)

valeriupredoi · 2019-10-29T14:13:49Z

...which in my case, it went on and finished the bloody recipe no problemo, without any height2m, coz that's just like Brexit, nobody needs it but it's there for some reason 😁

bouweandela · 2019-11-01T09:36:09Z

If you want no CMOR checks (see also #88), you can disable them in the recipe by making sure that every preprocessor contains:

fix_metadata: {cmor_table: null}
fix_data: {cmor_table: null}
cmor_check_metadata: false
cmor_check_data: false

However, if you're missing a 2m height coordinate for a dataset I would recommend implementing a fix for it, this shouldn't be much work.

Note however that many preprocessor functions and diagnostic scripts depend on the data being in the format specified by the CMOR standard, so disabling the checks is not recommended.

valeriupredoi · 2019-11-01T11:17:16Z

@bouweandela the issue I am raising is slightly more fundamental than simply adding a few lines in the recipe setting the checks to false and skipping those preprocessor steps: this is equivalent to a tank commander stripping the whole armor off the tank to allow for greater mobility. Instead I'd propose a mission-driven installation of armor:

allow the user to choose their own level of strictness for CMOR checks;
I would propose three levels set at runtime:
- no checks at all (super risky but good for testing), the tool should still perform the checks and log the errors but should go on and don't raise the exceptions, what Development dry run #307 does;
- basic checks and fixes - something like if your longitude is called long cows and time is measured in chicken; log but not raise if eg height2m is missing;
- full checks and fixes as we have now;

This is something that I am sure @axel-lauer would agree with from the perspective of science friendliness 🍺

axel-lauer · 2019-11-04T06:05:22Z

The 3-level idea proposed by @valeriupredoi sounds absolutely fantastic! The basic checks could be the default setting for doing "exploratory science", full checks the recommended setting for publications and no checks a "last aid" setting for "I know what I do, just read that data set" cases. I also particularly like the idea of logging all errors put not throwing exceptions in the "basic checks and fixes" and the "no checks" cases.

veyring · 2019-11-05T12:02:00Z

There has been a lot of discussion regarding the strict CMOR checks that are currently implemented in ESMValTool v2. What I am hearing from many of you is that “It is incredibly frustrating that ESMValTool can't handle any minor deviation from CMOR data”. While from a “good practice” coding point of view the checks make a lot of sense, in the meantime this really turns out to be a huge obstacle in actually using the tool for what it is built for, namely facilitating routine and rapid evaluation of the CMIP ensemble.

I therefore strongly support the proposal from @valeriupredoi to implement three different levels of checks to be selected by the user. I would like to suggest to implement this immediately. If you (particularly @ESMValGroup/esmvaltool-coreteam) have further suggestion on this proposal, please ensure to comment here by the end of this week. I suggest that then early next week @valeriupredoi is implementing his proposal if there are no further strong convincing arguments against this.

valeriupredoi · 2019-11-05T12:13:25Z

I see ❤️ 's all round, I should candidate for the next elections in the UK in December 🤣 Cheers @veyring for your note! I reckon it's best if @jvegasbsc and myself will be working on this, rather than just myself, since Javi is the authority when it comes to CMOR+ESMValTool. Javi will hopefully recover from his illness by next week (get better soon, bud!!) so we can start the work 🍺

ledm · 2019-11-05T14:02:58Z

“It is incredibly frustrating that ESMValTool can't handle any minor deviation from CMOR data”

I said this, but I would like to clarify what I mean. I'm not frustrated that we have strict CMOR compliance built in, but rather that there is so much variability in the standardisation of CMIP datasets. This variability combined with the strictness of ESMValTools CMOR checks means that almost no data can be read as is (in my experience). I've spent far more time working on fixes than on diagnostics or recipes.

I wish that checks had been made earlier in the process perhaps by the modelling groups or by WCRP before submitting or publishing non-standardised data. This is a model-intercomparison project (cMIP!), so it shouldn't be too much to ask to require the data to be inter-comparable.

valeriupredoi · 2019-11-05T14:09:06Z

@ledm asking scientists to use (and not just observe) a set of standards is like herding cats, man

ledm · 2019-11-06T09:07:51Z

Just to make sure it more widely known, this is from @veyring's email:

There is a formal way to catch and report errors provided by ES-Doc, see: https://errata.es-doc.org/static/index.html?project=CMIP6

I have a feeling that once the SOD is done, I'm going to need a few days to report all the issues with msftyz.

valeriupredoi · 2019-11-06T09:09:43Z

Poor SODs, they gonna hate us 😁 Dr Valeriu Predoi. Computational scientist NCAS-CMS University of Reading Department of Meteorology Reading RG6 6BB United Kingdom

…

On Wed, 6 Nov 2019, 09:07 Lee de Mora, ***@***.***> wrote: Just to make sure it more widely known, this is from @veyring <https://github.com/veyring>'s email: There is a formal way to catch and report errors provided by ES-Doc, see: https://errata.es-doc.org/static/index.html?project=CMIP6 I have a feeling that once the SOD is done, I'm going to need a few days to report all the issues with msftyz. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#338?email_source=notifications&email_token=AG5EFIZHFPMF2J6KXUQ4SPLQSKCOVA5CNFSM4JGIEFR2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEDFZ6ZA#issuecomment-550215524>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AG5EFIYAEGYJV5JYTILIHMDQSKCOVANCNFSM4JGIEFRQ> .

veyring · 2019-11-06T10:07:45Z

Information from Karl Taylor: "There is extensive QC in place to check the global attributes and some of the variable attributes, which are used extensively throughout the CMIP6 infrastructure, but the QC of the data itself is left to those contributing data. Those groups using CMOR to write their data have subjected it to additional QC checks not part of the ESGF basic checks, and I suspect that fewer groups may be using CMOR this round than used it in previous phases, so this might explain some of the problems. See https://goo.gl/NmuENr for documentation on what QC is done by PrePARE as part of the publication procedure on ESGF."

Please report issues to the responsible modeling groups as described at https://pcmdi.llnl.gov/CMIP6/Guide/dataUsers.html#6-reporting-suspected-errors. The modeling groups will verify the problems and then enter the information into the errata data base at https://errata.es-doc.org/static/index.html.

valeriupredoi · 2019-11-06T12:27:22Z

I am going to create a dedicated issue for reporting dataset issues and stop the conversation here since this issue deals with implementing the three-tiered checks and should not blow up on multiple fronts 🍺

bouweandela · 2019-11-08T13:29:55Z

Before starting to code the intermediate level of checking, it would be really useful to know what @valeriupredoi actually going to implement. That is, we need to have a list of what errors are serious enough to stop and/or what errors will only cause a warning. @ledm and @axel-lauer you seem to have an opinion on this, so maybe you could help out by starting to compile such a list?

It would also be good to know what to do with errors not in the list, i.e. should the default behaviour be to stop if an error is encountered, unless it's on the list of errors that are hopefully safe to ignore?

valeriupredoi · 2019-11-08T15:11:27Z

So this is best laid down if we actually list the types of metadata checks (note I don't propose tiering any of the data checks) performed by the tool. I propose naming Checks Levels: NONE, MEDIUM, STRICT and listing the checks I propose the following actions depending on check level. Note that for None we should log all issues but don't report any error (don't raise any exception):

CMORCheck.check_metadata: _check_var_metadata():

standard_name not CMOR: log if MEDIUM, report error if STRICT
long_name not CMOR: log if MEDIUM, report error if STRICT
units not CMOR: log if MEDIUM, report error if STRICT
attributes not present: log if MEDIUM, report error if STRICT
rank not == CMOR rank: log if MEDIUM, report error if STRICT
coordinates:
- standard_name not CMOR (1d, 2d [latitude, longitude] is treated as exceptional for CMIP6): log if MEDIUM, report error if STRICT
- does not exist: report error if MEDIUM, report error if STRICT
- units: report error if MEDIUM, report error if STRICT
- monotonicity and direction: report error if MEDIUM, report error if STRICT
- values (longitude, latitude: monotinic, CMOR range): report error if MEDIUM, report error if STRICT
- values (time: monotonic, reference time, frequency): report error if MEDIUM, report error if STRICT

I invite people to chime in. Also @jvegasbsc or anyone else pls add any other check that I may have omitted 🍺

valeriupredoi · 2019-11-08T15:15:02Z

Basically I am trying to replicate iris's behavior towards a Regular Joe netCDF file for MEDIUM with the addition of extra security for units and monotonic coord values; STRICT is strict like hell 😈

zklaus · 2019-11-08T15:42:45Z

Just a table version of @valeriupredoi list above; no other changes.

`CMORCheck.check_metadata`	MEDIUM	STRICT
standard_name not CMOR	log	error
long_name not CMOR	log	error
units not CMOR	log	error
attributes not present	log	error
rank not == CMOR rank	log	error
coordinates
standard_name not CMOR (1d, 2d [latitude, longitude] is treated as exceptional for CMIP6)	log	error
does not exist	error	error
units	error	error
monotonicity and direction	error	error
values (longitude, latitude: monotinic, CMOR range)	error	error
values (time: monotonic, reference time, frequency)	error	error

valeriupredoi · 2019-11-08T15:48:37Z

darn swanky table @zklaus cheers 🍺

valeriupredoi · 2019-11-08T16:31:46Z

the implementation should be very straightforward since report_error() in check.py for each of these functions should be replaced with a more flexible report() function that'll take both report_error() and report_warning() and use them according to case. Prob a piece of cake for @jvegasbsc or @sloosvel 🍺

sloosvel · 2019-11-11T12:22:00Z

Some of the checks on the table deal with the errors with automatic fixes. Would in this cases be necessary to add this intermediate level of reporting?

Prob a piece of cake for @jvegasbsc or @sloosvel

Well thanks for assuming that. But I currently have more than enough cake with the primavera metrics. So you are more than welcome to help out if you can / want to.

valeriupredoi · 2019-11-11T13:22:32Z

Sure thing, I will! In a work retreat til Wed but will start helping myself with cake on Wed 😁 Dr Valeriu Predoi. Computational scientist NCAS-CMS University of Reading Department of Meteorology Reading RG6 6BB United Kingdom

…

On Mon, 11 Nov 2019, 12:22 sloosvel, ***@***.***> wrote: Some of the checks on the table deal with the errors with automatic fixes. Would in this cases be necessary to add this intermediate level of reporting? Prob a piece of cake for @jvegasbsc <https://github.com/jvegasbsc> or @sloosvel <https://github.com/sloosvel> Well thanks for assuming that. But I currently have more than enough cake with the primavera metrics. So you are more than welcome to help out if you can / want to. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#338?email_source=notifications&email_token=AG5EFI25XT7G7ZDEOM7DVU3QTFE6XA5CNFSM4JGIEFR2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEDWVADI#issuecomment-552423437>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AG5EFI5XSCRT4C2LPOUIQOLQTFE6XANCNFSM4JGIEFRQ> .

sloosvel · 2019-11-11T13:25:55Z

Enjoy the retreat! Wednesday is fine for me to start working on that.

axel-lauer · 2019-11-11T13:37:01Z

The error handling proposed by Klaus is great. 3 small issues with datasets have turned out to be very annoying in the past, so I would propose to write warnings to the log file instead of handling them as errors when cmor check is set to medium, i.e.:

CMORCheck.check_metadata	MEDIUM	STRICT
monotonicity of time coordinate	log	error
frequency of time coordinate	log	error
longitude = -180...180 instead of 0...360	automatic fix + log	error

bouweandela · 2019-11-13T13:44:09Z

I would like to propose a different naming scheme for the different levels of checking, e.g. none, relaxed, default, as there is the possibility for confusion with the cmor_strict option in config-developer.yml and I think that the default behaviour should remain unchanged. Also, I think there is no need to write the levels in capital letters.

Regarding the technical implementation, I would propose to add the keyword argument relaxed=true to the function cmor_check_metadata which sets the level of checking to relaxed. This keyword could then be set by a command line option which modifies the preprocessors defined in the recipe, this option could e.g. be called --cmor-checks and takes as argument one of the 3 different levels none, relaxed or default.

valeriupredoi · 2019-11-13T13:52:27Z

@bouweandela I like the naming convention, especially relaxed 😁 Note that default may be slightly misleading ie it doesn't give the user info on the level of strictness, but if we explain it well in the documentation then it'll be a-ok.

Do you not think the cmor_checks could be an option in config-user.yml? Or that file is too crowded anyway?

bouweandela · 2019-11-13T14:18:12Z

Do you not think the cmor_checks could be an option in config-user.yml? Or that file is too crowded anyway?

I think that using --cmor-checks none or --cmor-checks relaxed fits in well with other command line options like e.g. --skip-nonexistent and `--max-years`` which also modify the recipe in some way for the convenience of the user who doesn't care too much about getting the exact same results and is prepared to accept the risk that the recipe crashes because they're trying to use it for something that it wasn't designed for.

valeriupredoi · 2019-11-13T14:24:59Z

sounds about right, man 🍺 I also found this one picture guide to be quite useful to show what the relationship is between ESMValTool and CMIP6 data

sloosvel · 2019-11-13T15:06:54Z

I also prefer the command line option, I think it's safer. So in summary, the report_error functions should be adapted to general report functions that give either an error, a warning or nothing (as @valeriupredoi pointed out) depending on the level of checking given by --cmor-checks?

valeriupredoi · 2019-11-13T15:22:59Z

yup, started work here #374

valeriupredoi · 2019-12-02T15:56:55Z

I am going to create a dedicated issue for reporting dataset issues and stop the conversation here since this issue deals with implementing the three-tiered checks and should not blow up on multiple fronts

the data issue template has its first use in #384

valeriupredoi added bug Something isn't working fix for dataset Related to dataset-specific fix files cmor Related to the CMOR standard labels Oct 29, 2019

valeriupredoi assigned jvegreg and mattiarighi Oct 29, 2019

mattiarighi assigned axel-lauer Oct 29, 2019

valeriupredoi changed the title ~~cmor_strict: false for CMIPX still performs checks and throws exceptions~~ Allow for CMOR checks but DO NOT raise exceptions (at the user's risk!) Oct 29, 2019

valeriupredoi mentioned this issue Nov 6, 2019

Reporing CMIP6 dataset issues to ESGF/model groups #360

Closed

valeriupredoi mentioned this issue Nov 13, 2019

Implement three levels of CMOR checks strictness #374

Merged

9 tasks

bouweandela added enhancement New feature or request and removed bug Something isn't working labels Nov 14, 2019

mattiarighi closed this as completed in #374 Mar 12, 2020

Allow for CMOR checks but DO NOT raise exceptions (at the user's risk!) #338

Allow for CMOR checks but DO NOT raise exceptions (at the user's risk!) #338

Comments

valeriupredoi commented Oct 29, 2019

mattiarighi commented Oct 29, 2019

valeriupredoi commented Oct 29, 2019

mattiarighi commented Oct 29, 2019

schlunma commented Oct 29, 2019

valeriupredoi commented Oct 29, 2019

valeriupredoi commented Oct 29, 2019

valeriupredoi commented Oct 29, 2019

bouweandela commented Nov 1, 2019 • edited Loading

valeriupredoi commented Nov 1, 2019

axel-lauer commented Nov 4, 2019

veyring commented Nov 5, 2019

valeriupredoi commented Nov 5, 2019

ledm commented Nov 5, 2019

valeriupredoi commented Nov 5, 2019

ledm commented Nov 6, 2019

valeriupredoi commented Nov 6, 2019 via email

veyring commented Nov 6, 2019

valeriupredoi commented Nov 6, 2019

bouweandela commented Nov 8, 2019 • edited Loading

valeriupredoi commented Nov 8, 2019 • edited Loading

valeriupredoi commented Nov 8, 2019 • edited Loading

zklaus commented Nov 8, 2019

valeriupredoi commented Nov 8, 2019

valeriupredoi commented Nov 8, 2019

sloosvel commented Nov 11, 2019

valeriupredoi commented Nov 11, 2019 via email

sloosvel commented Nov 11, 2019

axel-lauer commented Nov 11, 2019

bouweandela commented Nov 13, 2019

valeriupredoi commented Nov 13, 2019

bouweandela commented Nov 13, 2019

valeriupredoi commented Nov 13, 2019 • edited Loading

sloosvel commented Nov 13, 2019

valeriupredoi commented Nov 13, 2019

valeriupredoi commented Dec 2, 2019

bouweandela commented Nov 1, 2019 •

edited

Loading

bouweandela commented Nov 8, 2019 •

edited

Loading

valeriupredoi commented Nov 8, 2019 •

edited

Loading

valeriupredoi commented Nov 8, 2019 •

edited

Loading

valeriupredoi commented Nov 13, 2019 •

edited

Loading