Fx files are selected differently depending on what OS the code runs on #1159

valeriupredoi · 2021-06-07T14:54:17Z

Bug summary

this line loops through all possible MIPs - for the test fail the explanation is that on a Debian-based OS (Ubuntu like on my machine and Github Actions) the order in which the tables hence mip list is passes is random (but fixed each time the test runs) and it happens that IyrAnt is the last actual mip in the loop that is valid, whereas for CentOS (JASMIN and the CI machine), the tables hence the mips are ordered by UNIX order type: numerals, capitals then lower case last, so fx is the last valid mip so the test passes. This, however, uncovers a potential vulnerability - this is a rather random process of selecting the mip, and is not solid in my view - iike in the test case, for sftgif we have both fx and IyrAnt files available, and the file selection is performed depending on the OS where the code runs on - this is dodgy. Any ideas how to select the file that we need and is consistent across OS's? This potentially impactful since fx is not time-dependent whereas eg Oyr or IyrAnt are time-dependent!

Test fail that discovered the issue

This is an odd one: that test fails only on Ubuntu and OSX machines. The Ubuntu fail is reported in this closed test PR with the test failing on both my machine and GA both Ubuntu and the OSX fail is here - the test is failing since the filename key returns a file from a list of files that does not have fx in it. I am not sure if this is an issue with the actual preprocessor or if it's just the test object that misbehaves on different platforms. I'd like to investigate more but I need to write some slides now instead 😖 This is, however, a sneaky bug 🪲 @zklaus the anaconda package build may fail coz of this (depending what machine the tests are run on)

The text was updated successfully, but these errors were encountered:

valeriupredoi · 2021-06-08T12:01:53Z

OK I found the bugger: this line loops through all possible MIPs - for the test fail the explanation is that on a Debian-based OS (Ubuntu like on my machine and Github Actions) the order in which the tables hence mip list is passes is random (but fixed each time the test runs) and it happens that IyrAnt is the last actual mip in the loop that is valid, whereas for CentOS (JASMIN and the CI machine), the tables hence the mips are ordered by UNIX order type: numerals, capitals then lower case last, so fx is the last valid mip so the test passes. This, however, uncovers a potential vulnerability - this is a rather random process of selecting the mip, and is not solid in my view - iike in the test case, for sftgif we have both fx and IyrAnt files available, and the file selection is performed depending on the OS where the code runs on - this is dodgy. Any ideas how to select the file that we need and is consistent across OS's? This potentially impactful since fx is not time-dependent whereas eg Oyr or IyrAnt are time-dependent!

@sloosvel @schlunma could you guys pls give priority to this issue, it's quite important for the FX functionality, and my apologies I've not spotted this at review point. @zklaus you may want to flag this for release, mate

sloosvel · 2021-06-08T12:06:30Z

I don't know, as far as I have seen in our experiments we only generate the fx vars in one of the possible mips. I think this test is more complicated because somehow it returns files in all of the possible mips. Whereas in a "real life" application the code would go through all mips but only find one possible output. Although I am not sure if in other institutes the fx vars are outputed in all the possible mips. But this should be solvable by specifying in the recipe which one you want to use.

valeriupredoi · 2021-06-08T12:09:33Z

@sloosvel the data is inherently different - there are fx mip-ed sftgif's and there are IyrAnt mip-ed stfgif's - they are not mutually exclusive, models produce data for both mip's - the random nature of selecting the file based on how the mip is named on different OS's is NOT the right way

zklaus · 2021-06-08T12:21:14Z

I have assigned the milestone. I will still cut the release branch soon, but this week is reserved for testing and fixing exactly this kind of issue before next week's release, so I agree that it would be good to address this soon.

I also agree that the order can not be relied upon since it is arbitrary in dicts. You may be interested to learn that it is arbitrary in globbing as well, see, e.g., here.

valeriupredoi · 2021-06-08T12:25:47Z

cheers @zklaus - I would suggest we try and understand the functional behaviour of selecting the fx mip first - do we want to use and load the first available fx mip or we want to select a preferred one? This will influence how #1160 gets solved too I think. Unfortunately I don't have the data knowledge to propose a solution, that's why I'm asking you guys that have worked more closely with this sort of data than meself. Definitely not the last come in the loop selection though 😁

valeriupredoi · 2021-06-08T12:28:07Z

possibly ping @ledm since he's been seeing FX stuff for a living, in oceans 🐟

zklaus · 2021-06-08T12:34:57Z

Looking at the data request for (IyrAnt, sftgif) as an example (here), this lives on a polar stereographic grid. In other words, I think the *Ant* and *Gre* files are applicable if and only if the grid_label of the data variable contains the a or g suffix, c.f. last point of note 11 on page 11 of "CMIP6 Global Attributes, DRS, Filenames, Directory Structure, and CV’s".

zklaus · 2021-06-08T12:42:06Z

As for fx vs mon/dec, this depends on the model and the data should only appear either in *fx or *mon. This is somewhat documented in the "processing" instructions of the variables in the data request, see for example cell thickness thkcello in Ofx and in Omon.

valeriupredoi · 2021-06-08T14:11:47Z

cheers for the clarifications @zklaus 🍺 Here's an idea - can we set a "preferred" mip dictionary mapping variable mip to a set of preferred fx mips? This way we'd always select the ones that do make sense first then look for some exotic fx mips after?

schlunma · 2021-06-08T14:51:52Z

I had a more detailed look at this. I think that #999 introduced a bug here:

ESMValCore/esmvalcore/_recipe.py

Lines 344 to 360 in fc87d72

    
           def _search_fx_mip(tables, found_mip, variable, fx_info, config_user): 
        
               fx_files = None 
        
               for mip in tables: 
        
                   fx_cmor = tables[mip].get(fx_info['short_name']) 
        
                   if fx_cmor: 
        
                       found_mip = True 
        
                       fx_info['mip'] = mip 
        
                       fx_info = _add_fxvar_keys(fx_info, variable) 
        
                       logger.debug( 
        
                           "For fx variable '%s', found table '%s'", 
        
                           fx_info['short_name'], mip) 
        
                       fx_files = _get_input_files(fx_info, config_user)[0] 
        
                       if fx_files: 
        
                           logger.debug( 
        
                               "Found fx variables '%s':\n%s", 
        
                               fx_info['short_name'], pformat(fx_files)) 
        
               return found_mip, fx_info, fx_files

In the old implementation, there used to be a break at l. 359 if fx files are found. That ensured that the first fx files that are actually found are returned. Right now, fx_files is overwritten over and over for all searched tables and only the last value is returned, regardless if it's empty or not. We should definitely fix that first.

I agree that a default order for checking the mip tables would be good. Maybe fx first and then simply alphabetically to ensure a consistent behavior across machines? In the old implementation we also preferred the original mip of the variable. That could be achieved by modifying this line (e.g. with a sorted()):

ESMValCore/esmvalcore/_recipe.py

Line 376 in fc87d72

project_tables = CMOR_TABLES[var_project].tables

valeriupredoi · 2021-06-08T15:03:43Z

@schlunma very cool - I remember now our old implementation (and the discussion we had over it). I agree with your points fully. What do you think @sloosvel - would you be able/willing to do such a thing plss? Cheers guys, this issue looks to be fully elucidated 👻

zklaus · 2021-06-09T08:48:21Z

I think in terms of preferred order, we should use fx files from *Ant* tables for *Ant* variables, from *Gre* tables for *Gre* variables, and *fx, *mon, *dec for the rest. This is because Ant and Gre live on South and North polar stereographic grids, which should be identifiable also from a or g as the last letter in the grid_label, but probably isn't.

sloosvel · 2021-06-09T12:36:22Z

I don't know. The mips are something that depend on the tables, that change from project to project. And even some variables exist in some projects, some others don't. This criteria would work fine for CMIP6, but may be too specific for all the projects.

valeriupredoi · 2021-06-09T14:27:52Z

OK I suggest we fix that test in a way it doesn't fail on any OS, and keep the discussion here. If build tests run on a Debian machine, Klaus will be having issues building the package 👍

sloosvel · 2021-06-09T14:58:50Z

If it's just to fix the test, specifying the mip in the content of the recipe should be enough. Would that work for you?

valeriupredoi · 2021-06-11T10:28:51Z

silly GH closing the issue on its own 🤭 I just merged a temporary fix that skips the tests that are failing in #1169 (cheers for reviewing it, Saskia!). This issue is still ongoing though...

zklaus · 2021-07-07T09:14:51Z

To summarize the issue:

This occurs when a requested variable exists in more than one "fx" table (which may even be a yr, mon, or dec table in reality).
This ambiguity can not be resolved in general.
Specifying the table manually per @sloosvel's comment above resolves the problem.

Is this accurate? If that is the case, I suggest improving the documentation to clarify this and fix the test permanently by including the explicit fx table there as well. I think @schlunma's #1216 is a good starting point for documentation improvements.

valeriupredoi added bug Something isn't working testing labels Jun 7, 2021

valeriupredoi assigned zklaus, schlunma and sloosvel Jun 7, 2021

zklaus added this to the v2.3.0 milestone Jun 8, 2021

valeriupredoi changed the title ~~test_recipe.test_fx_list_mip_change_cmip6 fails on Ubuntu and OSX~~ Fx files are selected differently depending on what OS the code runs on Jun 8, 2021

valeriupredoi mentioned this issue Jun 10, 2021

Fix test failing due to fx files chosen differently on different OS's #1169

Merged

10 tasks

valeriupredoi closed this as completed in #1169 Jun 11, 2021

valeriupredoi reopened this Jun 11, 2021

zklaus modified the milestones: [locked] v2.3.0, v2.4.0 Jun 11, 2021

valeriupredoi mentioned this issue Jul 1, 2021

Circle test picks up release package installed from conda and not current branch esmvalcore #1063

Closed

schlunma mentioned this issue Jul 6, 2021

Fixed search for fx files when no mip is given #1216

Merged

10 tasks

zklaus closed this as completed in #1216 Jul 12, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fx files are selected differently depending on what OS the code runs on #1159

Fx files are selected differently depending on what OS the code runs on #1159

valeriupredoi commented Jun 7, 2021 •

edited

Loading

valeriupredoi commented Jun 8, 2021 •

edited

Loading

sloosvel commented Jun 8, 2021

valeriupredoi commented Jun 8, 2021

zklaus commented Jun 8, 2021 •

edited

Loading

valeriupredoi commented Jun 8, 2021 •

edited

Loading

valeriupredoi commented Jun 8, 2021

zklaus commented Jun 8, 2021

zklaus commented Jun 8, 2021

valeriupredoi commented Jun 8, 2021

schlunma commented Jun 8, 2021

valeriupredoi commented Jun 8, 2021 •

edited

Loading

zklaus commented Jun 9, 2021

sloosvel commented Jun 9, 2021

valeriupredoi commented Jun 9, 2021

sloosvel commented Jun 9, 2021

valeriupredoi commented Jun 11, 2021

zklaus commented Jul 7, 2021

Fx files are selected differently depending on what OS the code runs on #1159

Fx files are selected differently depending on what OS the code runs on #1159

Comments

valeriupredoi commented Jun 7, 2021 • edited Loading

valeriupredoi commented Jun 8, 2021 • edited Loading

sloosvel commented Jun 8, 2021

valeriupredoi commented Jun 8, 2021

zklaus commented Jun 8, 2021 • edited Loading

valeriupredoi commented Jun 8, 2021 • edited Loading

valeriupredoi commented Jun 8, 2021

zklaus commented Jun 8, 2021

zklaus commented Jun 8, 2021

valeriupredoi commented Jun 8, 2021

schlunma commented Jun 8, 2021

valeriupredoi commented Jun 8, 2021 • edited Loading

zklaus commented Jun 9, 2021

sloosvel commented Jun 9, 2021

valeriupredoi commented Jun 9, 2021

sloosvel commented Jun 9, 2021

valeriupredoi commented Jun 11, 2021

zklaus commented Jul 7, 2021

valeriupredoi commented Jun 7, 2021 •

edited

Loading

valeriupredoi commented Jun 8, 2021 •

edited

Loading

zklaus commented Jun 8, 2021 •

edited

Loading

valeriupredoi commented Jun 8, 2021 •

edited

Loading

valeriupredoi commented Jun 8, 2021 •

edited

Loading