Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong error message for duplicate coordinate #373

Closed
schlunma opened this issue Nov 11, 2019 · 24 comments · Fixed by #558
Closed

Wrong error message for duplicate coordinate #373

schlunma opened this issue Nov 11, 2019 · 24 comments · Fixed by #558
Assignees
Labels
bug Something isn't working cmor Related to the CMOR standard

Comments

@schlunma
Copy link
Contributor

The error message which appears in #371 is:

esmvalcore.cmor.check.CMORCheckError: There were errors in variable tas:
height2m: does not exist
in cube:
air_temperature / (K)               (time: 1200; latitude: 180; longitude: 288)
     Dimension coordinates:
          time                           x               -               -
          latitude                       -               x               -
          longitude                      -               -               x
     Scalar coordinates:
          height: 2.0 m
             description='~2 m standard surface air temperature and surface humidity  height'
          height: 2.0 m
     Attributes:
          Conventions: CF-1.7 CMIP-6.0 UGRID-1.0
          activity_id: CMIP
          branch_method: standard
          branch_time_in_child: -674885.0
          branch_time_in_parent: 0.0
          comment: <null ref>
          contact: [email protected]
          data_specs_version: 01.00.27
          experiment: pre-industrial control
          experiment_id: piControl
          external_variables: areacella
          forcing_index: 1
          frequency: monC
          further_info_url: https://furtherinfo.es-doc.org/CMIP6.NOAA-GFDL.GFDL-CM4.piControl.none...
          grid: atmos data regridded from Cubed-sphere (c96) to 180,288; interpolation...
          grid_label: gr1
          initialization_index: 1
          institution: National Oceanic and Atmospheric Administration, Geophysical Fluid Dynamics...
          institution_id: NOAA-GFDL
          interp_method: conserve_order2
          license: CMIP6 model data produced by NOAA-GFDL is licensed under a Creative Commons...
          mip_era: CMIP6
          nominal_resolution: 100 km
          original_name: tas
          parent_activity_id: CMIP
          parent_experiment_id: piControl-spinup
          parent_mip_era: CMIP6
          parent_source_id: GFDL-CM4
          parent_time_units: days since 1850-1-1 00:00:00
          parent_variant_label: r1i1p1f1
          physics_index: 1
          product: model-output
          realization_index: 1
          realm: atmos
          references: see further_info_url attribute
          source: GFDL-CM4 (2018): 
aerosol: interactive
atmos: GFDL-AM4.0.1 (Cubed-sphere...
          source_file: /mnt/lustre02/work/bd0854/DATA/ESMValTool2/CMIP6_DKRZ/CMIP/NOAA-GFDL/G...
          source_id: GFDL-CM4
          source_type: AOGCM
          sub_experiment: none
          sub_experiment_id: none
          table_id: Amon
          title: NOAA GFDL GFDL-CM4 model output prepared for CMIP6 pre-industrial cont...
          variable_id: tas
          variant_info: N/A
          variant_label: r1i1p1f1
     Cell methods:
          mean: area, time

However, the coordinate height is not missing, but appears twice (because it was additionally added by the fix). This can be very confusing.

The reason for this is that iris also raises an CoordinateNotFoundError for cube.coord('coord_name') if 'coord_name' appears more than once in cube:

iris.exceptions.CoordinateNotFoundError: 'Expected to find exactly 1 coordinate, but found 2. They were: ..., ....'
@schlunma schlunma added bug Something isn't working cmor Related to the CMOR standard labels Nov 11, 2019
@valeriupredoi
Copy link
Contributor

but it's impossible to add the same coordinate twice in a cube - even if one is DimCoord and the other is AuxCoord, no?

@schlunma
Copy link
Contributor Author

schlunma commented Dec 4, 2019

Apparently it's possible if the two coordinates are slightly different, e.g. if one has additional attributes and the other one doesn't, like it's the case for the example posted above (one height coordinate has a description, the other one doesn't).

@valeriupredoi
Copy link
Contributor

valeriupredoi commented Dec 4, 2019 via email

@schlunma
Copy link
Contributor Author

schlunma commented Dec 4, 2019

I don't think so. Why would they add an error message like this then?

iris.exceptions.CoordinateNotFoundError: 'Expected to find exactly 1 coordinate, but found 2. They were: ..., ....'

@valeriupredoi
Copy link
Contributor

so the error message takes care of the already created problem; the problem should not happen in the first place ie adding two longitude's as both DimCoords should not be allowed methinks

@schlunma
Copy link
Contributor Author

schlunma commented Dec 4, 2019

Adding two DimCoords for the same index is not possible anyway, but height is an AuxCoord

@valeriupredoi
Copy link
Contributor

yeah but what happens if the data is 3D and you remove say, time and add longitude in the place of time (so index 0), with the data now having longitude-longitude-latitude coords -> will that work? Coz if it does, it's bonkers 🍺

@schlunma
Copy link
Contributor Author

schlunma commented Dec 4, 2019

It works if the metadata of the DimCoord is different (changing the points is not enough):

Python 3.7.3 | packaged by conda-forge | (default, Jul  1 2019, 21:52:21) 
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import iris
>>> d1 = iris.coords.DimCoord([1, 2], standard_name='latitude')
>>> d2 = d1.copy()
>>> d3 = d1.copy([2, 4])
>>> d4 = iris.coords.DimCoord([1, 2], standard_name='latitude', var_name='lat')
>>> cube = iris.cube.Cube([[1, 2], [3, 4]])
>>> print(cube)
unknown / (unknown)                 (-- : 2; -- : 2)
>>> cube.add_dim_coord(d1, 0)
>>> cube.add_dim_coord(d1, 0)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/mnt/lustre02/work/bd0854/b309141/miniconda3/envs/mlr/lib/python3.7/site-packages/iris/cube.py", line 1054, in add_dim_coord
    raise ValueError('The coordinate already exists on the cube. '
ValueError: The coordinate already exists on the cube. Duplicate coordinates are not permitted.
>>> cube.add_dim_coord(d1, 1)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/mnt/lustre02/work/bd0854/b309141/miniconda3/envs/mlr/lib/python3.7/site-packages/iris/cube.py", line 1054, in add_dim_coord
    raise ValueError('The coordinate already exists on the cube. '
ValueError: The coordinate already exists on the cube. Duplicate coordinates are not permitted.
>>> cube.add_dim_coord(d2, 1)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/mnt/lustre02/work/bd0854/b309141/miniconda3/envs/mlr/lib/python3.7/site-packages/iris/cube.py", line 1054, in add_dim_coord
    raise ValueError('The coordinate already exists on the cube. '
ValueError: The coordinate already exists on the cube. Duplicate coordinates are not permitted.
>>> cube.add_dim_coord(d3, 1)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/mnt/lustre02/work/bd0854/b309141/miniconda3/envs/mlr/lib/python3.7/site-packages/iris/cube.py", line 1054, in add_dim_coord
    raise ValueError('The coordinate already exists on the cube. '
ValueError: The coordinate already exists on the cube. Duplicate coordinates are not permitted.
>>> cube.add_dim_coord(d4, 1)
>>> cube
<iris 'Cube' of unknown / (unknown) (latitude: 2; latitude: 2)>
>>> 

@valeriupredoi
Copy link
Contributor

Bahahah, you got a Frankencube 🤣 I think we should probably tell the iris guys about this - ping @bjlittle

@bjlittle
Copy link
Contributor

bjlittle commented Dec 6, 2019

@schlunma and @valeriupredoi

An interesting question, and quite topical... see SciTools/iris#3583 which is part of an overhaul of iris cube-arithmetic that I'm doing at the moment (also see SciTools/iris#3478) and aiming to land in the forthcoming 3.0.0.

This all comes down to the question of "What is the identity of a coordinate?".

To be fair we need to communicate this more clearly, but essentially the answer comes down to the metadata of the coordinate, and in this case that is a combination of the coordinate:

  • standard_name
  • long_name
  • var_name
  • units
  • attributes
  • coord_system
  • climatological (new in 2.3.0 and 3.0.0)

The identify of a coordinate has nothing to do with the value of its points and/or bounds. It's all about the associated metadata.

In your example, only the DimCoords d1 and d4 (has a var_name) differ in metadata, therefore their identity is different. Whereas, d1, d2 (exact copy of d1), and d3 (differs only in points, which is not part of the metadata, and therefore not part of the coordinate identity contract) are in metadata terms all the same - therefore that is why iris raises the exception.

Hope this helps to clarify.

I want to make this a slicker (and configurable) experience for the user and SciTools/iris#3583 is the first step in that direction - plus we need to clearly communicate all of this in the user guide to avoid any on-going confusion.

So IMHO this isn't a bug 😃

@rcomer
Copy link

rcomer commented Dec 6, 2019

For info, I have previously queried the Frankencubes: SciTools/iris#2917

@bjlittle
Copy link
Contributor

bjlittle commented Dec 9, 2019

"Frankencubes" -- it's now officially a thing 🤣

I've been thinking about Frankencubes over the weekend, and given that I'm kinda actively working in this area of iris at the moment, I may be able to address this behaviour in time for 3.0.0. No promises though...

I stand by my last comment that this isn't a bug, but then again, the double height coordinate in the cube behaviour is surprising for users and takes some explaining to justify... which hints to me that perhaps a less surprising compromise could be had.

Ideally, what you might want to happen here is for iris to raise an exception to say that the height coordinate already exists in that cube instance... and that comes down to relaxing the contract for coordinate metadata equality. I think there is a common sense compromise to be had here, that's not too upsetting for all concerned...I just need to figure that out and see if it make sense.

So, if you're happy, leave this issue open, and let's see if I can get the Frankencube to learn some table manners 😄

That said, your temporary workaround here might be something along the lines of:

height = iris.coords.DimCoord(2.0, standard_name="height", units="m", attributes=dict(description="...")
coords = cube.coords(height.name())
if not coords:
    # This isn't a Frankencube...
    cube.add_aux_coord(height)
else:
    # A "height-like" coordinate already exists in the cube...
    if len(coords)>1:
        raise ValueError("oops")
    # Add the extra metadata (what ever it is) to the existing "height" coordinate on the cube...
    coords[0].attributes.update(height.attributes)

@schlunma
Copy link
Contributor Author

schlunma commented Dec 9, 2019

That sounds great @bjlittle, thanks for your help!!

@rcomer
Copy link

rcomer commented Dec 9, 2019

I should probably say that, since posting that issue, I've come across cases where two coords with the same name appears to be deliberate. Have emailed you @bjlittle .

@valeriupredoi
Copy link
Contributor

valeriupredoi commented Dec 10, 2019 via email

@bjlittle
Copy link
Contributor

I should probably say that, since posting that issue, I've come across cases where two coords with the same name appears to be deliberate.

After investigating, this wasn't an issue, given the proposed approach that I wish to adopt here. All good.

@valeriupredoi
Copy link
Contributor

@schlunma have you had the chance to look for Frankencubes with iris=3.0?

@schlunma
Copy link
Contributor Author

@schlunma have you had the chance to look for Frankencubes with iris=3.0?

Do you mean 2.3.0? I think 3.0 is not released yet.

@valeriupredoi
Copy link
Contributor

yes yes, I meant 2.3.0, sorry, I meant the 3rd iteration of 2 😁

@schlunma
Copy link
Contributor Author

No, but I don't think that it is expected to change, since @bjlittle talked about version 3.0.0 😁

@bjlittle
Copy link
Contributor

@valeriupredoi and @schlunma The latest scoop is that we've just released iris 2.3.0, but that doesn't contain the changes that you need to address frankencubes(:tm:).

We've also decided to push out iris 2.4.0, as a final 2.x release (this is imminent)... again that won't contain the magic you need, soz.

The iris 3.0.0 release has been slightly delayed due to the massive amount of content we're attempting to cram into it. This release will however, thankfully, have new lenient behaviour that will allow you to optionally tame the beast that is the frankencube(:tm:) :tada:

Apologies for the delay. I'm hoping that we can make 3.0.0rc0 available in February/March. If I get a firm date, then I'll be sure to let you know. Ideally, just follow us on https://twitter.com/scitools_iris so that you don't miss any release/candidate announcements 😉

HTH

@schlunma
Copy link
Contributor Author

I found another issue regarding Frankencubes:

The following file has two latitude and longitude (regarding the standard_name) coordinates:

netcdf tos_Omon_BCC-CSM2-MR_historical_r1i1p1f1_gn_185001-201412 {
dimensions:
        time = UNLIMITED ; // (1980 currently)
        lat = 232 ;
        lon = 360 ;
        bnds = 2 ;
variables:
        double time(time) ;
                time:bounds = "time_bnds" ;
                time:units = "days since 1850-1-1" ;
                time:calendar = "365_day" ;
                time:axis = "T" ;
                time:long_name = "time" ;
                time:standard_name = "time" ;
        double time_bnds(time, bnds) ;
        double lat(lat) ;
                lat:bounds = "lat_bnds" ;
                lat:units = "degrees_north" ;
                lat:axis = "Y" ;
                lat:long_name = "latitude" ;
                lat:standard_name = "latitude" ;
        double lat_bnds(lat, bnds) ;
        double lon(lon) ;
                lon:bounds = "lon_bnds" ;
                lon:units = "degrees_east" ;
                lon:axis = "X" ;
                lon:long_name = "Longitude" ;
                lon:standard_name = "longitude" ;
        double lon_bnds(lon, bnds) ;
        float latitude(lat, lon) ;
                latitude:standard_name = "latitude" ;
                latitude:long_name = "latitude" ;
                latitude:units = "degrees_north" ;
                latitude:missing_value = 1.e+20f ;
                latitude:_FillValue = 1.e+20f ;
        float longitude(lat, lon) ;
                longitude:standard_name = "longitude" ;
                longitude:long_name = "longitude" ;
                longitude:units = "degrees_east" ;
                longitude:missing_value = 1.e+20f ;
                longitude:_FillValue = 1.e+20f ;
        float tos(time, lat, lon) ;
                tos:standard_name = "sea_surface_temperature" ;
                tos:long_name = "Sea Surface Temperature" ;
                tos:comment = "Temperature of upper boundary of the liquid ocean, including temperatures below sea-ice and floating ice shelves." ;
                tos:units = "degC" ;
                tos:original_name = "tos" ;
                tos:cell_methods = "area: mean where sea time: mean (interval: 30 minutes)" ;
                tos:cell_measures = "area: areacello" ;
                tos:missing_value = 1.e+20f ;
                tos:_FillValue = 1.e+20f ;
                tos:coordinates = "latitude longitude" ;

This causes the tool to fail with tos: does not match coordinate rank, which is definitely not helping, because the rank of this file is just fine.

The problematic part is here

try:
for dim in self._cube.coord_dims(coordinate.standard_name):
dimensions.append(dim)
except iris.exceptions.CoordinateNotFoundError:
# Error reported at other stages
pass

which runs into the except part for latitude and longitude. It would be really nice if we could detect duplicate coordinates during _check_dim_names() or _check_coords() and get a nice error message. It took me 1 hr to find out what the problem was 😬

@bouweandela
Copy link
Member

@jvegasbsc or @sloosvel Could one of you have a look at this?

@jvegreg
Copy link
Contributor

jvegreg commented Mar 6, 2020

Yes, I will do it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working cmor Related to the CMOR standard
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants