Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ESMF returns garbage data after CORDEX regridding #772

Closed
thomascrocker opened this issue Sep 8, 2020 · 26 comments
Closed

ESMF returns garbage data after CORDEX regridding #772

thomascrocker opened this issue Sep 8, 2020 · 26 comments
Assignees

Comments

@thomascrocker
Copy link
Contributor

Hello, I'm attempting to use ESMValTool with data from the Met Office HadGEM3-RA regional model.
ESMValTool_issue_files.zip

I have successfully run my model output through an in house CMORization / mip_convert suite and added project lines to read the files to a custom config-developer.yml file (using the same CMOR tables as for CORDEX). My simple recipe and diagnostic runs without any errors, I've also not noticed any relevant warnings.

However, something is going wrong with the regridding steps (and if I remove regridding steps from the recipe, I get problems with area extraction, as I'll explain later).

I set save_intermediary_cubes to true in my config_user.yml in order to examine the state of the data as it moves through the pre processor. Regardless of regridding scheme chosen, the data created immediately after the regridding step (in 05_regrid.nc) has all been set to zero, across every single grid point. (The data prior to this step is still intact)

I wondered if this is perhaps related to this open pull request #185 (If so, is the code under review likely to fix my problem?)

I also tried removing the regridding steps from my recipe entirely. In this case although again my recipe ran without any warnings or errors, the output was not as expected and I noticed something goes wrong at the point that the region extraction is carried out. This time the data immediately after the region extraction has all been set to 1e20 (masking?). Again, the preprocessed data immediately prior to this step is fine.

Any help appreciated, I'm hoping to use ESMValTool to perform analysis of some downscaled climate projections for the Caribbean that we have recently completed.

Thanks

@thomascrocker
Copy link
Contributor Author

From stepping into the souce code of _area.py using pdb I worked out the problem with the area extraction. My data was defining the longitude coord in the range -180 to +180 rather than 0 to 360 as the code clearly expects. I can fix this by adapting my data preparation stage to ensure longitude is stored in the 0 to 360 range.
Am about to take a closer look at the regridding now to see if I can work out what is going wrong there.

@Peter9192
Copy link
Contributor

Hi @thomascrocker ,
Thanks for reaching out! For some datasets, that 0-360 thing is automatically fixed in the preprocessor. We might want to look into issuing a warning and adding a fix for your data. In the meantime, did you have any luck after changing your longitude coordinate?

@valeriupredoi you've been looking quite a bit into the regridding lately. Any thoughts?

@thomascrocker
Copy link
Contributor Author

thomascrocker commented Sep 11, 2020

In the meantime, did you have any luck after changing your longitude coordinate?

Thanks for the reply. I added some nco commands to my data preparation code (which needs to run anyway to convert the data from pp format to CF compliant netCDF format) so that any longitude points (and bounds) < 0 have 360 added to them, forcing longitude into the range 0 to 360. After doing that area extraction worked as expected.

I then spent some time stepping through the source code in _regid_esmpy.py to try and isolate where the problem was with the regridding. It looks like the issue is with the ESMF.Regrid object itself.

field_regridder = ESMF.Regrid(src_mask_values=np.array([1]),

regr_field = field_regridder(src_field, dst_field)

Stepping through the code, at this point in execution, (line 186) the data in src_field is fine, but the data returned in regr_field is all zeros.
I've never used ESMPY before, so may well have missed something obvious in the setup of the src_field and dst_field due to my inexperience with it. (I normally use IRIS when working with regular grids, and CDO if I have to use an irregular one).

One thing that might be significant is the co-ordinate system of this dataset. Since it's from a regional model the main dim coords for the variable (rlon and rlat) are defined on a rotated grid, (i.e. with a different north pole location) and the longitude and latitude coords are aux coords that span both these dimensions. Due to the fact that the rotated pole is in a different location, the rotated longitudes of this data span the meridian, and therefore include values greater than 360.

One last thing I noticed when working with the debugger was that in construction of the target grid, if the target grid is being specified from a string specification (e.g. "2.5.x2.5") then the code assumes that the co-ordinate system of the target grid should match that of the source grid.

xcoord.coord_system = src_cs

In my use case this behaviour is undesirable, since the source grid has a rotated pole co-ordinate system, but the grid specified ("2.5x2.5") is a global grid which one would assume to be on a regular unrotated co-ordinate system. This is probably relevant to this issue #493
To get around this in case it was an issue I tried specifying explicitly the grid from a GCM to regrid to in my recipe (HadGEM2-ES in this case) but I still saw the same behaviour of the regridded data all being set to zero,

Hope this information is helpful. I can upload a sample file if that would help? The one I've been using is around 100 mb in size, just let me know where to upload to.

@valeriupredoi
Copy link
Contributor

@thomascrocker cheers for raising this and thanks for the very detailed debugging and problem-tracing! One suggestion would be to run with --check_level=strict so you can see if there are any CMOR-related issues with the data - the code will stop and raise where the problem occurs; also, are you running on Jasmin? If so, can you pls point me to the data and the config/recipe you running? Cheers 🍺

@thomascrocker
Copy link
Contributor Author

@valeriupredoi I tried with --check_level=strict but the data passed all the checks.

I'm not running on JASMIN, I have a local install. My recipe file (and debug output) are attached in the zip in my original post. I've also attached my config files here:
esmvaltool_config.zip
I did have JASMIN credentials but they expired a couple of months ago since I've not used it for some time. I've reapplied for access.

Do you have access to the MASS service on JASMIN? My processed RCM data is stored in MASS. If I understand MASS permissions correctly (there's a non zero chance I don't....) it should be read accessible to any MASS user.
See:
moose:/adhoc/users/thomas.crocker/cmor/u-bx459/CEFAS_Antigua/output/CEFAS_Antigua/MOHC/MOHC-HadGEM2-ES/historical/r1i1p1/MOHC-HadREM3-GA7-05/v1/v1/pr/pr_CEFAS_Antigua_MOHC-HadGEM2-ES_historical_r1i1p1_MOHC-HadREM3-GA7-05_v1_mon_197912-198812.nc

One other thing. I've been attempting to regrid the data from my high resolution (~12km grid) RCM to a coarser GCM scale grid. This uses the RCM as the source grid, and GCM as the destination. Since ESMValTool examines the lat and lon coords in the source grid and sees they are irregular it opts for ESMpy to handle regridding. Earlier I tried the other way around, i.e. regridding GCM data from the GCM grid onto the RCM grid. This time, ESMValTool sees that the lat and lon of the source grid are regularly spaced, and uses iris for the regridding. However, this also fails, in the case of the area_weighted scheme with the message:

ValueError: The horizontal grid coordinates of both the source and grid cubes must have the same coordinate system.

As far as I'm aware this is a known limitation of iris. I.e. area weighted regridding is not currently supported for differing coordinate systems.
With the linear scheme there is a different error:

ValueError: The rectilinear grid coordinates of the given cube and target grid must either both have coordinate systems or both have no coordinate system but with matching coordinate metadata.

Looking at the data, the CMIP5 GCM data does not have a coordinate system specified. My RCM data does have the coordinate system specified on the rotated coords (rlon and rlat), but not on the standard lat and lon coords.
In any case I really need area weighted regridding for my analysis anyway.
Thanks

@thomascrocker
Copy link
Contributor Author

I've not gotten any further with this yet, since I've been focussing on a couple of other bits of work. However in the meantime I have got my JASMIN access sorted again. @valeriupredoi Could you let me know how to access the installations there? I can then transfer my data to JASMIN and we'll be singing from the same hymn sheet at least.

@bouweandela
Copy link
Member

friendly ping @valeriupredoi

@valeriupredoi
Copy link
Contributor

blast! this fell off the radar big way, sorry. @thomascrocker on JASMIN just do module load esmvaltool (on any of the sci nodes) and you'll have the latest esmvaltool, I'll have a closer look at the issue today, need lunch now 🍺

@valeriupredoi
Copy link
Contributor

@thomascrocker - right - am afraid without the actual data I can't do much; it would be great if you could put the data (or a sample of it) on a shared group workspace, eg esmeval where we store the OBS data - could you maybe do that and let me know pls (you'll have to apply for membership of that gws but I'll grant it right away), after that I will attempt at running a test 🍺 Quick question - is this a dataset that will eventually be on ESGF? If so, we need to talk - the official cmorization is done via CDDS and that's what you guys should be running 👍

@thomascrocker
Copy link
Contributor Author

thomascrocker commented Nov 9, 2020

Hi @valeriupredoi thanks very much for looking into this. I've applied for access to the esmval workspace so will upload some data once it's approved.

On the ESGF question. No it won't be ending up there (at least not in the short term). The budget for the project is very limited and hosting on ESGF isn't part of it. However, I have been doing my cmorization using an adapted version of the CDDS suites that are being used internally to prep data from other projects for upload to ESGF.
There has been a lot of interest in data from the project from others so it's highly likely it will be used in further research in the future, hopefully that might result in an official hosting on ESGF at some point.

@valeriupredoi
Copy link
Contributor

@thomascrocker just approved your membership for esmeval 👍 If you could tell me where you put the data that'd be awesome! About CDDS - good stuff, let me know if you want to use the Jasmin installation, I am managing the project on Jasmin 🍺

@thomascrocker
Copy link
Contributor Author

OK, I've dropped some monthly data in:
/group_workspaces/jasmin4/esmeval/gh_issue772/mi-ba795

I've also uploaded the custom CMOR tables that I've been using (these are the ones that were used by the suite I used to CMORize my data) into:

/group_workspaces/jasmin4/esmeval/gh_issue772/cmor_tables

Finally, here is the entry I used in my config-developer.yml file for reading the data in recipes:

Antigua_NAP_raw:
  input_dir:
    default: '{dataset}'
  input_file: '{short_name}_CEFAS_Antigua_{driver}_{exp}_r1i1p1_MOHC-HadREM3-GA7-05_v1_{mip}*.nc'
  output_file: '{short_name}_{dataset}_{exp}_{mip}'
  cmor_path: '/project/ciid/projects/Antigua/CMOR_tables/cordex'
  cmor_type: 'CMIP6'
  cmor_default_table_prefix: 'CMIP6_'

Hopefully that should be sufficient for you to read in the data and attempt to regrid it to a regular lat/lon grid. Let me know if you have any issues or questions. Cheers

@valeriupredoi
Copy link
Contributor

great! cheers, mate - I'll have a stab at testing it tomorrow 👍 🍺

@valeriupredoi
Copy link
Contributor

@thomascrocker here's what I found out: I tried running a recipe (which I post below), and immediately run into this CMOR checker error:

esmvalcore.cmor.check.CMORCheckError: There were errors in variable pr:
Coordinate longitude has var name longitude instead of lon
 Coordinate latitude has var name latitude instead of lat
in cube:
precipitation_flux / (kg m-2 s-1)   (time: 120; grid_latitude: 310; grid_longitude: 656)
     Dimension coordinates:
          time                           x                   -                    -
          grid_latitude                  -                   x                    -
          grid_longitude                 -                   -                    x
     Auxiliary coordinates:
          latitude                       -                   x                    x
          longitude                      -                   x                    x
     Attributes:
          CORDEX_domain: CEFAS_Antigua
          Conventions: CF-1.4
          c3s_disclaimer: This data has been produced in the context of the C3S_34b_Lot1 and Lot...
          comment: at surface; includes both liquid and solid phases from all types of clouds...
          contact: [email protected]
          driving_experiment: MOHC-HadGEM2-ES,historical,r1i1p1
          driving_experiment_name: historical
          driving_model_ensemble_member: r1i1p1
          driving_model_id: MOHC-HadGEM2-ES
          experiment: Historical run using HadGEM2-ES as a driving model
          experiment_id: historical
          frequency: mon
          institute_id: MOHC
          institution: MetOffice, Hadley Centre, UK
          model_id: MOHC-HadREM3-GA7-05
          original_name: mo: (stash: m01s05i216, lbproc: 128)
          product: output
          project_id: CEFAS_Antigua
          rcm_version_id: v1
          references: https://www.metoffice.gov.uk/climate-guide/science/science-behind-climate-change/hadley;...
          source_file: /group_workspaces/jasmin4/esmeval/gh_issue772/mi-ba795/pr_CEFAS_Antigu...
     Cell methods:
          mean: time

Note that this is a not such a serious issue since if one runs with --check_level relaxed the warning will pop:

2020-11-11 14:24:02,380 UTC [20873] WARNING There were warnings in variable pr:
Coordinate longitude has var name longitude instead of lon
 Coordinate latitude has var name latitude instead of lat

but the code will not crash as it's supposed to do when there are serious issues. Note that this should be fixed since the default setting for cmor checks does crash and spits out this error, and we want data to ideally be compliant to the default level. But anyway, we can run with relaxed for now...

OK, now the recipe:

---
documentation:
  description: |
    Check 0 values from regrid

  authors:
    - predoi_valeriu

preprocessors:
  regridp:
    regrid:
      target_grid: 1x1
      scheme: linear

diagnostics:
  BerkeleyEarth:
    description: Antigua check
    variables:
      pr:
        preprocessor: regridp
    additional_datasets:
      - {dataset: MOHC-HadREM3-GA7-05, domain: CEFAS_Antigua, project: CORDEX, driver: MOHC-HadGEM2-ES, mip: mon, exp: historical, ensemble: r1i1p1, rcm_version: v1,
         start_year: 2000, end_year: 2008}
    scripts: null

you can use this recipe and use the CORDEX project right out the box, not needing to add projects to the config-developer file nor paths to custom cmor dirs; see the config-user:

rootpath:
  CORDEX: /group_workspaces/jasmin4/esmeval/gh_issue772/mi-ba795
drs:
  CORDEX: default

and it'll work right away! Also note that your custom cmor tables are almost identical to the ones in esmvaltool (CORDEX ones), bar a couple more vars in your tables, but not affecting the current tests.

Now, on to the issue at hand - why the regridded data is 0: here's what I found:

  • this is ESMPY regridding (irregular regridding)
  • the source data that comes in is small (order 1e-5) and the data that comes out from the field regridder is even smaller (order 1e-300) and this is where the problem kicks in: in _regrid_esmpy.py:
    def regridder(src):
        """Regrid 2d for irregular grids."""
        res = get_empty_data(dst_rep.shape, src.dtype)
        data = src.data
        if np.ma.is_masked(data):
            data = data.data
        src_field.data[...] = data.T
        print("source data", np.mean(src_field.data))
        regr_field = field_regridder(src_field, dst_field)
        print("regridder field data", np.mean(regr_field.data))
        res.data[...] = regr_field.data[...].T
        res.mask[...] = dst_mask
        return res

the source data is O(1e-5) and the regr data is O(1e-300) - so it means that the ESMF regridder:

    field_regridder = ESMF.Regrid(src_mask_values=np.array([1]),
                                  dst_mask_values=np.array([1]),
                                  **regridding_arguments)

is spitting out garbage; note that this is what I got for a regrid on a 1x1 degree grid, increasing the target grid to 10x10 or 20x20 etc does not improve the supertiny result numbers, nor does multiplying the source data by any given of orders of magnitudes. It is as if the ESMF regridder hits an overflow and returns garbage instead of actual values (1.3e-322 for each data point). I am not familiar with ESMF though, but @zklaus is and maybe he can help? 🍺

@valeriupredoi valeriupredoi changed the title Unexpected behaviour when regridding / extracting a region ESMF returns garbage data after regridding Nov 11, 2020
@valeriupredoi valeriupredoi changed the title ESMF returns garbage data after regridding ESMF returns garbage data after CORDEX regridding Nov 11, 2020
@valeriupredoi
Copy link
Contributor

also just noticed that some of my findings have already been listed by @thomascrocker in an earlier comment - meh, good thing we reached the same conclusion 😁

@bouweandela
Copy link
Member

It might be a good idea to send @zklaus an email if you want a reaction from him, he hasn't been very actively reading the notifications from the ESMValTool project recently..

@valeriupredoi
Copy link
Contributor

let's see if it works: Klauuuussssss @zklaus @zklaus @zklaus 😁 Yeah, I'll send him an email 📫

@thomascrocker
Copy link
Contributor Author

@valeriupredoi thanks for taking a look and good to see that we both came to the same conclusion at least.
I also encountered the issue with the co-ordinate names, but I think I got around it using my custom CMOR tables. Good to know that I can use the standard CORDEX project settings though provided I just change the lat and lon naming in my use of the CDDS suite.
Let's hope someone can shed some light on the ESMF behaviour

@valeriupredoi
Copy link
Contributor

I will be emailing Klaus in 2min

@zklaus
Copy link

zklaus commented Nov 18, 2020

Alright, had a look, here is what I found:

  • @thomascrocker is completely right that the treatment of coordinate systems in _regrid.regrid is problematic. Basically it just says to interpret the latitudes and longitudes created in the stock cube (often 1x1 or similar) according to the coord_system of the data cube. This kind-of worked up to now mostly because everyone is using geodetic cs anyway, but is quite dangerous when we are mixing in other coordinate systems, like the rotated one here and probably some projections.

  • The main problem is that the regridder doesn't check whether source and target cover the same domain. If the target is not covered by the source, we need to employ mask handling, even if neither source nor target are masked.

@valeriupredoi
Copy link
Contributor

valeriupredoi commented Nov 18, 2020

good stuff @zklaus 👍 - what do you recommend - implementing a check on the CS and subsequent treatment depending on it or this is something that will work out of the box in iris 3+ - also do you think this is worth plopping on to SciTools GitHub? 🍺

@zklaus
Copy link

zklaus commented Nov 18, 2020

The coordinate system issue has to be fixed regardless of this regridding problem, I think. It is a bit hairy though for two reasons: One is that a general approach with cartopy.crs.CRS and iris.coord_systems.CoordSystem can be involved. Perhaps some changes to iris (e.g. to complement the CoordSystem.as_cartopy_crs function, a CoordSystem.from_cartopy_crs classmethod could be added) would be most sensible.

The regridding should be taken care of with the new regridding project. But perhaps an interims fix is sensible?

@zklaus
Copy link

zklaus commented Nov 19, 2020

@thomascrocker, @valeriupredoi, I put up PR #865 that should fix the regridding issue (at least I was able to run your recipe and data with it. Could you give it a test run?

@thomascrocker
Copy link
Contributor Author

@thomascrocker, @valeriupredoi, I put up PR #865 that should fix the regridding issue (at least I was able to run your recipe and data with it. Could you give it a test run?

Thanks for developing a fix so quickly..

Looks like there is a problem though, I found that while the regridding appears to work correctly, it has the side effect of doing something to the latitude and longitude co-ordinates such that Iris can't read them..

This is what a print(cube) in python returns on the regrid cube produced by the processor

precipitation_flux / (kg m-2 s-1)   (time: 360; -- : 72; -- : 144)
     Dimension coordinates:
          time                           x         -        -

However, it looks like the coords are still there in the netcdf file

$ ncdump -h 06_regrid.nc 
netcdf \06_regrid {
dimensions:
	time = 360 ;
	lat = 72 ;
	lon = 144 ;
	bnds = 2 ;
variables:
	float pr(time, lat, lon) ;
		pr:_FillValue = 1.e+20f ;
		pr:standard_name = "precipitation_flux" ;
		pr:long_name = "Precipitation" ;
		pr:units = "kg m-2 s-1" ;
		pr:cell_methods = "time: mean" ;
		pr:grid_mapping = "rotated_latitude_longitude" ;
	int rotated_latitude_longitude ;
		rotated_latitude_longitude:grid_mapping_name = "rotated_latitude_longitude" ;
		rotated_latitude_longitude:grid_north_pole_latitude = 68.5 ;
		rotated_latitude_longitude:grid_north_pole_longitude = 112.5 ;
		rotated_latitude_longitude:north_pole_grid_longitude = 0. ;
	double time(time) ;
		time:axis = "T" ;
		time:bounds = "time_bnds" ;
		time:units = "days since 1850-1-1 00:00:00" ;
		time:standard_name = "time" ;
		time:long_name = "time" ;
		time:calendar = "360_day" ;
	double time_bnds(time, bnds) ;
	double lat(lat) ;
		lat:axis = "Y" ;
		lat:bounds = "lat_bnds" ;
		lat:units = "degrees_north" ;
		lat:standard_name = "latitude" ;
	double lat_bnds(lat, bnds) ;
	double lon(lon) ;
		lon:axis = "X" ;
		lon:bounds = "lon_bnds" ;
		lon:units = "degrees_east" ;
		lon:standard_name = "longitude" ;
	double lon_bnds(lon, bnds) ;
...

Could it be that the grid_mapping attribute needs removing from the variable?

@zklaus
Copy link

zklaus commented Nov 23, 2020

Could it be that the grid_mapping attribute needs removing from the variable?

Yes, that does seem to do the trick. WIll have a look how it snuck through...

@zklaus
Copy link

zklaus commented Nov 25, 2020

Turns out this is a resurfacing of #479.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants