Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for ESACCI Ocean Color (Chlorophyll) observations #2055

Merged
merged 21 commits into from
Jan 28, 2022

Conversation

ulrikaw-cloud
Copy link
Contributor

@ulrikaw-cloud ulrikaw-cloud commented Feb 25, 2021

Description

  • Closes #issue_number
  • Link to documentation:

Before you get started

Checklist

It is the responsibility of the author to make sure the PR is ready to review. The icons indicate whether the item will be subject to the 🛠 Technical or 🧪 Scientific review.

New or updated recipe/diagnostic:

New or updated data reformatting script:


To help with the number pull requests:

Copy link
Contributor

@valeriupredoi valeriupredoi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cheers for this! A couple minor technicalities to look at please 🍺

@valeriupredoi
Copy link
Contributor

also please have a look at the failing test, there is a recipe argument ("squeeze") that's not standard 👍

@axel-lauer
Copy link
Contributor

axel-lauer commented May 19, 2021

I just tried to run this recipe but failed early on. When running the recipe as is, I get the following error message:

[...]
  File "/mnt/lustre02/work/bd0854/b380103/ESMValCore/esmvalcore/preprocessor/__init__.py", line 218, in check_preprocessor_settings
    raise ValueError(
ValueError: Invalid argument(s): squeeze encountered for preprocessor function extract_levels.
Valid arguments are: [levels, scheme, coordinate]

After removing squeeze: true from the preprocessor extract_levels, the preprocessor finishes successfully but the diagnostic crashed with the following error:

Traceback (most recent call last):
  File "/mnt/lustre02/work/bd0854/b380103/ESMValTool/esmvaltool/diag_scripts/ocean/diagnostic_model_vs_obs.py", line 517, in <module>
    main(config)
  File "/mnt/lustre02/work/bd0854/b380103/ESMValTool/esmvaltool/diag_scripts/ocean/diagnostic_model_vs_obs.py", line 507, in main
    make_scatter(cfg, metadatas, filename, obs_filename)
  File "/mnt/lustre02/work/bd0854/b380103/ESMValTool/esmvaltool/diag_scripts/ocean/diagnostic_model_vs_obs.py", line 404, in make_scatter
    model_data = np.ma.masked_where(mask, model_data).compressed()
  File "/mnt/lustre02/work/bd0854/b380103/miniconda3/envs/esm22/lib/python3.9/site-packages/numpy/ma/core.py", line 1929, in masked_where
    raise IndexError("Inconsistent shape between the condition and the input"
IndexError: Inconsistent shape between the condition and the input (got (1, 90, 180) and (90, 180))

The reason seems to be that the ESACCI data produced by the preprocessor have a vertical level coordinate (dim = 1), while the model data do not. I presume that is what "squeeze" does in "extract_levels"? @ulrikaw-cloud @zklaus could you please check and possibly open a pull request for the "squeeze" feature?

I also noted that the example recipes "recipe_cmug_perfmetrics_example.yml" and "recipe_cmug_example.yml" have been included. These would best be simply deleted.

So far, there is no scientific documentation available. Please add such documentation following one of the examples in https://github.com/ESMValGroup/ESMValTool/tree/CMUG_ESACCI_oc_chlor_a/doc/sphinx/source/recipes or using the template provided here: https://github.com/ESMValGroup/ESMValTool/blob/CMUG_ESACCI_oc_chlor_a/doc/sphinx/source/recipes/recipe_template.rst.template

@axel-lauer
Copy link
Contributor

I also tested the CMORizer for the ESACCI-OC v5.0 data. The results look fine but I noticed that the time_bounds for the monthly means are extending from mid of month 12:00:00 to mid of next month 12:00:00. I would expect this should rather be 1st of month 00:00:00 to 1st of next month 00:00:00. @ulrikaw-cloud @zklaus could you please check and correct if needed?

@zklaus
Copy link

zklaus commented Dec 14, 2021

The reason seems to be that the ESACCI data produced by the preprocessor have a vertical level coordinate (dim = 1), while the model data do not.

It is not quite like that, but close. It is actually the model data that has a vertical level coordinate, because this is essentially the chlorophyll concentration which is a 3d field in the models. The satellites of ESA on the other hand have only information about the top layer, so their data starts out as a 2d surface field by nature.

To make it conform to the CMIP6 data request, we format that as a 3d field with a single layer, which makes sense since there is some information about the thickness of that surface layer and some indication that further analysis on part of ESA in the future might extend this with more layers or different depth information.

The problem then comes about because the extract_levels preprocessor effectively squeezes the models after extracting the single layer, while it returns the observations unchanged because it finds that it already has only a single layer.

I think this is somehow a bug in the extract_levels preprocessor, but for now, we modify the diagnostic script to be able to deal with this.

It is probably worth sorting this out more generally, since other variables might have a similar problem, namely 3d model data that need to be compared with surface observations by extracting the top layer of the models, though none of the other variables from ESACCI that we are looking at right now seem to be this way.

Comment on lines 60 to 66
*Note: (1) obs4MIPs data can be used directly without any preprocessing;
(2) see headers of reformat scripts for non-obs4MIPs data for download
instructions.*

* http://dx.doi.org/10.5285/00b5fc99f9384782976a4453b0148f49

*Reformat script:* <myreformatscript.py>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
*Note: (1) obs4MIPs data can be used directly without any preprocessing;
(2) see headers of reformat scripts for non-obs4MIPs data for download
instructions.*
* http://dx.doi.org/10.5285/00b5fc99f9384782976a4453b0148f49
*Reformat script:* <myreformatscript.py>
ESACCI-OC (chlor_a - esmvaltool/cmorizers/obs/cmorize_obs_esacci_oc.py)

Comment on lines 32 to 48
*Required settings for script*

* xxx: zzz

*Optional settings for script*

*Required settings for variables*

*Optional settings for variables*

*Required settings for preprocessor*

*Optional settings for preprocessor*

*Color tables*

* list required color tables (if any) here
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be populated with the corresponding variables or removed if not needed.

Comment on lines 76 to 60
.. _fig_mynewdiag_1:
.. figure:: /recipes/figures/<mynewdiagnostic>/awesome1.png
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An example figure would be great but is still missing.

@axel-lauer
Copy link
Contributor

Thanks for working on this PR and for adding some documentation. Please see some suggestions for the documentation above.

I also tested the CMORizer and the new recipe. The CMORizer works fine but the time bounds still look strange to me. The time bounds of the monthly means are extending from mid of month 12:00:00 to mid of next month 12:00:00. I would expect this should rather be 1st of month 00:00:00 to 1st of next month 00:00:00.

I could not run the recipe successfully. The preprocessor finishes but the diagnostic script crashes with the following error message:

Traceback (most recent call last):
  File "/mnt/lustre02/work/bd0854/b380103/ESMValTool/esmvaltool/diag_scripts/ocean/diagnostic_model_vs_obs.py", line 516, in <module>
    main(config)
  File "/mnt/lustre02/work/bd0854/b380103/ESMValTool/esmvaltool/diag_scripts/ocean/diagnostic_model_vs_obs.py", line 506, in main
    make_scatter(cfg, metadatas, filename, obs_filename)
  File "/mnt/lustre02/work/bd0854/b380103/ESMValTool/esmvaltool/diag_scripts/ocean/diagnostic_model_vs_obs.py", line 408, in make_scatter
    zrange = diagtools.get_array_range([model_data, obs_data])
  File "/mnt/lustre02/work/bd0854/b380103/ESMValTool/esmvaltool/diag_scripts/ocean/diagnostic_tools.py", line 693, in get_array_range
    mins.append(arr.min())
  File "/mnt/lustre02/work/bd0854/b380103/mambaforge/envs/esmvaltool/lib/python3.9/site-packages/numpy/core/_methods.py", line 44, in _amin
    return umr_minimum(a, axis, None, out, keepdims, initial, where)
ValueError: zero-size array to reduction operation minimum which has no identity

@axel-lauer
Copy link
Contributor

@ulrikaw-cloud Thanks for updating the documentation. I did some more polishing, I hope that's fine. Two main problems remain, though.

  1. I cannot run the diagnostic successfully. The script crashes with 'ValueError: zero-size array to reduction operation minimum which has no identity' (see my comment above for more details). @zklaus could you possibly help @ulrikaw-cloud with this?

  2. The time bounds in the cmorized ESACCI data seem weired. @zklaus as you are usually very careful about such issues, could you please take a look and let me know what you think?

@zklaus
Copy link

zklaus commented Jan 12, 2022

Thanks, @axel-lauer. I have corrected point 2. Point 1 was due to a combination of bad data (see CMIP6 erratum) and a bit unfortunate coding in the shared diagnostic script. Since the erratum already exists and the data is being republished, I am not writing a fix. Instead, I have deactivated the offending datasets for now.

@axel-lauer
Copy link
Contributor

@zklaus This is really cool! Thank you for working on this PR. The CMORizer works and the results look fine to me. The time coordinate of the output is not centered within the time bounds, but I guess that is fine. Time bounds look good now.

I have been able to produce a scatter plot with the recipe:

model_vs_obs_MassConcentrationofTotalPhytoplanktonExpressedasChlorophyllinSeaWater_MPI-ESM1-2-LR_ESACCI-OC__scatter

After creating the plot the diagnostic script hangs and does not exit. Not sure what could be the problem.

I noticed that the recipe originally produced four plots (as described in the documentation of the diagnostic). It seems the 3 missing plots have been deactivated in the current recipe. The missing plots should look like this:

oc_1

Was this on purpose? We had those 3 missing plots already included in the last CMUG delivarable. I tried to comment in these diagnostics, but then the diagnostics fail.

@ulrikaw-cloud @zklaus Could you please take a look at the hanging/missing plots or let me know if we do not need these diagnostics any longer (could be removed then, I guess)?

It would be great to get this merged very soon so we can meet our CMUG delivarable in time.

@zklaus
Copy link

zklaus commented Jan 14, 2022

@axel-lauer, you are right that the pdb line shouldn't be there. It had slipped me by in an earlier commit. But also, I had already fixed it. Perhaps something went wrong when you updated your local copy and one of those merge commits reintroduced it?

I have cleaned up the history again. The simplest way to get you on a clean current version would be

# Fetch the server state (assuming that you have only the relevant remote as "origin")
git fetch
# Make sure that you are on your local branch
git checkout <your-local-branch-name>
# Reset your local branch to the server state
# (ACHTUNG: This will destroy any commits and uncommitted changes that you added after your last push)
git reset --hard origin/CMUG_ESACCI_oc_chlor_a

@axel-lauer
Copy link
Contributor

@zklaus Thanks! The recipe is running fine now, the results and the documentation look fine to me. The only question left (from my point of view) is the missing provenance. Are there plans to add provenance?

I think this PR can now be converted from draft to a normal PR.

@axel-lauer axel-lauer self-requested a review January 17, 2022 09:02
Copy link
Contributor

@axel-lauer axel-lauer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Recipe runs fine, output looks good, documentation also looks good.

@axel-lauer
Copy link
Contributor

@zklaus @ulrikaw-cloud If possible, could you please let me know what you think about adding the provenance and changing this pull request from draft to normal? It would be great to get this merged this week. Apart from the missing provenance, I think this is ready to be merged! The CMUG meeting is on Monday (24 January 2022)...

@zklaus zklaus marked this pull request as ready for review January 20, 2022 10:26
@zklaus
Copy link

zklaus commented Jan 20, 2022

I have turned it into a regular PR now. The tricky bit with the provenance is that this recipe uses the diagnostic from the ocean toolbox which offers more functionality for 3d data and other things, so it is not immediately clear for me how to do the full provenance. But I will have a closer look today and maybe it's not too difficult.

Alternatively, since there are already two open issues about this diagnostic script and its maintenance, I think it might be defensible to relegate the provenance to the maintenance PR/issue of the script and merge this PR without. Just don't tell @bouweandela.

@axel-lauer
Copy link
Contributor

Thanks @zklaus ! I think even if the provenance would not be 100% complete, having something in place would be a lot better than nothing at all.

@axel-lauer
Copy link
Contributor

@zklaus Did you have a chance to decide on how to proceed with this? Thanks.

@zklaus
Copy link

zklaus commented Jan 26, 2022

@axel-lauer, I intend to look at the provenance. It's just a matter of finding the time. I hope to look at it today.

@axel-lauer
Copy link
Contributor

@zklaus Really cool! Thanks for adding provenance to the diagnostic! Works nicely but I would suggest three little changes regarding "themes" and "realm" in the recipe and "OBS6" instead of "OBS" in recipe_check_obs.yml. Would that be OK? Then we could finally merge this PR. Yay!

Klaus Zimmermann and others added 2 commits January 28, 2022 06:06
@axel-lauer axel-lauer merged commit 996bdde into main Jan 28, 2022
@axel-lauer axel-lauer deleted the CMUG_ESACCI_oc_chlor_a branch January 28, 2022 12:30
@bouweandela
Copy link
Member

bouweandela commented Feb 2, 2022

Maybe it's just me, but does the title of this pull request look descriptive to you? @axel-lauer @remi-kazeroni @valeriupredoi @zklaus

@valeriupredoi
Copy link
Contributor

Hahahah, I actually LOL-ed in the office when I read the title 😆 Now you're just being smug @bouweandela 😁

@zklaus zklaus changed the title Cmug esacci oc chlor a Add support for ESACCI Ocean Color (Chlorophyll) observations Feb 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants