Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merging IPCC analyses into a public branch #2068

Open
ledm opened this issue Mar 4, 2021 · 9 comments
Open

Merging IPCC analyses into a public branch #2068

ledm opened this issue Mar 4, 2021 · 9 comments

Comments

@ledm
Copy link
Contributor

ledm commented Mar 4, 2021

This is a continuation of the discussion that we started at yesterday's monthly developer meeting.

We're now approaching the "final" version of IPCC analyses, and these works need to be made available to the wider public. I suspect that this is going to be a painful process.

I see three main issues:

  1. Record keeping,
  2. Updating,
  3. Maintenance.

Record keeping

Before trying to update or merge any code, we need to record:

  • the version of ESMValCore and ESMValTool used to make the final figure.
  • We need to record whether any additional packages have been installed beyond those included in the ESMValTool conda.
  • What auxiliary data sets were used, where they live, and how they can be accessed in the future (as well as any ownership issues).

Updating

This has been a complex process and my final code is fragile, inefficient, hacky and does not adhere to any kind of code standards. Unfortunately, it also uses outdated core and iris methods. In order to be merged and made accessible:

  • The code will need to be updated to use the latest version of iris and esmvalcore,
  • the code will need to be editted for style and clarity,
  • the code will need to be moved from the ESMValTool-AR6 private fork to the public one.
  • The recipe and diagnostics code will need to be documented, described, and linked to the final figure.
  • Some kind of code-review process will be needed
  • Some kind of testing will be required as well.

Personally, one of my recipes takes several weeks to run, so there's no chance I'm going to go back and test the full version. It may be possible to create a "minimal" functioning version for testing though.

Another problem here is that all these updates could mean that the public facing code does not produce the same figure or results as those included in the IPCC report.

Maintenance

Once all this has been updated, reviewed and merged, who will be responsible for maintenance? This will be a non-trivial task, and the original authors may not have the time in the next few years to keep up to date with the twists and turns of ESMValTool development until AR7 rolls around.

Perhaps instead of merging the recipe into the full ESMValTool master branch, we could create a public fork with these recipes and diagnostics? If not, the master could become very bloated! On the other hand, I've spent more than a year producing this work, it would be a waste for it to disappear into the either and have to start again from scratch for AR7!

Finally, are there any ways to automate some of this and make it easy for everyone? If any of the record keeping, the updating or the maintenance could be updated, that would be great.

Timing wise, the final IPCC AR6 report should be out this Autumn. Ideally, we want everything available by then. Approx 6 months from now - start the clock! 🕐

@valeriupredoi
Copy link
Contributor

cheers for this issue @ledm - I think the key here is modularization - not in the sense of Python modules (even if some of the bits from the whole IPCC codebase can go into the Core and we can even create an ar6 esmvalcore module) but in terms of the diagnostics - breaking down the codebase in easy-to-digest and maintain bits is the way forward IMHO. This is just waffling on my part so far, but shall we do it this way, in line with a timeline:

  • first identify a list of diagnostics that will have to become public and open an issue for each of these diagnostics in the private AR6 github project; documenting each of the diagnostics can begin at issue opening stage;
  • it would be of great use if each issue would list and describe the pre-processing steps and implementations that were done for each of these diagnostics, specifically if any of the preprocessor stock functionality was changed in any way and/or if new pre-processing functionality was developed - this needs to be discussed first and issues and PR's be open in the Core github repo; this last step my suffer from a bit of a delay because the Core is public and I guess you can't really say "Implementing BLA-H for IPCC AR6 computations"; but it's very important to first disentangle whatever can go to Core since that can be merged much quicker (since it's purely technical and in lower volume (I assume)) and the only extra bit of work would be to write tests (since I assume little testing functionality was written). I think we can create an esmvalcore.ipcc or esmvalcore.ar6 dedicated module for those preprocessing functions that won't see much use apart from the IPCC analyses -> @bouweandela @jvegasbsc @schlunma what you guys stink?
  • assemble the code for each of those diagnostics in a separate PR, per diagnostic, in the private AR6 project repo - it's here we'd need heavy lifting from scientific reviewers, probably best to have people involved both from the IPCC and non-IPCC community to avoid opinion biases
  • while this is happening documentation for each of these diagnostics would have to be written, but am guessing that's gonna be the easiest bit since that's just drag&drop from the IPCC report

We must be very careful how we integrate the IPCC diagnostics with the aim of reusing them next time - AR7 - and minimal re-coding/re-factoring would be needed, ie the use of dictionaries would be highly encouraged 😁 How's this sound for a plan?

@jvegreg
Copy link
Contributor

jvegreg commented Mar 12, 2021

I think we can create an esmvalcore.ipcc or esmvalcore.ar6 dedicated module for those preprocessing functions that won't see much use apart from the IPCC analyses -> @bouweandela @jvegasbsc @schlunma what you guys stink?

I do not like to have the preprocessor tied to modules. If we end with too many preprocessor functions or too big files (I am looking at you, preprocessor._time.py), we should refine our theme organization to something like preprocessor.time.extraction, preprocessor.time.statistics. We already have some unintuitive names in the tool, let's try the core free of them

@ledm
Copy link
Contributor Author

ledm commented Mar 12, 2021

I've been rethinking my position on this issue. To be completely honest, it's been a hell of a long road, and I do not want update or maintain my ESMValTool-AR6 code anymore.

The pragmatic solution would be to preserve an original version of the code as it is. Ie: this is the code that we used, warts and all, this is the version of esmvalcore that I used, these are my recipes. Any additional changes to the code from that baseline would not represent what the figures in the Assessment Report. This is important for reproducibility and all that.

On the other hand, I can understand that someone (else - not me) would want to re-use this code (in 5 years for AR7 perhaps?). But I am not convinced that it would be worth the effort to update this code to iris 3, esmvaltool 2.2, let alone run it again it test it against each new version of each package that changes. Remember that it takes 4-6 weeks non-stop to run this analysis on jasmin, More likely that not, no one is ever going to run it ever again.

I don't think that it would be a quick and easy effort to prepare all IPCC AR6 recipes for merging into the ESMValTool master. If we must, that would be a wasteful top-down decision. Preserving the snapshot of where it is when the plots were when they were finalise seems like a much better and lazier option.

@bettina-gier
Copy link
Contributor

Should we do it on a case-by-case basis then? I know some code for chapters 3 (and 5) figures would not need much adaptation to include in the current version (I think mine could work out of the box - not counting codacy issues) and don't run very long. There's also cases where existing diagnostics from the e.g. flato13ipcc figure were reused and had some improvements - Lisa added some stuff for the perfmetrics to support multiple projects with spaces between them etc. At least these types of improvements would be nice to include in the public branch.

I'd suggest to archive the code used by ipcc for all figures as lee suggests with "I used this version to run this code" and cherry-pick stuff like improvements or "easier" plots to include in the public esmvaltool.

@malininae
Copy link
Contributor

@ledm and @bettina-gier Thanks for raising this topic. IPCC requirements include public, reproducible code, so we'd have to save the code in the form we created the figures and give it them within the next month or so (Guess some clean up is possible, but it's impossible to redo the code in non-hacky fashion, or improve it drastically). They'll store it, as I understand it correctly, on their host, or in their git directory, so no worries, the code will be available for AR7 :-) I agree, that ESMValTool community, the recipes/diagnostics should come in a somewhat different form and should be updated/improved. Personally, one of my extreme diagnostics for the final set of data takes about 10 days to run o_O , but there's not much one can do about it. However, the final basins figure (@ledm and @valeriupredoi know what I mean, and yes, my eye is also twitching), might be a nice addition to the tool and should improve with iris 3. So, I'm on board with @bettina-gier , do it on a case by case basis.

@nielsdrost
Copy link
Member

nielsdrost commented Mar 29, 2021

Hi all,

Great that we are having this discussion!

At the very least I think we should know what everyone was missing from ESMValTool to create the diagnostics so that in AR7 there is less need for custom ESMValCore things.

As a first step, would it be possible to start creating PRs for all needed additions to the ESMValCore and ESMValTool outside of the diagnostics? So this would be preprocessors, CMORizers, fixes, etc.

The recipe we use to test could be something like the check obs recipe , but with all needed AR6 datasets.

This we should be able to do in the public repo, right?

@remi-kazeroni
Copy link
Contributor

As a first step, would it be possible to start creating PRs for all needed additions to the ESMValCore and ESMValTool outside of the diagnostics? So this would be preprocessors, CMORizers, fixes, etc.

Yes that is definitely possible for the ESMValCore and should also be fine for ESMValTool, expect for the diagnostics. @LisaBock will contact IPCC developers to ask tehm to push their developments into the public repositories.

The figure code (diagnostic scripts) can not be made public before the report is published (August/September). There is still an uncertainty about what is meant by publishing the code. @LisaBock will enquire about and after we'll have an idea what this means for us in terms of ESMValTool releases.

Regarding the diagnostics, would it make sense to open PR in the AR6 and start the merging process in that repo before the publication of the IPCC report? How tricky would it be to move the code from the AR6 master branch to the public master branch?

@nielsdrost
Copy link
Member

Regarding the diagnostics, would it make sense to open PR in the AR6 and start the merging process in that repo before the publication of the IPCC report? How tricky would it be to move the code from the AR6 master branch to the public master branch?

The problem is that all the code would become one large PR that would be impossible to review, let alone merge. So I would strongly recommend not merging anything to the AR6 master. But perhaps having a PR in the AR6 repo (but never merging it!) would already help getting things organized?

@bouweandela
Copy link
Member

On a side note: the code quality services like CircleCI and Codacy that we rely on for maintaining code quality are only free for public repositories.

@LisaBock LisaBock self-assigned this May 3, 2021
@zklaus zklaus removed their assignment Feb 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

10 participants