-
Notifications
You must be signed in to change notification settings - Fork 128
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merging IPCC analyses into a public branch #2068
Comments
cheers for this issue @ledm - I think the key here is modularization - not in the sense of Python modules (even if some of the bits from the whole IPCC codebase can go into the Core and we can even create an
We must be very careful how we integrate the IPCC diagnostics with the aim of reusing them next time - AR7 - and minimal re-coding/re-factoring would be needed, ie the use of dictionaries would be highly encouraged 😁 How's this sound for a plan? |
I do not like to have the preprocessor tied to modules. If we end with too many preprocessor functions or too big files (I am looking at you, |
I've been rethinking my position on this issue. To be completely honest, it's been a hell of a long road, and I do not want update or maintain my ESMValTool-AR6 code anymore. The pragmatic solution would be to preserve an original version of the code as it is. Ie: this is the code that we used, warts and all, this is the version of esmvalcore that I used, these are my recipes. Any additional changes to the code from that baseline would not represent what the figures in the Assessment Report. This is important for reproducibility and all that. On the other hand, I can understand that someone (else - not me) would want to re-use this code (in 5 years for AR7 perhaps?). But I am not convinced that it would be worth the effort to update this code to iris 3, esmvaltool 2.2, let alone run it again it test it against each new version of each package that changes. Remember that it takes 4-6 weeks non-stop to run this analysis on jasmin, More likely that not, no one is ever going to run it ever again. I don't think that it would be a quick and easy effort to prepare all IPCC AR6 recipes for merging into the ESMValTool master. If we must, that would be a wasteful top-down decision. Preserving the snapshot of where it is when the plots were when they were finalise seems like a much better and lazier option. |
Should we do it on a case-by-case basis then? I know some code for chapters 3 (and 5) figures would not need much adaptation to include in the current version (I think mine could work out of the box - not counting codacy issues) and don't run very long. There's also cases where existing diagnostics from the e.g. flato13ipcc figure were reused and had some improvements - Lisa added some stuff for the perfmetrics to support multiple projects with spaces between them etc. At least these types of improvements would be nice to include in the public branch. I'd suggest to archive the code used by ipcc for all figures as lee suggests with "I used this version to run this code" and cherry-pick stuff like improvements or "easier" plots to include in the public esmvaltool. |
@ledm and @bettina-gier Thanks for raising this topic. IPCC requirements include public, reproducible code, so we'd have to save the code in the form we created the figures and give it them within the next month or so (Guess some clean up is possible, but it's impossible to redo the code in non-hacky fashion, or improve it drastically). They'll store it, as I understand it correctly, on their host, or in their git directory, so no worries, the code will be available for AR7 :-) I agree, that ESMValTool community, the recipes/diagnostics should come in a somewhat different form and should be updated/improved. Personally, one of my extreme diagnostics for the final set of data takes about 10 days to run o_O , but there's not much one can do about it. However, the final basins figure (@ledm and @valeriupredoi know what I mean, and yes, my eye is also twitching), might be a nice addition to the tool and should improve with iris 3. So, I'm on board with @bettina-gier , do it on a case by case basis. |
Hi all, Great that we are having this discussion! At the very least I think we should know what everyone was missing from ESMValTool to create the diagnostics so that in AR7 there is less need for custom ESMValCore things. As a first step, would it be possible to start creating PRs for all needed additions to the ESMValCore and ESMValTool outside of the diagnostics? So this would be preprocessors, CMORizers, fixes, etc. The recipe we use to test could be something like the check obs recipe , but with all needed AR6 datasets. This we should be able to do in the public repo, right? |
Yes that is definitely possible for the ESMValCore and should also be fine for ESMValTool, expect for the diagnostics. @LisaBock will contact IPCC developers to ask tehm to push their developments into the public repositories. The figure code (diagnostic scripts) can not be made public before the report is published (August/September). There is still an uncertainty about what is meant by publishing the code. @LisaBock will enquire about and after we'll have an idea what this means for us in terms of ESMValTool releases. Regarding the diagnostics, would it make sense to open PR in the AR6 and start the merging process in that repo before the publication of the IPCC report? How tricky would it be to move the code from the AR6 |
The problem is that all the code would become one large PR that would be impossible to review, let alone merge. So I would strongly recommend not merging anything to the AR6 master. But perhaps having a PR in the AR6 repo (but never merging it!) would already help getting things organized? |
On a side note: the code quality services like CircleCI and Codacy that we rely on for maintaining code quality are only free for public repositories. |
This is a continuation of the discussion that we started at yesterday's monthly developer meeting.
We're now approaching the "final" version of IPCC analyses, and these works need to be made available to the wider public. I suspect that this is going to be a painful process.
I see three main issues:
Record keeping
Before trying to update or merge any code, we need to record:
Updating
This has been a complex process and my final code is fragile, inefficient, hacky and does not adhere to any kind of code standards. Unfortunately, it also uses outdated core and iris methods. In order to be merged and made accessible:
Personally, one of my recipes takes several weeks to run, so there's no chance I'm going to go back and test the full version. It may be possible to create a "minimal" functioning version for testing though.
Another problem here is that all these updates could mean that the public facing code does not produce the same figure or results as those included in the IPCC report.
Maintenance
Once all this has been updated, reviewed and merged, who will be responsible for maintenance? This will be a non-trivial task, and the original authors may not have the time in the next few years to keep up to date with the twists and turns of ESMValTool development until AR7 rolls around.
Perhaps instead of merging the recipe into the full ESMValTool master branch, we could create a public fork with these recipes and diagnostics? If not, the master could become very bloated! On the other hand, I've spent more than a year producing this work, it would be a waste for it to disappear into the either and have to start again from scratch for AR7!
Finally, are there any ways to automate some of this and make it easy for everyone? If any of the record keeping, the updating or the maintenance could be updated, that would be great.
Timing wise, the final IPCC AR6 report should be out this Autumn. Ideally, we want everything available by then. Approx 6 months from now - start the clock! 🕐
The text was updated successfully, but these errors were encountered: