Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slight refactoring of diag galytska23/select_variables_for_tigramite.py for generality and portability (for Changelog v2.10: authors: @valeriupredoi and @egalytska) #3298

Merged
merged 10 commits into from
Aug 11, 2023

Conversation

valeriupredoi
Copy link
Contributor

@valeriupredoi valeriupredoi commented Jul 18, 2023

Description

@egalytska and myself are trying to refactor galytska23/select_variables_for_tigramite.py diagnostic so it is:


Before you get started

Checklist

It is the responsibility of the author to make sure the pull request is ready to review. The icons indicate whether the item will be subject to the 🛠 Technical or 🧪 Scientific review.

New or updated recipe/diagnostic

@valeriupredoi
Copy link
Contributor Author

@egalytska I have slightly refactored the diag, and made some variable names changes (code variables, not actual analysis variables eg "key" -> "dataset" etc - so that the code reads better), so that the thing is more readable, also simplified some functionality, and, more importantly, I pulled all your references to specific variables out of the bellows of code, so that they can be set at a higher level. I have not tested it, but I don't expect it to fail, or to produce different results, since I've kept the functionality as it was before. Note the variables dictionaries at the very top - you can pull those out and make them recipe options now, so you allow contributors to add/remove variables etc. If you are happy with these changes, please take it from here and tune it more 👍

Copy link
Contributor

@egalytska egalytska left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @valeriupredoi
Thanks a lot for the amazing suggestions.
Unfortunately, the code crashed (please see the comments), but I suggested some alternatives.
Now the test run was successful.
But Levante is super slow today, so the full run takes forever (to compare the results).
P.S. This is my first review, I hope I did not mess things up.

@valeriupredoi
Copy link
Contributor Author

valeriupredoi commented Jul 20, 2023

excellent review work @egalytska 🎖️ Please commit the suggested changes straight to the branch here - me not having run the actual thing was not aware of the subtleties of previous implementation - this way we can still keep the higher degree of generality, and it also works (which is rather important 😁 ) 🍺

Also, I'll leave it up to you how to bring forth those variables and expose them to the API outside the code (if you think that's useful - I reckon so) ie put them straight into the recipe, or create a configuration file for the diag etc. Let me know if I can help in any way 👍

@valeriupredoi
Copy link
Contributor Author

hi @egalytska you got a bit of time operate the changes you requested - no worries for time, people are mostly on holidays around this time, will go on hols myself in a couple weeks 🏖️

@egalytska
Copy link
Contributor

Hi @valeriupredoi
I just came back from vacation myself. Planning to fix this next days.

@egalytska
Copy link
Contributor

Hi @valeriupredoi , I did several tests to check if the output is the same. All looks good. Should I now go for "Ready for review"?

@valeriupredoi valeriupredoi marked this pull request as ready for review August 7, 2023 15:15
@valeriupredoi
Copy link
Contributor Author

stellar work @egalytska - many thanks for your changes and checks, I made this RfR, and if you are happy with it, please approve, so we can get it in 🍺

Copy link
Contributor

@egalytska egalytska left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @valeriupredoi. I am approving this.
Though there is an issue: for a few days this diagnostics is incredibly slow for me in Levante. And I am concerned whether its the fault of Levante, my diagnostics, or smth in ESMVAlTool?
To be on a safe side, I tested my original diagnostics (before its improvements) and it is now also incredibly slow (which it was not before).
So I wanted to ask whether you could do a test run (just with a few models for a short time, and without cvdp), so we could hopefully track down the reason.

@valeriupredoi valeriupredoi changed the title Slight refactoring of diag galytska23/select_variables_for_tigramite.py for generality and portability Slight refactoring of diag galytska23/select_variables_for_tigramite.py for generality and portability (for Changelog v2.10: authors: @valeriupredoi and @egalytska) Aug 8, 2023
@valeriupredoi
Copy link
Contributor Author

Many thanks, Evgenia! I will, tomorrow - not too fussed about it since it's most probably Levante acting up, but a test won't hurt 👍

@remi-kazeroni
Copy link
Contributor

Hi @valeriupredoi. I am approving this. Though there is an issue: for a few days this diagnostics is incredibly slow for me in Levante. And I am concerned whether its the fault of Levante, my diagnostics, or smth in ESMVAlTool? To be on a safe side, I tested my original diagnostics (before its improvements) and it is now also incredibly slow (which it was not before). So I wanted to ask whether you could do a test run (just with a few models for a short time, and without cvdp), so we could hopefully track down the reason.

Could this be related to the new Dask configuration? recipe_galytska23jgr was run successfully for the release of v2.9 and it took about 2h on a Levante compute node (see this table). Maybe this is one of the recipes for which the default configuration leads to longer runtimes?

@valeriupredoi
Copy link
Contributor Author

cheers @remi-kazeroni - running a minitest now myself; don't think it's the Dask issue since @egalytska could run fine this very recipe using this branch until the very last time she tested, unless she's updated her env fully to bring in the new Core and iris for the last run. Think it's ready to merge, would you have a minute to have a quick high-level look at this PR, Remi, pls mate? 🍺

@valeriupredoi
Copy link
Contributor Author

OK my minitest is indeed taking for ages, bit it's the same with the latest main - the problem is the preprocessing and not the diagnostic (it didn't even reach diag running) - it seems almost 100% that @remi-kazeroni is correct about the new Dask+iris being the roadblock here, so this can be merged (I would literally eat my hat if the changes we made here did indeed slow down the show). @egalytska when you tested this PR and it ran fast enough for you, do you remember which esmvalcore/iris version did you use? At any rate, I am very happy with this PR (from a technical standpoint, no clue about the science, but that's Evgenia's land) 🍺

@egalytska
Copy link
Contributor

So my environment has ESMValCore v2.9.0 and iris 3.6.0 and I did not update the environment between the "fast" runs and "slow" runs. I think the runs became this slow for me after I pulled the main branch into my local branch. Does it make sense?

@valeriupredoi
Copy link
Contributor Author

valeriupredoi commented Aug 10, 2023

So my environment has ESMValCore v2.9.0 and iris 3.6.0 and I did not update the environment between the "fast" runs and "slow" runs. I think the runs became this slow for me after I pulled the main branch into my local branch. Does it make sense?

cheers @egalytska - only Thor knows, it could be the file system is just slow on Levante since a while ago, it could be the shared node is busier than normal, all manners of causes, defo not this poor branch here 😁

@remi-kazeroni
Copy link
Contributor

@egalytska and @valeriupredoi, I have just run successfully recipe_galytska23jgr on a Levante compute node using the latest main branches for Tool and Core and got the same runtime and memory consumption as we had for the testing of v2.9. I'm not saying the performance is "optimal", but simply that this has not changed since the release of version v2.9. I will make another test using the updated diagnostic from this branch. If the performance is still unchanged, this can be merged.

For reference, we have a utility script that produces batch scripts for DKRZ-Levante with parameters automatically tweaked for each recipe. This is what I have just used again to get something like:

#!/bin/bash -l

#SBATCH --job-name=recipe_galytska23jgr.%J
#SBATCH --output=recipe_galytska23jgr.%J.out
#SBATCH --error=recipe_galytska23jgr.%J.err
#SBATCH --account=bdXXXX
#SBATCH --partition=compute
#SBATCH --time=04:00:00
#SBATCH --mem=0

set -eo pipefail
unset PYTHONPATH

. PATH_TO/mambaforge/etc/profile.d/conda.sh
conda activate my_env

esmvaltool run recipe_galytska23jgr.yml

@valeriupredoi
Copy link
Contributor Author

thanks, Remi, breathing a statistics-induced sigh of relief 😁

@remi-kazeroni
Copy link
Contributor

Running the recipe with this branch leads to very similar runtime (Time for running the recipe was: 2:05:49.256957), so nothing to worry about. I agree with @valeriupredoi, it's good to see that there is no further performance decrease since v2.9.0 release.

I understand from @egalytska 's comments that the recipe is slower than what it used to be with earlier versions of the Core. But this is part of a larger issue related to the ongoing daskification that has some side-effects in terms of performance...

This PR is ready to be merged imo

Copy link
Contributor

@remi-kazeroni remi-kazeroni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your contribution @valeriupredoi and @egalytska! I could run the recipe successfully and the time series produced look identical to that of results from v2.9 testing 👍

@valeriupredoi
Copy link
Contributor Author

brilliant, very many thanks @remi-kazeroni for spending time test it 🍺 And many thanks to @egalytska for the work here too 🍺

@valeriupredoi valeriupredoi merged commit a8186dd into main Aug 11, 2023
2 checks passed
@valeriupredoi valeriupredoi deleted the improve_diag_select_vars_tigramite_galytska23 branch August 11, 2023 10:37
ehogan added a commit that referenced this pull request Aug 30, 2023
…_RTW

* recipe_test_workflow_prototype: (237 commits)
  Add version to dataset in python example recipe to avoid "Unknown file format" issue on JASMIN (#3322)
  CMORizer for NASA MERRA reanalysis (#3039)
  Add `OBS-maintainers` team to documentation on OBS data maintenance and CMORizer reviews (#3335)
  Fixed provenance tracking for NCL multipanel PNGs (#3332)
  Cmorizer for NOAA-CIRES-20CR v3 reanalysis (clt, clwvi, hus, prw, rlut, rlutcs, rsut, rsutcs) (#3137)
  [Condalock] Update Linux condalock file (#3321)
  Slight refactoring of diag `galytska23/select_variables_for_tigramite.py` for generality and portability (for Changelog v2.10: authors: @valeriupredoi and @egalytska) (#3298)
  Removed recipe_carvalhais14nat from list of broken recipes (#3319)
  add Romain Beucher to CITATION as contributor (#3318)
  update `mamba` version in readthedocs configuration docs builds (#3310)
  [Github Actions] Compress all bash shell setters into one default option per workflow (#3315)
  [condalock] update conda lock creation Github Action workflow and ship updated (bot-generated) conda-lock file (#3307)
  Allow NCL unit conversion `kg s-1` -> `GtC y-1` (#3300)
  Add list of failing recipes for v2.9.0 release (#3294)
  Update diag_shapeselect.py to work with shapely v2 (#3283)
  Update release schedule after release of v2.9.0 (#3289)
  Add merge instructions to release instructions (#3292)
  Made sklearn test backwards-compatible with sklearn < 1.3 (#3285)
  Add release notes for v2.9 (#3266)
  Add release notes for v2.9 (#3266)
  ...
jvegreg pushed a commit that referenced this pull request Jan 14, 2024
….py` for generality and portability (for Changelog v2.10: authors: @valeriupredoi and @egalytska) (#3298)

Co-authored-by: Evgenia Galytska <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Further improvement of the diagnostic select_variables_for_tigramite.py
3 participants