Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support test data in benchmark workflows #4402

Merged

Conversation

wjbenfold
Copy link
Contributor

@wjbenfold wjbenfold commented Nov 8, 2021

🚀 Pull Request

Description

Adds test data support to github action benchmarks (by downloading the iris test data and setting the OVERRIDE_TEST_DATA_REPOSITORY environment variable).
Adds an example benchmark (of area weighted regridding) that uses the test data

Currently github action benchmarks don't have access to the test data. I'm aiming to improve this.

To do:

  • Enable caching (by uncommenting) once the test data download is tested
  • Revert noxfile logging changes
  • Change the example benchmark to use new test data that shows significant slowness (once it's in the iris-test-data repo)
  • What's new

Consult Iris pull request check list

@trexfeathers
Copy link
Contributor

trexfeathers commented Nov 9, 2021

Change the example benchmark to use new test data that shows significant slowness (once it's in the iris-test-data repo)

@wjbenfold if you figure out a decent pattern to host full size PP files online, and without slowing the benchmark process, that will potentially solve some other benchmarking headaches so I'd be interested in the result!

@wjbenfold
Copy link
Contributor Author

wjbenfold commented Nov 9, 2021

I think this PR is now ready for review (save a what's new).

Reviewer questions:

  • Are the changes to benchmarks/nox_asv_plugin.py, benchmarks/benchmarks/mixin.py and benchmarks/benchmarks/metadata_manager_factory.py as a result of changing .pre-commit-config.yaml acceptable? (This was done because the ci precommit was checking more aggressively than my local precommit and thus forcing me to pull the changes down to my local copy.)
  • I've used the test data to access files for benchmarking, ignoring the code in benchmarks/benchmarks/__init__.py that sets a data path. Is this ok, and if so should __init__.py be tidied at all (given I don't think it's doing anything to do with test data for us)?

@wjbenfold wjbenfold marked this pull request as ready for review November 9, 2021 16:15
@wjbenfold wjbenfold marked this pull request as draft November 9, 2021 16:16
@wjbenfold wjbenfold marked this pull request as ready for review November 9, 2021 17:40
Copy link
Contributor

@trexfeathers trexfeathers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @wjbenfold! Some comments for you.

I don't get how an performance improvement has been detected for this PR, which makes no changes to the regridding code? It may be a sign that performance of this one is highly variable; I'll try re-running it.

benchmarks/benchmarks/regridding.py Outdated Show resolved Hide resolved
benchmarks/benchmarks/regridding.py Outdated Show resolved Hide resolved
.github/workflows/benchmark.yml Show resolved Hide resolved
.github/workflows/benchmark.yml Outdated Show resolved Hide resolved
@trexfeathers
Copy link
Contributor

I don't get how an performance improvement has been detected for this PR, which makes no changes to the regridding code? It may be a sign that performance of this one is highly variable; I'll try re-running it.

From investigation this looks like it was an aberration, so no changes needed.

(For the record I was going to suggest adjusting the repeat attribute to force ASV to spend longer / do more repeats. Not needed here but a good tool to have).

@trexfeathers
Copy link
Contributor

  • Are the changes to benchmarks/nox_asv_plugin.py, benchmarks/benchmarks/mixin.py and benchmarks/benchmarks/metadata_manager_factory.py as a result of changing .pre-commit-config.yaml acceptable? (This was done because the ci precommit was checking more aggressively than my local precommit and thus forcing me to pull the changes down to my local copy.)

👍

  • I've used the test data to access files for benchmarking, ignoring the code in benchmarks/benchmarks/__init__.py that sets a data path. Is this ok, and if so should __init__.py be tidied at all (given I don't think it's doing anything to do with test data for us)?

👍 The data location stuff in __init__ will very likely get re-written as we establish new patterns to handle benchmark data as part of remote CI. A lot of this was written to support a fixed directory containing very large files, and that just isn't workable in CI.

Will Benfold and others added 2 commits November 11, 2021 10:41
Copy link
Contributor

@trexfeathers trexfeathers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy that you've actioned my comments. Still need that What's New entry though!

@trexfeathers trexfeathers merged commit 94c1a6a into SciTools:main Nov 11, 2021
@wjbenfold wjbenfold deleted the wjbenfold-ci-benchmarks-test-data branch November 11, 2021 15:38
tkknight added a commit to tkknight/iris that referenced this pull request Dec 7, 2021
* main: (23 commits)
  Suggest type hinting (SciTools#4390)
  area weight regrid test fixes (SciTools#4432)
  Update latest.rst (SciTools#4425)
  Added @wjbenfold to the core dev list (SciTools#4423)
  Removed addition of period from wrap_lons. (SciTools#4421)
  Add release docs sections describing the role of a Release Manager (SciTools#4413)
  Subset should always return None if no value matches are found (SciTools#4417)
  What's new for SciTools#4400 (SciTools#4422)
  `iris.analysis.AreaWeighted` regrid speedup (SciTools#4400)
  [pre-commit.ci] pre-commit autoupdate (SciTools#4419)
  Remove newline to satisfy setuptools (SciTools#4418)
  Updated environment lockfiles (SciTools#4416)
  NAME loader fixes (SciTools#4411)
  Updated whatsnew for PR 4402 (SciTools#4410)
  Support test data in benchmark workflows (SciTools#4402)
  What's new for pr 4387 (SciTools#4405)
  Make concat mismatch warning for scalar coords more accurate (SciTools#4387)
  Added line to latest release notes for updates to pp_save_rules.py (SciTools#4404)
  Update pp_save_rules.py (SciTools#4391)
  [pre-commit.ci] pre-commit autoupdate (SciTools#4403)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants