Merge remote-tracking branch 'upstream/main' into theme_play

* upstream/main: (28 commits) Improve Small Benchmark Accuracy (SciTools#4636) Sperf & Cperf Benchmarks (SciTools#4621) Votable Issues (SciTools#4617) Overnight benchmarks - find a valid issue assignee (SciTools#4627) Updated environment lockfiles (SciTools#4624) Scalar Scatter Plot (SciTools#4616) Remove no_clobber task from Refresh lockfiles Action (SciTools#4618) purge deploy key (SciTools#4615) Overnight benchmarks remove ambiguity between file and commit names. (SciTools#4620) Final offline benchmark migration (SciTools#4562) Bump actions/checkout from 2 to 3 (SciTools#4611) Bump actions/stale from 4.1.0 to 5 (SciTools#4612) Revert plotting-vs-y (SciTools#4601) Bump peter-evans/create-pull-request from 3.13.0 to 3.14.0 (SciTools#4608) Support false-easting and false-northing when loading Mercator-projected data (SciTools#4524) Bump peter-evans/create-pull-request from 3.12.1 to 3.13.0 (SciTools#4607) Stop using nc_time_axis.CalendarDateTime (SciTools#4584) Utility class in netcdf loader should not be public. (SciTools#4592) Overnight benchmarks (SciTools#4583) Yaml fixes + clarifications. (SciTools#4594) ...
tkknight · Mar 16, 2022 · 14f7d19 · 14f7d19
2 parents 98c5d9b + 3ee7c56
commit 14f7d19
Show file tree

Hide file tree

Showing 70 changed files with 3,176 additions and 409 deletions.
diff --git a/.cirrus.yml b/.cirrus.yml
@@ -38,7 +38,7 @@ env:
   # Conda packages to be installed.
   CONDA_CACHE_PACKAGES: "nox pip"
   # Git commit hash for iris test data.
-  IRIS_TEST_DATA_VERSION: "2.5"
+  IRIS_TEST_DATA_VERSION: "2.7"
   # Base directory for the iris-test-data.
   IRIS_TEST_DATA_DIR: ${HOME}/iris-test-data
 

diff --git a/.github/deploy_key.scitools-docs.enc b/.github/deploy_key.scitools-docs.enc
diff --git a/.github/workflows/benchmark.yml b/.github/workflows/benchmark.yml
@@ -1,10 +1,11 @@
-# This is a basic workflow to help you get started with Actions
+# Use ASV to check for performance regressions in the last 24 hours' commits.
 
 name: benchmark-check
 
 on:
-  # Triggers the workflow on push or pull request events but only for the master branch
-  pull_request:
+  schedule:
+    # Runs every day at 23:00.
+    - cron: "0 23 * * *"
 
 jobs:
   benchmark:
@@ -22,13 +23,9 @@ jobs:
 
     steps:
       # Checks-out your repository under $GITHUB_WORKSPACE, so your job can access it
-      - uses: actions/checkout@v2
-
-      - name: Fetch the PR base branch too
-        run: |
-          git fetch --depth=1 origin ${{ github.event.pull_request.base.ref }}
-          git branch _base FETCH_HEAD
-          echo PR_BASE_SHA=$(git rev-parse _base) >> $GITHUB_ENV
+      - uses: actions/checkout@v3
+        with:
+          fetch-depth: 0
 
       - name: Install Nox
         run: |
@@ -65,11 +62,56 @@ jobs:
         run: |
           echo "OVERRIDE_TEST_DATA_REPOSITORY=${GITHUB_WORKSPACE}/${IRIS_TEST_DATA_PATH}/test_data" >> $GITHUB_ENV
 
-      - name: Run CI benchmarks
+      - name: Run overnight benchmarks
+        run: |
+          first_commit=$(git log --after="$(date -d "1 day ago" +"%Y-%m-%d") 23:00:00" --pretty=format:"%h" | tail -n 1)
+          if [ "$first_commit" != "" ]
+          then
+            nox --session="benchmarks(overnight)" -- $first_commit
+          fi
+
+      - name: Create issues for performance shifts
+        env:
+          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
         run: |
-          mkdir --parents benchmarks/.asv
-          set -o pipefail
-          nox --session="benchmarks(ci compare)" | tee benchmarks/.asv/ci_compare.txt
+          if [ -d benchmarks/.asv/performance-shifts ]
+          then
+            cd benchmarks/.asv/performance-shifts
+            for commit_file in *
+            do
+              commit="${commit_file%.*}"
+              pr_number=$(git log "$commit"^! --oneline | grep -o "#[0-9]*" | tail -1 | cut -c 2-)
+              author=$(gh pr view $pr_number --json author -q '.["author"]["login"]' --repo $GITHUB_REPOSITORY)
+              merger=$(gh pr view $pr_number --json mergedBy -q '.["mergedBy"]["login"]' --repo $GITHUB_REPOSITORY)
+              # Find a valid assignee from author/merger/nothing.
+              if curl -s https://api.github.com/users/$author | grep -q "login"; then
+                assignee=$author
+              elif curl -s https://api.github.com/users/$merger | grep -q "login"; then
+                assignee=$merger
+              else
+                assignee=""
+              fi
+              title="Performance Shift(s): \`$commit\`"
+              body="
+          Benchmark comparison has identified performance shifts at commit \
+          $commit (#$pr_number). Please review the report below and \
+          take corrective/congratulatory action as appropriate \
+          :slightly_smiling_face:
+
+          <details>
+          <summary>Performance shift report</summary>
+
+          \`\`\`
+          $(cat $commit_file)
+          \`\`\`
+
+          </details>
+
+          Generated by GHA run [\`${{github.run_id}}\`](https://github.com/${{github.repository}}/actions/runs/${{github.run_id}})
+              "
+              gh issue create --title "$title" --body "$body" --assignee $assignee --label "Bot" --label "Type: Performance" --repo $GITHUB_REPOSITORY
+            done
+          fi
 
       - name: Archive asv results
         if: ${{ always() }}
@@ -78,4 +120,3 @@ jobs:
           name: asv-report
           path: |
             benchmarks/.asv/results
-            benchmarks/.asv/ci_compare.txt
diff --git a/.github/workflows/refresh-lockfiles.yml b/.github/workflows/refresh-lockfiles.yml
@@ -2,7 +2,7 @@
 # available packages and dependencies.
 #
 # Environment specifications are given as conda environment.yml files found in
-# `requirements/ci/py**.yml`.  These state the pacakges required, the conda channels
+# `requirements/ci/py**.yml`.  These state the packages required, the conda channels
 # that the packages will be pulled from, and any versions of packages that need to be
 # pinned at specific versions.
 #
@@ -14,50 +14,14 @@ name: Refresh Lockfiles
 
 on:
   workflow_dispatch:
-    inputs:
-      clobber:
-        description: |
-          Force the workflow to run, potentially clobbering any commits already made to the branch. 
-          Enter "yes" or "true" to run.
-        default: "no"
   schedule:
     # Run once a week on a Saturday night 
-    - cron: 1 0 * * 6
+    # N.B. "should" be quoted, according to
+    # https://docs.github.com/en/actions/using-workflows/workflow-syntax-for-github-actions#onschedule
+    - cron: "1 0 * * 6"
 
 
 jobs:
-
-  no_clobber:
-    if: "github.repository == 'SciTools/iris'"
-    runs-on: ubuntu-latest
-    steps:
-      # check if the auto-update-lockfiles branch exists.  If it does, and someone other than
-      # the lockfile bot has made the head commit, abort the workflow.
-      # This job can be manually overridden by running directly from the github actions panel
-      # (known as a "workflow_dispatch") and setting the `clobber` input to "yes".
-      - uses: actions/[email protected]
-        with:
-          github-token: ${{ secrets.GITHUB_TOKEN }}
-          script: |
-            if (context.eventName == "workflow_dispatch") {
-              const clobber = context.payload.inputs.clobber || "no";
-              if (["yes", "true", "y"].includes(clobber.trim().toLowerCase())) {
-                core.info("Manual override, continuing workflow, potentially overwriting previous commits to auto-update-lockfiles");
-                return
-              }
-            }
-            github.repos.getBranch({...context.repo, branch: "auto-update-lockfiles"}).then(res => {
-              const committer = res.data.commit.commit.committer;
-              if (committer && committer.name === "Lockfile bot") {
-                core.info("Lockfile bot was the last to push to auto-update-lockfiles. Continue.");
-              } else {
-                core.setFailed("New commits to auto-update-lockfiles since bot last ran. Abort!");
-              }
-            }).catch(err => {
-              if (err.status === 404) {
-                  core.info("auto-update-lockfiles branch not found, continue");
-              }
-            })
 
   gen_lockfiles:
     # this is a matrix job: it splits to create new lockfiles for each
@@ -67,14 +31,13 @@ jobs:
     #       ref: https://tomasvotruba.com/blog/2020/11/16/how-to-make-dynamic-matrix-in-github-actions/
     if: "github.repository == 'SciTools/iris'"
     runs-on: ubuntu-latest
-    needs: no_clobber
 
     strategy:
       matrix:
         python: ['38']
 
     steps:
-      - uses: actions/checkout@v2
+      - uses: actions/checkout@v3
       - name: install conda-lock
         run: |
           source $CONDA/bin/activate base
@@ -96,7 +59,7 @@ jobs:
     needs: gen_lockfiles
 
     steps:
-      - uses: actions/checkout@v2
+      - uses: actions/checkout@v3
       - name: get artifacts
         uses: actions/download-artifact@v2
         with:
@@ -109,7 +72,7 @@ jobs:
         
       - name: Create Pull Request
         id: cpr
-        uses: peter-evans/create-pull-request@f22a7da129c901513876a2380e2dae9f8e145330
+        uses: peter-evans/create-pull-request@18f7dc018cc2cd597073088f7c7591b9d1c02672
         with:
           commit-message: Updated environment lockfiles
           committer: "Lockfile bot <[email protected]>"
@@ -119,6 +82,8 @@ jobs:
           title: "[iris.ci] environment lockfiles auto-update"
           body: |
             Lockfiles updated to the latest resolvable environment.
+
+            If the CI test suite fails, create a new branch based of this pull request and add the required fixes to that branch.
           labels: |
             New: Pull Request
             Bot

diff --git a/.github/workflows/stale.yml b/.github/workflows/stale.yml
@@ -1,16 +1,20 @@
 # See https://github.com/actions/stale
 
 name: Stale issues and pull-requests
+
 on:
   schedule:
-    - cron: 0 0 * * *
+    # Run once a day
+    # N.B. "should" be quoted, according to
+    # https://docs.github.com/en/actions/using-workflows/workflow-syntax-for-github-actions#onschedule
+    - cron: "0 0 * * *"
 
 jobs:
   stale:
     if: "github.repository == 'SciTools/iris'"
     runs-on: ubuntu-latest
     steps:
-      - uses: actions/stale@v4.1.0
+      - uses: actions/stale@v5
         with:
           repo-token: ${{ secrets.GITHUB_TOKEN }}
 
@@ -59,11 +63,11 @@ jobs:
           stale-pr-label: Stale
 
           # Labels on issues exempted from stale.
-          exempt-issue-labels: |
+          exempt-issue-labels:
             "Status: Blocked,Status: Decision Required,Peloton 🚴‍♂️,Good First Issue"
 
           # Labels on prs exempted from stale.
-          exempt-pr-labels: |
+          exempt-pr-labels:
             "Status: Blocked,Status: Decision Required,Peloton 🚴‍♂️,Good First Issue"
 
           # Max number of operations per run.

diff --git a/benchmarks/README.md b/benchmarks/README.md
@@ -0,0 +1,97 @@
+# Iris Performance Benchmarking
+
+Iris uses an [Airspeed Velocity](https://github.com/airspeed-velocity/asv)
+(ASV) setup to benchmark performance. This is primarily designed to check for
+performance shifts between commits using statistical analysis, but can also
+be easily repurposed for manual comparative and scalability analyses.
+
+The benchmarks are automatically run overnight
+[by a GitHub Action](../.github/workflows/benchmark.yml), with any notable
+shifts in performance being flagged in a new GitHub issue.
+
+## Running benchmarks
+
+`asv ...` commands must be run from this directory. You will need to have ASV
+installed, as well as Nox (see
+[Benchmark environments](#benchmark-environments)).
+
+[Iris' noxfile](../noxfile.py) includes a `benchmarks` session that provides
+conveniences for setting up before benchmarking, and can also replicate the
+automated overnight run locally. See the session docstring for detail.
+
+### Environment variables
+
+* `OVERRIDE_TEST_DATA_REPOSITORY` - required - some benchmarks use
+`iris-test-data` content, and your local `site.cfg` is not available for
+benchmark scripts.
+* `DATA_GEN_PYTHON` - required - path to a Python executable that can be
+used to generate benchmark test objects/files; see
+[Data generation](#data-generation). The Nox session sets this automatically,
+but will defer to any value already set in the shell.
+* `BENCHMARK_DATA` - optional - path to a directory for benchmark synthetic
+test data, which the benchmark scripts will create if it doesn't already
+exist. Defaults to `<root>/benchmarks/.data/` if not set.
+* `ON_DEMAND_BENCHMARKS` - optional - when set (to any value): benchmarks
+decorated with `@on_demand_benchmark` are included in the ASV run. Usually
+coupled with the ASV `--bench` argument to only run the benchmark(s) of
+interest. Is set during the Nox `cperf` and `sperf` sessions.
+
+## Writing benchmarks
+
+[See the ASV docs](https://asv.readthedocs.io/) for full detail.
+
+### Data generation
+**Important:** be sure not to use the benchmarking environment to generate any
+test objects/files, as this environment changes with each commit being
+benchmarked, creating inconsistent benchmark 'conditions'. The
+[generate_data](./benchmarks/generate_data/__init__.py) module offers a
+solution; read more detail there.
+
+### ASV re-run behaviour
+
+Note that ASV re-runs a benchmark multiple times between its `setup()` routine.
+This is a problem for benchmarking certain Iris operations such as data
+realisation, since the data will no longer be lazy after the first run.
+Consider writing extra steps to restore objects' original state _within_ the
+benchmark itself.
+
+If adding steps to the benchmark will skew the result too much then re-running
+can be disabled by setting an attribute on the benchmark: `number = 1`. To
+maintain result accuracy this should be accompanied by increasing the number of
+repeats _between_ `setup()` calls using the `repeat` attribute.
+`warmup_time = 0` is also advisable since ASV performs independent re-runs to
+estimate run-time, and these will still be subject to the original problem.
+
+### Scaling / non-Scaling Performance Differences
+
+When comparing performance between commits/file-type/whatever it can be helpful
+to know if the differences exist in scaling or non-scaling parts of the Iris
+functionality in question. This can be done using a size parameter, setting
+one value to be as small as possible (e.g. a scalar `Cube`), and the other to
+be significantly larger (e.g. a 1000x1000 `Cube`). Performance differences
+might only be seen for the larger value, or the smaller, or both, getting you
+closer to the root cause.
+
+### On-demand benchmarks
+
+Some benchmarks provide useful insight but are inappropriate to be included in
+a benchmark run by default, e.g. those with long run-times or requiring a local
+file. These benchmarks should be decorated with `@on_demand_benchmark`
+(see [benchmarks init](./benchmarks/__init__.py)), which
+sets the benchmark to only be included in a run when the `ON_DEMAND_BENCHMARKS`
+environment variable is set. Examples include the CPerf and SPerf benchmark
+suites for the UK Met Office NG-VAT project.
+
+## Benchmark environments
+
+We have disabled ASV's standard environment management, instead using an
+environment built using the same Nox scripts as Iris' test environments. This
+is done using ASV's plugin architecture - see
+[asv_delegated_conda.py](asv_delegated_conda.py) and the extra config items in
+[asv.conf.json](asv.conf.json).
+
+(ASV is written to control the environment(s) that benchmarks are run in -
+minimising external factors and also allowing it to compare between a matrix
+of dependencies (each in a separate environment). We have chosen to sacrifice
+these features in favour of testing each commit with its intended dependencies,
+controlled by Nox + lock-files).
diff --git a/benchmarks/asv.conf.json b/benchmarks/asv.conf.json
@@ -5,6 +5,7 @@
     "repo": "..",
     "environment_type": "conda-delegated",
     "show_commit_url": "http://github.com/scitools/iris/commit/",
+    "branches": ["upstream/main"],
 
     "benchmark_dir": "./benchmarks",
     "env_dir": ".asv/env",