Add serialised data to ci #338

samkellerhals · 2023-12-14T12:03:37Z

Description

Use the icon grid as the mesh when running cpu and gpu benchmarks.

Note:
A small number of stencils are skipped as when running them with the icon_grid a Fatal Python Error occurs, most likely as the execution domain needs to be restricted or boundary taken into account for those stencils. Also a few datatests are currently failing (maybe due to outdated serialised data?), we should address this in a future PR.

Other changes

One precommit yaml file with hooks to run all other precommit configs in each namespace package.

samkellerhals · 2023-12-14T12:04:10Z

cscs-ci run

samkellerhals · 2023-12-14T12:05:44Z

cscs-ci run

samkellerhals · 2023-12-14T12:14:57Z

cscs-ci run

samkellerhals · 2023-12-14T12:39:37Z

cscs-ci run

samkellerhals · 2023-12-14T12:43:32Z

cscs-ci run

samkellerhals · 2023-12-14T13:00:38Z

cscs-ci run

samkellerhals · 2023-12-14T15:24:43Z

cscs-ci run

samkellerhals · 2023-12-14T17:53:50Z

cscs-ci run

samkellerhals · 2023-12-14T17:53:59Z

launch jenkins spack

edopao · 2023-12-20T08:23:11Z

ci/cscs.yml

  extends: .test_template
  stage: benchmark
  script:
    - pip install dace==$DACE_VERSION
-    - tox -r -e stencil_tests -c model/ --verbose -- --benchmark-only --backend=dace_cpu --grid=simple_grid
+    - tox -r -e stencil_tests -c model/ --verbose -- --benchmark-only --backend=dace_cpu --grid=icon_grid


I noticed that we use the same test command (stencil_tests) for both test and benchmark stages. This test command was added in your previous PR and contains the flags --cov --cov-append. I think that these coverage flags should not be used in benchmark mode. Maybe, --verbose could also be skipped on this line.

I added a new command for benchmarks now.

edopao · 2023-12-20T08:27:24Z

model/common/src/icon4py/model/common/test_utils/datatest_utils.py

+    if env_base_path:
+        return Path(env_base_path)
+    else:
+        return common_path.parent.joinpath("serialized_data")


You are changing the default path of serialized data (was testdata before). I did a quick git grep testdata and found some places (e.g. README, .gitignore and code comments) that need to be updated.

Good catch thanks, I've updated it.

edopao · 2023-12-20T09:08:48Z

Another problem I have observed (in baseline benchmarks) is that the benchmark executed a single run per stencil (check columns Rounds).

edopao · 2023-12-20T09:14:22Z

Another problem I have observed (in baseline benchmarks) is that the benchmark executed a single run per stencil (check columns Rounds).

I found why... we are using benchmark.pedantic with overriding of iterations and rounds here: model/common/src/icon4py/model/common/test_utils/helpers.py

def _test_execution_benchmark(self, pytestconfig, grid, backend, input_data, benchmark):
    if pytestconfig.getoption(
        "--benchmark-disable"
    ):  # skipping as otherwise program calls are duplicated in tests.
        pytest.skip("Test skipped due to 'benchmark-disable' option.")
    else:
        input_data = allocate_data(backend, input_data)
        benchmark.pedantic(
            self.PROGRAM.with_backend(backend),
            args=(),
            kwargs={**input_data, "offset_provider": grid.get_all_offset_providers()},
            iterations=1,
            rounds=1,
        )

We should probably try to avoid benchmark.pedantic and instead use normal benchmark:

diff --git a/model/common/src/icon4py/model/common/test_utils/helpers.py b/model/common/src/icon4py/model/common/test_utils/helpers.py
index 883a0aa7..f7c1445a 100644
--- a/model/common/src/icon4py/model/common/test_utils/helpers.py
+++ b/model/common/src/icon4py/model/common/test_utils/helpers.py
@@ -176,12 +176,10 @@ if pytest_benchmark:
             pytest.skip("Test skipped due to 'benchmark-disable' option.")
         else:
             input_data = allocate_data(backend, input_data)
-            benchmark.pedantic(
+            benchmark(
                 self.PROGRAM.with_backend(backend),
-                args=(),
-                kwargs={**input_data, "offset_provider": grid.get_all_offset_providers()},
-                iterations=1,
-                rounds=1,
+                **input_data,
+                offset_provider=grid.get_all_offset_providers()
             )
 
 else:

samkellerhals · 2024-01-03T11:28:50Z

cscs-ci run

samkellerhals · 2024-01-03T11:31:45Z

Another problem I have observed (in baseline benchmarks) is that the benchmark executed a single run per stencil (check columns Rounds).

I found why... we are using benchmark.pedantic with overriding of iterations and rounds here: model/common/src/icon4py/model/common/test_utils/helpers.py

def _test_execution_benchmark(self, pytestconfig, grid, backend, input_data, benchmark):
    if pytestconfig.getoption(
        "--benchmark-disable"
    ):  # skipping as otherwise program calls are duplicated in tests.
        pytest.skip("Test skipped due to 'benchmark-disable' option.")
    else:
        input_data = allocate_data(backend, input_data)
        benchmark.pedantic(
            self.PROGRAM.with_backend(backend),
            args=(),
            kwargs={**input_data, "offset_provider": grid.get_all_offset_providers()},
            iterations=1,
            rounds=1,
        )

We should probably try to avoid benchmark.pedantic and instead use normal benchmark:

diff --git a/model/common/src/icon4py/model/common/test_utils/helpers.py b/model/common/src/icon4py/model/common/test_utils/helpers.py
index 883a0aa7..f7c1445a 100644
--- a/model/common/src/icon4py/model/common/test_utils/helpers.py
+++ b/model/common/src/icon4py/model/common/test_utils/helpers.py
@@ -176,12 +176,10 @@ if pytest_benchmark:
             pytest.skip("Test skipped due to 'benchmark-disable' option.")
         else:
             input_data = allocate_data(backend, input_data)
-            benchmark.pedantic(
+            benchmark(
                 self.PROGRAM.with_backend(backend),
-                args=(),
-                kwargs={**input_data, "offset_provider": grid.get_all_offset_providers()},
-                iterations=1,
-                rounds=1,
+                **input_data,
+                offset_provider=grid.get_all_offset_providers()
             )
 
 else:

So the reason I fixed the iterations to 1 was to see whether that would decrease the runtime of the benchmarks as they seemed to be taking very long to run. But I agree that having only limited iterations makes benchmarks less useful. I think we still need to find a way to decrease the runtime, for example by making sure we only compile stencils once. Let's address that soon.

samkellerhals · 2024-01-03T12:37:08Z

cscs-ci run

halungge · 2024-01-03T12:20:07Z

.gitignore

@@ -4,7 +4,7 @@ _local
 _external_src
 _reports
 tmp
-testdata
+serialized_data


there are also the grid files that are used in the test_gridmanager.pyin that same folder: serialized data get downloaded to testdata/ser_icondata, grid files to testdata/grids, so I would rather not refer to serialized... in the top level folder name.

Leaving it as testdata then makes sense, wasn't aware we store anything else other than the serialized test data.

halungge · 2024-01-03T12:47:34Z

ci/cscs.yml

  extends: .test_template
  stage: benchmark
  script:
    - pip install dace==$DACE_VERSION
-    - tox -r -e stencil_tests -c model/ --verbose -- --benchmark-only --backend=dace_cpu --grid=simple_grid
+    - tox -r -e run_benchmarks -c model/ -- --benchmark-only --backend=dace_cpu --grid=icon_grid
  only:


I was wondering: does our cscs-ci run count as manual trigger? I guess so... Also what does the only: -main refer to the target branch of the PR or are these options ignored in our setup since we run it from outside gitlab?

(don't know how it works and currentlyit runs always all of the jobs and the benchmarks take quite long... once we add the datatest that will get worse, response time wise...)

I am not sure to be honest since @edopao added these dace jobs, maybe he can explain more. I would assume these benchmarks run only on main but have to be manually triggered, how I am not sure. Currently the dace jobs seem to be not run when using cscs-ci run.

As commented in today's standup meeting, the intention of this setting was to run the dace benchmark on main after PR is merged. However, this setting is ignored in our setup, as also noted above. I agree that we could have a separate CI pipeline for benchmarking, automatically triggered after PR is merged or by a daily job.

halungge · 2024-01-03T12:52:32Z

...tmosphere/diffusion/tests/diffusion_stencil_tests/test_apply_diffusion_to_theta_and_exner.py

@@ -81,7 +81,12 @@ def reference(
        return dict(theta_v=theta_v, exner=exner)

    @pytest.fixture
-    def input_data(self, grid):
+    def input_data(self, grid, uses_icon_grid_with_otf):


I have merged the verification of the global (EXCLAIM Aquaplanet) run, that means there is an additional serialized dataset (which for the datatest we would need to upload to the server) but it contains a global grid. Maybe using that one instead of the mch_ch_r04b09 local experiment would solve some of these issues. Lets discuss this...

Ok sounds good, let's discuss it tomorrow, uploading to the server should be relatively straightforward, Andreas can help us.

halungge · 2024-01-03T13:15:43Z

.github/workflows/icon4py-qa.yml

@@ -35,21 +35,6 @@ jobs:
                  python -m pip install --upgrade pip setuptools wheel
                  python -m pip install -r ./requirements-dev.txt
                  python -m pip list
-            - name: Run checks in icon4pytools
+            - name: Run checks
              run: |


halungge · 2024-01-03T13:21:46Z

ci/cscs.yml

  extends: .test_template
  stage: benchmark
  script:
    - pip install dace==$DACE_VERSION
-    - tox -r -e stencil_tests -c model/ --verbose -- --benchmark-only --backend=dace_gpu --grid=simple_grid
+    - tox -r -e run_benchmarks -c model/ -- --benchmark-only --backend=dace_gpu --grid=icon_grid


could you add the --benchmark-only for the run_benchmarks and the --benchmark-skip for the run_stencil_tests in the tox.ini file instead of having the repeatedly here in CI jobs?

samkellerhals · 2024-01-03T14:01:53Z

cscs-ci run

halungge · 2024-01-03T14:10:14Z

So the reason I fixed the iterations to 1 was to see whether that would decrease the runtime of the benchmarks as they seemed to be taking very long to run. But I agree that having only limited iterations makes benchmarks less useful. I think we still need to find a way to decrease the runtime, for example by making sure we only compile stencils once. Let's address that soon.

Agree. Maybe we could also split the pipelines? what test jobs does it make sense to run for each PR? benchmarks? how many backends, ... Are there some that we could run in scheduled daily builds?

samkellerhals · 2024-01-03T14:18:39Z

cscs-ci run

samkellerhals · 2024-01-03T15:26:54Z

cscs-ci run

github-actions · 2024-01-03T15:27:31Z

Mandatory Tests

Please make sure you run these tests via comment before you merge!

cscs-ci run
launch jenkins spack

Optional Tests

In case your change might affect downstream icon-exclaim, please consider running

launch jenkins icon

For more detailed information please look at CI in the EXCLAIM universe.

samkellerhals · 2024-01-03T15:27:51Z

cscs-ci run

samkellerhals · 2024-01-04T07:16:47Z

launch jenkins spack

samkellerhals · 2024-01-04T07:44:58Z

@edopao @halungge I've opened this issue to keep track of the recompilation problem which I think we should solve at some point regardless of whether we move the benchmarking into a separate daily CI pipeline or not.

halungge

one small comment. There are some -- without option in the cscs.yml, I guess they can be deleted, or do they serve any purpose?

halungge · 2024-01-04T15:02:53Z

ci/cscs.yml

  extends: .test_template
  stage: benchmark
  script:
    - pip install dace==$DACE_VERSION
-    - tox -r -e stencil_tests -c model/ --verbose -- --benchmark-only --backend=dace_gpu --grid=simple_grid
+    - tox -r -e run_benchmarks -c model/ -- --backend=dace_gpu --grid=icon_grid


do you need this double double-dashes? Or did you simply forget to delete?

Yes the double dashes are used to denote the end of arguments passed to tox itself, and that any subsequent arguments are to be treated as positional arguments passed to whatever command tox invokes, in this case pytest

tehrengruber · 2024-01-04T22:05:24Z

ci/cscs.yml

@@ -49,14 +52,14 @@ test_model_job_roundtrip_simple_grid:
  extends: .test_template


Just leaving my 2 cents here: All of these jobs could be easily expressed using https://docs.gitlab.com/ee/ci/yaml/#needsparallelmatrix

I will try this in a new PR.

edopao

Approving this PR, since its original objective (adding serialized data) is achieved. Agreed to open a new PR, after this one is merged, to pick up some leftovers:

create a separate CI pipeline for the benchmark jobs
remove only:main from dace jobs
implement Till’s comment about using needs:parallel:matrix

halungge

Agree, let's add further improvements in a follow up PR.

Add serialised data path

146f4d1

test job

efdcb6c

test using serialised data

a449d62

export data path

95d4533

Try exporting in variables test_template

32d8eca

Modify passenv

2af5052

samkellerhals added 2 commits December 14, 2023 15:34

Fix one test

7872ae3

Add icon_grid benchmarks

390721c

samkellerhals requested a review from halungge December 14, 2023 16:48

samkellerhals added 2 commits December 14, 2023 17:54

precommit fixes

f1fb468

Remove C2VDim connectivity

38adcce

edopao reviewed Dec 20, 2023

View reviewed changes

samkellerhals added 3 commits January 3, 2024 11:59

Add run_benchmarks stage and update serialized data references

3520d13

Fix merge conflicts

ac95551

Use default benchmark fixture

1584e2c

samkellerhals added 2 commits January 3, 2024 13:19

Add root precommit file

f576f48

Update variable name

43fa823

halungge reviewed Jan 3, 2024

View reviewed changes

Fix unused kwarg

3b9904f

Fixes

a22537d

Pass TEST_DATA_PATH to tox

3f68853

samkellerhals requested review from halungge and edopao January 4, 2024 07:17

halungge reviewed Jan 4, 2024

View reviewed changes

samkellerhals requested a review from halungge January 4, 2024 16:45

tehrengruber reviewed Jan 4, 2024

View reviewed changes

edopao approved these changes Jan 5, 2024

View reviewed changes

halungge approved these changes Jan 5, 2024

View reviewed changes

samkellerhals merged commit 588987f into main Jan 5, 2024
5 checks passed

samkellerhals deleted the serialised-data-ci branch January 5, 2024 08:48

		@@ -49,14 +52,14 @@ test_model_job_roundtrip_simple_grid:
		extends: .test_template

Add serialised data to ci #338

Add serialised data to ci #338

Conversation

samkellerhals commented Dec 14, 2023 • edited Loading

Description

Other changes

samkellerhals commented Dec 14, 2023

samkellerhals commented Dec 14, 2023

samkellerhals commented Dec 14, 2023

samkellerhals commented Dec 14, 2023

samkellerhals commented Dec 14, 2023

samkellerhals commented Dec 14, 2023

samkellerhals commented Dec 14, 2023

samkellerhals commented Dec 14, 2023

samkellerhals commented Dec 14, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

edopao commented Dec 20, 2023

edopao commented Dec 20, 2023 • edited Loading

samkellerhals commented Jan 3, 2024

samkellerhals commented Jan 3, 2024

samkellerhals commented Jan 3, 2024

Choose a reason for hiding this comment

samkellerhals Jan 3, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

samkellerhals commented Jan 3, 2024

halungge commented Jan 3, 2024

samkellerhals commented Jan 3, 2024

samkellerhals commented Jan 3, 2024

github-actions bot commented Jan 3, 2024

samkellerhals commented Jan 3, 2024

samkellerhals commented Jan 4, 2024

samkellerhals commented Jan 4, 2024 • edited Loading

halungge left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

edopao left a comment

Choose a reason for hiding this comment

halungge left a comment

Choose a reason for hiding this comment

samkellerhals commented Dec 14, 2023 •

edited

Loading

edopao commented Dec 20, 2023 •

edited

Loading

samkellerhals Jan 3, 2024 •

edited

Loading

samkellerhals commented Jan 4, 2024 •

edited

Loading