-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add serialised data to ci #338
Conversation
cscs-ci run |
cscs-ci run |
cscs-ci run |
cscs-ci run |
cscs-ci run |
cscs-ci run |
cscs-ci run |
cscs-ci run |
launch jenkins spack |
ci/cscs.yml
Outdated
extends: .test_template | ||
stage: benchmark | ||
script: | ||
- pip install dace==$DACE_VERSION | ||
- tox -r -e stencil_tests -c model/ --verbose -- --benchmark-only --backend=dace_cpu --grid=simple_grid | ||
- tox -r -e stencil_tests -c model/ --verbose -- --benchmark-only --backend=dace_cpu --grid=icon_grid |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed that we use the same test command (stencil_tests
) for both test and benchmark stages. This test command was added in your previous PR and contains the flags --cov --cov-append
. I think that these coverage flags should not be used in benchmark mode. Maybe, --verbose
could also be skipped on this line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a new command for benchmarks now.
if env_base_path: | ||
return Path(env_base_path) | ||
else: | ||
return common_path.parent.joinpath("serialized_data") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are changing the default path of serialized data (was testdata
before). I did a quick git grep testdata
and found some places (e.g. README, .gitignore
and code comments) that need to be updated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch thanks, I've updated it.
Another problem I have observed (in baseline benchmarks) is that the benchmark executed a single run per stencil (check columns |
I found why... we are using
We should probably try to avoid
|
cscs-ci run |
So the reason I fixed the iterations to 1 was to see whether that would decrease the runtime of the benchmarks as they seemed to be taking very long to run. But I agree that having only limited iterations makes benchmarks less useful. I think we still need to find a way to decrease the runtime, for example by making sure we only compile stencils once. Let's address that soon. |
cscs-ci run |
.gitignore
Outdated
@@ -4,7 +4,7 @@ _local | |||
_external_src | |||
_reports | |||
tmp | |||
testdata | |||
serialized_data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there are also the grid files that are used in the test_gridmanager.py
in that same folder: serialized data get downloaded to testdata/ser_icondata
, grid files to testdata/grids
, so I would rather not refer to serialized...
in the top level folder name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leaving it as testdata
then makes sense, wasn't aware we store anything else other than the serialized test data.
extends: .test_template | ||
stage: benchmark | ||
script: | ||
- pip install dace==$DACE_VERSION | ||
- tox -r -e stencil_tests -c model/ --verbose -- --benchmark-only --backend=dace_cpu --grid=simple_grid | ||
- tox -r -e run_benchmarks -c model/ -- --benchmark-only --backend=dace_cpu --grid=icon_grid | ||
only: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was wondering: does our cscs-ci run
count as manual trigger? I guess so... Also what does the only: -main
refer to the target branch of the PR or are these options ignored in our setup since we run it from outside gitlab?
(don't know how it works and currentlyit runs always all of the jobs and the benchmarks take quite long... once we add the datatest
that will get worse, response time wise...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure to be honest since @edopao added these dace jobs, maybe he can explain more. I would assume these benchmarks run only on main
but have to be manually triggered, how I am not sure. Currently the dace jobs seem to be not run when using cscs-ci run
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As commented in today's standup meeting, the intention of this setting was to run the dace benchmark on main after PR is merged. However, this setting is ignored in our setup, as also noted above. I agree that we could have a separate CI pipeline for benchmarking, automatically triggered after PR is merged or by a daily job.
@@ -81,7 +81,12 @@ def reference( | |||
return dict(theta_v=theta_v, exner=exner) | |||
|
|||
@pytest.fixture | |||
def input_data(self, grid): | |||
def input_data(self, grid, uses_icon_grid_with_otf): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have merged the verification of the global (EXCLAIM Aquaplanet) run, that means there is an additional serialized dataset (which for the datatest
we would need to upload to the server) but it contains a global grid. Maybe using that one instead of the mch_ch_r04b09
local experiment would solve some of these issues. Lets discuss this...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok sounds good, let's discuss it tomorrow, uploading to the server should be relatively straightforward, Andreas can help us.
@@ -35,21 +35,6 @@ jobs: | |||
python -m pip install --upgrade pip setuptools wheel | |||
python -m pip install -r ./requirements-dev.txt | |||
python -m pip list | |||
- name: Run checks in icon4pytools | |||
- name: Run checks | |||
run: | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice...
ci/cscs.yml
Outdated
extends: .test_template | ||
stage: benchmark | ||
script: | ||
- pip install dace==$DACE_VERSION | ||
- tox -r -e stencil_tests -c model/ --verbose -- --benchmark-only --backend=dace_gpu --grid=simple_grid | ||
- tox -r -e run_benchmarks -c model/ -- --benchmark-only --backend=dace_gpu --grid=icon_grid |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you add the --benchmark-only
for the run_benchmarks
and the --benchmark-skip
for the run_stencil_tests
in the tox.ini
file instead of having the repeatedly here in CI jobs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
cscs-ci run |
Agree. Maybe we could also split the pipelines? what test jobs does it make sense to run for each PR? benchmarks? how many backends, ... Are there some that we could run in scheduled daily builds? |
cscs-ci run |
cscs-ci run |
Mandatory Tests Please make sure you run these tests via comment before you merge!
Optional Tests In case your change might affect downstream icon-exclaim, please consider running
For more detailed information please look at CI in the EXCLAIM universe. |
cscs-ci run |
launch jenkins spack |
@edopao @halungge I've opened this issue to keep track of the recompilation problem which I think we should solve at some point regardless of whether we move the benchmarking into a separate daily CI pipeline or not. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one small comment. There are some --
without option in the cscs.yml
, I guess they can be deleted, or do they serve any purpose?
extends: .test_template | ||
stage: benchmark | ||
script: | ||
- pip install dace==$DACE_VERSION | ||
- tox -r -e stencil_tests -c model/ --verbose -- --benchmark-only --backend=dace_gpu --grid=simple_grid | ||
- tox -r -e run_benchmarks -c model/ -- --backend=dace_gpu --grid=icon_grid |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you need this double double-dashes? Or did you simply forget to delete?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes the double dashes are used to denote the end of arguments passed to tox itself, and that any subsequent arguments are to be treated as positional arguments passed to whatever command tox invokes, in this case pytest
@@ -49,14 +52,14 @@ test_model_job_roundtrip_simple_grid: | |||
extends: .test_template |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just leaving my 2 cents here: All of these jobs could be easily expressed using https://docs.gitlab.com/ee/ci/yaml/#needsparallelmatrix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will try this in a new PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving this PR, since its original objective (adding serialized data) is achieved. Agreed to open a new PR, after this one is merged, to pick up some leftovers:
- create a separate CI pipeline for the benchmark jobs
- remove
only:main
from dace jobs - implement Till’s comment about using
needs:parallel:matrix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, let's add further improvements in a follow up PR.
Description
Use the icon grid as the mesh when running cpu and gpu benchmarks.
Note:
A small number of stencils are skipped as when running them with the icon_grid a
Fatal Python Error
occurs, most likely as the execution domain needs to be restricted or boundary taken into account for those stencils. Also a few datatests are currently failing (maybe due to outdated serialised data?), we should address this in a future PR.Other changes