Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add serialised data to ci #338

Merged
merged 18 commits into from
Jan 5, 2024
Merged
Show file tree
Hide file tree
Changes from 10 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 11 additions & 8 deletions ci/cscs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -34,13 +34,16 @@ variables:
- pyversion_no_dot="${PYTHON_VERSION//./}"
- pip install tox clang-format
- python -c "import cupy"
- ls ${SERIALIZED_DATA_PATH}
variables:
SLURM_JOB_NUM_NODES: 1
SLURM_NTASKS: 1
SLURM_TIMELIMIT: '06:00:00'
CRAY_CUDA_MPS: 1
NUM_PROCESSES: auto
VIRTUALENV_SYSTEM_SITE_PACKAGES: 1
CSCS_NEEDED_DATA: icon4py
SERIALIZED_DATA_PATH: "/apps/daint/UES/jenkssl/ciext/icon4py"

build_job:
extends: .build_template
Expand Down Expand Up @@ -89,34 +92,34 @@ test_tools_job:
script:
- tox -r -c tools/ --verbose

benchmark_model_dace_cpu_simple_grid:
benchmark_model_dace_cpu_icon_grid:
extends: .test_template
stage: benchmark
script:
- pip install dace==$DACE_VERSION
- tox -r -e stencil_tests -c model/ --verbose -- --benchmark-only --backend=dace_cpu --grid=simple_grid
- tox -r -e stencil_tests -c model/ --verbose -- --benchmark-only --backend=dace_cpu --grid=icon_grid
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed that we use the same test command (stencil_tests) for both test and benchmark stages. This test command was added in your previous PR and contains the flags --cov --cov-append. I think that these coverage flags should not be used in benchmark mode. Maybe, --verbose could also be skipped on this line.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a new command for benchmarks now.

only:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering: does our cscs-ci run count as manual trigger? I guess so... Also what does the only: -main refer to the target branch of the PR or are these options ignored in our setup since we run it from outside gitlab?

(don't know how it works and currentlyit runs always all of the jobs and the benchmarks take quite long... once we add the datatest that will get worse, response time wise...)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure to be honest since @edopao added these dace jobs, maybe he can explain more. I would assume these benchmarks run only on main but have to be manually triggered, how I am not sure. Currently the dace jobs seem to be not run when using cscs-ci run.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As commented in today's standup meeting, the intention of this setting was to run the dace benchmark on main after PR is merged. However, this setting is ignored in our setup, as also noted above. I agree that we could have a separate CI pipeline for benchmarking, automatically triggered after PR is merged or by a daily job.

- main
when: manual

benchmark_model_dace_gpu_simple_grid:
benchmark_model_dace_gpu_icon_grid:
extends: .test_template
stage: benchmark
script:
- pip install dace==$DACE_VERSION
- tox -r -e stencil_tests -c model/ --verbose -- --benchmark-only --backend=dace_gpu --grid=simple_grid
- tox -r -e stencil_tests -c model/ --verbose -- --benchmark-only --backend=dace_gpu --grid=icon_grid
only:
- main
when: manual

benchmark_model_gtfn_cpu_simple_grid:
benchmark_model_gtfn_cpu_icon_grid:
extends: .test_template
stage: benchmark
script:
- tox -r -e stencil_tests -c model/ --verbose -- --benchmark-only --backend=gtfn_cpu --grid=simple_grid
- tox -r -e stencil_tests -c model/ --verbose -- --benchmark-only --backend=gtfn_cpu --grid=icon_grid

benchmark_model_gtfn_gpu_simple_grid:
benchmark_model_gtfn_gpu_icon_grid:
extends: .test_template
stage: benchmark
script:
- tox -r -e stencil_tests -c model/ --verbose -- --benchmark-only --backend=gtfn_gpu --grid=simple_grid
- tox -r -e stencil_tests -c model/ --verbose -- --benchmark-only --backend=gtfn_gpu --grid=icon_grid
1 change: 1 addition & 0 deletions model/atmosphere/diffusion/tests/conftest.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,4 +16,5 @@
)
from icon4py.model.common.test_utils.helpers import ( # noqa : F401 # fixtures from test_utils
backend,
uses_icon_grid_with_otf,
)
Original file line number Diff line number Diff line change
Expand Up @@ -81,7 +81,12 @@ def reference(
return dict(theta_v=theta_v, exner=exner)

@pytest.fixture
def input_data(self, grid):
def input_data(self, grid, uses_icon_grid_with_otf):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have merged the verification of the global (EXCLAIM Aquaplanet) run, that means there is an additional serialized dataset (which for the datatest we would need to upload to the server) but it contains a global grid. Maybe using that one instead of the mch_ch_r04b09 local experiment would solve some of these issues. Lets discuss this...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok sounds good, let's discuss it tomorrow, uploading to the server should be relatively straightforward, Andreas can help us.

if uses_icon_grid_with_otf:
pytest.skip(
"Execution domain needs to be restricted or boundary taken into account in stencil."
)

kh_smag_e = random_field(grid, EdgeDim, KDim)
inv_dual_edge_length = random_field(grid, EdgeDim)
theta_v_in = random_field(grid, CellDim, KDim)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -98,7 +98,12 @@ def reference(
return dict(vn=vn)

@pytest.fixture
def input_data(self, grid):
def input_data(self, grid, uses_icon_grid_with_otf):
if uses_icon_grid_with_otf:
pytest.skip(
"Execution domain needs to be restricted or boundary taken into account in stencil."
)

edge = indices_field(EdgeDim, grid, is_halfdim=False, dtype=int32)

u_vert = random_field(grid, VertexDim, KDim)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,6 @@
import numpy as np
import pytest
from gt4py.next.ffront.fbuiltins import int32
from gt4py.next.program_processors.otf_compile_executor import OTFCompileExecutor

from icon4py.model.atmosphere.diffusion.stencils.calculate_nabla2_for_z import (
calculate_nabla2_for_z,
Expand Down Expand Up @@ -55,11 +54,12 @@ def reference(
return dict(z_nabla2_e=z_nabla2_e)

@pytest.fixture
def input_data(self, grid, backend):
if isinstance(backend, OTFCompileExecutor):
def input_data(self, grid, uses_icon_grid_with_otf):
if uses_icon_grid_with_otf:
pytest.skip(
"Execution domain needs to be restricted or boundary taken into account in stencil."
)

kh_smag_e = random_field(grid, EdgeDim, KDim, dtype=vpfloat)
inv_dual_edge_length = random_field(grid, EdgeDim, dtype=wpfloat)
theta_v = random_field(grid, CellDim, KDim, dtype=wpfloat)
Expand Down
1 change: 1 addition & 0 deletions model/atmosphere/dycore/tests/conftest.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,4 +16,5 @@
)
from icon4py.model.common.test_utils.helpers import ( # noqa : F401 # fixtures from test_utils
backend,
uses_icon_grid_with_otf,
)
Original file line number Diff line number Diff line change
Expand Up @@ -116,7 +116,12 @@ def reference(
return dict(ddt_vn_apc=ddt_vn_apc)

@pytest.fixture
def input_data(self, grid):
def input_data(self, grid, uses_icon_grid_with_otf):
if uses_icon_grid_with_otf:
pytest.skip(
"Execution domain needs to be restricted or boundary taken into account in stencil."
)

z_kin_hor_e = random_field(grid, EdgeDim, KDim)
coeff_gradekin = random_field(grid, EdgeDim, E2CDim)
coeff_gradekin_new = as_1D_sparse_field(coeff_gradekin, ECDim)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -192,7 +192,12 @@ def reference(
)

@pytest.fixture
def input_data(self, grid):
def input_data(self, grid, uses_icon_grid_with_otf):
if uses_icon_grid_with_otf:
pytest.skip(
"Execution domain needs to be restricted or boundary taken into account in stencil."
)

c_intp = random_field(grid, VertexDim, V2CDim)
vn = random_field(grid, EdgeDim, KDim)
rbf_vec_coeff_e = random_field(grid, EdgeDim, E2C2EDim)
Expand Down
2 changes: 0 additions & 2 deletions model/common/src/icon4py/model/common/grid/icon.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,6 @@
C2E2CDim,
C2E2CODim,
C2EDim,
C2VDim,
CECDim,
CEDim,
CellDim,
Expand Down Expand Up @@ -56,7 +55,6 @@ def __init__(self):
"E2C2V": (self._get_offset_provider, E2C2VDim, EdgeDim, VertexDim),
"V2E": (self._get_offset_provider, V2EDim, VertexDim, EdgeDim),
"V2C": (self._get_offset_provider, V2CDim, VertexDim, CellDim),
"C2V": (self._get_offset_provider, C2VDim, CellDim, VertexDim),
"E2ECV": (self._get_offset_provider_for_sparse_fields, E2C2VDim, EdgeDim, ECVDim),
"C2CEC": (self._get_offset_provider_for_sparse_fields, C2E2CDim, CellDim, CECDim),
"C2CE": (self._get_offset_provider_for_sparse_fields, C2EDim, CellDim, CEDim),
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@
from .data_handling import download_and_extract
from .datatest_utils import (
DATA_URIS,
SER_DATA_BASEPATH,
SERIALIZED_DATA_BASEPATH,
create_icon_serial_data_provider,
get_datapath_for_ranked_data,
get_processor_properties_for_run,
Expand All @@ -35,7 +35,7 @@ def processor_props(request):

@pytest.fixture(scope="session")
def ranked_data_path(processor_props):
return get_ranked_data_path(SER_DATA_BASEPATH, processor_props)
return get_ranked_data_path(SERIALIZED_DATA_BASEPATH, processor_props)


@pytest.fixture(scope="session")
Expand Down
20 changes: 14 additions & 6 deletions model/common/src/icon4py/model/common/test_utils/datatest_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,25 +10,33 @@
# distribution for a copy of the license or check <https://www.gnu.org/licenses/>.
#
# SPDX-License-Identifier: GPL-3.0-or-later

import os
from pathlib import Path

from icon4py.model.common.decomposition.definitions import get_processor_properties


TEST_UTILS_PATH = Path(__file__).parent
MODEL_PATH = TEST_UTILS_PATH.parent.parent
COMMON_PATH = MODEL_PATH.parent.parent.parent.parent
BASE_PATH = COMMON_PATH.parent.joinpath("testdata")
def get_serialized_data_path() -> Path:
test_utils_path = Path(__file__).parent
model_path = test_utils_path.parent.parent
common_path = model_path.parent.parent.parent.parent
env_base_path = os.getenv("SERIALIZED_DATA_PATH")

if env_base_path:
return Path(env_base_path)
else:
return common_path.parent.joinpath("serialized_data")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are changing the default path of serialized data (was testdata before). I did a quick git grep testdata and found some places (e.g. README, .gitignore and code comments) that need to be updated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch thanks, I've updated it.



SERIALIZED_DATA_PATH = get_serialized_data_path()
SERIALIZED_DATA_BASEPATH = SERIALIZED_DATA_PATH.joinpath("ser_icondata")

# TODO: a run that contains all the fields needed for dycore, diffusion, interpolation fields needs to be consolidated
DATA_URIS = {
1: "https://polybox.ethz.ch/index.php/s/re46l1xnJZ4uCMx/download",
2: "https://polybox.ethz.ch/index.php/s/YyC5qDJWyC39y7u/download",
4: "https://polybox.ethz.ch/index.php/s/UIHOVJs6FVPpz9V/download",
}
SER_DATA_BASEPATH = BASE_PATH.joinpath("ser_icondata")


def get_processor_properties_for_run(run_instance):
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@

from icon4py.model.common.decomposition.definitions import SingleNodeRun
from icon4py.model.common.test_utils.datatest_utils import (
SER_DATA_BASEPATH,
SERIALIZED_DATA_BASEPATH,
create_icon_serial_data_provider,
get_datapath_for_ranked_data,
get_processor_properties_for_run,
Expand All @@ -25,7 +25,7 @@

def get_icon_grid():
processor_properties = get_processor_properties_for_run(SingleNodeRun())
ranked_path = get_ranked_data_path(SER_DATA_BASEPATH, processor_properties)
ranked_path = get_ranked_data_path(SERIALIZED_DATA_BASEPATH, processor_properties)
data_path = get_datapath_for_ranked_data(ranked_path)
icon_data_provider = create_icon_serial_data_provider(data_path, processor_properties)
grid_savepoint = icon_data_provider.from_savepoint_grid()
Expand Down
14 changes: 14 additions & 0 deletions model/common/src/icon4py/model/common/test_utils/helpers.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,8 +21,10 @@
from gt4py.next import common as gt_common
from gt4py.next import constructors
from gt4py.next.ffront.decorator import Program
from gt4py.next.program_processors.otf_compile_executor import OTFCompileExecutor

from ..grid.base import BaseGrid
from ..grid.icon import IconGrid


try:
Expand Down Expand Up @@ -219,3 +221,15 @@ def __init_subclass__(cls, **kwargs):
super().__init_subclass__(**kwargs)
setattr(cls, f"test_{cls.__name__}", _test_validation)
setattr(cls, f"test_{cls.__name__}_benchmark", _test_execution_benchmark)


@pytest.fixture
def uses_icon_grid_with_otf(backend, grid):
"""Check whether we are using a compiled backend with the icon_grid.

Is needed to skip certain stencils where the execution domain needs to be restricted or boundary taken into account.
"""
if hasattr(backend, "executor") and isinstance(grid, IconGrid):
if isinstance(backend.executor, OTFCompileExecutor):
return True
return False
4 changes: 2 additions & 2 deletions model/common/tests/grid_tests/conftest.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,10 +25,10 @@
processor_props,
ranked_data_path,
)
from icon4py.model.common.test_utils.datatest_utils import BASE_PATH
from icon4py.model.common.test_utils.datatest_utils import SERIALIZED_DATA_PATH


grids_path = BASE_PATH.joinpath("grids")
grids_path = SERIALIZED_DATA_PATH.joinpath("grids")
r04b09_dsl_grid_path = grids_path.joinpath("mch_ch_r04b09_dsl")
r04b09_dsl_data_file = r04b09_dsl_grid_path.joinpath("mch_ch_r04b09_dsl_grids_v1.tar.gz").name
r02b04_global_grid_path = grids_path.joinpath("r02b04_global")
Expand Down
1 change: 1 addition & 0 deletions model/tox.ini
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ skipsdist = true
passenv =
PIP_USER
PYTHONUSERBASE
SERIALIZED_DATA_PATH
deps =
-r {toxinidir}/requirements-dev.txt
commands =
Expand Down