Update CoCiP-grid data access patterns to reduce duplicate chunk downloads #171

thabbott · 2024-04-02T20:09:00Z

Changes

Updates data access patterns in CocipGrid to avoid discarding and reloading met data after it has already been loaded into memory.

Benchmarks for the API preprocessor (https://github.com/contrailcirrus/api-preprocessor/tree/triage/coarse-fine-zarr-chunking):

	`main`	`feature/cocip-grid-access-patterns`
Remote zarr store	Wall time: 386 s Peak RAM: 3369 MiB Ingress: 70.4 GiB Model-level variables: 2.15 downloads/chunk Single-level variables: 2.60 downloads/chunk	Wall time: 202 s Peak RAM: 4037 MiB Ingress: 38.3 GiB Model-level variables: 1.17 downloads/chunk Single-level variables: 1.80 downloads/chunk
Remote zarr store, no dask	Wall time: 235 s Peak RAM: 25298 MiB Ingress: 58.8 GiB Model-level variables: 1.55 downloads/chunk Single-level variables: 1.97 downloads/chunk	Wall time: 202 s Peak RAM: 28050 MiB Ingress: 37.9 GiB Model-level variables: 1.16 downloads/chunk Single-level variables: 1.97 downloads/chunk
Local zarr store	Wall time: 245 s Peak RAM: 3726 MiB	Wall time: 210 s Peak RAM: 4530 MiB
Local zarr store, no dask	Wall time: 303 s Peak RAM: 22952 MiB	Wall time: 297 s Peak RAM: 22966 MiB

Numbers are all running on a c3d-highmem-4 VM (4 vCPUs, 32 GB RAM). Downloads/chunks are averages over all accessed chunks.

When using dask, the updated data access pattern cuts ingress and wall time by something like a factor of two, but comes with some memory overhead (a bit less than 1 GiB).

I've included benchmarks without dask (disabled by calling xr.open_zarr with chunks=None) to check that I'm not causing any major performance regressions for that case, but disabling dask increases memory consumption significantly, so I don't think those benchmarks are particularly relevant for the API preprocessor. (I will admit that I don't understand why dask-less benchmarks are slower with a local zarr store than a remote zarr store.... Not sure that's worth digging into, though.)

@nickmasson, @mlshapiro, open to feedback on whether this seems like a useful improvement. If the consensus is yes, I'll update unit tests (failing since the latest xarray release) and add a test or two to check for expected behavior in the updated _maybe_downselect_met_rad.

Internals

Modifies _maybe_downselect_met_rad method of CocipGrid to reuse in-memory data when new forecast steps are selected.

Tests

QC test passes locally (make test)
CI tests pass

Reviewer

Select @ reviewer

nickmasson · 2024-04-02T22:14:19Z

@thabbott , this is solid 🦾

I'm of the opinion that it is worth adding these changes and cutting a release.

The additional RAM is a bit surprising, and it isn't a clear cut win for the case where we localize the zarr store. But, given that our nominal case is using a remote zarr store, this seems like a good improvement. Furthermore, if we are operating in a network-contrained environment, the ~50% decrease in data over the wire could lead to an even more substantive gain that what you've profiled here.

Unless there are other concerns, I think full steam ahead 🚀

Once there is a new release of pycontrails, I'll pull it into the api-preprocessor (along with other WIP), and get some numbers based on it running in k8s vs. a VM.

…hanges

…ipy versions

thabbott · 2024-04-05T19:48:20Z

A few other notes after spending some time cleaning up this PR:

Some unit tests broke because changes to netCDF decoding introduced in https://github.com/pydata/xarray/releases/tag/v2024.03.0 (correctly encode/decode _FillValues/missing_values/dtypes for packed data pydata/xarray#8713) lead to slight changes in some fixtures that rely on netCDF files with variables serialized as int16. This isn't something that you can fix by coercing from float64 to float32 after decoding (the value you get still differs from the old decoding straight from int16 to float32). I've updated unit tests to pass for v2024.03.0 (currently used in CI), but this means that unit tests will fail with older versions of xarray. Ideally we would update fixtures to read from netCDF files with values serialized as floats (no decoding necessary), but I don't think we have all the scripts used to generate existing fixtures.
scipy 1.13.0 (https://github.com/scipy/scipy/releases/tag/v1.13.0) introduced some changes to spline interpolation (ENH: use NdBSpline in RegularGridInterpolator to speed up evaluations scipy/scipy#19633) that are incompatible with the current pycontrails RegularGridInterpolator. I'm dealing with this by explicitly falling back to legacy interpolation methods when using scipy 1.13.0 and newer, but we should consider whether we want to support the new spline interpolation methods.

nickmasson

I can't speak to the implementation adjustments following the package version updates. But the implementation logic vis a vis what we discussed looks appropriate, and the summary numbers in the table position this change as a valuable enhancement.

noting:
I haven't been following the recent releases for pycontrails, but it appears that the recent resample_and_fill updates will also go out with this release. I'll make adjustments, if warranted, to the flights-pipeline/spire-ingest-resample-worker, as that was the instigator for those changes.

mlshapiro

This all looks good, thank you for fixing the scipy problem.

If you want me to check into the nc file fixtures for the tests, i'm happy to go back through that

pycontrails/core/interpolation.py

tests/unit/test_cocip_grid.py

tests/unit/conftest.py

draft updated logic for step selection

966648f

thabbott added 4 commits April 4, 2024 20:56

move tests without BADA dependency before tests with BADA dependency

5838f2a

clean up and test step selection logic

cd4e442

fix dtype mismatch by manually converting to float32 where needed

6ba2e80

update values in unit tests to pass post xarray v2024.03.0 decoding c…

825eb39

…hanges

thabbott force-pushed the feature/cocip-grid-access-patterns branch from 75b9af6 to 825eb39 Compare April 5, 2024 17:59

thabbott added 4 commits April 5, 2024 18:38

update values in bada-dependent unit tests

2c2e3f4

update interpolation to fall back on legacy spline methods for new sc…

4f21b1c

…ipy versions

update doctests with post xarray v2024.03.0 outputs

f6deba9

update changelog

442f887

thabbott requested review from mlshapiro and nickmasson April 5, 2024 19:48

thabbott marked this pull request as ready for review April 5, 2024 19:49

thabbott changed the title ~~WIP: update CoCiP-grid data access patterns to reduce duplicate chunk downloads~~ Update CoCiP-grid data access patterns to reduce duplicate chunk downloads Apr 5, 2024

nickmasson approved these changes Apr 5, 2024

View reviewed changes

mlshapiro approved these changes Apr 5, 2024

View reviewed changes

pycontrails/core/interpolation.py Show resolved Hide resolved

tests/unit/test_cocip_grid.py Show resolved Hide resolved

tests/unit/conftest.py Show resolved Hide resolved

thabbott merged commit ff3425d into main Apr 8, 2024
11 checks passed

thabbott deleted the feature/cocip-grid-access-patterns branch April 8, 2024 12:41

thabbott mentioned this pull request Apr 18, 2024

Update doctest fixtures for consistency with xarray v2024.3.0 #178

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update CoCiP-grid data access patterns to reduce duplicate chunk downloads #171

Update CoCiP-grid data access patterns to reduce duplicate chunk downloads #171

thabbott commented Apr 2, 2024 •

edited

Loading

nickmasson commented Apr 2, 2024

thabbott commented Apr 5, 2024 •

edited

Loading

nickmasson left a comment

mlshapiro left a comment

Update CoCiP-grid data access patterns to reduce duplicate chunk downloads #171

Update CoCiP-grid data access patterns to reduce duplicate chunk downloads #171

Conversation

thabbott commented Apr 2, 2024 • edited Loading

Changes

Internals

Tests

Reviewer

nickmasson commented Apr 2, 2024

Unless there are other concerns, I think full steam ahead 🚀

thabbott commented Apr 5, 2024 • edited Loading

nickmasson left a comment

Choose a reason for hiding this comment

mlshapiro left a comment

Choose a reason for hiding this comment

thabbott commented Apr 2, 2024 •

edited

Loading

thabbott commented Apr 5, 2024 •

edited

Loading