Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable file locks in MetDataSource.open_dataset #213

Merged
merged 11 commits into from
Jul 5, 2024
Merged

Conversation

thabbott
Copy link
Contributor

@thabbott thabbott commented Jul 1, 2024

Changes

These changes prevent segmentation faults when reading and writing multi-file datasets with recent netCDF4 versions (see #204) and allow us to remove upper limits on netCDF4 and numpy versions.

These changes revert a change (#68) made to address the issue described in pydata/xarray#4406. That issue remains open, so this change may resurface problems with dask threadlocks.

Breaking changes

  • Remove lock=False as a default keyword argument to xr.open_mfdataset in the MetDataSource.open_dataset method. This reverts a change from v0.44.1 and prevents segmentation faults when using recent versions of netCDF4 (1.7.0 and above).

Internals

  • Remove upper limits on netCDF4 and numpy versions.

Tests

  • QC test passes locally (make test)
  • CI tests pass

Reviewer

@zebengberg

@thabbott thabbott marked this pull request as ready for review July 2, 2024 19:59
Copy link
Contributor

@zebengberg zebengberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Tristan!

CHANGELOG.md Outdated

- Remove upper limits on netCDF4 and numpy versions.
- Remove h5netcdf dependency.
- Update doctests with numpy 2 scalar representation (see [NEP 51](https://numpy.org/neps/nep-0051-scalar-representation.html)).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So will the doctests now fail with numpy<2?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I added a check in the Makefile (d39d70e) to make this requirement explicit.

Trying to keep doctests compatible with numpy 1 and 2 seems tricky.... You can get the same output for scalars by using print, but that doesn't work for scalars wrapped in some other object:

>>> np.float64(1.0)
1.0              # numpy 1
np.float64(1.0)  # numpy 2
>>> print(np.float64(1.0)
1.0  # numpy 1
1.0  # numpy 2
>>> print((np.float64(1.0), np.float64(2.0))
(1.0, 2.0)                          # numpy 1
(np.float64(1.0), np.float64(2.0))  # numpy 2

Happy to discuss this more if you don't think this seems like the right solution!

pycontrails/datalib/goes.py Outdated Show resolved Hide resolved
@thabbott thabbott merged commit 95df7f2 into main Jul 5, 2024
11 checks passed
@thabbott thabbott deleted the fix/dask-locks branch July 5, 2024 21:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants