Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimsed lazy indexing - h5netcdf backend - Active storage reductions #805

Merged
merged 158 commits into from
Oct 30, 2024
Merged
Show file tree
Hide file tree
Changes from 156 commits
Commits
Show all changes
158 commits
Select commit Hold shift + click to select a range
8f95bb2
dev
davidhassell Nov 16, 2022
815d933
dev
davidhassell Nov 16, 2022
1d8e39b
actify methods with @active_stage decorator
davidhassell Nov 17, 2022
24ec636
tidy
davidhassell Nov 17, 2022
d9c9c7f
refactor
davidhassell Nov 18, 2022
54bef6b
refactor
davidhassell Nov 18, 2022
eddd377
dev
davidhassell Nov 21, 2022
0825c56
dev
davidhassell Nov 22, 2022
033fed9
Merge branch 'lama-to-dask' of github.com:NCAS-CMS/cf-python into das…
davidhassell Dec 8, 2022
5d60f10
Merge branch 'dask-active-storage' of github.com:davidhassell/cf-pyth…
davidhassell Dec 8, 2022
044ccc9
dev
davidhassell Feb 8, 2023
f5d2834
dev
davidhassell Feb 9, 2023
02ce7b7
dev
davidhassell Feb 9, 2023
0d4276f
merge conflicts
davidhassell Feb 10, 2023
7dff9a0
linting
davidhassell Feb 10, 2023
ede2946
dev
davidhassell Feb 10, 2023
a32ced6
dev
davidhassell Feb 10, 2023
b24d521
dev
davidhassell Feb 11, 2023
669f3cd
dev
davidhassell Mar 2, 2023
248b67c
Merge branch 'main' of github.com:NCAS-CMS/cf-python into dask-active…
davidhassell Mar 2, 2023
b2b0c7e
dev
davidhassell Mar 2, 2023
e95a624
dev
davidhassell Mar 2, 2023
f74cf7a
dev
davidhassell Mar 3, 2023
d464a04
dev
davidhassell Mar 3, 2023
8bc3a92
dev
davidhassell Mar 3, 2023
68fb18a
dev
davidhassell Mar 17, 2023
3a5c3a2
dev
davidhassell Mar 17, 2023
bd1a1be
Merge branch 'main' of github.com:NCAS-CMS/cf-python into dask-active…
davidhassell Mar 17, 2023
9fcc737
dev
davidhassell Mar 17, 2023
a4c1267
Merge pull request #616 from davidhassell/dask-active-storage
davidhassell Mar 17, 2023
c2e7eca
move netcdf lock
davidhassell Mar 18, 2023
5ee6886
upstream merge
davidhassell Sep 25, 2023
064de91
dev
davidhassell Sep 25, 2023
1383a1d
Merge branch 'main' of github.com:davidhassell/cf-python into active-…
davidhassell Jan 18, 2024
bfbdb33
Merge branch 'main' of github.com:NCAS-CMS/cf-python into active-stor…
davidhassell Jan 22, 2024
78b7269
dev
davidhassell Jan 22, 2024
39a5a64
dev
davidhassell Jan 25, 2024
37f8b7f
dev
davidhassell Jan 25, 2024
fe429b7
dev
davidhassell Jan 26, 2024
417a297
dev
davidhassell Jan 26, 2024
5ef961c
dev
davidhassell Jan 26, 2024
2abc8c4
dev
davidhassell Jan 28, 2024
589bd16
dev
davidhassell Jan 28, 2024
92fc8e2
dev
davidhassell Jan 30, 2024
d54fc40
dev
davidhassell Feb 1, 2024
80ac2e6
dev
davidhassell Feb 2, 2024
62edeb8
dependency versions
davidhassell Feb 2, 2024
ebb94cc
dev
davidhassell Feb 2, 2024
af7c20a
dev
davidhassell Feb 4, 2024
31b2b64
dev
davidhassell Feb 5, 2024
7b6cabe
dev
davidhassell Feb 6, 2024
a038030
dev
davidhassell Feb 7, 2024
c6e94e7
dev
davidhassell Feb 8, 2024
3b8ae98
dev
davidhassell Feb 9, 2024
1f90a48
dev
davidhassell Feb 12, 2024
866ccca
dev
davidhassell Feb 13, 2024
baee889
dev
davidhassell Feb 13, 2024
8108dd6
dev
davidhassell Feb 13, 2024
28fdf10
dev
davidhassell Feb 14, 2024
4fcb960
dev
davidhassell Feb 14, 2024
96cdc8f
dev
davidhassell Feb 16, 2024
16131f8
dev
davidhassell Mar 4, 2024
1023ad0
dev
davidhassell Mar 4, 2024
4334cff
upstream merge
davidhassell Mar 5, 2024
6eef10a
dev
davidhassell Mar 5, 2024
e829e58
dev
davidhassell Mar 5, 2024
e2c892c
dev
davidhassell Mar 11, 2024
aa8d505
dev
davidhassell Mar 12, 2024
14a4de7
dev
davidhassell Mar 15, 2024
4825684
dev
davidhassell Mar 15, 2024
36f1ecc
dev
davidhassell Mar 15, 2024
df2f23b
dev
davidhassell Mar 15, 2024
d01d427
dev
davidhassell Mar 17, 2024
297f33b
dev
davidhassell Mar 17, 2024
c7a9cb9
dev
davidhassell Mar 17, 2024
d48a7cf
dev
davidhassell Mar 18, 2024
1c73b89
dev
davidhassell Mar 18, 2024
80d533d
dev
davidhassell Mar 18, 2024
4bfa673
dev
davidhassell Mar 19, 2024
82079fd
dev
davidhassell Mar 19, 2024
9e6d4a2
dev
davidhassell Mar 19, 2024
be63ec7
dev
davidhassell Mar 20, 2024
2a16242
dev
davidhassell Mar 20, 2024
b3907b2
dev
davidhassell Mar 20, 2024
b8b52a7
dev
davidhassell Mar 20, 2024
81f3794
dev
davidhassell Mar 21, 2024
8c39e35
dev
davidhassell Mar 21, 2024
146b4ef
dev
davidhassell Mar 21, 2024
7e633e6
dev
davidhassell Mar 22, 2024
2ac6cbd
dev
davidhassell Mar 22, 2024
9b373ae
dev
davidhassell Mar 22, 2024
128e7ef
main merge conflicts
davidhassell Mar 26, 2024
2aca4a1
Merge branch 'active-storage' of github.com:davidhassell/cf-python in…
davidhassell Mar 26, 2024
080f227
dev
davidhassell Mar 26, 2024
b127508
dev
davidhassell Apr 3, 2024
3a2ad82
dev
davidhassell Apr 3, 2024
157eeea
dev
davidhassell Apr 4, 2024
ab45235
Merge branch 'main' of github.com:NCAS-CMS/cf-python into active-storage
davidhassell Apr 4, 2024
930812b
dev
davidhassell Apr 4, 2024
0ff02be
Merge branch 'main' of github.com:NCAS-CMS/cf-python into active-storage
davidhassell Apr 5, 2024
a3f805c
dev
davidhassell Apr 5, 2024
75e4897
dev
davidhassell Apr 5, 2024
222a18b
dev
davidhassell Apr 5, 2024
bdbbd6c
dev
davidhassell Apr 6, 2024
d4ec974
dev
davidhassell Apr 8, 2024
20dc358
dev
davidhassell Apr 8, 2024
b3dc1bd
dev
davidhassell Apr 20, 2024
7987bde
dev
davidhassell Apr 21, 2024
18b3e09
dev
davidhassell Apr 22, 2024
87e249e
2-d np index
davidhassell Apr 23, 2024
6973177
dask vn
davidhassell Apr 23, 2024
bac1cc8
fragment get_array
davidhassell Apr 24, 2024
bd625f5
dev
davidhassell Apr 25, 2024
a279f21
Merge branch 'active-storage' of github.com:davidhassell/cf-python in…
davidhassell Apr 26, 2024
aa6d04c
dev
davidhassell May 1, 2024
d05c50b
dev
davidhassell May 2, 2024
9b56aae
engine -> backend
davidhassell Jul 10, 2024
68dce62
merge conflicts
davidhassell Jul 12, 2024
88cdbe6
dev
davidhassell Jul 15, 2024
a1dc78f
new non-dask code start
davidhassell Jul 19, 2024
dc4ce6f
dev
davidhassell Jul 19, 2024
eff61c1
dev
davidhassell Jul 19, 2024
03eeb8c
dev
davidhassell Jul 19, 2024
4c6adad
dev
davidhassell Jul 19, 2024
8125510
dev
davidhassell Jul 22, 2024
57561a0
dev
davidhassell Aug 6, 2024
581648d
dev
davidhassell Aug 6, 2024
baf9898
Fix typos
davidhassell Oct 21, 2024
8697288
Remove dead code
davidhassell Oct 21, 2024
03067a2
Remove dead code
davidhassell Oct 21, 2024
20fe071
When a note isn't a note
davidhassell Oct 21, 2024
ef8d9ae
trap no fragment files
davidhassell Oct 21, 2024
8b0086e
Typo
davidhassell Oct 21, 2024
bd45bda
Update cf.environment docs
davidhassell Oct 21, 2024
e2bdf64
Clarify is_log_level_info docs
davidhassell Oct 21, 2024
af54bd1
dev
davidhassell Oct 21, 2024
cc5aca1
Merge branch 'active-storage-new' of github.com:davidhassell/cf-pytho…
davidhassell Oct 21, 2024
df7a672
Fix typos
davidhassell Oct 22, 2024
9b4f721
Fix typos
davidhassell Oct 22, 2024
7dd8ff5
Typo
davidhassell Oct 22, 2024
96eb691
activestorage installation instructions
davidhassell Oct 22, 2024
2f9a47f
Typo
davidhassell Oct 22, 2024
9b0e8a6
dask_task_graph.png -> dask_task_graph.svg
davidhassell Oct 22, 2024
6af723c
remove redundant active_storage test
davidhassell Oct 22, 2024
dc173a3
fix active doc string
davidhassell Oct 22, 2024
0509135
trap: No module named 'activestorage'
davidhassell Oct 22, 2024
b72c17b
correct hdf5 chunks after data operations
davidhassell Oct 22, 2024
dd92b55
set default mtol=1 everywhere, and update docstrings
davidhassell Oct 22, 2024
66b84ae
warning note about current and futiue Active class APIs
davidhassell Oct 22, 2024
edc51cd
warning note about current and future Active class APIs
davidhassell Oct 22, 2024
9e0b446
linting
davidhassell Oct 22, 2024
885a67d
Fix missing methods in on-line API docs
davidhassell Oct 23, 2024
5d03edf
Remove dead code
davidhassell Oct 23, 2024
e90430a
\emptyset
davidhassell Oct 23, 2024
4db0276
asanyarray0
davidhassell Oct 23, 2024
d75dcd1
asanyarray changes
davidhassell Oct 25, 2024
9d8f8bb
dev
davidhassell Oct 28, 2024
93fa1f0
Active storage placeholder
davidhassell Oct 28, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 23 additions & 2 deletions Changelog.rst
Original file line number Diff line number Diff line change
@@ -1,3 +1,22 @@
version NEXTVERSION + 1
-----------------------

**2024-??-??**

* Allow access to netCDF-4 files in S3 object stores
(https://github.com/NCAS-CMS/cf-python/issues/712)
* New class `cf.H5netcdfArray`
* New class `cf.NetCDF4Array`
* New class `cf.CFAH5netcdfArray`
* New class `cf.CFANetCDF4Array`
* New dependency: ``h5netcdf>=1.3.0``
* New dependency: ``h5py>=3.10.0``
* New dependency: ``s3fs>=2024.2.0``
* Changed dependency: ``1.11.2.0<=cfdm<1.11.3.0``
* Changed dependency: ``cfunits>=3.3.7``

----

version NEXTVERSION
-------------------

Expand Down Expand Up @@ -130,6 +149,8 @@ version 3.16.0
* Changed dependency: ``1.11.0.0<=cfdm<1.11.1.0``
* New dependency: ``scipy>=1.10.0``

----

version 3.15.4
--------------

Expand Down Expand Up @@ -268,7 +289,7 @@ version 3.14.1

----

version 3.14.0 (*first Dask release*)
version 3.14.0 (*first Dask version*)
-------------------------------------

**2023-01-31**
Expand Down Expand Up @@ -303,7 +324,7 @@ version 3.14.0 (*first Dask release*)

----

version 3.13.1 (*last LAMA release*)
version 3.13.1 (*last LAMA version*)
------------------------------------

**2022-10-17**
Expand Down
27 changes: 16 additions & 11 deletions cf/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,10 +13,17 @@

* read field constructs from netCDF, CDL, PP and UM datasets,

* read field constructs and domain constructs from netCDF, CDL, PP and
UM datasets with a choice of netCDF backends,

* read files from OPeNDAP servers and S3 object stores,

* create new field constructs in memory,

* write and append field constructs to netCDF datasets on disk,

* read, write, and manipulate UGRID mesh topologies,

* read, write, and create coordinates defined by geometry cells,

* read netCDF and CDL datasets containing hierarchical groups,
Expand Down Expand Up @@ -74,8 +81,8 @@
"""

__Conventions__ = "CF-1.11"
__date__ = "2024-04-26"
__version__ = "3.16.2"
__date__ = "2024-??-??"
__version__ = "3.17.0"

_requires = (
"numpy",
Expand Down Expand Up @@ -199,8 +206,8 @@
)

# Check the version of cfdm
_minimum_vn = "1.11.1.0"
_maximum_vn = "1.11.2.0"
_minimum_vn = "1.11.2.0"
_maximum_vn = "1.11.3.0"
_cfdm_version = Version(cfdm.__version__)
if not Version(_minimum_vn) <= _cfdm_version < Version(_maximum_vn):
raise RuntimeError(
Expand All @@ -209,12 +216,6 @@
)

# Check the version of dask
_minimum_vn = "2022.12.1"
if Version(dask.__version__) < Version(_minimum_vn):
raise RuntimeError(
f"Bad dask version: cf requires dask>={_minimum_vn}. "
f"Got {dask.__version__} at {dask.__file__}"
)

# Check the version of Python
_minimum_vn = "3.8.0"
Expand Down Expand Up @@ -274,15 +275,19 @@
from .data.array import (
BoundsFromNodesArray,
CellConnectivityArray,
CFANetCDFArray,
CFAH5netcdfArray,
CFANetCDF4Array,
FullArray,
GatheredArray,
H5netcdfArray,
NetCDFArray,
NetCDF4Array,
PointTopologyArray,
RaggedContiguousArray,
RaggedIndexedArray,
RaggedIndexedContiguousArray,
SubsampledArray,
UMArray,
)

from .data.fragment import (
Expand Down
2 changes: 1 addition & 1 deletion cf/cellmethod.py
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,7 @@ class CellMethod(cfdm.CellMethod):
def __new__(cls, *args, **kwargs):
"""This must be overridden in subclasses.

.. versionadded:: (cfdm) 3.7.0
.. versionadded:: 3.7.0

"""
instance = super().__new__(cls)
Expand Down
82 changes: 33 additions & 49 deletions cf/cfimplementation.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,12 +26,16 @@
TiePointIndex,
)
from .data import Data

# REVIEW: h5: `cfimplementation.py`: import `CFAH5netcdfArray`, `CFANetCDF4Array`, `H5netcdfArray`,`NetCDF4Array`
from .data.array import (
BoundsFromNodesArray,
CellConnectivityArray,
CFANetCDFArray,
CFAH5netcdfArray,
CFANetCDF4Array,
GatheredArray,
NetCDFArray,
H5netcdfArray,
NetCDF4Array,
PointTopologyArray,
RaggedContiguousArray,
RaggedIndexedArray,
Expand Down Expand Up @@ -112,65 +116,41 @@ def set_construct(self, parent, construct, axes=None, copy=True, **kwargs):
parent, construct, axes=axes, copy=copy, **kwargs
)

def initialise_CFANetCDFArray(
self,
filename=None,
address=None,
dtype=None,
mask=True,
units=False,
calendar=False,
instructions=None,
substitutions=None,
term=None,
x=None,
**kwargs,
):
"""Return a `CFANetCDFArray` instance.
# REVIEW: h5: `initialise_CFANetCDF4Array`: new method to initialise `CFANetCDF4Array`
def initialise_CFANetCDF4Array(self, **kwargs):
"""Return a `CFANetCDF4Array` instance.

:Parameters:

filename: `str`

address: (sequence of) `str` or `int`

dytpe: `numpy.dtype`

mask: `bool`, optional
kwargs: optional
Initialisation parameters to pass to the new instance.

units: `str` or `None`, optional
:Returns:

calendar: `str` or `None`, optional
`CFANetCDF4Array`

instructions: `str`, optional
"""
cls = self.get_class("CFANetCDF4Array")
return cls(**kwargs)

substitutions: `dict`, optional
# REVIEW: h5: `initialise_CFAH5netcdfArray`: new method to initialise `CFAH5netcdfArray`
def initialise_CFAH5netcdfArray(self, **kwargs):
"""Return a `CFAH5netcdfArray` instance.

term: `str`, optional
.. versionadded:: NEXTVERSION

x: `dict`, optional
:Parameters:

kwargs: optional
Ignored.
Initialisation parameters to pass to the new instance.

:Returns:

`CFANetCDFArray`
`CFAH5netcdfArray`

"""
cls = self.get_class("CFANetCDFArray")
return cls(
filename=filename,
address=address,
dtype=dtype,
mask=mask,
units=units,
calendar=calendar,
instructions=instructions,
substitutions=substitutions,
term=term,
x=x,
)
cls = self.get_class("CFAH5netcdfArray")
return cls(**kwargs)


_implementation = CFImplementation(
Expand All @@ -179,7 +159,8 @@ def initialise_CFANetCDFArray(
CellConnectivity=CellConnectivity,
CellMeasure=CellMeasure,
CellMethod=CellMethod,
CFANetCDFArray=CFANetCDFArray,
CFAH5netcdfArray=CFAH5netcdfArray,
CFANetCDF4Array=CFANetCDF4Array,
CoordinateReference=CoordinateReference,
DimensionCoordinate=DimensionCoordinate,
Domain=Domain,
Expand All @@ -202,7 +183,8 @@ def initialise_CFANetCDFArray(
BoundsFromNodesArray=BoundsFromNodesArray,
CellConnectivityArray=CellConnectivityArray,
GatheredArray=GatheredArray,
NetCDFArray=NetCDFArray,
H5netcdfArray=H5netcdfArray,
NetCDF4Array=NetCDF4Array,
PointTopologyArray=PointTopologyArray,
RaggedContiguousArray=RaggedContiguousArray,
RaggedIndexedArray=RaggedIndexedArray,
Expand Down Expand Up @@ -236,7 +218,8 @@ def implementation():
'CellConnectivityArray': cf.data.array.cellconnectivityarray.CellConnectivityArray,
'CellMeasure': cf.cellmeasure.CellMeasure,
'CellMethod': cf.cellmethod.CellMethod,
'CFANetCDFArray': cf.data.array.cfanetcdfarray.CFANetCDFArray,
'CFAH5netcdfArray': cf.data.array.cfah5netcdfarray.CFAH5netcdfArray,
'CFANetCDF4Array': cf.data.array.cfanetcdf4array.CFANetCDF4Array,
'CoordinateReference': cf.coordinatereference.CoordinateReference,
'DimensionCoordinate': cf.dimensioncoordinate.DimensionCoordinate,
'Domain': cf.domain.Domain,
Expand All @@ -257,7 +240,8 @@ def implementation():
'PartNodeCountProperties': cf.partnodecountproperties.PartNodeCountProperties,
'Data': cf.data.data.Data,
'GatheredArray': cf.data.array.gatheredarray.GatheredArray,
'NetCDFArray': cf.data.array.netcdfarray.NetCDFArray,
'H5netcdfArray': cf.data.array.h5netcdfarray.H5netcdfArray,
'NetCDF4Array': cf.data.array.netcdf4array.NetCDF4Array,
'PointTopologyArray': <class 'cf.data.array.pointtopologyarray.PointTopologyArray'>,
'RaggedContiguousArray': cf.data.array.raggedcontiguousarray.RaggedContiguousArray,
'RaggedIndexedArray': cf.data.array.raggedindexedarray.RaggedIndexedArray,
Expand Down
4 changes: 4 additions & 0 deletions cf/constants.py
Original file line number Diff line number Diff line change
Expand Up @@ -63,6 +63,10 @@
"LOG_LEVEL": logging.getLevelName(logging.getLogger().level),
"BOUNDS_COMBINATION_MODE": "AND",
"CHUNKSIZE": parse_bytes(_CHUNKSIZE),
# REVIEW: active: `CONSTANTS`: new constants 'active_storage', 'active_storage_url', 'active_storage_max_requests'
"active_storage": False,
"active_storage_url": None,
"active_storage_max_requests": 100,
}

masked = np.ma.masked
Expand Down
13 changes: 12 additions & 1 deletion cf/data/array/__init__.py
Original file line number Diff line number Diff line change
@@ -1,9 +1,20 @@
from .boundsfromnodesarray import BoundsFromNodesArray
from .cellconnectivityarray import CellConnectivityArray
from .cfanetcdfarray import CFANetCDFArray

# REVIEW: h5: `__init__.py`: import `CFAH5netcdfArray`
from .cfah5netcdfarray import CFAH5netcdfArray

# REVIEW: h5: `__init__.py`: import `CFAH5netcdfArray`
from .cfanetcdf4array import CFANetCDF4Array
from .fullarray import FullArray
from .gatheredarray import GatheredArray

# REVIEW: h5: `__init__.py`: import `H5netcdfArray`
from .h5netcdfarray import H5netcdfArray
from .netcdfarray import NetCDFArray

# REVIEW: h5: `__init__.py`: import `NetCDF4Array`
from .netcdf4array import NetCDF4Array
from .pointtopologyarray import PointTopologyArray
from .raggedcontiguousarray import RaggedContiguousArray
from .raggedindexedarray import RaggedIndexedArray
Expand Down
11 changes: 11 additions & 0 deletions cf/data/array/cfah5netcdfarray.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
# REVIEW: h5: `CFAH5netcdfArray`: New class for accessing CFA with `h5netcdf`
from .h5netcdfarray import H5netcdfArray
from .mixin import CFAMixin


class CFAH5netcdfArray(CFAMixin, H5netcdfArray):
"""A CFA-netCDF array accessed with `h5netcdf`

.. versionadded:: NEXTVERSION

"""
11 changes: 11 additions & 0 deletions cf/data/array/cfanetcdf4array.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
# REVIEW: h5: `CFAnetCDF4Array`: New class for accessing CFA with `netCDF4`
from .mixin import CFAMixin
from .netcdf4array import NetCDF4Array


class CFANetCDF4Array(CFAMixin, NetCDF4Array):
"""A CFA-netCDF array accessed with `netCDF4`.

.. versionadded:: NEXTVERSION

"""
Loading