-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle scalar dataset variables #205
Handle scalar dataset variables #205
Conversation
virtualizarr/tests/test_xarray.py
Outdated
@@ -419,6 +419,32 @@ def test_open_virtual_dataset_passes_expected_args( | |||
} | |||
mock_read_kerchunk.assert_called_once_with(**args) | |||
|
|||
@patch("virtualizarr.kerchunk.parse_array_refs") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wait what exactly is this mocking doing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I couldn't get a repro of the issue by just creating an xarray dataset with a 0-dim variable so I had to debug the problem I was getting on a real netCDF file and mock out the return value of parse_array_refs
to match what I was seeing. Here's a repro if you'd like to give it shot:
from urllib.parse import urlparse
import xarray as xr
import planetary_computer
import virtualizarr
def _href_to_fsspec(href: str) -> str:
url = urlparse(href)
return f"az://{url.path.lstrip('/')}"
def open_planetary_computer_netcdf(href):
account_name = href.split("/")[2].split(".")[0]
container_name = href.split("/")[3]
fs = planetary_computer.get_adlfs_filesystem(account_name, container_name)
return fs.open(_href_to_fsspec(href))
def _get_storage_options(href: str, use_sas_as_credential: bool = False) -> dict:
url = urlparse(href)
if use_sas_as_credential:
return {
"account_name": url.hostname.split(".")[0],
"account_key": url.query,
"anon": False,
}
else:
return {
"account_name": url.hostname.split(".")[0],
"anon": False,
}
href = "https://landcoverdata.blob.core.windows.net/esa-cci-lc/netcdf/C3S-LC-L4-LCCS-Map-300m-P1Y-2016-v2.1.1.nc"
# Next line works, you can inspect the `crs` variable
ds = xr.open_dataset(open_planetary_computer_netcdf(href))
# Next line will raise a `ValueError` unpacking a list of no elements when handling the `crs` kerchunk attrs
vds = virtualizarr.open_virtual_dataset(
_href_to_fsspec(href),
reader_options={
"storage_options": _get_storage_options(planetary_computer.sign(href), use_sas_as_credential=True)
})
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ghidalgo3 I just met up with some folks from the HDF group at a conference last week so I can ask them the proper incantation to use so we can create an hdf fixture with a scalar 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the explanation! I'm happy to merge this as-is or wait if you wanted to test it some other way.
@ghidalgo3 Here is an HDF5 fixture you could use for direct testing of this behavior without mocking.
Note that I used "empty" terminology here rather than "scalar".
Which still has a |
Thanks @sharkinsspatial for the fixtures! I added test cases for both cases without that ugly mock. @TomNicholas if it looks good to you, it's ready to merge. UPDATE: in case you see this in the next 30 minutes, let me just check that the data value is preserved for the scalar case. |
I'm running into the problem described by #62 but I think I don't want to solve it in this PR. You can open files with scalar variables, and empty variables, but if you call Given that, this PR is good to merge. |
@@ -1,3 +1,4 @@ | |||
import h5py |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think h5py
should be added to the test requirements?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, good catch. Is there a way to make the CI conda install be constructed starting from the test
extra + additional conda packages?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably, but let's do that in a follow-up PR.
Some real-world NetCDF files have scalar variables that
kerchunk
parsers to a ChunkDict of{}
. Previously, these variables would need to be explicitly loaded or dropped to make aopen_virtual_dataset
call work. With this change, the variable will be loaded intoxr.Variables
of dimension 0 and their attributes passed up to the XArray dataset.drop_variables
declaration #194docs/releases.rst