Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spec v1 for Ike #18

Merged
merged 1 commit into from
May 14, 2021
Merged

Spec v1 for Ike #18

merged 1 commit into from
May 14, 2021

Conversation

martindurant
Copy link
Member

For information: templating for the one Ike file reduces the JSON size by 50%.

@martindurant
Copy link
Member Author

This requires latest fsspec, of course.

I have renamed to SingleHdf5ToZarr and made all the methods except translate private, because that's the only one that should get called. I intend to implement multi-HDF too, but I don't yet know how. It may require xarray to handle the combine logic.

Question: if I have an xarray dataset from open_mfdataset, can I tell which ranges of coordinates come from which input file? @rabernat (please ping whoever is most likely to know if you don't).

@martindurant
Copy link
Member Author

cc @rsignell-usgs

@pbranson
Copy link

Question: if I have an xarray dataset from open_mfdataset, can I tell which ranges of coordinates come from which input file?

Not sure if this helps, but the impression from watching various data processing on the dask dashboard is that the file path is embedded within the delayed _open_dataset function signatures that back the dask arrays composed by open_mfdataset.

There is a ds[var].encoding['source'], but that only lists the first file I believe for open_mfdataset, see pydata/xarray#2550

So I don't think there is an explicit mapping from coordinates to files that can be accessed. To save the overhead of concatting many files I have pickled dask backed xarrays, which is fragile but works, so the file paths are definitely in there somewhere!

@martindurant
Copy link
Member Author

Thanks @pbranson - I think I was coming to the same conclusion, that I would have to look at the chunks of the dask arrays. I am not sure about where to put the derived coordinates aggregated from the many files, typically "time".

@martindurant martindurant merged commit 1d4f27e into fsspec:main May 14, 2021
@martindurant martindurant deleted the spec1 branch July 11, 2023 15:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants