Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add encode_cf, decode_cf #69

Merged
merged 21 commits into from
Jul 17, 2024
Merged

Conversation

dcherian
Copy link
Contributor

@dcherian dcherian commented Jul 1, 2024

xref #26, #48

This is a proof-of-concept really but it works for that demo notebook

encoded = cube.xvec.encode_cf()
encoded.to_zarr("cube.zarr")
roundtripped = xr.open_zarr("cube.zarr").xvec.decode_cf()
roundtripped.identical(cube)  # True

xvec/accessor.py Outdated Show resolved Hide resolved
xvec/accessor.py Outdated Show resolved Hide resolved
xvec/accessor.py Outdated Show resolved Hide resolved
@martinfleis
Copy link
Member

This is cool! I am fine with the initial limit of one geometry coordinate as that would already result in a parity with R's {stars} implementation (I think, it may not be entirely up to date).

Is there any blocker preventing this to move beyond POC?

@dcherian
Copy link
Contributor Author

dcherian commented Jul 1, 2024

Is there any blocker preventing this to move beyond POC?

Test and docstrings. Do you want cf_xarray as a required or optional dependency?

@martinfleis
Copy link
Member

Do you want cf_xarray as a required or optional dependency?

Given cf_xarray itself depends on xarray only, I think it is fine to depend on it directly.

@dcherian
Copy link
Contributor Author

dcherian commented Jul 1, 2024

Boom with xarray-contrib/cf-xarray#526 xvec supports encoding/decoding multiple geometries.

The roundtrip tests fail because of an extra crs attribute added in decode_cf. Can you help me fix that? Should we be deleting that attribute?

It could use quite a bit of testing :)

Comment on lines +1319 to +1321
decoded = decoded.xvec.set_geom_indexes(
dim, crs=crs.get(decoded[dim].attrs.get("grid_mapping", None))
)
Copy link
Contributor Author

@dcherian dcherian Jul 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the key buggy line. it always sets the index, we do not record which geometry dims were indexed at encode-time. What should we do here?

As an aside it'd be nice for set_geom_indexes to understand the grid_mapping convention. WDYT?

One approach: decode_cf does NOT set the new index, but the user does so manually. Instead set_geom_indexes learns how to interpret the grid_mapping convention so CRS is set properly by default.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it always sets the index, we do not record which geometry dims were indexed at encode-time. What should we do here?

Is that an issue if we just index all geom dims encoded in the file?

As an aside it'd be nice for set_geom_indexes to understand the grid_mapping convention. WDYT?

Not against but I don't really know what would it mean implementation-wise. Maybe just a simple call to pyproj.CRS.from_cf?

set_geom_indexes learns how to interpret the grid_mapping convention so CRS is set properly by default.

That would be preferable. Not a fan of asking users to set indexes after reading what already was indexed before writing.

Comment on lines +83 to +84
ds["geom"].attrs["crs"] = ds.xindexes["geom"].crs
ds["geom_z"].attrs["crs"] = ds.xindexes["geom_z"].crs
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason you can't set these in GeometryIndex.create_variables?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apart from "no one thought about that until now", I am not aware of any.

@martinfleis
Copy link
Member

The roundtrip tests fail because of an extra crs attribute added in decode_cf. Can you help me fix that? Should we be deleting that attribute?

Given the crs information is now stored in the index itself, I guess we can just drop it? Not sure about consequences.

@dcherian
Copy link
Contributor Author

dcherian commented Jul 2, 2024

Yes I don't see why you duplicate it.

@martinfleis
Copy link
Member

What needs to happen here apart from merge of that PR in cf-xarray? Tests locally pass and apart from that commented out ValueError, we should probably raise when needed, I don't see anything obviously missing.

@dcherian
Copy link
Contributor Author

Not much. I'd like to copy some tests over to cf-xarray.

I'm on vacation right now so don't get to this for another week and a half

@dcherian dcherian marked this pull request as ready for review July 17, 2024 10:40
Copy link

codecov bot commented Jul 17, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 99.06%. Comparing base (c2260b8) to head (c88ea9c).
Report is 2 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main      #69      +/-   ##
==========================================
- Coverage   99.29%   99.06%   -0.24%     
==========================================
  Files           4        4              
  Lines         427      534     +107     
==========================================
+ Hits          424      529     +105     
- Misses          3        5       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@dcherian
Copy link
Contributor Author

OK Should be good to go for now. We could always keep tweaking but this should be great for experimenting and making it sure works with real-world datasets.

@dcherian dcherian mentioned this pull request Jul 17, 2024
Copy link
Member

@martinfleis martinfleis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @dcherian!

@martinfleis martinfleis merged commit 97118ff into xarray-contrib:main Jul 17, 2024
9 checks passed
@dcherian dcherian deleted the encode_cf branch July 18, 2024 01:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants