-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add printable representation for MDIOReader and MDIOWriter #33
Comments
Good idea. Another excellent example from Xarray. We could also consider using Xarray as a backend to reduce duplicate effort. |
Xarray dev here! I just discovered this very cool project via twitter. Would love to help you integrate Xarray into the library. I had a quick tour through the docs, and indeed it seems like Xarray could help reduce some boilerplate while bringing lots of features that would help your package. (It layers great on top of Zarr and Dask.) For an example of a domain-specific package built on top of these packages, check out sgkit. Also tagging @TomNicholas, another Xarray dev with a big interest in energy. Let us know how we can help! |
Hi @rabernat 👋 Great to see you here! Big fan of your work. We should collaborate! We are planning to have a steering committee pretty soon and would love you have you and more Xarray developers on board. Our main intention is to have a energy domain specific library with some features similar Xarray and some extra domain specific features. We have done a lot of heavy lifting incorporating exploration seismology data, and have some more implementations for wind resource data (to be integrated later). |
That sounds like a great vision! Our goal in Xarray is to be a generic container for multi-dimensional labeled arrays with metadata. We'd love it if you could rely on Xarray as a base data container. We have a section in our docs on extending Xarray which explains how a third party package can add custom functionality to Xarray objects. We also have an entry points to implement your own backend for custom file formats. (Note that Xarray already has very strong support for Zarr I/O.) If you feel like there are features missing from Xarray that are holding you back from adopting it internally, we would love to hear about it on our issue tracker. |
@rabernat I am starting to look into the Xarray backend integration. In our exploration seismology data case, we have groups of rich information (arrays, metadata, etc.) related to the other groups. Still, we want to keep them separate for various reasons:
One of the first challenges for using Xarray as a backend is that Xarray can only work with groups with data duplication. I know we can write datasets into different groups of a Zarr / NetCDF; if I remember correctly, each group will have a copy of the coordinates, dimensions, etc., associated with the group data. Is there a workaround to this? Would you be able to suggest a better alternative to our thought process? |
You may want to look into Xarray Datatree - https://xarray-datatree.readthedocs.io/ - a new package created by @TomNicholas. Soon this will become part of Xarray proper (see pydata/xarray#7418). |
@tasansal I would love to help you get going with xarray! It sounds like datatree could fit some of your needs too.
Datatree empowers you to work with many groups at once. However at the moment you might still need to duplicate things across groups. One long-term solution to this might be to implement symbolic nodes in datatree, but I expect that using xarray and datatree would streamline your code a lot even without that feature. Xarray and datatree work well with zarr already, so that should work nicely for you. |
Hey @rabernat and @TomNicholas I am taking a stab at making our backend Xarray. I took a look at extending Xarray and sgkit libraries. My understanding is:
Here are some of the things we want to have on top of regular Xarray functionality:
Given the above assumptions and requirements, what approach would work best for us? I am leaning towards option 1, unfortunately, since it gives the ultimate flexibility. But if these can all be done with option 2, that would be better! I also noticed API documentation using option 2 is a bit hacky using a special sphinx extension. We were planning to move to Mkdocs, which may cause a problem. Any thoughts on what to do here? To give you an idea about our roadmap; we are adding these features to MDIO.
Thanks! |
Hi @tasansal! This all sounds very ambitious and exciting!
Yep pretty much! Also check out this new page on interoperability in xarray I wrote. (It's from a PR but should be released as part of the main docs soon). Going through your list of features one-by-one:
I'm actually not sure what the best way to fully overwrite the repr without subclassing would be. Monkey-patching xarray classes on import seems a bit hacky but might be enough...
Adding additional but hidden information would be a major change to the xarray data model - just storing the information normally but hiding it via the repr would be a lot easier if (a) is solved.
This seems fairly decoupleable from the other ideas, but for inspiration you should look at cf-xarray (which interprets CF conventions) and xarray-dataclasses.
Anything like this would likely be of interest to the wider xarray / Zarr community, so could be implemented as improvements to xarray's Zarr backend, for example.
This is in-progress for xarray, with most of the effort currently focused on making zarr-python support v3.
This is easy using a custom accessor.
I'm not sure I totally understand this one, but (1) if the mask is data-dependent, I feel it should still be explicitly listed instead of hidden, and (2) operations which require the mask could be re-implemented on an accessor. Still, this might be a decent reason to subclass.
Again take a look at xarray-dataclasses.
I think it sounds plausible to solve all of this without subclassing (but I don't fully understand all the requirements, so please don't take that as gospel!). If you do decide you want to subclass then it would be amazing if you could help us out with a few upstream contributions to make the subclassing easier. See pydata/xarray#3980
We also have plans to move our documentation to markdown (using MYST), so we would also be interested in any solution to this. Does that help? |
Currently,
MDIOReader
andMDIOWriter
prints the Python object. It would be useful to have a nice printable representation.from here is a good model to follow.
The text was updated successfully, but these errors were encountered: