-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DataTree should support Hashable names. #8836
Comments
The other option worth considering is to deprecate non-string names on Dataset and DataArray. I'm sure this has come up for discussion before... |
👍 to removing non-string names - I think its been more trouble than its' worth... |
What's the case for removing non-string names? My memory was we had had issues defining what exactly could be a key, but that these were mostly fixed, would be a lot of work to undo, and that many of the issues were around consistency rather than per-se non-string names... |
Speaking strictly for DataTree, the basic model of This correspondence assumes that the names of children (groups) must be something that can be concatenated into path-like strings, using Okay so why don't we allow names of variables to be class DataTree:
def __getitem__(self, key: str) -> DataArray | DataTree:
... we would have class DataTree:
@overload
def __getitem__(self, key: Hashable) -> DataArray:
...
@overload
def __getitem__(self, key: str) -> DataTree:
...
def __getitem__(self, key: str | Hashable) -> DataArray | DataTree:
... Finally these non-str names can rarely be serialized. I think the "paths as concatenated names" is the actual problem, the rest are just things to work around. |
I was predominantly asking about the case for forcing |
Thinking about this a bit more, in principle I don't see any reason why we can't switch from str -> Hashable in DataTree. It just means that the internal DataTree APIs relied on for operations will need to switch from using a string path to a tuple of path segments. |
(I wrote this out before @shoyer's comment so I'm going to paste it anyway)
I mean to me all of these issues seem like a lot of extra complexity in our code for like 1% of users... I also still don't really understand what analyses you can do with names of variables / dims as Also if these types can't be serialized to netCDF / Zarr then that's an argument against allowing it to exist in-memory IMO. I had forgotten about this proposal to use |
I think this is the main argument. Making Hashable work properly adds a lot of complexity for very niche use-cases. |
Having said all that, going back to what to do in
Maybe? Internally we could rewrite dt['/path/to/<str-variable-or-child-name>'] can be done without forcing the user to pass it all as a tuple: dt[('/', 'path', 'to', <weird-non-str-variable-or-child-name>)] There is a very interesting suggestion buried in a comment from 2018 #2292 (comment):
This covers both of dt['/path/to/Enum('Red')'] or whatever. Possibly with some added restriction around not including the |
Historically, it doesn't seem like the discussion in #2292 was ever properly resolved. Adding in |
I (very respectfully :)) think there's a significant risk that you guys are annoyed by the finickiness of typing, and assigning all that blame to Taking each of these in turn:
I would strongly think we shouldn't change Is that reasonable? Trying not to be defensive etc, tell me if I'm not sounding well-balanced here. |
Thanks for the gentle pushback @max-sixty ! (Do you want to make a reappearance in tomorrow's meeting? Would be great to see you there :) )
Your responses are very reasonable, but I think there are still valid concerns in #2292 (comment) that haven't been addressed.
Even if it was to change Generally I'm only about 10% anti- If we can make |
Lol, quite possibly true!
I think it would be fine not to support this syntax for non-string names. Syntax like We might need a few more convenience APIs for the internal DataTree implementation (because |
What is your issue?
In porting xarray-contrib/datatree into pydata/xarray. We discovered some type mismatches.
The general feeling was that we should support Hashable in order to improve
DataTree
interactions withDataset
andDataArray
s.The quick solution of changing the name type to Hashable in NamedNode fails quickly because of it's PathPurePath inheritance.
This issue just tracks that we want to come back to this.
The text was updated successfully, but these errors were encountered: