Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider indicating inherited coordinates & dimensions in DataTree repr #9463

Closed
shoyer opened this issue Sep 9, 2024 · 8 comments · Fixed by #9470
Closed

Consider indicating inherited coordinates & dimensions in DataTree repr #9463

shoyer opened this issue Sep 9, 2024 · 8 comments · Fixed by #9470
Labels
topic-DataTree Related to the implementation of a DataTree class

Comments

@shoyer
Copy link
Member

shoyer commented Sep 9, 2024

What is your issue?

Showing these in the repr makes it harder to understand the DataTree at a glance. In particular, it makes it impossible to immediately see if coordinates are duplicated or not, e.g., consider these two cases, which have the same repr but would serialize to Zarr differently:

In [6]: DataTree.from_dict({'/': Dataset(coords={'x': [1]}), '/sub': Dataset({'foo': 1})})
Out[6]:
<xarray.DataTree>
Group: /
│   Dimensions:  (x: 1)
│   Coordinates:
│     * x        (x) int64 8B 1
│   Data variables:
│       *empty*
└── Group: /sub
        Dimensions:  (x: 1)
        Coordinates:
          * x        (x) int64 8B 1
        Data variables:
            foo      int64 8B 1

In [8]: DataTree.from_dict({'/': Dataset(coords={'x': [1]}), '/sub': Dataset({'foo': 1}, coords={'x': [1]})})
Out[8]:
<xarray.DataTree>
Group: /
│   Dimensions:  (x: 1)
│   Coordinates:
│     * x        (x) int64 8B 1
│   Data variables:
│       *empty*
└── Group: /sub
        Dimensions:  (x: 1)
        Coordinates:
          * x        (x) int64 8B 1
        Data variables:
            foo      int64 8B 1

The simplest way to indicate inherited coordinates/dimensions would be to not display them at all. But maybe there is a different way we could indicate such dimensions/variables (less prominently).

@shoyer shoyer added needs triage Issue that has not been reviewed by xarray team member topic-DataTree Related to the implementation of a DataTree class and removed needs triage Issue that has not been reviewed by xarray team member labels Sep 9, 2024
@shoyer
Copy link
Member Author

shoyer commented Sep 9, 2024

Consider the following DataTree with inherited coordinates:

tree = DataTree.from_dict({
    '/': Dataset(coords={'x': [1]}),
    '/first_child': None,
    '/second_child': Dataset({'foo': ('x', [0])}),
})

Here is the current repr of the root and child nodes:


In [19]: tree
Out[19]:
<xarray.DataTree>
Group: /
│   Dimensions:  (x: 1)
│   Coordinates:
│     * x        (x) int64 8B 1
│   Data variables:
│       *empty*
├── Group: /first_child
└── Group: /second_child
        Dimensions:  (x: 1)
        Coordinates:
          * x        (x) int64 8B 1
        Data variables:
            foo      (x) int64 8B 0

In [20]: tree['first_child']
Out[20]:
<xarray.DataTree 'first_child'>
Group: /first_child

In [21]: tree['second_child']
Out[21]:
<xarray.DataTree 'second_child'>
Group: /second_child
    Dimensions:  (x: 1)
    Coordinates:
      * x        (x) int64 8B 1
    Data variables:
        foo      (x) int64 8B 0

I think these could be improved by removing inherited coordinate/dimensions, but only if they already already showed on their parent in the same repr. In cases where we are printing a child node only, we should also indicate inherited coordinates/dimensions (because they determine what is valid on the child node). These could look something like:

In [19]: tree
Out[19]:
<xarray.DataTree>
Group: /
│   Dimensions:  (x: 1)
│   Coordinates:
│     * x        (x) int64 8B 1
├── Group: /first_child
└── Group: /second_child
        Dimensions:  (x: 1)
        Data variables:
            foo      (x) int64 8B 0

In [20]: tree['first_child']
Out[20]:
<xarray.DataTree 'first_child'>
Group: /first_child
    Inherited dimensions:  (x: 1)
    Inherited coordinates:
      * x                  (x) int64 8B 1

In [21]: tree['second_child']
Out[21]:
<xarray.DataTree 'second_child'>
Group: /second_child
    Dimensions:  (x: 1)
    Inherited coordinates:
      * x        (x) int64 8B 1
    Data variables:
        foo      (x) int64 8B 0

@TomNicholas
Copy link
Member

TomNicholas commented Sep 9, 2024

I think I generally like this suggestion, but the "Inherited dimensions" gets messy if you have a child node with some dimensions inherited and some not. Since dimensions are not in fact inherited (coordinate variables are inherited, which may have new dimensions), I think we should display "Dimensions: (x: 1)" regardless of whether or not the dimension came from an inherited coordinate or a local coordinate.

@shoyer
Copy link
Member Author

shoyer commented Sep 9, 2024

Since dimensions are not in fact inherited (coordinate variables are inherited, which may have new dimensions)

Dimensions do get inherited. Consider:

In [28]: tree = DataTree(Dataset({'foo': ('x', [1, 2])}))

In [29]: tree = DataTree(Dataset({'foo': ('x', [1, 2])}), children={'child': DataTree()})

In [30]: tree
Out[30]:
<xarray.DataTree>
Group: /
│   Dimensions:  (x: 2)
│   Dimensions without coordinates: x
│   Data variables:
│       foo      (x) int64 16B 1 2
└── Group: /child

In [31]: tree['child'].ds = Dataset({'bar': ('x', [3])})
ValueError: group '/child' is not aligned with its parents:
Group:
    Dimensions:  (x: 1)
    Dimensions without coordinates: x
    Data variables:
        bar      (x) int64 8B 3
From parents:
    Dimensions:  (x: 2)
    Dimensions without coordinates: x

@TomNicholas
Copy link
Member

My understanding from #9457 (comment) was that inheritance of dimensions during parent-child alignment is an internal implementation detail, and your example above is literally the only possible time that dimension will ever be displayed to the user. For all other intents and purposes I would have said that dimensions are not inherited - they are built from inheritable variables.

@shoyer
Copy link
Member Author

shoyer commented Sep 9, 2024

My understanding from #9457 (comment) was that inheritance of dimensions during parent-child alignment is an internal implementation detail, and your example above is literally the only possible time that dimension will ever be displayed to the user. For all other intents and purposes I would have said that dimensions are not inherited - they are built from inheritable variables.

I would consider the invalid Dataset object (to make the error message) an internal implementation detail, but the inherited dimension should be considered part of the data model of the child DataTree.

@TomNicholas
Copy link
Member

TomNicholas commented Sep 9, 2024

but the inherited dimension should be considered part of the data model of the child DataTree.

Maybe for developers, but I'm still not convinced that users should have to care about this distinction. After all, we do not allow users any public way to create a Dataset that has a dimension with no corresponding variable, so from an API perspective I would argue therefore dimensions are not part of the public data model.

To be concrete, if we have a child with both "inherited" and non-inherited dimensions like this

tree = DataTree.from_dict({
    '/': Dataset(coords={'x': [1]}),
    '/child': Dataset({'foo': ('y', [0])}),
})

your suggestion would display the dimensions by separating them into inherited and non-inherited (I've chosen to display them adjacent to the lists of coordinate variables):

In [9]: tree['child']
Out[9]: 
<xarray.DataTree 'child'>
Group: /child
    Inherited Dimensions:  (x: 1)
    Inherited Coordinates:
      * x        (x) int64 8B 1
    Dimensions:  (y: 1)
    Dimensions without coordinates: y
    Data variables:
        foo      (y) int64 8B 0

whereas mine would simply list them all under a common Dimensions heading, with no indication of which came from inherited coordinate variables:

In [9]: tree['child']
Out[9]: 
<xarray.DataTree 'child'>
Group: /child
    Dimensions:  (x: 1, y: 1)
    Inherited Coordinates:
      * x        (x) int64 8B 1
    Dimensions without coordinates: y
    Data variables:
        foo      (y) int64 8B 0

I think one way the latter is nice is that now you can still see all the accessible dimensions on the entire child dataset on one line.

EDIT: If I had 100 inherited coordinate variables, the former layout means I have to scroll down a long way to find out the dimensions of the node's data variables. But if I put everything at the top like this (where the node could have additional non-inherited coordinates in general),

In [9]: tree['child']
Out[9]: 
<xarray.DataTree 'child'>
Group: /child
    Inherited Dimensions:  (x: 1)
    Dimensions:  (y: 1)
    Inherited Coordinates:
      * x        (x) int64 8B 1
      ...
    Coordinates:
      ...
    Dimensions without coordinates: y
    Data variables:
        foo      (y) int64 8B 0
        ...

now Dimensions could potentially be quite far away from the Data variables/Coordinates which depend on those non-inherited Dimensions.

@shoyer
Copy link
Member Author

shoyer commented Sep 9, 2024

I agree, users don't need to userstand the distinction between dimensions & inherited dimensions. From a user perspective, neither can be edited directly.

In that case, do we always display inherited dimensions as part of Dimensions? Or only if they are not already shown on a parent?

@TomNicholas
Copy link
Member

In that case, do we always display inherited dimensions as part of Dimensions? Or only if they are not already shown on a parent?

Good question. I think that if our general approach above is motivated by a principle of "only show dimensions immediately adjacent to any variables that actually depend on those dimensions", then for consistency we should only show inherited dimensions if not already shown on a parent.

shoyer added a commit to shoyer/xarray that referenced this issue Sep 9, 2024
@shoyer shoyer closed this as completed in a6bacfe Sep 10, 2024
hollymandel pushed a commit to hollymandel/xarray that referenced this issue Sep 23, 2024
* Update DataTree repr to indicate inheritance

Fixes pydata#9463

* fix whitespace

* add more repr tests, fix failure

* fix failure on windows

* fix repr for inherited dimensions
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic-DataTree Related to the implementation of a DataTree class
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants