Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New MultiIndex dimension behaviours #6990

Closed
ConnorSwales opened this issue Sep 5, 2022 · 2 comments
Closed

New MultiIndex dimension behaviours #6990

ConnorSwales opened this issue Sep 5, 2022 · 2 comments
Labels
plan to close May be closeable, needs more eyeballs

Comments

@ConnorSwales
Copy link

What is your issue?

As of the latest release (2022.6.0), the behaviour of MultiIndex coordinates and their constituents appears to have changed.

Creating a simple multi-indexed DataSet:

ds1 = xr.Dataset({'foo': (('x',), [1, 2, 3])}, {'x': [1, 2, 3], 'y': 'a'})
ds2 = xr.Dataset({'foo': (('x',), [4, 5, 6])}, {'x': [1, 2, 3], 'y': 'b'})

mult = xr.concat([ds1, ds2], dim='y').stack(yx=['y', 'x'])

Printing the DataSet and the coordinate indices on version 2022.3.0:

mult
<xarray.Dataset>
Dimensions:  (yx: 6)
Coordinates:
  * yx       (yx) MultiIndex
  - y        (yx) object 'a' 'a' 'a' 'b' 'b' 'b'
  - x        (yx) int64 1 2 3 1 2 3
Data variables:
    foo      (yx) int64 1 2 3 4 5 6
mult.coords.indexes
yx: MultiIndex([('a', 1),
                ('a', 2),
                ('a', 3),
                ('b', 1),
                ('b', 2),
                ('b', 3)],
               names=['y', 'x'])

Printing the DataSet and the coordinate indices on version 2022.6.0:

mult
<xarray.Dataset>
Dimensions:  (yx: 6)
Coordinates:
  * yx       (yx) object MultiIndex
  * y        (yx) <U1 'a' 'a' 'a' 'b' 'b' 'b'
  * x        (yx) int64 1 2 3 1 2 3
Data variables:
    foo      (yx) int64 1 2 3 4 5 6
mult.coords.indexes
Indexes:
yx: MultiIndex([('a', 1),
                ('a', 2),
                ('a', 3),
                ('b', 1),
                ('b', 2),
                ('b', 3)],
               name='yx')
y: MultiIndex([('a', 1),
               ('a', 2),
               ('a', 3),
               ('b', 1),
               ('b', 2),
               ('b', 3)],
              name='yx')
x: MultiIndex([('a', 1),
               ('a', 2),
               ('a', 3),
               ('b', 1),
               ('b', 2),
               ('b', 3)],
              name='yx')

On the latest version (2022.6.0), the constituent coordinates (x,y) of the multi-indexed coordinate (yx) have asterisks next to them, implying that they are dimensional (despite the them not being named equal to their sole dimension, as stated in the docs' coordinates section).

dimension coordinates are one dimensional coordinates with a name equal to their sole dimension (marked by * when printing a dataset or data array)

Further, looking into the .coords.indexes shows that each constituent coordinate is also indexed by an instance of the same MultiIndex.

I was wondering if this change been made design and whether the documentation just need updating, or I have misunderstood something along the way.

@ConnorSwales ConnorSwales added the needs triage Issue that has not been reviewed by xarray team member label Sep 5, 2022
@benbovy
Copy link
Member

benbovy commented Sep 6, 2022

Hi @ConnorSwales,

Yes this is an intentional change part of the ongoing explicit indexes refactor. There's still some work to do on the documentation. For more details, see:

@dcherian dcherian added plan to close May be closeable, needs more eyeballs and removed needs triage Issue that has not been reviewed by xarray team member labels Sep 9, 2022
@headtr1ck
Copy link
Collaborator

Closing for now.
Feel free to reopen or better to comment in the linked issues above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
plan to close May be closeable, needs more eyeballs
Projects
None yet
Development

No branches or pull requests

4 participants