-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Document the new __repr__ #1199
Comments
Agreed, we should definitely document this and highlight the change in the release notes. I'm still not super happy with using |
I wonder if instead this should be taken as an indication that we need to make the repr more self explanatory, e.g., by writing something like Current version:
Alternative A: write
Alternative B: write
I think I like Alternative B best. @MaximilianR @crusaderky @benbovy any opinions? |
[edited to remove a less relevant comment, now moved to a separate issue] After having a look at the doc about coordinates (http://xarray.pydata.org/en/latest/data-structures.html#coordinates), here a few thoughts:
I'll go on with a simple example that I just tried out to let you know how I dealt with this new behavior of xarray: In [1]: import xarray as xr
In [2]: a = np.array([[1.1, 2.2, 3.3], [4.4, 5.5, 6.6]])
In [3]: da = xr.DataArray(a, dims=['y', 'x'],
coords={'x':[0.1, 1.1, 2.2], 'xy':(['y', 'x'], a)})
In [4]: da
Out[4]:
<xarray.DataArray (y: 2, x: 3)>
array([[ 1.1, 2.2, 3.3],
[ 4.4, 5.5, 6.6]])
Coordinates:
* x (x) float64 0.1 1.1 2.2
xy (y, x) float64 1.1 2.2 3.3 4.4 5.5 6.6
o y (y) - "OK, I have three types of coordinates here. This makes sense, they are all defined in a different way. y is a dimension of my dataset, so at least I should be able to select along that dimension:". In [6]: da.isel(y=1)
Out[6]:
<xarray.DataArray (x: 3)>
array([ 4.4, 5.5, 6.6])
Coordinates:
* x (x) float64 0.1 1.1 2.2
xy (x) float64 4.4 5.5 6.6 "so far so good! x is a coordinate, so here I should be able to do more complex selection based on values:" In [7]: da.sel(x=2.2)
Out[7]:
<xarray.DataArray (y: 2)>
array([ 3.3, 6.6])
Coordinates:
x float64 2.2
xy (y) float64 3.3 6.6
o y (y) - "That makes sense. However, this is probably not possible with y, which is not defined:" In [8]: da.sel(y=0)
Out[8]:
<xarray.DataArray (x: 3)>
array([ 1.1, 2.2, 3.3])
Coordinates:
* x (x) float64 0.1 1.1 2.2
xy (x) float64 1.1 2.2 3.3 "wait... So I can select by label on an undefined coordinate? Maybe y is a coordinate after all?" In [9]: 'y' in da.coords
Out[9]: False "ah, no." I actually think that Maybe I'm completely missing the point here: don't hesitate to correct me! |
Yes this does not seem like the best alternative.
+1. That was the initial implementation in #1017 I think, which is the most consistent. But @crusaderky raised an issue of readability (see #1017 (comment)). IMO this issue of readability is more a bad habit of using the If this remains an issue, another option than the ones above would be to list dimensions with no index in a separate, dedicated section of the repr, e.g. with the examples above,
To not add too much verbosity, multiple dimensions without coord may be displayed inline:
|
See #1017 (comment) and discussion below for comments on this. This does make |
Sorry, I'm not sure to understand: do you mean you'd rather stay with the current behavior as you said in #1017 (comment) ? |
I like the currently implemented behavior, which is the second bullet in #1017 (comment). We could potentially switch to the behavior of the first bullet (issue |
OK, I'm going to put together a PR to change this behavior to add the This feels a little overly verbose (I still feel like there may be some sort of solution with symbols here), but it's explicit and fully self-explanatory, which is a big advantage of over what we currently have. |
See #1221. |
We may not have gotten this right yet. See StackOverflow: What are “unindexed dimensions” and why are coordinates empty? |
OK, let's go back to the drawing board. Let Current repr (v0.9.0):
Some alternatives:
|
I think "Dimensions without coordinates" is clearer than "Unindexed dimensions", and only marginally more verbose (30 characters instead of 20). Any dimension can be indexed, just the index lookup is by position rather than by coordinate/label. I don't think marking the dimension/coordinate matches makes it any clearer as this matching is by name anyway, and my confusion was due to none of the dimensions having coordinates. I would support simply changing the label. |
I agree. As any dimension can be indexed (at least lookup by position), the name "coordinate" may be indeed more appropriate, but we need also to make the distinction between coordinates which are Any use case for dimensions which don't have an index (i.e., an
|
So the original issue was about highlighting in the repr which dimensions have an "Dimensions without index variable" would be unambiguous in all cases, but it doesn't look nice. Mirroring |
With any kind of marking (such as with *) the problem is that the user might not know what the marking is for, and syntax is hard to google. When I see |
I suppose we do something even more explicit, e.g., "Dimensions without a corresponding coordinate". That feels too long to me, though. I don't like "Dimensions without index variable" because it emphasizes that they don't have an index rather than that they don't have a variable. For now I think "Dimensions without coordinates" is my favorite. |
The problem for the user is that the I'd add another suggestion to the list that @shoyer proposed, which is simply to do nothing with the unindexed dimensions:
This has the advantage to be unambiguous and close to the data model of the file at hand (with dimensions but no coordinates). After that, my preference goes for "Dimensions without coordinates" too. In the SO post, the OP also wondered about the "empty" coordinates. Any plan to change this too? Maybe a |
Perhaps more broadly documentation-wise, it might be good to add a terminology list. For example, that could clarify the difference and relation between dimensions, labels, indices, coordinates, etc.. There are dimensions without coordinates, dimensions that are labelled or unlabelled, there are coordinates that are indices, coordinates that are not indices. I'm still figuring out how all of those relate to each other and how I use them. |
I think I want to remove the appearance of
for It gives a subtle signal that the user is doing something wrong, but that isn't necessarily the case.
Yes, this is what I did originally, and I think it's a very elegant solution. Unfortunately, for large I don't see it working well to make |
Agreed, this is definitely worth doing. I also made a diagram recently to summarize the xarray data model that I'd like to put in the docs: (feedback on the diagram is very welcome if anyone has suggestions!) |
Nice diagram! Do you think it's worth to also add Finally I would be also +1 to do nothing with the unindexed dimensions in the repr, even though complaints for large Although this wouldn't fully solve the problem, maybe an html repr for the notebook would help here? |
Yes, but probably only for a separate "internal/advanced API" diagram. I want this to focus on the user facing and public API.
Indeed, I think this is quite possible -- we could squeeze a lot more information into an HTML repr. For example, with a little bit of JavaScript (or maybe CSS these days?) we could highlight all appearances of a dimension name when you hover over it. Maybe we can find someone with design talent/interest to work on this? |
See #1236 for a proposed fix. After merging it, I will release v0.9.1 and issue the delayed release announcement for xarray v0.9. |
Sorry I missed that one when it was decided upon in #1017, but I think the changes in
repr
should be documented somewhere (at the minimum in the "Breaking Changes" section of what's new).I just updated Salem for it to work well with xarray 0.9.0. The changes I had to make where quite small (that's a good thing), but it took me a bit of time to understand what was going on.
What I found confusing is following:
dim_0
is listed as coordinate, but'dim_0' in ds.coords
isFalse
. I think it should remain like this, but maybe we should document somewhere what the "o" and "*" mean?(possibly here)
The text was updated successfully, but these errors were encountered: