-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Updated text for indexing page #2410
Comments
Copying @horta's doc: Xarray definitionA data array Each data array dimension has an unique A data array can have zero or more coordinates, represented by a dict-like A coordinate can have zero or more dimensions associated with. A dimension data array is a unidimensional coordinate data array associated with one, and only one, dimension having the same name as the coordinate data array itself. A dimension data array has always one, and only one, coordinate. That coordinate has again a dimension data array associated with. As an example, let
Now lets consider a practical example: >>> import numpy as np
>>> import xarray as xr
>>>
>>> a = xr.DataArray(np.arange(6).reshape((3, 2)), dims=["x", "y"], coords={"x": list("abc")})
>>> a
<xarray.DataArray (x: 3, y: 2)>
array([[0, 1],
[2, 3],
[4, 5]])
Coordinates:
* x (x) <U1 'a' 'b' 'c'
Dimensions without coordinates: y Data array >>> a.coords["x"]
<xarray.DataArray 'x' (x: 3)>
array(['a', 'b', 'c'], dtype='<U1')
Coordinates:
* x (x) <U1 'a' 'b' 'c'
>>> a.coords["x"].coords["x"]
<xarray.DataArray 'x' (x: 3)>
array(['a', 'b', 'c'], dtype='<U1')
Coordinates:
* x (x) <U1 'a' 'b' 'c' IndexingIndexing is a operation that when applied to a data array will produce a new data array according the rules explained here. The most general form of indexing is the one that involves data arrays
The following code snippet shows an indexing example of a data array >>> a = xr.DataArray(np.arange(6).reshape((3, 2)), dims=["x", "y"], coords={"x": list("abc"), "y": [0, 1]})
>>> i0 = xr.DataArray(['a', 'c'], dims=["x"])
>>> i0
<xarray.DataArray (x: 2)>
array(['a', 'c'], dtype='<U1')
Dimensions without coordinates: x
>>> i1 = xr.DataArray([0, 1], dims=["y"])
>>> i1
<xarray.DataArray (y: 2)>
array([0, 1])
Dimensions without coordinates: y
>>> a.sel(x=i0, y=i1)
<xarray.DataArray (x: 2, y: 2)>
array([[0, 1],
[4, 5]])
Coordinates:
* x (x) <U1 'a' 'c'
* y (y) int64 0 1 As hinted above, xarray allows the use of multidimensional data array indexers for greater flexibility. Lets look at another exampe: >>> a = xr.DataArray([0, 1, 2], dims=["x"], coords={"x": list("abc")})
>>> a
<xarray.DataArray (x: 3)>
array([0, 1, 2])
Coordinates:
* x (x) <U1 'a' 'b' 'c'
>>> i0 = xr.DataArray([['a', 'c']], dims=["x0", "x1"])
>>> i0
<xarray.DataArray (x0: 1, x1: 2)>
array([['a', 'c']], dtype='<U1')
Dimensions without coordinates: x0, x1
>>> a.sel(x=i0)
<xarray.DataArray (x0: 1, x1: 2)>
array([[0, 2]])
Coordinates:
x (x0, x1) object 'a' 'c'
Dimensions without coordinates: x0, x1 The resulting data array have the same dimensions as the indexer, not as the original data array. Notice also that the resulting data array has no dimension data array as opposed to the previous example. Instead, it has a bi-dimensional coordinate data array: >>> a.sel(x=i0).coords["x"]
<xarray.DataArray 'x' (x0: 1, x1: 2)>
array([['a', 'c']], dtype=object)
Coordinates:
x (x0, x1) object 'a' 'c'
Dimensions without coordinates: x0, x1 |
Thanks guys! Just to make sure, this is a work in progress. i realise that I made some wrong assumptions, and there are more to add into it. |
OK, maybe I was premature to copy it here :). But yes, I'm excited about clarifying these definitions in our docs! |
I have updated mainly the Indexing and selection data section. I'm proposing an indexing notation using Xarray definitionA data array >>> import xarray as xr
>>>
>>> a = xr.DataArray([[0, 1], [2, 3], [4, 5]])
>>> a.name = "My name"
>>> a
<xarray.DataArray 'My name' (dim_0: 3, dim_1: 2)>
array([[0, 1],
[2, 3],
[4, 5]])
Dimensions without coordinates: dim_0, dim_1 Each data array dimension has an unique >>> a.dims[0]
'dim_0' A data array can have zero or more coordinates, represented by a dict-like A coordinate can have zero or more dimensions associated with. A dimension data array is a unidimensional coordinate data array associated with one, and only one, dimension having the same name as the coordinate data array itself. A dimension data array has always one, and only one, coordinate. That coordinate has again a dimension data array associated with: >>> import numpy as np
>>>
>>> a = xr.DataArray(np.arange(6).reshape((3, 2)), dims=["x", "y"], coords={"x": list("abc")})
>>> a
<xarray.DataArray (x: 3, y: 2)>
array([[0, 1],
[2, 3],
[4, 5]])
Coordinates:
* x (x) <U1 'a' 'b' 'c'
Dimensions without coordinates: y The above data array >>> a.coords["x"]
<xarray.DataArray 'x' (x: 3)>
array(['a', 'b', 'c'], dtype='<U1')
Coordinates:
* x (x) <U1 'a' 'b' 'c'
>>> a.coords["x"].coords["x"]
<xarray.DataArray 'x' (x: 3)>
array(['a', 'b', 'c'], dtype='<U1')
Coordinates:
* x (x) <U1 'a' 'b' 'c' Coordinate data arrays are meant to provide labels to array positions, allowing for convenient access to array elements: >>> a.loc["b", :]
<xarray.DataArray (y: 2)>
array([2, 3])
Coordinates:
x <U1 'b'
Dimensions without coordinates: y Note that there is no asterisk symbol for coordinate Indexing and selecting dataThere are four different but equally powerful ways of selecting data from a data array. They differ only on the type of dimension and index lookups: position-based lookup and label-based lookup:
A dimension position-based lookup is determined by the used position in the index operator: A dimension label-based lookup is determined by the provided dimension name: [1] An index label is any Numpy data type object. Consider the following data array: >>> a = xr.DataArray(np.arange(6).reshape((3, 2)), dims=["year", "country"], coords={"year": [1990, 1994, 1998], "country": ["UK", "US"]})
>>> a
<xarray.DataArray (year: 3, country: 2)>
array([[0, 1],
[2, 3],
[4, 5]])
Coordinates:
* year (year) int64 1990 1994 1998
* country (country) <U2 'UK' 'US' The expressions >>> a.loc[:, "UK"]
<xarray.DataArray (year: 3)>
array([0, 2, 4])
Coordinates:
* year (year) int64 1990 1994 1998
country <U2 'UK' Formal indexing definitionLet
for which Let
be the tuple representing the indices of For each
Consider >>> i_0
<xarray.DataArray (apples: 2, oranges: 1)>
array([['a'],
['c']], dtype='<U1')
Dimensions without coordinates: apples, oranges
>>> r_0
<xarray.DataArray (dim_0: 3)>
array(['a', 'b', 'c'], dtype='<U1')
Coordinates:
* dim_0 (dim_0) <U1 'a' 'b' 'c' Performing operations 1 and 2 will result in the list The positions b[reshape(t_0[0][1], i_0.shape), ..., reshape(t_A[0][1], i_A.shape)] = a[t_0[1][1], ..., t_A[1][1]] |
Thanks, @horta Your description looks much more exact than that we already have in our doc. I am a little wondering how we could balance the exactness and simplicity of the description. I think it would be nice if we can have both
Any idea about how our docs should looks like? |
I will first try to have both together. I'm well aware that learning by examples (that is true for me at least and apparently to most of people: tldr library), so at first I will try to combine all in one page:
I prefer starting with 2 and 3 for me to actually understand xarray... |
@horta, what's the status on this? @max-sixty and I were discussing a glossary over in #3176. I think this is an important contribution and am happy to help if needed. |
In order to maintain a list of currently relevant issues, we mark issues as stale after a period of inactivity If this issue remains relevant, please comment here or remove the |
We have a version here now: https://xarray.pydata.org/en/stable/user-guide/terminology.html |
Actualy we should update the |
It appears there is an incorrect statement in the 'indexing' section of the terminology page.
But this simple code does not reproduce this result
Because the dimension y does not have any coordinates associated with it. Maybe I am missing something, but it seems a clarification should be added that this is only true for dimensions with coordinates assigned to them. |
We have a bunch of terms to describe the xarray structure, such as dimension, coordinate, dimension coordinate, etc..
Although it has been discussed in #1295 and we tried to use the consistent terminology in our docs, it looks still not easy for users to understand our functionalities.
In #2399, @horta wrote a list of definitions (https://drive.google.com/file/d/1uJ_U6nedkNe916SMViuVKlkGwPX-mGK7/view?usp=sharing).
I think it would be nice to have something like this in our docs.
Any thought?
The text was updated successfully, but these errors were encountered: