Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Setting datarrays with non-dimension coordinates errors #8028

Open
Illviljan opened this issue Jul 28, 2023 · 6 comments
Open

Setting datarrays with non-dimension coordinates errors #8028

Illviljan opened this issue Jul 28, 2023 · 6 comments

Comments

@Illviljan
Copy link
Contributor

Illviljan commented Jul 28, 2023

What happened?

I'm not sure if this is a bug or a feature but I was expecting this example to work since the new coord is just a slight rewrite of the original dimension coordinate:

import xarray as xr

ds = xr.tutorial.open_dataset("air_temperature")

# Change the first time value:
ds["air_new"] = ds.air.copy()
air_new_changed = ds.air_new[{"time": 0}] * 3
ds.air_new.loc[air_new_changed.coords] = air_new_changed  # Works! :)

# Add a another coord along time axis and change 
# the first time value:
ds["air_new"] = ds.air.copy().assign_coords(
    {"time_float": ds.time.astype(float)}
)
air_new_changed = ds.air_new[{"time": 0}] * 4
ds.air_new.loc[air_new_changed.coords] = air_new_changed  # Error! :(

Traceback (most recent call last):

  Cell In[25], line 5
    ds.air_new.loc[air_new_changed.coords] = air_new_changed

  File ~\AppData\Local\mambaforge\envs\jw\lib\site-packages\xarray\core\dataarray.py:222 in __setitem__
    dim_indexers = map_index_queries(self.data_array, key).dim_indexers

  File ~\AppData\Local\mambaforge\envs\jw\lib\site-packages\xarray\core\indexing.py:182 in map_index_queries
    grouped_indexers = group_indexers_by_index(obj, indexers, options)

  File ~\AppData\Local\mambaforge\envs\jw\lib\site-packages\xarray\core\indexing.py:144 in group_indexers_by_index
    raise KeyError(f"no index found for coordinate {key!r}")

KeyError: "no index found for coordinate 'time_float'"
@Illviljan Illviljan added bug needs triage Issue that has not been reviewed by xarray team member labels Jul 28, 2023
@benbovy
Copy link
Member

benbovy commented Aug 2, 2023

I'd say it is rather a feature. Despite being very similar to the original one, the new coord is not a dimension coordinate, which is thus not indexed by default so loc or other label-based indexing won't work with it.

@benbovy
Copy link
Member

benbovy commented Aug 2, 2023

That said, we could probably make the error message a bit more user-friendly and verbose than just saying that no index is found, e.g.,

"No index found for coordinate 'x', which therefore cannot be used with .loc or .sel (label-based indexing). Set an index first for that coordinate using either set_index or set_xindex. Note that you can still use .iloc and .isel (integer-based indexing) when no index is set."

@benbovy benbovy added topic-indexing topic-error reporting and removed needs triage Issue that has not been reviewed by xarray team member labels Aug 2, 2023
@Illviljan
Copy link
Contributor Author

Would you mind showing the intended way for my example? Because I don't find this intuitive at all.

ds["air_new"] = ds.air.copy().assign_coords(
    {"time_float": ds.time.astype(float)}
)
# ds["air_new"] = ds["air_new"].set_xindex("time_float") # Doesn't work
a = ds["air_new"].set_xindex("time_float")
air_new_changed = a[{"time": 0}] * 4
a.loc[air_new_changed.coords] = air_new_changed

ValueError: Xarray does not support label-based selection with more than one index over the following dimension(s):
'time': 2 indexes involved
Suggestion: use a multi-index for each of those dimension(s).

Other tests:

ds.air_new.iloc[air_new_changed.coords] = air_new_changed  # Error! :(
AttributeError: 'DataArray' object has no attribute 'iloc'

ds.air_new.sel[air_new_changed.coords] = air_new_changed  # Error! :(
TypeError: 'method' object does not support item assignment

@benbovy
Copy link
Member

benbovy commented Aug 3, 2023

Haha yes agreed that is confusing. So improving the error messages will require more coordination across the codebase and thinking more about the possible cases. That is not an easy task.

Regarding your example, just setting a single index for "time_float" does not work: currently you cannot call loc or sel and pass labels at once for two or more independently indexed coordinates that share the same dimension (the error message mostly makes sense except the "dimension" term that is not accurate here). This because it is easier to raise now when such case occurs and maybe figure out later how we can merge results from different index queries along the same dimension.

(My bad, Xarray has no iloc indeed... I'm mostly using isel/sel).

I'm not sure what exactly you're trying to achieve in your example, actually.

  • item assignment on label selection using both the "time" and "time_float" coordinates? Then set a multi-index from both of them and call loc. Or maybe chain two calls to loc each with one of these coordinates but I'm not sure at all that it will work.

  • item assignment on label selection using only the "time" coordinate? Then be sure to not pass any mapping to loc that contains a key referring to an unindexed coordinate (i.e., do not include "time_float" there).

@benbovy benbovy removed the bug label Aug 4, 2023
@Illviljan
Copy link
Contributor Author

I simply want to modify a few values in an array. :) A similar example in numpy-land:

a = np.array([1, 1, 1, 1])
b = a[0] * 3
a[0] = b

Now in xarray land this ds["air_new"] happens to have a few helper coordinates. This is what I came up with and it took longer than I'd like to admit:

# Remove unneccessary coordinates without indexes:
ds["air_new"] = ds.air.copy().assign_coords({"time_float": ds.time.astype(float)})
air_new_changed = ds.air_new[{"time": 0}] * 3
ds.air_new.loc[
    {
        k: air_new_changed.coords[k].values
        for k in air_new_changed.coords.keys() & air_new_changed.indexes.keys()
    }
] = air_new_changed

Is this the most elegant way? How would you have done it?

Using masks doesn't have this issue for some reason:

# Use a mask instead...
ds["air_new"] = ds.air.copy().assign_coords({"time_float": ds.time.astype(float)})
mask = xr.ones_like(ds["air_new"], dtype=bool)
mask.loc[{"time": ds.time.isel(time=0)}] = False
ds["air_new"] = ds["air_new"].where(mask, 3 * ds["air_new"])

Why would I want to pass around helper coordinates?

  • Normalizing coordinates, for example between 0 and 1.
  • Converting absolute time to relative time: ds["relative_time"] = ds.time - ds.time[0]
  • Datetime can be a nightmare to code with so easier to just convert to floats: ds["time_float"] = ds.time.astype(float)

@benbovy
Copy link
Member

benbovy commented Aug 10, 2023

Yes helper coordinates are useful. However I don't really understand why do you want to pass labels to .loc for all of those coordinates (or all the indexed ones) at once?

Since your helper coordinates are all derived from the same original coordinate, couldn’t you just select data using the one that is the most convenient to you ? E.g.,

ds.air_new.loc[{"time": air_new_changed.time.values}] = air_new_changed

This should yield the same results (and it is much more elegant) than

ds.air_new.loc[
    {
        k: air_new_changed.coords[k].values
        for k in air_new_changed.xindexes.keys()
    }
] = air_new_changed

Both the "time" and "time_float" coordinates share the same dimension and should be subset / propagated the same way.

I doubt that you want to update values on a selection using arbitrary (possibly non-overlapping) labels for each of those coordinates, do you? E.g.,

ds.air_new.loc[{"time"="2013-01-01", "time_float"=1.42e18}] =

We don’t support that (yet) because it is more complicated. How should we join the integer array indices selected on the "time" dimension from the given "time" and "time_float" labels? By union, intersection or difference? Or do we want an exact match and raise otherwise? It might get even more complicated when multiple indexes, coordinates and/or dimensions are involved. In theory, we could support it but it would represent quite a lot of work and the implementation would quickly get much more complicated.

I can imagine how just passing all coordinates like

ds.air_new.loc[air_new_changed.coords] = air_new_changed 

would be convenient as (with a carefully chosen default behavior) it removes some manual steps or thinking, similarly to Xarray auto-alignment. However, after we start considering more use cases we will quickly see that there is no easy generic solution, like for alignment (#7045). Note that we also don't support alignment when multiple (independent) indexes are found along the same dimension (example here) for the same reasons.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants