-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
xr.combine_nested() fails when passed nested DataSets #3315
Comments
This honestly makes no sense to me.
These are dataarrays with two different names. Why is this the expected result?
That error arises because it's trying to concatenate data_vars
ping @TomNicholas |
Sorry when you say expected result are you referring to a particular unit
test?
…On Wed, 18 Sep 2019, 18:07 Deepak Cherian, ***@***.***> wrote:
This honestly makes no sense to me.
da1 = xr.DataArray(name="a", data=[[0]], dims=["x", "y"])
da2 = xr.DataArray(name="b", data=[[1]], dims=["x", "y"])
da3 = xr.DataArray(name="a", data=[[2]], dims=["x", "y"])
da4 = xr.DataArray(name="b", data=[[3]], dims=["x", "y"])
xr.combine_nested([[da1, da2], [da3, da4]], concat_dim=["x", "y"])
These are dataarrays with two different names. Why is this the expected
result?
<xarray.DataArray 'a' (x: 2, y: 2)>
array([[0, 1],
[2, 3]])
Dimensions without coordinates: x, y
That error arises because it's trying to concatenate data_vars a and b
but there are datasets that don't have a. If you set those DataArrays to
have the same name, this will work.
da1 = xr.DataArray(name="a", data=[[0]], dims=["x", "y"])
da2 = xr.DataArray(name="a", data=[[1]], dims=["x", "y"])
da3 = xr.DataArray(name="a", data=[[2]], dims=["x", "y"])
da4 = xr.DataArray(name="a", data=[[3]], dims=["x", "y"])
ds1 = da1.to_dataset()
ds2 = da2.to_dataset()
ds3 = da3.to_dataset()
ds4 = da4.to_dataset()
xr.combine_nested([[ds1, ds2], [ds3, ds4]], concat_dim=["x", "y"])
<xarray.Dataset>
Dimensions: (x: 2, y: 2)
Dimensions without coordinates: x, y
Data variables:
a (x, y) int64 0 1 2 3
ping @TomNicholas <https://github.com/TomNicholas>
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#3315?email_source=notifications&email_token=AISNPI36JGXYGON2QZCX4RLQKJN6DA5CNFSM4IXWOSZ2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD7AYT7Y#issuecomment-532777471>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AISNPIZPWVSS7SU2A3AEUMLQKJN6DANCNFSM4IXWOSZQ>
.
|
Yes/ xarray/xarray/tests/test_combine.py Lines 467 to 478 in fddced0
|
Hmm I can look at this properly at the weekend but in the meantime the
logic was motivated by discussion in #2777. If the test doesn't make sense
in that context then it's not right.
…On Wed, 18 Sep 2019, 18:16 Deepak Cherian, ***@***.***> wrote:
Yes/
https://github.com/pydata/xarray/blob/fddced063b7ecbea6254dc1008bb4db15a5d9304/xarray/tests/test_combine.py#L467-L478
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#3315?email_source=notifications&email_token=AISNPI6OZK7YES6JUSDSWCLQKJO7VA5CNFSM4IXWOSZ2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD7AZOBI#issuecomment-532780805>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AISNPI3UMSS4MAOZKQLNRKTQKJO7VANCNFSM4IXWOSZQ>
.
|
Okay something has definitely gone wrong here. My intention with that test was to check that the order of operations doesn't matter, but you're right that the test as written makes no sense. It would probably be a good idea to remove this test and check that property correctly by adding a second assert to the (poorly-named) # Prove it works symmetrically
datasets = [[ds(0), ds(3)],
[ds(1), ds(4)],
[ds(2), ds(5)]]
result = combine_nested(datasets, concat_dim=["dim2", "dim1"])
assert_equal(result, expected) (This passes fine) However, that still leaves the question of why is this nonsensical test passing? I think it's because da1 = DataArray(name="a", data=[[0]], dims=["x", "y"])
da2 = DataArray(name="b", data=[[1]], dims=["x", "y"])
result = concat([da1, da2], dim="x") However it doesn't fail, instead it gives this!:
Where has |
|
Really? Okay, so that means that currently we don't treat a named DataArray and a single-variable Dataset as if they are the same. For example I would have expected these two operations to give the same result: objs = [DataArray([0], dims='x', name='a'),
DataArray([0], dims='x', name='b')]
concat(objs, dim='x')
objs = [Dataset({'a': ('x', [0])}),
Dataset({'b': ('x', [0])})]
concat(objs, dim='x')
Is this what we want to do? Surely the first one should also fail, else this is counter-intuitive. I think of a named DataArray and a single-variable Dataset as being the same thing, just a single physical variable? @shoyer am I misunderstanding xarray's data model here? |
Few observations after looking at the default flags for xr.concat(
objs,
dim,
data_vars='all',
coords='different',
compat='equals',
positions=None,
fill_value=<NA>,
join='outer',
) The description of Another option is objs = [xr.DataArray([0],
dims='x',
name='a'),
xr.DataArray([1],
dims='x',
name='b')]
xr.concat(objs, dim='x', compat='identical') ValueError: array names not identical ... and is the case for
ValueError: 'a' is not present in all datasets. However, objs = [xr.DataArray([0],
dims='x',
name='a',
attrs={'foo':1}),
xr.DataArray([1],
dims='x',
name='a',
attrs={'bar':2})]
xr.concat(objs, dim='x', compat='identical') succeeds with <xarray.DataArray 'a' (x: 2)>
array([0, 1])
Dimensions without coordinates: x
Attributes:
foo: 1 but again fails on Datasets, as one would expect from the description. ds1 = xr.Dataset({'a': ('x', [0])})
ds1.attrs['foo'] = 'example attribute'
ds2 = xr.Dataset({'a': ('x', [1])})
ds2.attrs['bar'] = 'example attribute'
objs = [ds1,ds2]
xr.concat(objs, dim='x',compat='identical') ValueError: Dataset global attributes not equal. Also had a look at Potential resolutions:
Final thought: perhaps promoting to Dataset when all requirements are met for a DataArray to be considered as such, might simplify keeping operations and checks consistent? |
xr.__version__ '0.13.0'
xr.combine_nested() works when passed a nested list of DataArray objects.
returns
but fails if passed a nested list of DataSet objects.
returns
The text was updated successfully, but these errors were encountered: