Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Coordinate dtype changing to object after xr.concat #4543

Closed
JS-Parent opened this issue Oct 27, 2020 · 1 comment · Fixed by #4759
Closed

Coordinate dtype changing to object after xr.concat #4543

JS-Parent opened this issue Oct 27, 2020 · 1 comment · Fixed by #4759

Comments

@JS-Parent
Copy link

JS-Parent commented Oct 27, 2020

What happened: The dtype of DataArray coordinates change after concatenation using xr.concat

What you expected to happen: dtype of DataArray coordinates to stay the same.

Minimal Complete Verifiable Example:

In the below I create two examples. The first one shows the issue happening on the coords associated to the concatenated dimension. In the second I use different dtypes and the problem appears on both dimensions.

Example 1:

import numpy as np
import xarray as xr

da1 = xr.DataArray(data=np.arange(4).reshape([2, 2]),
                   dims=["x1", "x2"],
                   coords={"x1": np.array([0, 1]),
                           "x2": np.array(['a', 'b'])})
da2 = xr.DataArray(data=np.arange(4).reshape([2, 2]),
                   dims=["x1", "x2"],
                   coords={"x1": np.array([1, 2]),
                           "x2": np.array(['c', 'd'])})
da_joined = xr.concat([da1, da2], dim="x2")

print("coord x1 dtype:")
print("in da1:", da1.coords["x1"].data.dtype)
print("in da2:", da2.coords["x1"].data.dtype)
print("after concat:", da_joined.coords["x1"].data.dtype)
# this in line with expectations:
# coord x1 dtype:
# in da1: int64
# in da2: int64
# after concat: int64

print("coord x2 dtype")
print("in da1:", da1.coords["x2"].data.dtype)
print("in da2:", da2.coords["x2"].data.dtype)
print("after concat:", da_joined.coords["x2"].data.dtype)
# coord x2 dtype
# in da1: <U1
# in da2: <U1
# after concat: object           # This is the problem: it should still be <U1

Example 2:

da1 = xr.DataArray(data=np.arange(4).reshape([2, 2]),
                   dims=["x1", "x2"],
                   coords={"x1": np.array([b'\x00', b'\x01']),
                           "x2": np.array(['a', 'b'])})

da2 = xr.DataArray(data=np.arange(4).reshape([2, 2]),
                   dims=["x1", "x2"],
                   coords={"x1": np.array([b'\x01', b'\x02']),
                           "x2": np.array(['c', 'd'])})

da_joined = xr.concat([da1, da2], dim="x2")

# coord x1 dtype:
# in da1: |S1
# in da2: |S1
# after concat: object              # This is the problem: it should still be |S1
# coord x2 dtype
# in da1: <U1
# in da2: <U1
# after concat: object              # This is the problem: it should still be <U1

Anything else we need to know:

This seems related to #1266

Environment: Ubuntu 18.04, python 3.7.9, xarray 0.16.1

Output of xr.show_versions()

xr.show_versions()
INSTALLED VERSIONS

commit: None
python: 3.7.9 (default, Aug 31 2020, 12:42:55)
[GCC 7.3.0]
python-bits: 64
OS: Linux
OS-release: 5.4.0-51-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
libhdf5: None
libnetcdf: None
xarray: 0.16.1
pandas: 0.25.3
numpy: 1.19.1
scipy: 1.5.3
netCDF4: None
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: None
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: None
distributed: None
matplotlib: None
cartopy: None
seaborn: None
numbagg: None
pint: None
setuptools: 50.3.0
pip: 20.2.4
conda: None
pytest: None
IPython: 7.18.1
sphinx: None

@mathause
Copy link
Collaborator

I think the problem is in align and that pd.Index(["a"]) has dtype=object:

import pandas as pd
pd.Index(["a", "b"])

concat calls align here

datasets = align(

and align basically does the following:

index = da1.indexes["x2"] | da2.indexes["x2"]
da1.reindex({"x2": index})

Thus we replace the coords with an index.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants