Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dataset.to_netcdf with mixed int and str in data #2620

Closed
ChengweiWang0 opened this issue Dec 19, 2018 · 3 comments · Fixed by #4700
Closed

Dataset.to_netcdf with mixed int and str in data #2620

ChengweiWang0 opened this issue Dec 19, 2018 · 3 comments · Fixed by #4700
Labels

Comments

@ChengweiWang0
Copy link

ChengweiWang0 commented Dec 19, 2018

Hi,

The code below gives me a TypeError when use engine = ‘netcdf4’ or ‘scipy’ in to_netcdf() function, but works when use engine='h5netcdf' by automatically converting int to str.

Any good idea to solve this issue? To save the data as their original dtype (not covert int to str)?

Code:
df = pd.DataFrame({'a' : ['x', 'y'], 'b': [1,2]})
da = xr.DataArray(df)
ds = xr.Dataset({'test' : da})

ds.to_netcdf('test.nc', engine='netcdf4') #TypeError: expected bytes, int found
ds.to_netcdf('test.nc', engine='scipy') #TypeError: expected bytes, int found
ds.to_netcdf('test.nc', engine='h5netcdf') #works, but automatically covert int to str which is not good...

@shoyer shoyer added the bug label Dec 19, 2018
@shoyer
Copy link
Member

shoyer commented Dec 19, 2018

Thanks for the report! This is definitely a bug, we should be raising an informative error message here.

It looks like these lines in _infer_dtype should be updated to look at the full array rather than only the first element:

def _infer_dtype(array, name=None):
"""Given an object array with no missing values, infer its dtype from its
first element
"""
if array.dtype.kind != 'O':
raise TypeError('infer_type must be called on a dtype=object array')
if array.size == 0:
return np.dtype(float)
element = array[(0,) * array.ndim]
if isinstance(element, (bytes_type, unicode_type)):
return strings.create_vlen_dtype(type(element))
dtype = np.array(element).dtype
if dtype.kind != 'O':
return dtype
raise ValueError('unable to infer dtype on variable {!r}; xarray '
'cannot serialize arbitrary Python objects'
.format(name))

@ChengweiWang0
Copy link
Author

Thank you very much shoyer, hope we can fix this bug in the next version.

@stale
Copy link

stale bot commented Nov 21, 2020

In order to maintain a list of currently relevant issues, we mark issues as stale after a period of inactivity

If this issue remains relevant, please comment here or remove the stale label; otherwise it will be marked as closed automatically

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants