Dataset.to_netcdf with mixed int and str in data #2620

ChengweiWang0 · 2018-12-19T00:03:46Z

Hi,

The code below gives me a TypeError when use engine = ‘netcdf4’ or ‘scipy’ in to_netcdf() function, but works when use engine='h5netcdf' by automatically converting int to str.

Any good idea to solve this issue? To save the data as their original dtype (not covert int to str)?

Code:
df = pd.DataFrame({'a' : ['x', 'y'], 'b': [1,2]})
da = xr.DataArray(df)
ds = xr.Dataset({'test' : da})

ds.to_netcdf('test.nc', engine='netcdf4') #TypeError: expected bytes, int found
ds.to_netcdf('test.nc', engine='scipy') #TypeError: expected bytes, int found
ds.to_netcdf('test.nc', engine='h5netcdf') #works, but automatically covert int to str which is not good...

shoyer · 2018-12-19T05:21:02Z

Thanks for the report! This is definitely a bug, we should be raising an informative error message here.

It looks like these lines in _infer_dtype should be updated to look at the full array rather than only the first element:

xarray/xarray/conventions.py

Lines 119 to 139 in 778ffc4

    
           def _infer_dtype(array, name=None): 
        
               """Given an object array with no missing values, infer its dtype from its 
        
               first element 
        
               """ 
        
               if array.dtype.kind != 'O': 
        
                   raise TypeError('infer_type must be called on a dtype=object array') 
        
               if array.size == 0: 
        
                   return np.dtype(float) 
        
               element = array[(0,) * array.ndim] 
        
               if isinstance(element, (bytes_type, unicode_type)): 
        
                   return strings.create_vlen_dtype(type(element)) 
        
               dtype = np.array(element).dtype 
        
               if dtype.kind != 'O': 
        
                   return dtype 
        
               raise ValueError('unable to infer dtype on variable {!r}; xarray ' 
        
                                'cannot serialize arbitrary Python objects' 
        
                                .format(name))

ChengweiWang0 · 2018-12-19T11:03:39Z

Thank you very much shoyer, hope we can fix this bug in the next version.

stale · 2020-11-21T14:06:35Z

In order to maintain a list of currently relevant issues, we mark issues as stale after a period of inactivity

If this issue remains relevant, please comment here or remove the stale label; otherwise it will be marked as closed automatically

shoyer added the bug label Dec 19, 2018

stale bot added the stale label Nov 21, 2020

andersy005 mentioned this issue Dec 16, 2020

Raise an informative error message when object array has mixed types #4700

Merged

5 tasks

andersy005 removed the stale label Dec 16, 2020

andersy005 closed this as completed in #4700 Nov 28, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dataset.to_netcdf with mixed int and str in data #2620

Dataset.to_netcdf with mixed int and str in data #2620

ChengweiWang0 commented Dec 19, 2018 •

edited

Loading

shoyer commented Dec 19, 2018

ChengweiWang0 commented Dec 19, 2018

stale bot commented Nov 21, 2020

Dataset.to_netcdf with mixed int and str in data #2620

Dataset.to_netcdf with mixed int and str in data #2620

Comments

ChengweiWang0 commented Dec 19, 2018 • edited Loading

shoyer commented Dec 19, 2018

ChengweiWang0 commented Dec 19, 2018

stale bot commented Nov 21, 2020

ChengweiWang0 commented Dec 19, 2018 •

edited

Loading