You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Having data available in NetCDF4 format is incredibly useful for both cross-compatibility with much of climate/weather analysis but also for the tools built around it. However, a lot of this analysis is more easily (and more commonly) done in xarray (docs, which builds on netCDF4 + numpy + pandas) these days rather than using the netCDF4 package itself + numpy.
I could see a benefit to the user for the default output of api.get_data(...use_opendap=True) to return an xarray.Dataset rather than a netCDF4 object.
Without changing anything else, the easiest thing to do would be to use xarray's built-in converter:
But the real benefit would be something that transfers some of the metadata from the stations info into the xarray object (and therefore, to any netcdf file saved from it). Something like this:
ds = stdmet_df.reset_index().set_index(['station_id','timestamp']).to_xarray()
metas = []
for sid in ds.station_id.values:
# Get station metadata
smeta = api.station(sid)
# Create empty Dataset with dimension of station_id
ds_meta = xr.Dataset(coords = {'station_id':[sid]})
for attr in smeta:
if attr == 'Location':
for dim,dirc in zip(['lat','lon'],['NS','EW']):
# Find lat / lon in string
dim_set = re.search(r'[0-9]*\.[0-9]*\ ['+dirc+']',smeta['Location']).group(0)
# Turn into float (multiplied by +1 / -1 depending on whether it's N/E or S/W)
dim_set = float(re.search(r'[0-9]*\.[0-9]*',dim_set).group(0))*(1 if dirc[0] in dim_set else -1)
# Add to dataset
ds_meta[dim] = (['station_id'],[dim_set])
elif re.search(r'^[0-9]+\.{0,1}[0-9]*',smeta[attr]) is not None:
# Assume everythign of the form '##.## lorem ipsum' is split
# into a value and units
# Get units as the bit after the numebr
units = re.split(r'^[0-9]+\.{0,1}[0-9]*\ ',smeta[attr])[-1]
# Get the value
value = float(re.search(r'^[0-9]+\.{0,1}[0-9]*',smeta[attr]).group(0))
ds_meta[re.sub(r'\ ','_',attr).lower()] = (['station_id'],[value])
ds_meta[re.sub(r'\ ','_',attr).lower()].attrs['units'] = units
else:
# If doesn't fit either paradigm, just copy in as string
ds_meta[attr] = (['station_id'],[smeta[attr]])
metas.append(ds_meta)
# Concatenate across station ids
metas = xr.concat(metas,dim='station_id')
# Merge with original dataset, make coordinates
ds = xr.merge([ds,metas]).set_coords([k for k in metas])
print(ds)
Then it can either be saved as a netcdf using ds.to_netcdf(fn) or further analyzed using xarray's tools.
This is totally a suggestion and I clearly got a little sidetracked with it, so feel free to take it or leave it (but I think this will make a big difference in terms of usability in workflows using array data).
Thank you for the suggestion @ks905383 this looks excellent. At a minimum, the switch over to xarray as a wrapper for the netCDF4 datasets is worth implementing. I initially attempted this directly through xarray, but was unable to load the data from URL without the extra steps of reading it to a local temp file. In terms of including the station metadata, the snippet you posted speaks for itself. I'll work on implementing this, possibly with an include_metadata flag or similar. I'll also try to make sure the behavior maps well to xarray.concat.
The latter might take some time to implement, but the first will be included in the next release.
Having data available in
NetCDF4
format is incredibly useful for both cross-compatibility with much of climate/weather analysis but also for the tools built around it. However, a lot of this analysis is more easily (and more commonly) done inxarray
(docs, which builds onnetCDF4
+numpy
+pandas
) these days rather than using thenetCDF4
package itself +numpy
.I could see a benefit to the user for the default output of
api.get_data(...use_opendap=True)
to return anxarray.Dataset
rather than anetCDF4
object.Without changing anything else, the easiest thing to do would be to use
xarray
's built-in converter:But the real benefit would be something that transfers some of the metadata from the stations info into the xarray object (and therefore, to any netcdf file saved from it). Something like this:
Then it can either be saved as a netcdf using
ds.to_netcdf(fn)
or further analyzed usingxarray
's tools.This is totally a suggestion and I clearly got a little sidetracked with it, so feel free to take it or leave it (but I think this will make a big difference in terms of usability in workflows using array data).
This is related to JOSS review openjournals/joss-reviews#7406.
The text was updated successfully, but these errors were encountered: