You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I use engine="h5netcdf" because I can use {"compression": True} in encoding. I do this because some compression is needed for BitRound to be effective on disk storage. (I have only very little knowledge of compression and netcdf4 basics.) But probably there is a better way. What I do not want is that the compression impacts the read performance too much, i.e. reading the compressed data should not take much more (subjective I know) time that compared to before (or is this the tradeoff anyways).
# https://github.com/zarr-developers/numcodecs/pull/299/filesimportnumpyasnpfromnumcodecs.abcimportCodecfromnumcodecs.compatimportensure_ndarray, ndarray_copyclassBitRound(Codec):
codec_id='bitround'def__init__(self, keepbits: int):
if (keepbits<0) or (keepbits>23):
raiseValueError("keepbits must be between 0 and 23")
self.keepbits=keepbitsdefencode(self, buf):
ifself.keepbits==23:
returnbuf# TODO: figure out if we need to make a copy# Currently this appears to be overwriting the input buffer# Is that the right behavior?a=ensure_ndarray(buf).view()
asserta.dtype==np.float32b=a.view(dtype=np.int32)
maskbits=23-self.keepbitsmask= (0xFFFFFFFF>>maskbits) <<maskbitshalf_quantum1= (1<< (maskbits-1)) -1b+= ((b>>maskbits) &1) +half_quantum1b&=maskreturnbdefdecode(self, buf, out=None):
data=ensure_ndarray(buf).view(np.float32)
out=ndarray_copy(data, out)
returnoutdefbitround(data, keepbits):
codec=BitRound(keepbits=keepbits)
data=data.copy() # otherwise overwrites the inputencoded=codec.encode(data)
returncodec.decode(encoded)
# my code belowimportxarrayasxrimportsysprint('Require three args: python BitRound_file.py ifile.nc keepbits.json ofile')
assertlen(sys.argv) ==3+1_, ifile, keepbits_json, ofile=sys.argvimportjsonprint("keepbits_json",keepbits_json)
keepbits_loopup=json.load(open(keepbits_json))
ds=xr.open_dataset(ifile)
ds_round=ds.copy()
forvinds.data_vars:
ifvinkeepbits_loopup.keys():
keepbits=keepbits_loopup[v]
# here corrections could be applied# some hard fixes maybe# if keepbits < 2: # keepbits = 5ds_round[v].values=bitround(ds[v].values,keepbits)
ds_round=ds_round.assign_coords(ds.coords)
encoding={v:{'compression':True} forvinds.data_varsifvinkeepbits_loopup.keys()}
print('BitRound_file.py writes ', ofile)
ds_round.to_netcdf(ofile, mode='w', engine='h5netcdf', encoding=encoding)
What I do not want is that the compression impacts the read performance too much, i.e. reading the compressed data should not take much more (subjective I know) time that compared to before (or is this the tradeoff anyways).
Many lossless compressors have a decompression speed that's approximately independent of the compression level. So read performance is rarely something to worry about with this method. E.g.
(from https://www.nature.com/articles/s43588-021-00156-2)
I want to use
BitInformation
andBitRound
on NetCDF output. Anyone with experience withxarray
?What I do: see also #25 (comment)
My routine:
plot_bitinformation.jl
and save keepbits between 0 and 23 to adict
withjson
BitRound
for each variableds_rounded.to_netcdf(path, encoding, engine='h5netcdf')
I use
engine="h5netcdf"
because I can use{"compression": True}
in encoding. I do this because some compression is needed forBitRound
to be effective on disk storage. (I have only very little knowledge of compression and netcdf4 basics.) But probably there is a better way. What I do not want is that the compression impacts the read performance too much, i.e. reading the compressed data should not take much more (subjective I know) time that compared to before (or is this the tradeoff anyways).https://xarray.pydata.org/en/latest/generated/xarray.Dataset.to_netcdf.html
EDIT:
Result:
The text was updated successfully, but these errors were encountered: