You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
importxarrayasxrimportnumpyasnp# Creating arrayds=xr.Dataset()
ds["longitude"] =np.arange(0.1,2000.1,1)
ds["latitude"] =range(3000)
ds["step"] =range(50)
ds["field"] = (("step","latitude","longitude"),np.random.randn(50,3000,2000))
ds.to_netcdf("big_array.nc")
# Create another array ds=xr.Dataset()
ds["longitude"] =np.arange(500.1,600.1,1) # Coordinate are a continuous subset of the first arrayds["latitude"] =np.arange(510,660)
ds["id"] =range(50)
ds["field"] = (("longitude","latitude","id"),np.random.randn(100,150,50))
ds.to_netcdf("slicing.nc")
# Create another array with "discontinuity" in longitude dimension ds=xr.Dataset()
ds["longitude"] =list(np.arange(500.1,598.1,1)) +[622.1,640.1]
ds["latitude"] =range(510,660)
ds["id"] =range(10)
ds["mask"] = (("longitude","latitude","id"),np.random.randn(100,150,10))
ds.to_netcdf("no_slicing.nc")
# Load the Three arrays da=xr.open_dataset("big_array.nc")
db=xr.open_dataset("slicing.nc").isel(id=0)
dc=xr.open_dataarray("no_slicing.nc").isel(id=0)
%timeitda*db# 32.3 ms ± 5.02 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)%timeitda*dc#2.13 s ± 15.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)defslicing_operation(dc,da):
""" Slicing when knowing that dc is a subpart of da """importoperatormin_lat=np.max([dc.latitude.values.min(),da.latitude.values.min()])
max_lat=np.min([dc.latitude.values.max(),da.latitude.values.max()])
index_lat_field=operator.and_(da.latitude>=min_lat,da.latitude<=max_lat)
min_lon=np.max([dc.longitude.values.min(),da.longitude.values.min()])
max_lon=np.min([dc.longitude.values.max(),da.longitude.values.max()])
index_lon_field=operator.and_(da.longitude>=min_lon,da.longitude<=max_lon)
# Extending dc such that it covers all longitude of da # Creating an empty array. dempty=xr.Dataset()
dempty["longitude"] =da.longitude.isel(longitude=index_lon_field).valuesdempty["latitude"] =da.latitude.isel(latitude=index_lat_field).valuesdc_ext=dc.broadcast_like(dempty)
# Performing multiplication and align res,_=xr.align((dc_ext*da),dc)
returnres%timeitslicing_operation(dc,da)
#43.9 ms ± 1.94 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)#Checking if we get the same resultsres_0=dc*dares=slicing_operation(dc,da)
print(abs(res_0-res).max())
Problem Description
A performance problem occured when performing operation between da and dc.
Computing da *db takes ~32 ms
While db and dc have the same shape, opeartion dc * da takes ~2.13s (so x 60 in cpu times).
This seems linked to the fact tha dc includes discontinuous indexes of da.
Indeed, when extending dc (such that its longitude are continuous when compared at da) the computation time take ~44 ms.
For me it seems that when doing da * dc da dataset is fully loaded in memory first, while only the part involved in computation is loaded when doing da *db.
Indeed when doing :
%timeda.load()
we get the following result:
CPU times: user 0 ns, sys: 1.88 s, total: 1.88 s
Wall time: 1.89 s
On top of that, da *db scales with the number of id selected while it is not the case for da *dc .
MCVE Code Sample
Problem Description
A performance problem occured when performing operation between da and dc.
Computing
da *db
takes ~32 msWhile db and dc have the same shape, opeartion
dc * da
takes ~2.13s (so x 60 in cpu times).This seems linked to the fact tha dc includes discontinuous indexes of da.
Indeed, when extending dc (such that its longitude are continuous when compared at da) the computation time take ~44 ms.
For me it seems that when doing
da * dc
da dataset is fully loaded in memory first, while only the part involved in computation is loaded when doingda *db
.Indeed when doing :
we get the following result:
On top of that,
da *db
scales with the number ofid
selected while it is not the case forda *dc
.Output of
xr.show_versions()
python: 3.7.6 | packaged by conda-forge | (default, Jan 7 2020, 22:33:48)
[GCC 7.3.0]
python-bits: 64
OS: Linux
OS-release: 3.10.0-957.27.2.el7.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
libhdf5: 1.10.5
libnetcdf: 4.7.3
xarray: 0.14.1
pandas: 0.25.3
numpy: 1.17.4
scipy: 1.4.1
netCDF4: 1.5.3
pydap: None
h5netcdf: None
h5py: 2.10.0
Nio: None
zarr: None
cftime: 1.0.4.2
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: 0.9.7.6
iris: None
bottleneck: 1.3.1
dask: 2.9.1
distributed: 2.9.1
matplotlib: 3.1.2
cartopy: None
seaborn: 0.9.0
numbagg: None
setuptools: 44.0.0.post20200106
pip: 19.3.1
conda: None
pytest: 5.3.2
IPython: 7.11.1
sphinx: 2.3.1
The text was updated successfully, but these errors were encountered: