-
Notifications
You must be signed in to change notification settings - Fork 97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parallelised calculations using dask. #30
Comments
The "culprit" is two-fold:
|
To play with dask, you can use |
I see, thanks for pointing this semi-hiden feature out! Setting Shall we leave the issue open as a reminder (for documentation & porting the non-dask conversion functions)? |
Yes, leave it open. |
Following up on the performance question, the two for comparison: Using
|
The |
It would probably make sense to retry the evaluation of wind generation with dask with the new commit 1294373 on v0.2 which parallelizes interpolation with dask. |
Keeping this here as a reminder. Not a pressing issue, but something that has been bothering me for some time.
I have not yet observed that
atlite
is using more than one processing core for calculations (e.g. wind speed to wind power conversion).I would assume that this should be easy to implement (as in the backend we are using
dask
andxarray
, but I haven't been able to identify the culprit.Checking for parallelised acitvity
Initially I was only monitoring the CPU util. using
htop
:There only one core seems to be active.
A better way is to use a
dask
dashboard and a local client:I also do not see any dask activity during calculations there.
Some background info
When checking
cutout.data
this returns anxarray
Dataset using numpy arrays.When loading the dataset (
xarray.open_dataset(...)
) with thechunks
option or rechunking the dataset after the cutout was loaded usingcutout.data.chunk(...)
then shows some dask activity during calculations (but slow, I guess a lot of overhead from spawning and orchestrating the different workers and threads).
Low priority
With the new version #20 calculations have become significantly faster and caching of datasets obsolete (can now be done implicitly by xarray and dask).
So this is only interesting when doing conversion of large datasets or repeated conversions.
Current versions
The text was updated successfully, but these errors were encountered: