-
Notifications
You must be signed in to change notification settings - Fork 97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use dask for PV time-series #48
Conversation
I tested variants of this version with Limitations / RestrictionsWhat is still missing is an appropriate place to introduce (automatic) chunking of the cutout, otherwise dask is not feeling responsible. For now I used
after loading the cutout. Also the Lines 51 to 55 in 88f2e2b
is stricter for using the dask mode: Workers exceeding allowed memory poolI can confirm the problem with the Obsolete code pieces?If we figure this out, am I correct to assume that we can remove the whole Also we can probably make the aggregation here Line 29 in 88f2e2b
simpler, as we do not need any distinction between backends? For further referencePlotting with
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixup
Did you test In principle the decorator could go then. The code simplification is relatively local, though. Only parts of But I would want to be sure, we are not introducing major performance degradations by relying on dask only. The An |
Yes, I checked with
as it should be. Performance wise I feel like there is a bit overhead (not much, I'll report numbers later). The simplifications gained by understanding the code (when reading because e.g. of bugtracking) would I think be more valuable than minor performance loses. A actually prefer the |
Addendum - timingsTiming using
cutout dimensions (ERA5): For the codebase, GitHub refers to the
I do not see any significant changes. |
side note: atlite/atlite/pv/irradiation.py Line 113 in 8efc4c8
should be
|
Update on memory usageI think this is related to I was able to trim it down (~50% for my case from ca. 60GB to 32GB) using 5-6 Task graphs(External links because of large svg files - sorry)
Caveat: Do you think that's a reasonable solution? I'd like to finish this up, already literally a week on this without much success. It's a pain in the seat muscles to pinpoint and solve. |
Update - AlternativeI didn't really like the idea of adding Lines 414 to 421 in b227ab2
to
This yields similar gains in memory reduction (down to 20-25 GB from ~60 GB) in my case, but is noticeably slower than the |
Update - Better alternativeI was just trying out another alternative, which is now my favourite one:
Results:
I am now down to 11 GB of memory usage (reduction by a factor of 5 to 6). Could you confirm this to work for you? |
@euronion this relevant or necessary for |
I think so, yes. Need to check it during review. |
Nope, out of scope |
Closing this in favour of code in #20 |
convert_pv
is the last hurdle to a full dask mode in atlite. Refer also to #30 .This PR replaces all the inplace transformations with clip and where so they work in principle on dask.
Unfortunately dask fails to properly optimize the task graph (into something which should be linear in time), but instead thrashes memory until it dies even for more small cutouts.
I wasn't able today to find any working solution (not even in synchronuous mode it works sensibly).
For future reference, its very helpful to visualize and profile dask:
You might also want to look at intermediate values, starting from: