-
-
Notifications
You must be signed in to change notification settings - Fork 404
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Plasma optimization #399
Plasma optimization #399
Conversation
@unoebauer this is now a factor of 3. One function is now in C and takes up a significant fraction of the time. The other slower function is the random blackbody nu function, the reimplementation would probably significantly speed that up (currently 1/3 of the non-montecarlo functions). @mreinecke I'll mark a function that is cython (so very close to C). but that can probably be sped up by openmp (array operations). |
p_transition[j, k] /= norm_factor[k] | ||
|
||
|
||
def calculate_transition_probabilities( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mreinecke this is the beast. You could just write it in C and maybe with openmp it would be faster. I'm wondering if memory bandwidth is an issue here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On 08/25/15 18:25, Wolfgang Kerzendorf wrote:
- cdef end_id = 0
- for i in range(len(reference_levels) - 1):
norm_factor[:] = 0.0
for j in range(reference_levels[i], reference_levels[i + 1]):
for k in range(p_transition.shape[1]):
norm_factor[k] += p_transition[j, k]
for j in range(reference_levels[i], reference_levels[i + 1]):
for k in range(0, p_transition.shape[1]):
if norm_factor[k] == 0.0:
continue
p_transition[j, k] /= norm_factor[k]
+def calculate_transition_probabilities(
@mreinecke this is the beast. You could just write it in C and maybe with openmp it would be faster. I'm wondering if memory bandwidth is an issue here.
It may be possible to optimize this a bit, but there are a few issues to
consider first:
- how large are the individual loop iteration counts (roughly, order of
magnitude is fine)? This determines the best arrangement of the nested
loops. - Is it known how the array p_transition is laid out in memory on the
Python side? We won't gain any performace by tweaking the routine as
long as the glue code between Python and C has to do things like array
copying and maybe even transposition. This is because this function is
absolutely dominated by memory accesses; the cost of arithmetic is
negligible in comparison. - related to the point above: does "p_transition[j, k]" in Cython mean
the same as "p_transition[j][k]" in C? More specifically, will
p_transition[j][k] and p_transition[j][k+1] be neighbours in memory?
Overall it would be beneficial if "p_transition" could be stored in a
way that elements with j and j+1 are neighbouring in memory, but I'm not
sure if the memory layout is constrained somehow by other considerations.
Cheers,
Martin
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Martin - do you want to meet? maybe this is easier discussed in person.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On 08/26/15 10:44, Wolfgang Kerzendorf wrote:
- cdef end_id = 0
- for i in range(len(reference_levels) - 1):
norm_factor[:] = 0.0
for j in range(reference_levels[i], reference_levels[i + 1]):
for k in range(p_transition.shape[1]):
norm_factor[k] += p_transition[j, k]
for j in range(reference_levels[i], reference_levels[i + 1]):
for k in range(0, p_transition.shape[1]):
if norm_factor[k] == 0.0:
continue
p_transition[j, k] /= norm_factor[k]
+def calculate_transition_probabilities(
Martin - do you want to meet? maybe this is easier discussed in person.
Sure, but I have a few quick things to finish now. I'll let you know!
@wkerzendorf Travis fails because of a NameError related to jit. Maybe because you removed numba? |
@wkerzendorf
|
@unoebauer - it works for me when running the |
@wkerzendorf - still doesn't work for me. It still fails with the same shape mismatch error as before... |
@unoebauer tardis_example? |
@unoebauer sorry - I just see that it is. hmm. |
@wkerzendorf Good news: the last commit fixed the convergence issue in the plasma part |
Ok - the shape mismatch error may be connected to pandas. It occurred when using pandas 0.14.1 (debian jessie). After upgrading to pandas 0.16.2, tardis_example runs without problems |
@unoebauer it works (well the coverage still sucks 😉 but I'll update this one a bit more) |
When I run this I get a bunch of these errors in the initial iteration: |
@aoifeboyle I wonder if this is the problem that comes from not having it merged with the newest master. Can you locate it? |
@wkerzendorf Could you merge this PR soonish? |
@aoifeboyle are we ready to merge this. |
@wkerzendorf Yes, sure. |
rewrite of some functions to make them faster. There are still some functions to go and specifically one function is already in C/Cython and would need to be sped-up there.