-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance regression on the threading branch compared to the threads branch #10527
Comments
I have this one by the tail. It comes down to how the array assignment gets inlined. In the fast version:
and in the slow version:
Just extra (though unnecessary) temp vars. Surprisingly, this causes a cascade of horrible effects in code gen. |
It looks like inlining may be concerned about the evaluation order. I've seen that be an issue with some intrinsics which want to swap their argument order. Since u3 is not local, it's usage is not affect-free. (I don't think that's related here though) |
But u3 is never assigned, so it should be affect-free. |
True. Plus it wouldn't explain why the Other vars are being extracted. More codegen passes now generate GenSym objects. Which still generate a symbol? |
Somewhat related: #6713 |
Inlining still uses symbols instead of GenSyms, based on the |
I can fix this by making |
yes. if the symbol is local, that means the argument is also assigned locally. it's only valid to use a GenSym when the variable is assigned exactly once.
can you give an example of what you mean? i thought a local variable would already always return true |
insanity case? function f()
x = 1
g() = (x+=1)
h(a,b,c,d) = (a,b,c,d)
return h(x, g(), g(), x)
end but the inliner would bail on that just due to the presence of the captured variable ( Lines 2374 to 2377 in b201312
|
Yeah, that's why I said "non-captured".
In other words, it only returns true for a local when |
Ah, right. That's not falling through correctly and is being wayyyy too conservative |
this avoids inserting temporary variable copies of local variables unless they might be assigned by an inner function.
Ok I fixed this on master, and it will fix this issue when picked onto the threading branch. |
this avoids inserting temporary variable copies of local variables unless they might be assigned by an inner function. Conflicts: base/inference.jl
The reason the old
threads
branch is still around and hasn't been replaced by the newerthreading
branch is that there is a performance regression. There are three forms for the@threads
macro:@threads all foo()
@threads all begin ... end
@threads all for ... end
They all use the same threading call--the latter two generate an inner function to pass in--and should have the same performance. In the
threads
branch they do; in thethreading
branch, the latter two are ~3.5X slower than the first form.Any of the three forms can be used in
test/perf/threads/laplace3d.jl
; choose the one to try at line 95.@JeffBezanson: I think you had fixed this or something very similar to this in the
threads
branch back in November--something to do with type checks in inner functions?The text was updated successfully, but these errors were encountered: