-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
misc. benchmark regressions since 0.4 #16128
Comments
There is also a significant regression in the 0.4:
0.5:
I tried changing some of the containers to use
slightly better but still slower than 0.4 and still 2.5x slower than python. |
Some of the planned changes may improve this – in particular, the new |
We should probably try to track down what led to the slowdown though - then any internal representation changes would be speedups, not just countering regressions elsewhere. |
a retrospective daily (?) history of the perf tests might help, that way it could be seen whether it was single/few commits which were to blame. cc @jrevels |
Looks like #16236 improved things a bit, but not enough. |
Spell test after #16236:
|
It probably makes sense to run benchmarks on the version right before the String change to figure out how much of this is due to various other changes and how much is due to that change. |
See here. Hopefully I got the commit range correct - I meant to do the most recent commit related to #16058 vs. the commit right before #16058 in the history. |
Excellent point. I'll retry without threading. |
I expect threading to have a larger performance impact on the C runtime (e.g. #15541) than generated (especially runtime JITed code) since the optimization we use for TLS access doesn't really work for C code.... |
After #16439 is merged, it would be helpful to see an updated table with JULIA_THREADS=0. I think most of these have been addressed or are due to Box (such as the json test), which is being tracked separately. |
Measured again, still with threads enabled. Not many changes, but rand_mat_stat is much worse. k_nucleotide is so much faster that it's suspicious. Needs investigating. |
Tried with |
Does this changes too much with different LLVM versions? |
|
In the following "mine" refers to commit e280a27 One commit before mine
mine
almost latest master (from yesterday I think)
It does seem that there's a doubling (ouch) from my commit. However, since then it seems to have gotten even worse (especially rand_mat_mul). This is just one run of |
splatting penalty probably? some manually written out cases for small numbers of inputs may help? |
Another way to look at it is that |
If you added the addendum after the merger of #16260, note that |
Progress! We're starting to do pretty well; several of the regressions are fixed, and most of the ones that remain are minor, around ~10%. Regressions marked |
Shall we move this to 0.5.x then? (And make sure that this time we actually work though the .x issues.) |
Has anyone done this yet, and does it work? @nanosoldier runbenchmarks(ALL, vs=":release-0.4")` |
Missed a backtick, try again: @nanosoldier |
Will that work on an issue? Try on the latest commit to master maybe? |
It doesn't work in issues, since there's no explicit commit that can be assumed by context (though master would be a reasonable default). You can comment on commits though, as Tony mentioned. I triggered the job here. |
I did a comparison between the two reports and this is a filtered (not completely exhaustive) summary of what is consistently slower. The number to the right is the slowdown. String
Sparse
Sort
Simd
Shootout
Scalar
Basically all scalar arithmetic on BigInt and BigFloat are slower Problem
Parallel
LinAlgBasically everything with tridiagonal and co, some examples:
Array
|
Its good to know that's now there, but I checked my source tree before posting to confirm this was a known inliner bug, as it preceded that PR. @JeffBezanson Is there some concrete actions required still to move this issue to v0.5.x, or what's left that seems unexplained / uncorrected? |
The regressions in splitline and gk are a bit worrying, but hopefully we can work on them in the RC period. |
I'm more worried about the 27x regression on the function perf_local_arrays(V)
# SIMD loop on local arrays declared without type annotations
T, n = eltype(V), length(V)
X = rand(T, n)
Y = rand(T, n)
Z = rand(T, n)
@simd for i in eachindex(X)
@inbounds X[i] = Y[i] * Z[i]
end
return X
end doesn't vectorize? (Input: |
It vectorizes on LLVM 3.8 |
this makes for faster dynamic dispatch and apply-type construction ref #16128
for those who may not have seen it, here's 0.5.0-rc2 vs 0.4.6: https://github.com/JuliaCI/BaseBenchmarkReports/blob/6d82a5518a25740eef4abde8359ea3cdbc630375/0350e57_vs_2e358ce/report.md |
With the exception of the horrifying regressions on "simd", fairly satisfying overall. I'm especially pleased with the improvements in so many array benchmarks. Worried about the linalg regressions, but an optimist can always hope that my barrage of PRs this morning may help. |
It feels like the array arithmetic is just some inlining problem or something like that. |
this makes for faster dynamic dispatch and apply-type construction ref JuliaLang#16128
Closing this, since at this point the experiment will need to be repeated for 0.6. |
Just ran
make test-perf
and observed this. Most results are the same, and we've improved on about 6 tests, but unfortunately there are some significant regressions:The text was updated successfully, but these errors were encountered: