-
Notifications
You must be signed in to change notification settings - Fork 17.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cmd/compile: regression around now inlining code in 1.17 #48067
Comments
Reduced
|
I don't see anything obvious. The register allocation looks a bit worse, but not enough to be 400+% worse. There are more reg-reg moves, but still no spills or anything like that. Where does the cpu profiler say the time is going? |
Ran only the worse benchmark with
|
It would help to compare benchmarks using The one thing that stands out to me from the pprof output is that in the 1.16 case, a lot of time went to line 15, probably the |
Sorry @randall77 , I didn't know I can see the assembly annotated. And then copied
|
It looks to me like in 1.16 the divide is outside the loop, but in 1.17 the divide is inside the loop. |
I think I understand why this is happening. In 1.16, the expressions In 1.17, the expression You can "fix" this by using the result of So I think this issue is just a dup of #15808. |
What version of Go are you using (
go version
)?Does this issue reproduce with the latest release?
yes
What operating system and processor architecture are you using (
go env
)?go env
OutputWhat did you do?
I ran our project benchmarks and there was quite a bit of regression in some calculating code(among others).
original issue grafana/k6#2116
repo to reproduce it with a lot less of the code: https://github.com/MStoykov/reproduceScaleRegression
As mentioned in the repo the commit that makes the particular function inline is adb467f , but even without it, it's possible to make the code inline which results in the same regression even with go1.16.7.
Given that
What did you expect to see?
I would've wanted all the benchmarks to get faster :P.
In this particular example, I would expect that inlining will help not hinder, and some nice people in slack were theorizing that it's likely unaligned jumps or cache line misses and also (and most likely) the branch predictor getting hash collisions as proposed in #18977 (comment) , but I don't know enough to confirm or deny any of this
Given that there isn't such a regression in arm64 benchmarks results, provided by @vearutop 🙇
What did you see instead?
I saw a decrease in performance after a change that should've done the opposite.
The text was updated successfully, but these errors were encountered: