Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: replace std::pow(x, 0.25) with std::sqrt(std::sqrt(x)) #1150

Merged
merged 9 commits into from
Feb 21, 2022

Conversation

paulgessinger
Copy link
Member

powf64 is relatively slow. This change improves the performance of ActsBenchmarkEigenStepper by about 10%, the performance impact on the more real-world propagation with navigation (as in the propagation example with the generic detector), seems negligible.

Overall I think it's probably still worth adding.

@paulgessinger paulgessinger added this to the next milestone Feb 8, 2022
@codecov
Copy link

codecov bot commented Feb 8, 2022

Codecov Report

Merging #1150 (c716eee) into main (f9dbc02) will increase coverage by 0.00%.
The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##             main    #1150   +/-   ##
=======================================
  Coverage   47.89%   47.89%           
=======================================
  Files         359      359           
  Lines       18502    18504    +2     
  Branches     8730     8730           
=======================================
+ Hits         8861     8863    +2     
  Misses       3605     3605           
  Partials     6036     6036           
Impacted Files Coverage Δ
Core/include/Acts/Propagator/EigenStepper.ipp 50.00% <100.00%> (+0.75%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f9dbc02...c716eee. Read the comment docs.

Copy link
Contributor

@HadrienG2 HadrienG2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit puzzling to me that we have two almost but not quite identical versions of the step size adaptation formula. But I'm happy to see this optimization land.

@paulgessinger
Copy link
Member Author

Copy paste error?

Also, I'm a bit surprised that the root file output changes. I'd have thought that a slight numerical change in the step size scaling would have a negligible effect on the output...

@asalzburger
Copy link
Contributor

It's a bit puzzling to me that we have two almost but not quite identical versions of the step size adaptation formula. But I'm happy to see this optimization land.

This doesn't surprise me, I think this is really a numerical difference from slight change of step size.

@benjaminhuth
Copy link
Member

Out of interest I've played a bit around with different versions of this, and I found that

std::sqrt(std::sqrt(static_cast<float>(x)));

is about twice as fast as using the double version. As I understand we do not require large precision here, so that could speed this even up a bit?

(The values I measured by just computing the result out of many sqrts(sqrts(...)) in a row, but without vectorization and with -O2)

@HadrienG2
Copy link
Contributor

@benjaminhuth This throughput difference indeed expected, https://www.agner.org/optimize/instruction_tables.pdf states that on modern CPUs the underlying SQRTSS and SQRTSD hardware instructions exhibit a 2x throughput difference in the worst-case scenario. I also agree that some computations here could likely be safely moved to single precision.

@paulgessinger
Copy link
Member Author

I'd probably say using this static cast to float is a good idea. I don't think we care about numerical precision in these cases.

@paulgessinger
Copy link
Member Author

Switched to sqrt(sqrt(static_cast<float>(x))). The CI should fail still because of the output hashes, I'll fix those from the CI values.

@stephenswat
Copy link
Member

Depending on the frequency at which the clamping condition between 0.25 and 4.0 occurs, it might also be beneficial to rewrite the clamp explicitly:

float r = state.options.tolerance / std::abs(2. * error_estimate);

if (r <= 0.00390625f) { // 0.25^4
    stepSizeScaling = 0.25;
else if (r >= 256.f) { // 4.0^4
    stepSizeScaling = 4.0;
} else {
    stepSizeScaling = std::sqrt(std::sqrt(r));
}

Hard to say if you'll benefit without knowing how often this fires, though. 🤷‍♂️

@paulgessinger
Copy link
Member Author

@stephenswat that could be an improvement, but might indeed be overkill.

@asalzburger
Copy link
Contributor

This has conflicts now because of the updated file I suppose ...

@paulgessinger
Copy link
Member Author

Updated the hashes again. Let's see.

@paulgessinger
Copy link
Member Author

This is green now. Do we merge (/ can you approve)? @benjaminhuth @HadrienG2 @stephenswat ?

@paulgessinger paulgessinger added automerge Improvement Changes to an existing feature labels Feb 21, 2022
@kodiakhq kodiakhq bot merged commit de6af39 into acts-project:main Feb 21, 2022
@paulgessinger paulgessinger deleted the perf/stepsize-scaling branch February 22, 2022 08:13
@paulgessinger paulgessinger modified the milestones: next, v17.1.0 Mar 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
automerge Improvement Changes to an existing feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants