-
Notifications
You must be signed in to change notification settings - Fork 168
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf: replace std::pow(x, 0.25)
with std::sqrt(std::sqrt(x))
#1150
perf: replace std::pow(x, 0.25)
with std::sqrt(std::sqrt(x))
#1150
Conversation
Codecov Report
@@ Coverage Diff @@
## main #1150 +/- ##
=======================================
Coverage 47.89% 47.89%
=======================================
Files 359 359
Lines 18502 18504 +2
Branches 8730 8730
=======================================
+ Hits 8861 8863 +2
Misses 3605 3605
Partials 6036 6036
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a bit puzzling to me that we have two almost but not quite identical versions of the step size adaptation formula. But I'm happy to see this optimization land.
Copy paste error? Also, I'm a bit surprised that the root file output changes. I'd have thought that a slight numerical change in the step size scaling would have a negligible effect on the output... |
This doesn't surprise me, I think this is really a numerical difference from slight change of step size. |
Out of interest I've played a bit around with different versions of this, and I found that std::sqrt(std::sqrt(static_cast<float>(x))); is about twice as fast as using the (The values I measured by just computing the result out of many |
@benjaminhuth This throughput difference indeed expected, https://www.agner.org/optimize/instruction_tables.pdf states that on modern CPUs the underlying |
I'd probably say using this static cast to |
Switched to |
Depending on the frequency at which the clamping condition between 0.25 and 4.0 occurs, it might also be beneficial to rewrite the clamp explicitly: float r = state.options.tolerance / std::abs(2. * error_estimate);
if (r <= 0.00390625f) { // 0.25^4
stepSizeScaling = 0.25;
else if (r >= 256.f) { // 4.0^4
stepSizeScaling = 4.0;
} else {
stepSizeScaling = std::sqrt(std::sqrt(r));
} Hard to say if you'll benefit without knowing how often this fires, though. 🤷♂️ |
@stephenswat that could be an improvement, but might indeed be overkill. |
This has conflicts now because of the updated file I suppose ... |
Updated the hashes again. Let's see. |
This is green now. Do we merge (/ can you approve)? @benjaminhuth @HadrienG2 @stephenswat ? |
acts-project#1150)" This reverts commit de6af39.
powf64
is relatively slow. This change improves the performance ofActsBenchmarkEigenStepper
by about 10%, the performance impact on the more real-world propagation with navigation (as in the propagation example with the generic detector), seems negligible.Overall I think it's probably still worth adding.