Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize the implementation not to use any external function. #3

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

sjrd
Copy link

@sjrd sjrd commented Feb 11, 2022

Here is the implementation of the fround polyfill used by Scala.js 1.9.0+ when targeting ECMAScript 5.1, adapted for variable names to match the previous variable names in this codebase. The inspiration for using Veltkamp's splitting for the normal form case form came from the polyfill in core-js, which seems to also be the inspiration for the previous algorithm in this codebase. I have however reconstructed a proof with references that it is indeed correct. I designed the algorithm (if it can be called that) for the subnormal form case.

I intentionally break two eslint rules for the sake of performance:

  • using +x instead of $Number(x)
  • hard-coding the constants right inside the algorithm.

If that is not welcome in this repository, or misguided, I am happy to revert to using $Number and/or declaring the magic constants in vars as before.


The special code path for zeros and NaNs is removed. They fall through as the general case.

We avoid $Number by using the asm.js idiom v = +x. We avoid abs by using s * v. This causes -0 to fall through as is, but it turns out that it has the desired effect in the end.

We optimize the subnormal path to use only one multiplication and one division.

We dispatch the overflow case from the normal and subnormal cases ahead of time, which allows to more easily justify the correctness of the algorithm.

We hard-code the magical constants to avoid having to look up external identifiers, ensuring that immediate values are used in the generated code.

Finally, and perhaps most importantly, we add significant documentation about the algorithms for the normal and subnormal paths, with---if not proof---at least an argument and references of why they are correct.

@codecov
Copy link

codecov bot commented Feb 11, 2022

Codecov Report

Merging #3 (34070bd) into main (be55c4c) will decrease coverage by 75.00%.
The diff coverage is 11.11%.

Impacted file tree graph

@@             Coverage Diff              @@
##              main       #3       +/-   ##
============================================
- Coverage   100.00%   25.00%   -75.00%     
============================================
  Files            5        5               
  Lines           42       28       -14     
  Branches         5        4        -1     
============================================
- Hits            42        7       -35     
- Misses           0       21       +21     
Impacted Files Coverage Δ
implementation.js 10.00% <11.11%> (-90.00%) ⬇️
auto.js 0.00% <0.00%> (-100.00%) ⬇️
index.js 0.00% <0.00%> (-100.00%) ⬇️
shim.js 50.00% <0.00%> (-50.00%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update be55c4c...34070bd. Read the comment docs.

@sjrd
Copy link
Author

sjrd commented Feb 11, 2022

I'm confused by the CI errors on (IIUC) Node.js 1-4. They don't seem related to my changes, but the main branch builds successfully. Did I do something wrong when removing the dependency on is-nan?

Copy link
Member

@nicolo-ribaudo nicolo-ribaudo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, leaving to @ljharb the questions about code style and CI.

* causes an overflow, resulting in Infinity.
* This code path is also used when the input is already an Infinity.
*/
if (mod >= 3.4028235677973366E38) { // overflow-threshold
Copy link
Member

@nicolo-ribaudo nicolo-ribaudo Feb 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The old implementation used 3.4028234663852886e+38, which is lower than this number. I verified that this number is correct, but could you add a test for it?

We already have one for "big numbers":

t.test('rounds properly with the max float 32', function (st) {
var maxFloat32 = 3.4028234663852886e+38;

It would probably be good to test both the new magic number and 3.4028235677973362e+38, which is the float that comes right before it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The old implementation used 3.4028234663852886e+38, which is lower than this number.

The old implementation used the max-binary32 value. The new implementation refers to the midpoint between max-binary32 and Infinity in binary32.

It would probably be good to test both the new magic number and 3.4028235677973362e+38, which is the float that comes right before it.

The new magic number was already tested in "returns infinity for large numbers". I have added a test for the float just below it. I have also added tests around the subnormal-normal boundary.

@nicolo-ribaudo
Copy link
Member

nicolo-ribaudo commented Feb 11, 2022

Do you have a microbenchmark that shows the perf diff?

The special code path for zeros and NaNs is removed. They fall through
as the general case.

We avoid `$Number` by using the asm.js idiom `v = +x`.

We avoid `abs` by using `s * v`. This causes `-0` to fall through as
is, but it turns out that it has the desired effect in the end.

We optimize the subnormal path to use only one multiplication and one
division.

We dispatch the overflow case from the normal and subnormal cases ahead
of time, which allows to more easily justify the correctness of the
algorithm.

We hard-code the magical constants to avoid having to look up external
identifiers, ensuring that immediate values are used in the generated
code.

Finally, and perhaps most importantly, we add significant documentation
about the algorithms for the normal and subnormal paths, with---if not
proof---at least an argument and references of why they are correct.
@sjrd sjrd force-pushed the optimize-implementation branch from 01e9356 to 34070bd Compare February 11, 2022 18:45
@ljharb
Copy link
Member

ljharb commented Feb 11, 2022

How does hardcoding the constants in the algorithm improve performance?

Any performance-motivated change should definitely include benchmarks (in the PR is fine). In particular, I'd like comparisons of:

  • the current approach
  • the PR's approach
  • the PR's approach, but caching constants at module level and using Number() over +

Additionally, performance is only a sensible concept to discuss in the context of a specific implementation. This package supports basically every version of Firefox, Safari, IE, Edge, Chrome, node, and many others - so we'd want to compare performance in all of these.

@sjrd
Copy link
Author

sjrd commented Feb 11, 2022

I don't have micro-benchmarks, but I do have a benchmark of a raytracer where every single floating point value is a Float, and therefore every floating point operation goes through fround. Here are the results:

Which variant Mean running time (lower is better) Standard error the Mean
Current approach 4828.66 5.86
PR approach with caching 3802.77 3.74
PR as is 3525.61 3.67

How does hardcoding the constants in the algorithm improve performance?

I guess it's easier for the engine's JIT to compile them to immediate values, rather than loading them from memory or having to protect the immediates with deoptimization guards.

Additionally, performance is only a sensible concept to discuss in the context of a specific implementation. This package supports basically every version of Firefox, Safari, IE, Edge, Chrome, node, and many others - so we'd want to compare performance in all of these.

True. Here the performance measurements are done on a Linux Ubuntu machine with Node.js 14.14.0. This is probably not representative of all the possible versions. But I would say that an implementation that uses 0 external function call has a higher chance of being compiled in a consistent way across as many engines as possible.

@sjrd
Copy link
Author

sjrd commented Feb 24, 2022

Since the "optimizations" are controversial, I resubmitted a variant of this PR as #4, which preserves the use of $Number and var constants for magic numbers. Hopefully that PR will be less controversial.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants