Optimize the implementation not to use any external function. #3

sjrd · 2022-02-11T17:47:55Z

Here is the implementation of the fround polyfill used by Scala.js 1.9.0+ when targeting ECMAScript 5.1, adapted for variable names to match the previous variable names in this codebase. The inspiration for using Veltkamp's splitting for the normal form case form came from the polyfill in core-js, which seems to also be the inspiration for the previous algorithm in this codebase. I have however reconstructed a proof with references that it is indeed correct. I designed the algorithm (if it can be called that) for the subnormal form case.

I intentionally break two eslint rules for the sake of performance:

using +x instead of $Number(x)
hard-coding the constants right inside the algorithm.

If that is not welcome in this repository, or misguided, I am happy to revert to using $Number and/or declaring the magic constants in vars as before.

The special code path for zeros and NaNs is removed. They fall through as the general case.

We avoid $Number by using the asm.js idiom v = +x. We avoid abs by using s * v. This causes -0 to fall through as is, but it turns out that it has the desired effect in the end.

We optimize the subnormal path to use only one multiplication and one division.

We dispatch the overflow case from the normal and subnormal cases ahead of time, which allows to more easily justify the correctness of the algorithm.

We hard-code the magical constants to avoid having to look up external identifiers, ensuring that immediate values are used in the generated code.

Finally, and perhaps most importantly, we add significant documentation about the algorithms for the normal and subnormal paths, with---if not proof---at least an argument and references of why they are correct.

codecov · 2022-02-11T17:48:58Z

Codecov Report

Merging #3 (34070bd) into main (be55c4c) will decrease coverage by 75.00%.
The diff coverage is 11.11%.

@@             Coverage Diff              @@
##              main       #3       +/-   ##
============================================
- Coverage   100.00%   25.00%   -75.00%     
============================================
  Files            5        5               
  Lines           42       28       -14     
  Branches         5        4        -1     
============================================
- Hits            42        7       -35     
- Misses           0       21       +21

Impacted Files	Coverage Δ
implementation.js	`10.00% <11.11%> (-90.00%)`	⬇️
auto.js	`0.00% <0.00%> (-100.00%)`	⬇️
index.js	`0.00% <0.00%> (-100.00%)`	⬇️
shim.js	`50.00% <0.00%> (-50.00%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update be55c4c...34070bd. Read the comment docs.

sjrd · 2022-02-11T17:55:14Z

I'm confused by the CI errors on (IIUC) Node.js 1-4. They don't seem related to my changes, but the main branch builds successfully. Did I do something wrong when removing the dependency on is-nan?

nicolo-ribaudo

LGTM, leaving to @ljharb the questions about code style and CI.

nicolo-ribaudo · 2022-02-11T18:09:59Z

implementation.js

+	 * causes an overflow, resulting in Infinity.
+	 * This code path is also used when the input is already an Infinity.
+	 */
+	if (mod >= 3.4028235677973366E38) { // overflow-threshold


The old implementation used 3.4028234663852886e+38, which is lower than this number. I verified that this number is correct, but could you add a test for it?

We already have one for "big numbers":

Math.fround/test/tests.js

Lines 48 to 49 in be55c4c

t.test('rounds properly with the max float 32', function (st) {

var maxFloat32 = 3.4028234663852886e+38;

It would probably be good to test both the new magic number and 3.4028235677973362e+38, which is the float that comes right before it.

The old implementation used 3.4028234663852886e+38, which is lower than this number.

The old implementation used the max-binary32 value. The new implementation refers to the midpoint between max-binary32 and Infinity in binary32.

It would probably be good to test both the new magic number and 3.4028235677973362e+38, which is the float that comes right before it.

The new magic number was already tested in "returns infinity for large numbers". I have added a test for the float just below it. I have also added tests around the subnormal-normal boundary.

nicolo-ribaudo · 2022-02-11T18:16:00Z

Do you have a microbenchmark that shows the perf diff?

The special code path for zeros and NaNs is removed. They fall through as the general case. We avoid `$Number` by using the asm.js idiom `v = +x`. We avoid `abs` by using `s * v`. This causes `-0` to fall through as is, but it turns out that it has the desired effect in the end. We optimize the subnormal path to use only one multiplication and one division. We dispatch the overflow case from the normal and subnormal cases ahead of time, which allows to more easily justify the correctness of the algorithm. We hard-code the magical constants to avoid having to look up external identifiers, ensuring that immediate values are used in the generated code. Finally, and perhaps most importantly, we add significant documentation about the algorithms for the normal and subnormal paths, with---if not proof---at least an argument and references of why they are correct.

ljharb · 2022-02-11T18:45:30Z

How does hardcoding the constants in the algorithm improve performance?

Any performance-motivated change should definitely include benchmarks (in the PR is fine). In particular, I'd like comparisons of:

the current approach
the PR's approach
the PR's approach, but caching constants at module level and using Number() over +

Additionally, performance is only a sensible concept to discuss in the context of a specific implementation. This package supports basically every version of Firefox, Safari, IE, Edge, Chrome, node, and many others - so we'd want to compare performance in all of these.

sjrd · 2022-02-11T19:10:29Z

I don't have micro-benchmarks, but I do have a benchmark of a raytracer where every single floating point value is a Float, and therefore every floating point operation goes through fround. Here are the results:

Which variant	Mean running time (lower is better)	Standard error the Mean
Current approach	4828.66	5.86
PR approach with caching	3802.77	3.74
PR as is	3525.61	3.67

How does hardcoding the constants in the algorithm improve performance?

I guess it's easier for the engine's JIT to compile them to immediate values, rather than loading them from memory or having to protect the immediates with deoptimization guards.

Additionally, performance is only a sensible concept to discuss in the context of a specific implementation. This package supports basically every version of Firefox, Safari, IE, Edge, Chrome, node, and many others - so we'd want to compare performance in all of these.

True. Here the performance measurements are done on a Linux Ubuntu machine with Node.js 14.14.0. This is probably not representative of all the possible versions. But I would say that an implementation that uses 0 external function call has a higher chance of being compiled in a consistent way across as many engines as possible.

sjrd · 2022-02-24T14:57:17Z

Since the "optimizations" are controversial, I resubmitted a variant of this PR as #4, which preserves the use of $Number and var constants for magic numbers. Hopefully that PR will be less controversial.

nicolo-ribaudo approved these changes Feb 11, 2022

View reviewed changes

sjrd force-pushed the optimize-implementation branch from 01e9356 to 34070bd Compare February 11, 2022 18:45

sjrd mentioned this pull request Feb 24, 2022

Rearrange the implementation and extensively document it. #4

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize the implementation not to use any external function. #3

Optimize the implementation not to use any external function. #3

sjrd commented Feb 11, 2022

codecov bot commented Feb 11, 2022 •

edited

Loading

sjrd commented Feb 11, 2022

nicolo-ribaudo left a comment

nicolo-ribaudo Feb 11, 2022 •

edited

Loading

sjrd Feb 11, 2022

nicolo-ribaudo commented Feb 11, 2022 •

edited

Loading

ljharb commented Feb 11, 2022

sjrd commented Feb 11, 2022

sjrd commented Feb 24, 2022

	t.test('rounds properly with the max float 32', function (st) {
	var maxFloat32 = 3.4028234663852886e+38;

Optimize the implementation not to use any external function. #3

Are you sure you want to change the base?

Optimize the implementation not to use any external function. #3

Conversation

sjrd commented Feb 11, 2022

codecov bot commented Feb 11, 2022 • edited Loading

Codecov Report

sjrd commented Feb 11, 2022

nicolo-ribaudo left a comment

Choose a reason for hiding this comment

nicolo-ribaudo Feb 11, 2022 • edited Loading

Choose a reason for hiding this comment

sjrd Feb 11, 2022

Choose a reason for hiding this comment

nicolo-ribaudo commented Feb 11, 2022 • edited Loading

ljharb commented Feb 11, 2022

sjrd commented Feb 11, 2022

sjrd commented Feb 24, 2022

codecov bot commented Feb 11, 2022 •

edited

Loading

nicolo-ribaudo Feb 11, 2022 •

edited

Loading

nicolo-ribaudo commented Feb 11, 2022 •

edited

Loading