Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MROO has float rounding issues #260

Closed
Dentosal opened this issue Nov 1, 2022 · 4 comments · Fixed by #262
Closed

MROO has float rounding issues #260

Dentosal opened this issue Nov 1, 2022 · 4 comments · Fixed by #262
Assignees
Labels
bug Something isn't working fuel-vm Related to the `fuel-vm` crate.

Comments

@Dentosal
Copy link
Member

Dentosal commented Nov 1, 2022

Similarly to the MLOG issues discussed in #150, MROO also has floating point rounding issues. This is the only location of the vm still using floats.

For instance, calling MROO with values mroo(647214625, 3), i.e. cube root of 647214625, gives 864.9999999999997 in floating point, while 865**3 == 647214625.

I'll investigate the performance impact of handling this with integer-only operations.

@Dentosal Dentosal added the bug Something isn't working label Nov 1, 2022
@Dentosal Dentosal self-assigned this Nov 1, 2022
@Dentosal
Copy link
Member Author

Dentosal commented Nov 1, 2022

I benchmarked the generally fast num-integer::Roots :: nth-root and my error correcting floating point solution seen below.

While the num-integer is a bit faster on the average case compared to the current floating point solution, it has some really bad edge cases. Benchmarks found over 70x runtime cases and I suspect even 100x would be possible. See mroo(4294967296, 21) for an example of this.

The custom implementation seen below performs acceptably. For non-interesting inputs it's four times faster. Even in the worst case it only takes around three times longer than the current version.

Custom error correction

fn custom(target: u64, nth_root: u32) -> u64 {
    if nth_root == 0 {
        panic!();
    }

    if nth_root == 1  || target <= 1 {
        return target;
    }

    if nth_root > 64 {
        // for any n>1, n**64 can never fit into u64
        return 1;
    }

    if nth_root as u64  >= target {
        return 1;
    }

    // Use floating point operation to get a guess for the starting point.
    // This is at most off by one in either direction.
    let guess = (target as f64).powf((nth_root as f64).recip()) as u64;

    let Some(g1) = guess.checked_pow(nth_root) else {
        // guess**nth_root >= 2**64 and target < 2**64
        return guess - 1;
    };

    if target < g1 {
        return guess - 1;
    }

    let Some(g2) = (guess + 1).checked_pow(nth_root) else {
        // (guess + 1)**nth_root >= 2**64 and target < 2**64
        return guess;
    };

    if target < g2 {
        return guess;
    }

    guess + 1
}

Results

cargo run --release

Exponent Base iroot timing custom-impl timing float timing iroot/float timing custom/float timing
2 2 65.411ms 12.447ms 43.351ms 1.5089 0.2871
2 3 50.987ms 9.603ms 40.642ms 1.2545 0.2363
2 4 31.782ms 9.625ms 40.596ms 0.7829 0.2371
2 5 31.819ms 9.543ms 41.127ms 0.7737 0.2320
2 6 32.152ms 9.514ms 40.799ms 0.7881 0.2332
2 7 31.998ms 9.581ms 40.749ms 0.7852 0.2351
2 10 32.050ms 9.550ms 40.917ms 0.7833 0.2334
2 16 32.021ms 9.549ms 41.060ms 0.7799 0.2326
2 17 31.868ms 9.539ms 40.580ms 0.7853 0.2350
2 21 32.166ms 9.596ms 40.457ms 0.7951 0.2372
2 31 31.893ms 9.568ms 40.753ms 0.7826 0.2348
2 37 31.986ms 9.571ms 40.599ms 0.7878 0.2357
2 100 32.059ms 9.650ms 40.677ms 0.7881 0.2372
2 1337 31.894ms 9.535ms 40.591ms 0.7857 0.2349
2 2147483647 32.003ms 9.560ms 40.572ms 0.7888 0.2356
3 2 32.490ms 111.668ms 94.916ms 0.3423 1.1765
3 3 51.720ms 9.396ms 94.583ms 0.5468 0.0993
3 4 32.271ms 9.529ms 94.853ms 0.3402 0.1005
3 5 32.049ms 9.637ms 94.749ms 0.3382 0.1017
3 6 32.070ms 9.555ms 95.109ms 0.3372 0.1005
3 7 32.072ms 9.550ms 94.994ms 0.3376 0.1005
3 10 32.077ms 9.605ms 94.939ms 0.3379 0.1012
3 16 32.221ms 9.612ms 95.079ms 0.3389 0.1011
3 17 32.120ms 9.551ms 95.294ms 0.3371 0.1002
3 21 32.081ms 9.621ms 95.250ms 0.3368 0.1010
3 31 32.079ms 9.549ms 94.128ms 0.3408 0.1014
3 37 31.550ms 9.364ms 93.098ms 0.3389 0.1006
3 100 31.516ms 9.386ms 93.070ms 0.3386 0.1008
3 1337 31.479ms 9.368ms 93.109ms 0.3381 0.1006
3 2147483647 31.482ms 9.392ms 94.783ms 0.3321 0.0991
4 2 32.536ms 69.378ms 39.784ms 0.8178 1.7439
4 3 51.738ms 76.600ms 40.140ms 1.2889 1.9083
4 4 32.051ms 9.660ms 40.634ms 0.7888 0.2377
4 5 31.993ms 9.395ms 40.075ms 0.7983 0.2344
4 6 31.509ms 9.368ms 40.199ms 0.7838 0.2330
4 7 31.544ms 9.411ms 40.366ms 0.7814 0.2331
4 10 31.454ms 9.383ms 40.494ms 0.7768 0.2317
4 16 31.456ms 9.411ms 40.220ms 0.7821 0.2340
4 17 31.481ms 9.478ms 40.247ms 0.7822 0.2355
4 21 31.612ms 9.430ms 41.064ms 0.7698 0.2296
4 31 32.071ms 9.584ms 40.760ms 0.7868 0.2351
4 37 32.048ms 9.581ms 40.578ms 0.7898 0.2361
4 100 32.046ms 9.592ms 40.716ms 0.7871 0.2356
4 1337 32.101ms 9.595ms 40.634ms 0.7900 0.2361
4 2147483647 32.521ms 9.547ms 40.682ms 0.7994 0.2346
5 2 32.687ms 111.170ms 94.904ms 0.3444 1.1714
5 3 51.874ms 115.921ms 94.890ms 0.5467 1.2216
5 4 32.072ms 119.596ms 94.834ms 0.3382 1.2611
5 5 32.054ms 9.547ms 95.273ms 0.3364 0.1002
5 6 32.042ms 9.611ms 95.277ms 0.3363 0.1009
5 7 31.993ms 9.416ms 93.188ms 0.3433 0.1010
5 10 31.468ms 9.441ms 93.076ms 0.3381 0.1014
5 16 31.460ms 9.387ms 93.112ms 0.3379 0.1008
5 17 31.484ms 9.371ms 93.371ms 0.3372 0.1004
5 21 31.521ms 9.431ms 93.633ms 0.3366 0.1007
5 31 31.483ms 9.371ms 93.054ms 0.3383 0.1007
5 37 31.522ms 9.371ms 93.816ms 0.3360 0.0999
5 100 31.450ms 9.454ms 93.549ms 0.3362 0.1010
5 1337 31.508ms 9.388ms 94.626ms 0.3330 0.0992
5 2147483647 31.450ms 9.413ms 93.090ms 0.3378 0.1011
6 2 31.904ms 109.369ms 94.308ms 0.3383 1.1597
6 3 51.444ms 115.932ms 94.855ms 0.5423 1.2222
6 4 32.056ms 120.159ms 95.019ms 0.3374 1.2646
6 5 32.064ms 127.206ms 94.856ms 0.3380 1.3410
6 6 32.085ms 9.583ms 94.879ms 0.3382 0.1010
6 7 32.118ms 9.486ms 94.873ms 0.3385 0.1000
6 10 31.509ms 9.368ms 94.286ms 0.3342 0.0994
6 16 31.459ms 9.371ms 94.691ms 0.3322 0.0990
6 17 31.474ms 9.368ms 95.080ms 0.3310 0.0985
6 21 31.722ms 9.623ms 94.918ms 0.3342 0.1014
6 31 32.072ms 9.594ms 94.836ms 0.3382 0.1012
6 37 32.052ms 9.552ms 94.848ms 0.3379 0.1007
6 100 32.093ms 9.627ms 94.630ms 0.3391 0.1017
6 1337 31.974ms 9.545ms 94.868ms 0.3370 0.1006
6 2147483647 32.086ms 9.587ms 94.860ms 0.3382 0.1011
7 2 32.551ms 111.668ms 94.867ms 0.3431 1.1771
7 3 50.970ms 114.537ms 94.854ms 0.5374 1.2075
7 4 31.466ms 118.968ms 94.651ms 0.3324 1.2569
7 5 32.108ms 127.190ms 94.907ms 0.3383 1.3402
7 6 32.045ms 127.552ms 95.462ms 0.3357 1.3361
7 7 32.057ms 9.555ms 94.834ms 0.3380 0.1007
7 10 32.034ms 9.635ms 94.781ms 0.3380 0.1017
7 16 32.040ms 9.580ms 94.777ms 0.3381 0.1011
7 17 32.031ms 9.371ms 93.129ms 0.3439 0.1006
7 21 31.458ms 9.368ms 93.419ms 0.3367 0.1003
7 31 31.473ms 9.433ms 93.994ms 0.3348 0.1004
7 37 31.455ms 9.419ms 94.042ms 0.3345 0.1002
7 100 32.106ms 9.553ms 94.924ms 0.3382 0.1006
7 1337 32.053ms 9.562ms 95.115ms 0.3370 0.1005
7 2147483647 32.046ms 9.555ms 94.897ms 0.3377 0.1007
16 2 32.533ms 69.067ms 39.752ms 0.8184 1.7374
16 3 81.768ms 76.986ms 39.734ms 2.0579 1.9375
16 4 38.195ms 78.748ms 39.744ms 0.9610 1.9814
16 5 32.039ms 83.569ms 39.576ms 0.8096 2.1117
16 6 31.519ms 82.286ms 39.734ms 0.7933 2.0709
16 7 31.529ms 93.712ms 39.616ms 0.7959 2.3656
16 10 31.503ms 97.828ms 39.631ms 0.7949 2.4684
16 16 31.565ms 9.425ms 40.532ms 0.7788 0.2325
16 17 31.506ms 9.449ms 40.710ms 0.7739 0.2321
16 21 31.506ms 9.366ms 40.442ms 0.7790 0.2316
16 31 31.476ms 9.368ms 40.578ms 0.7757 0.2308
16 37 31.532ms 9.368ms 40.557ms 0.7775 0.2310
16 100 32.085ms 9.462ms 40.546ms 0.7913 0.2333
16 1337 31.565ms 9.378ms 40.700ms 0.7755 0.2304
16 2147483647 31.542ms 9.432ms 40.520ms 0.7784 0.2328
100 2 31.944ms 111.340ms 94.742ms 0.3372 1.1752
100 3 79.944ms 113.787ms 93.835ms 0.8520 1.2126
100 4 59.791ms 117.456ms 94.512ms 0.6326 1.2428
100 5 83.966ms 126.364ms 95.357ms 0.8805 1.3252
100 6 47.711ms 128.195ms 94.676ms 0.5039 1.3540
100 7 31.655ms 133.375ms 94.280ms 0.3358 1.4147
100 10 31.479ms 138.762ms 95.404ms 0.3299 1.4545
100 16 32.205ms 147.394ms 95.396ms 0.3376 1.5451
100 17 32.519ms 159.157ms 94.638ms 0.3436 1.6818
100 21 31.600ms 165.064ms 94.711ms 0.3336 1.7428
100 31 32.103ms 217.815ms 95.446ms 0.3363 2.2821
100 37 32.174ms 206.750ms 95.241ms 0.3378 2.1708
100 100 32.272ms 9.603ms 95.411ms 0.3382 0.1006
100 1337 32.248ms 9.716ms 95.322ms 0.3383 0.1019
100 2147483647 32.180ms 9.613ms 95.482ms 0.3370 0.1007
1000 2 32.179ms 111.959ms 95.075ms 0.3385 1.1776
1000 3 85.151ms 130.544ms 94.905ms 0.8972 1.3755
1000 4 92.154ms 120.636ms 95.236ms 0.9676 1.2667
1000 5 55.675ms 127.301ms 94.939ms 0.5864 1.3409
1000 6 89.543ms 128.609ms 96.705ms 0.9259 1.3299
1000 7 111.508ms 135.902ms 95.390ms 1.1690 1.4247
1000 10 32.350ms 140.741ms 95.267ms 0.3396 1.4773
1000 16 32.215ms 146.372ms 97.310ms 0.3310 1.5042
1000 17 35.073ms 164.926ms 95.021ms 0.3691 1.7357
1000 21 31.754ms 168.394ms 95.829ms 0.3314 1.7572
1000 31 31.871ms 216.394ms 95.389ms 0.3341 2.2686
1000 37 32.698ms 207.868ms 95.953ms 0.3408 2.1663
1000 100 31.790ms 9.552ms 95.280ms 0.3336 0.1003
1000 1337 31.765ms 9.499ms 95.198ms 0.3337 0.0998
1000 2147483647 32.352ms 9.665ms 99.091ms 0.3265 0.0975
1337 2 33.769ms 116.467ms 100.303ms 0.3367 1.1611
1337 3 92.556ms 120.228ms 98.361ms 0.9410 1.2223
1337 4 60.878ms 120.081ms 95.196ms 0.6395 1.2614
1337 5 37.818ms 130.880ms 94.853ms 0.3987 1.3798
1337 6 89.164ms 127.111ms 95.115ms 0.9374 1.3364
1337 7 112.887ms 135.471ms 95.034ms 1.1879 1.4255
1337 10 62.967ms 139.590ms 95.238ms 0.6612 1.4657
1337 16 31.627ms 145.243ms 95.028ms 0.3328 1.5284
1337 17 31.559ms 160.692ms 95.809ms 0.3294 1.6772
1337 21 32.512ms 165.445ms 94.918ms 0.3425 1.7430
1337 31 32.202ms 227.296ms 95.301ms 0.3379 2.3850
1337 37 31.480ms 204.461ms 95.780ms 0.3287 2.1347
1337 100 31.555ms 9.431ms 94.898ms 0.3325 0.0994
1337 1337 31.882ms 9.497ms 95.331ms 0.3344 0.0996
1337 2147483647 31.658ms 9.537ms 102.058ms 0.3102 0.0934
647214625 2 33.205ms 111.544ms 98.062ms 0.3386 1.1375
647214625 3 95.494ms 115.745ms 95.637ms 0.9985 1.2102
647214625 4 170.408ms 120.192ms 95.333ms 1.7875 1.2607
647214625 5 89.517ms 129.382ms 95.393ms 0.9384 1.3563
647214625 6 89.786ms 128.036ms 95.184ms 0.9433 1.3451
647214625 7 220.023ms 136.131ms 95.122ms 2.3131 1.4311
647214625 10 120.601ms 140.422ms 95.706ms 1.2601 1.4672
647214625 16 104.728ms 147.315ms 95.106ms 1.1012 1.5490
647214625 17 79.959ms 159.419ms 95.406ms 0.8381 1.6710
647214625 21 153.919ms 165.598ms 95.005ms 1.6201 1.7430
647214625 31 31.620ms 218.622ms 94.989ms 0.3329 2.3016
647214625 37 32.194ms 204.035ms 112.118ms 0.2871 1.8198
647214625 100 33.473ms 9.446ms 94.736ms 0.3533 0.0997
647214625 1337 31.968ms 9.580ms 95.165ms 0.3359 0.1007
647214625 2147483647 31.533ms 9.398ms 94.672ms 0.3331 0.0993
2147483647 2 31.976ms 111.691ms 94.817ms 0.3372 1.1780
2147483647 3 88.660ms 115.958ms 94.841ms 0.9348 1.2227
2147483647 4 96.857ms 123.800ms 94.905ms 1.0206 1.3045
2147483647 5 85.732ms 138.737ms 104.552ms 0.8200 1.3270
2147483647 6 118.432ms 129.164ms 95.823ms 1.2359 1.3480
2147483647 7 156.424ms 136.056ms 95.798ms 1.6329 1.4202
2147483647 10 63.551ms 140.853ms 95.527ms 0.6653 1.4745
2147483647 16 127.349ms 147.782ms 95.943ms 1.3274 1.5403
2147483647 17 80.504ms 158.583ms 95.454ms 0.8434 1.6613
2147483647 21 155.300ms 169.390ms 97.667ms 1.5901 1.7344
2147483647 31 33.115ms 240.467ms 98.207ms 0.3372 2.4486
2147483647 37 32.180ms 206.674ms 95.793ms 0.3359 2.1575
2147483647 100 31.740ms 9.455ms 96.297ms 0.3296 0.0982
2147483647 1337 32.500ms 9.481ms 95.445ms 0.3405 0.0993
2147483647 2147483647 32.043ms 9.628ms 95.644ms 0.3350 0.1007
15455138536657945190 2 32.700ms 106.654ms 95.626ms 0.3420 1.1153
15455138536657945190 3 71.428ms 117.061ms 95.258ms 0.7498 1.2289
15455138536657945190 4 108.186ms 120.716ms 95.556ms 1.1322 1.2633
15455138536657945190 5 113.673ms 128.073ms 97.705ms 1.1634 1.3108
15455138536657945190 6 125.940ms 129.480ms 95.741ms 1.3154 1.3524
15455138536657945190 7 118.654ms 135.132ms 95.083ms 1.2479 1.4212
15455138536657945190 10 262.309ms 139.990ms 95.068ms 2.7592 1.4725
15455138536657945190 16 310.300ms 138.092ms 95.787ms 3.2395 1.4417
15455138536657945190 17 122.096ms 156.655ms 94.688ms 1.2895 1.6545
15455138536657945190 21 125.180ms 159.813ms 94.778ms 1.3208 1.6862
15455138536657945190 31 141.707ms 217.722ms 94.989ms 1.4918 2.2921
15455138536657945190 37 331.158ms 189.907ms 94.741ms 3.4954 2.0045
15455138536657945190 100 31.642ms 9.514ms 94.872ms 0.3335 0.1003
15455138536657945190 1337 31.628ms 9.584ms 94.736ms 0.3339 0.1012
15455138536657945190 2147483647 31.958ms 9.551ms 95.911ms 0.3332 0.0996
4294967296 2 32.759ms 69.874ms 40.569ms 0.8075 1.7224
4294967296 3 71.506ms 77.417ms 40.091ms 1.7836 1.9311
4294967296 4 139.298ms 79.647ms 39.898ms 3.4914 1.9962
4294967296 5 112.949ms 84.565ms 39.899ms 2.8309 2.1195
4294967296 6 116.830ms 84.042ms 39.741ms 2.9398 2.1147
4294967296 7 117.905ms 94.779ms 39.769ms 2.9648 2.3833
4294967296 10 122.653ms 99.448ms 40.068ms 3.0611 2.4820
4294967296 16 131.148ms 100.225ms 39.839ms 3.2920 2.5158
4294967296 17 549.720ms 109.678ms 39.911ms 13.7736 2.7480
4294967296 21 2.803s 119.426ms 39.772ms 70.4666 3.0028
4294967296 31 142.771ms 162.972ms 40.097ms 3.5606 4.0644
4294967296 37 32.008ms 150.594ms 40.491ms 0.7905 3.7193
4294967296 100 31.930ms 9.501ms 40.650ms 0.7855 0.2337
4294967296 1337 31.682ms 9.529ms 40.691ms 0.7786 0.2342
4294967296 2147483647 31.619ms 9.461ms 41.016ms 0.7709 0.2307
18446744073709551615 2 33.018ms 50.458ms 41.371ms 0.7981 1.2196
18446744073709551615 3 71.564ms 76.037ms 40.018ms 1.7883 1.9000
18446744073709551615 4 191.143ms 55.139ms 41.128ms 4.6475 1.3406
18446744073709551615 5 114.623ms 82.947ms 40.838ms 2.8068 2.0311
18446744073709551615 6 117.379ms 82.176ms 39.851ms 2.9455 2.0621
18446744073709551615 7 117.773ms 92.182ms 39.787ms 2.9602 2.3169
18446744073709551615 10 122.511ms 94.915ms 39.749ms 3.0821 2.3879
18446744073709551615 16 310.646ms 69.490ms 41.485ms 7.4881 1.6751
18446744073709551615 17 122.712ms 105.749ms 39.763ms 3.0862 2.6595
18446744073709551615 21 125.661ms 112.856ms 39.677ms 3.1672 2.8444
18446744073709551615 31 141.998ms 159.805ms 40.196ms 3.5326 3.9756
18446744073709551615 37 392.561ms 125.350ms 39.695ms 9.8897 3.1579
18446744073709551615 100 31.561ms 9.454ms 40.648ms 0.7765 0.2326
18446744073709551615 1337 31.580ms 9.468ms 40.631ms 0.7772 0.2330
18446744073709551615 2147483647 31.602ms 9.471ms 40.643ms 0.7776 0.2330

@freesig
Copy link
Contributor

freesig commented Nov 2, 2022

Hey @Dentosal any idea what the main usage of time is on your benchmarks? Would be interesting to get some flamegraphs to compare the two.
You can easily create them with cargo-flamegraph

@Dentosal
Copy link
Member Author

Dentosal commented Nov 2, 2022

I can look into it, but flamegraphs are not going to help here, since the work is completely CPU-bound and inlined into a single function, <u64 as num_integer::roots::Roots>::nth_root::go. That in turn uses Newton's method, which might take a long time to converge on some inputs. The problematic code is this closure, which is iterated way too many times. This is due to Newton's method converging quite slowly here. My code doesn't suffer from the same issue, as it only uses more precise (albeit a bit slower) floating point guess, and then simply looks into the three adjacent integer values to determinate the correct one.

@Dentosal
Copy link
Member Author

Dentosal commented Nov 2, 2022

My older implementation had a bug that it would produce incorrect values for some weird cases with big numbers like nroot(15455138536657945190, 2) and nroot(11875230360893570326, 27). These have now been fixed and updated in the earlier comment along with the benchmark.

Unfortunately I'm unable to prove that the error is always small enough that my code is correct. However, I'm rather confident that it is. I have ran following tests for it, verifying it against the num_integer crate:

  • For all cases where 2 <= target < 232 and 2 <= root < 64
  • For all cases where 264-100_000_000 <= target < 264 and 2 <= root < 64
  • For 234 randomly selected values 2 <= target < 264 and 2 <= root < 64
  • For 235 randomly selected values 2 <= target < 264 and root == 2
  • For 235 randomly selected values 2 <= target < 264 and root == 3

Also ran this through with my mathematician friend who assured me that the error in the initial guess will never exceed one to either direction. I verified the central points about that myself as well. Most importantly, 1024 is the largest absolute error when representing integers below 264 with f64, and in those cases the error will be about 0.00005, which is several magnitudes on the safe area.

@mitchmindtree mitchmindtree added the fuel-vm Related to the `fuel-vm` crate. label Dec 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working fuel-vm Related to the `fuel-vm` crate.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants