MROO has float rounding issues #260

Dentosal · 2022-11-01T18:39:01Z

Similarly to the MLOG issues discussed in #150, MROO also has floating point rounding issues. This is the only location of the vm still using floats.

For instance, calling MROO with values mroo(647214625, 3), i.e. cube root of 647214625, gives 864.9999999999997 in floating point, while 865**3 == 647214625.

I'll investigate the performance impact of handling this with integer-only operations.

The text was updated successfully, but these errors were encountered:

Dentosal · 2022-11-01T22:33:45Z

I benchmarked the generally fast num-integer::Roots :: nth-root and my error correcting floating point solution seen below.

While the num-integer is a bit faster on the average case compared to the current floating point solution, it has some really bad edge cases. Benchmarks found over 70x runtime cases and I suspect even 100x would be possible. See mroo(4294967296, 21) for an example of this.

The custom implementation seen below performs acceptably. For non-interesting inputs it's four times faster. Even in the worst case it only takes around three times longer than the current version.

Custom error correction

fn custom(target: u64, nth_root: u32) -> u64 {
    if nth_root == 0 {
        panic!();
    }

    if nth_root == 1  || target <= 1 {
        return target;
    }

    if nth_root > 64 {
        // for any n>1, n**64 can never fit into u64
        return 1;
    }

    if nth_root as u64  >= target {
        return 1;
    }

    // Use floating point operation to get a guess for the starting point.
    // This is at most off by one in either direction.
    let guess = (target as f64).powf((nth_root as f64).recip()) as u64;

    let Some(g1) = guess.checked_pow(nth_root) else {
        // guess**nth_root >= 2**64 and target < 2**64
        return guess - 1;
    };

    if target < g1 {
        return guess - 1;
    }

    let Some(g2) = (guess + 1).checked_pow(nth_root) else {
        // (guess + 1)**nth_root >= 2**64 and target < 2**64
        return guess;
    };

    if target < g2 {
        return guess;
    }

    guess + 1
}

Results

cargo run --release

Exponent	Base	iroot timing	custom-impl timing	float timing	iroot/float timing	custom/float timing
2	2	65.411ms	12.447ms	43.351ms	1.5089	0.2871
2	3	50.987ms	9.603ms	40.642ms	1.2545	0.2363
2	4	31.782ms	9.625ms	40.596ms	0.7829	0.2371
2	5	31.819ms	9.543ms	41.127ms	0.7737	0.2320
2	6	32.152ms	9.514ms	40.799ms	0.7881	0.2332
2	7	31.998ms	9.581ms	40.749ms	0.7852	0.2351
2	10	32.050ms	9.550ms	40.917ms	0.7833	0.2334
2	16	32.021ms	9.549ms	41.060ms	0.7799	0.2326
2	17	31.868ms	9.539ms	40.580ms	0.7853	0.2350
2	21	32.166ms	9.596ms	40.457ms	0.7951	0.2372
2	31	31.893ms	9.568ms	40.753ms	0.7826	0.2348
2	37	31.986ms	9.571ms	40.599ms	0.7878	0.2357
2	100	32.059ms	9.650ms	40.677ms	0.7881	0.2372
2	1337	31.894ms	9.535ms	40.591ms	0.7857	0.2349
2	2147483647	32.003ms	9.560ms	40.572ms	0.7888	0.2356
3	2	32.490ms	111.668ms	94.916ms	0.3423	1.1765
3	3	51.720ms	9.396ms	94.583ms	0.5468	0.0993
3	4	32.271ms	9.529ms	94.853ms	0.3402	0.1005
3	5	32.049ms	9.637ms	94.749ms	0.3382	0.1017
3	6	32.070ms	9.555ms	95.109ms	0.3372	0.1005
3	7	32.072ms	9.550ms	94.994ms	0.3376	0.1005
3	10	32.077ms	9.605ms	94.939ms	0.3379	0.1012
3	16	32.221ms	9.612ms	95.079ms	0.3389	0.1011
3	17	32.120ms	9.551ms	95.294ms	0.3371	0.1002
3	21	32.081ms	9.621ms	95.250ms	0.3368	0.1010
3	31	32.079ms	9.549ms	94.128ms	0.3408	0.1014
3	37	31.550ms	9.364ms	93.098ms	0.3389	0.1006
3	100	31.516ms	9.386ms	93.070ms	0.3386	0.1008
3	1337	31.479ms	9.368ms	93.109ms	0.3381	0.1006
3	2147483647	31.482ms	9.392ms	94.783ms	0.3321	0.0991
4	2	32.536ms	69.378ms	39.784ms	0.8178	1.7439
4	3	51.738ms	76.600ms	40.140ms	1.2889	1.9083
4	4	32.051ms	9.660ms	40.634ms	0.7888	0.2377
4	5	31.993ms	9.395ms	40.075ms	0.7983	0.2344
4	6	31.509ms	9.368ms	40.199ms	0.7838	0.2330
4	7	31.544ms	9.411ms	40.366ms	0.7814	0.2331
4	10	31.454ms	9.383ms	40.494ms	0.7768	0.2317
4	16	31.456ms	9.411ms	40.220ms	0.7821	0.2340
4	17	31.481ms	9.478ms	40.247ms	0.7822	0.2355
4	21	31.612ms	9.430ms	41.064ms	0.7698	0.2296
4	31	32.071ms	9.584ms	40.760ms	0.7868	0.2351
4	37	32.048ms	9.581ms	40.578ms	0.7898	0.2361
4	100	32.046ms	9.592ms	40.716ms	0.7871	0.2356
4	1337	32.101ms	9.595ms	40.634ms	0.7900	0.2361
4	2147483647	32.521ms	9.547ms	40.682ms	0.7994	0.2346
5	2	32.687ms	111.170ms	94.904ms	0.3444	1.1714
5	3	51.874ms	115.921ms	94.890ms	0.5467	1.2216
5	4	32.072ms	119.596ms	94.834ms	0.3382	1.2611
5	5	32.054ms	9.547ms	95.273ms	0.3364	0.1002
5	6	32.042ms	9.611ms	95.277ms	0.3363	0.1009
5	7	31.993ms	9.416ms	93.188ms	0.3433	0.1010
5	10	31.468ms	9.441ms	93.076ms	0.3381	0.1014
5	16	31.460ms	9.387ms	93.112ms	0.3379	0.1008
5	17	31.484ms	9.371ms	93.371ms	0.3372	0.1004
5	21	31.521ms	9.431ms	93.633ms	0.3366	0.1007
5	31	31.483ms	9.371ms	93.054ms	0.3383	0.1007
5	37	31.522ms	9.371ms	93.816ms	0.3360	0.0999
5	100	31.450ms	9.454ms	93.549ms	0.3362	0.1010
5	1337	31.508ms	9.388ms	94.626ms	0.3330	0.0992
5	2147483647	31.450ms	9.413ms	93.090ms	0.3378	0.1011
6	2	31.904ms	109.369ms	94.308ms	0.3383	1.1597
6	3	51.444ms	115.932ms	94.855ms	0.5423	1.2222
6	4	32.056ms	120.159ms	95.019ms	0.3374	1.2646
6	5	32.064ms	127.206ms	94.856ms	0.3380	1.3410
6	6	32.085ms	9.583ms	94.879ms	0.3382	0.1010
6	7	32.118ms	9.486ms	94.873ms	0.3385	0.1000
6	10	31.509ms	9.368ms	94.286ms	0.3342	0.0994
6	16	31.459ms	9.371ms	94.691ms	0.3322	0.0990
6	17	31.474ms	9.368ms	95.080ms	0.3310	0.0985
6	21	31.722ms	9.623ms	94.918ms	0.3342	0.1014
6	31	32.072ms	9.594ms	94.836ms	0.3382	0.1012
6	37	32.052ms	9.552ms	94.848ms	0.3379	0.1007
6	100	32.093ms	9.627ms	94.630ms	0.3391	0.1017
6	1337	31.974ms	9.545ms	94.868ms	0.3370	0.1006
6	2147483647	32.086ms	9.587ms	94.860ms	0.3382	0.1011
7	2	32.551ms	111.668ms	94.867ms	0.3431	1.1771
7	3	50.970ms	114.537ms	94.854ms	0.5374	1.2075
7	4	31.466ms	118.968ms	94.651ms	0.3324	1.2569
7	5	32.108ms	127.190ms	94.907ms	0.3383	1.3402
7	6	32.045ms	127.552ms	95.462ms	0.3357	1.3361
7	7	32.057ms	9.555ms	94.834ms	0.3380	0.1007
7	10	32.034ms	9.635ms	94.781ms	0.3380	0.1017
7	16	32.040ms	9.580ms	94.777ms	0.3381	0.1011
7	17	32.031ms	9.371ms	93.129ms	0.3439	0.1006
7	21	31.458ms	9.368ms	93.419ms	0.3367	0.1003
7	31	31.473ms	9.433ms	93.994ms	0.3348	0.1004
7	37	31.455ms	9.419ms	94.042ms	0.3345	0.1002
7	100	32.106ms	9.553ms	94.924ms	0.3382	0.1006
7	1337	32.053ms	9.562ms	95.115ms	0.3370	0.1005
7	2147483647	32.046ms	9.555ms	94.897ms	0.3377	0.1007
16	2	32.533ms	69.067ms	39.752ms	0.8184	1.7374
16	3	81.768ms	76.986ms	39.734ms	2.0579	1.9375
16	4	38.195ms	78.748ms	39.744ms	0.9610	1.9814
16	5	32.039ms	83.569ms	39.576ms	0.8096	2.1117
16	6	31.519ms	82.286ms	39.734ms	0.7933	2.0709
16	7	31.529ms	93.712ms	39.616ms	0.7959	2.3656
16	10	31.503ms	97.828ms	39.631ms	0.7949	2.4684
16	16	31.565ms	9.425ms	40.532ms	0.7788	0.2325
16	17	31.506ms	9.449ms	40.710ms	0.7739	0.2321
16	21	31.506ms	9.366ms	40.442ms	0.7790	0.2316
16	31	31.476ms	9.368ms	40.578ms	0.7757	0.2308
16	37	31.532ms	9.368ms	40.557ms	0.7775	0.2310
16	100	32.085ms	9.462ms	40.546ms	0.7913	0.2333
16	1337	31.565ms	9.378ms	40.700ms	0.7755	0.2304
16	2147483647	31.542ms	9.432ms	40.520ms	0.7784	0.2328
100	2	31.944ms	111.340ms	94.742ms	0.3372	1.1752
100	3	79.944ms	113.787ms	93.835ms	0.8520	1.2126
100	4	59.791ms	117.456ms	94.512ms	0.6326	1.2428
100	5	83.966ms	126.364ms	95.357ms	0.8805	1.3252
100	6	47.711ms	128.195ms	94.676ms	0.5039	1.3540
100	7	31.655ms	133.375ms	94.280ms	0.3358	1.4147
100	10	31.479ms	138.762ms	95.404ms	0.3299	1.4545
100	16	32.205ms	147.394ms	95.396ms	0.3376	1.5451
100	17	32.519ms	159.157ms	94.638ms	0.3436	1.6818
100	21	31.600ms	165.064ms	94.711ms	0.3336	1.7428
100	31	32.103ms	217.815ms	95.446ms	0.3363	2.2821
100	37	32.174ms	206.750ms	95.241ms	0.3378	2.1708
100	100	32.272ms	9.603ms	95.411ms	0.3382	0.1006
100	1337	32.248ms	9.716ms	95.322ms	0.3383	0.1019
100	2147483647	32.180ms	9.613ms	95.482ms	0.3370	0.1007
1000	2	32.179ms	111.959ms	95.075ms	0.3385	1.1776
1000	3	85.151ms	130.544ms	94.905ms	0.8972	1.3755
1000	4	92.154ms	120.636ms	95.236ms	0.9676	1.2667
1000	5	55.675ms	127.301ms	94.939ms	0.5864	1.3409
1000	6	89.543ms	128.609ms	96.705ms	0.9259	1.3299
1000	7	111.508ms	135.902ms	95.390ms	1.1690	1.4247
1000	10	32.350ms	140.741ms	95.267ms	0.3396	1.4773
1000	16	32.215ms	146.372ms	97.310ms	0.3310	1.5042
1000	17	35.073ms	164.926ms	95.021ms	0.3691	1.7357
1000	21	31.754ms	168.394ms	95.829ms	0.3314	1.7572
1000	31	31.871ms	216.394ms	95.389ms	0.3341	2.2686
1000	37	32.698ms	207.868ms	95.953ms	0.3408	2.1663
1000	100	31.790ms	9.552ms	95.280ms	0.3336	0.1003
1000	1337	31.765ms	9.499ms	95.198ms	0.3337	0.0998
1000	2147483647	32.352ms	9.665ms	99.091ms	0.3265	0.0975
1337	2	33.769ms	116.467ms	100.303ms	0.3367	1.1611
1337	3	92.556ms	120.228ms	98.361ms	0.9410	1.2223
1337	4	60.878ms	120.081ms	95.196ms	0.6395	1.2614
1337	5	37.818ms	130.880ms	94.853ms	0.3987	1.3798
1337	6	89.164ms	127.111ms	95.115ms	0.9374	1.3364
1337	7	112.887ms	135.471ms	95.034ms	1.1879	1.4255
1337	10	62.967ms	139.590ms	95.238ms	0.6612	1.4657
1337	16	31.627ms	145.243ms	95.028ms	0.3328	1.5284
1337	17	31.559ms	160.692ms	95.809ms	0.3294	1.6772
1337	21	32.512ms	165.445ms	94.918ms	0.3425	1.7430
1337	31	32.202ms	227.296ms	95.301ms	0.3379	2.3850
1337	37	31.480ms	204.461ms	95.780ms	0.3287	2.1347
1337	100	31.555ms	9.431ms	94.898ms	0.3325	0.0994
1337	1337	31.882ms	9.497ms	95.331ms	0.3344	0.0996
1337	2147483647	31.658ms	9.537ms	102.058ms	0.3102	0.0934
647214625	2	33.205ms	111.544ms	98.062ms	0.3386	1.1375
647214625	3	95.494ms	115.745ms	95.637ms	0.9985	1.2102
647214625	4	170.408ms	120.192ms	95.333ms	1.7875	1.2607
647214625	5	89.517ms	129.382ms	95.393ms	0.9384	1.3563
647214625	6	89.786ms	128.036ms	95.184ms	0.9433	1.3451
647214625	7	220.023ms	136.131ms	95.122ms	2.3131	1.4311
647214625	10	120.601ms	140.422ms	95.706ms	1.2601	1.4672
647214625	16	104.728ms	147.315ms	95.106ms	1.1012	1.5490
647214625	17	79.959ms	159.419ms	95.406ms	0.8381	1.6710
647214625	21	153.919ms	165.598ms	95.005ms	1.6201	1.7430
647214625	31	31.620ms	218.622ms	94.989ms	0.3329	2.3016
647214625	37	32.194ms	204.035ms	112.118ms	0.2871	1.8198
647214625	100	33.473ms	9.446ms	94.736ms	0.3533	0.0997
647214625	1337	31.968ms	9.580ms	95.165ms	0.3359	0.1007
647214625	2147483647	31.533ms	9.398ms	94.672ms	0.3331	0.0993
2147483647	2	31.976ms	111.691ms	94.817ms	0.3372	1.1780
2147483647	3	88.660ms	115.958ms	94.841ms	0.9348	1.2227
2147483647	4	96.857ms	123.800ms	94.905ms	1.0206	1.3045
2147483647	5	85.732ms	138.737ms	104.552ms	0.8200	1.3270
2147483647	6	118.432ms	129.164ms	95.823ms	1.2359	1.3480
2147483647	7	156.424ms	136.056ms	95.798ms	1.6329	1.4202
2147483647	10	63.551ms	140.853ms	95.527ms	0.6653	1.4745
2147483647	16	127.349ms	147.782ms	95.943ms	1.3274	1.5403
2147483647	17	80.504ms	158.583ms	95.454ms	0.8434	1.6613
2147483647	21	155.300ms	169.390ms	97.667ms	1.5901	1.7344
2147483647	31	33.115ms	240.467ms	98.207ms	0.3372	2.4486
2147483647	37	32.180ms	206.674ms	95.793ms	0.3359	2.1575
2147483647	100	31.740ms	9.455ms	96.297ms	0.3296	0.0982
2147483647	1337	32.500ms	9.481ms	95.445ms	0.3405	0.0993
2147483647	2147483647	32.043ms	9.628ms	95.644ms	0.3350	0.1007
15455138536657945190	2	32.700ms	106.654ms	95.626ms	0.3420	1.1153
15455138536657945190	3	71.428ms	117.061ms	95.258ms	0.7498	1.2289
15455138536657945190	4	108.186ms	120.716ms	95.556ms	1.1322	1.2633
15455138536657945190	5	113.673ms	128.073ms	97.705ms	1.1634	1.3108
15455138536657945190	6	125.940ms	129.480ms	95.741ms	1.3154	1.3524
15455138536657945190	7	118.654ms	135.132ms	95.083ms	1.2479	1.4212
15455138536657945190	10	262.309ms	139.990ms	95.068ms	2.7592	1.4725
15455138536657945190	16	310.300ms	138.092ms	95.787ms	3.2395	1.4417
15455138536657945190	17	122.096ms	156.655ms	94.688ms	1.2895	1.6545
15455138536657945190	21	125.180ms	159.813ms	94.778ms	1.3208	1.6862
15455138536657945190	31	141.707ms	217.722ms	94.989ms	1.4918	2.2921
15455138536657945190	37	331.158ms	189.907ms	94.741ms	3.4954	2.0045
15455138536657945190	100	31.642ms	9.514ms	94.872ms	0.3335	0.1003
15455138536657945190	1337	31.628ms	9.584ms	94.736ms	0.3339	0.1012
15455138536657945190	2147483647	31.958ms	9.551ms	95.911ms	0.3332	0.0996
4294967296	2	32.759ms	69.874ms	40.569ms	0.8075	1.7224
4294967296	3	71.506ms	77.417ms	40.091ms	1.7836	1.9311
4294967296	4	139.298ms	79.647ms	39.898ms	3.4914	1.9962
4294967296	5	112.949ms	84.565ms	39.899ms	2.8309	2.1195
4294967296	6	116.830ms	84.042ms	39.741ms	2.9398	2.1147
4294967296	7	117.905ms	94.779ms	39.769ms	2.9648	2.3833
4294967296	10	122.653ms	99.448ms	40.068ms	3.0611	2.4820
4294967296	16	131.148ms	100.225ms	39.839ms	3.2920	2.5158
4294967296	17	549.720ms	109.678ms	39.911ms	13.7736	2.7480
4294967296	21	2.803s	119.426ms	39.772ms	70.4666	3.0028
4294967296	31	142.771ms	162.972ms	40.097ms	3.5606	4.0644
4294967296	37	32.008ms	150.594ms	40.491ms	0.7905	3.7193
4294967296	100	31.930ms	9.501ms	40.650ms	0.7855	0.2337
4294967296	1337	31.682ms	9.529ms	40.691ms	0.7786	0.2342
4294967296	2147483647	31.619ms	9.461ms	41.016ms	0.7709	0.2307
18446744073709551615	2	33.018ms	50.458ms	41.371ms	0.7981	1.2196
18446744073709551615	3	71.564ms	76.037ms	40.018ms	1.7883	1.9000
18446744073709551615	4	191.143ms	55.139ms	41.128ms	4.6475	1.3406
18446744073709551615	5	114.623ms	82.947ms	40.838ms	2.8068	2.0311
18446744073709551615	6	117.379ms	82.176ms	39.851ms	2.9455	2.0621
18446744073709551615	7	117.773ms	92.182ms	39.787ms	2.9602	2.3169
18446744073709551615	10	122.511ms	94.915ms	39.749ms	3.0821	2.3879
18446744073709551615	16	310.646ms	69.490ms	41.485ms	7.4881	1.6751
18446744073709551615	17	122.712ms	105.749ms	39.763ms	3.0862	2.6595
18446744073709551615	21	125.661ms	112.856ms	39.677ms	3.1672	2.8444
18446744073709551615	31	141.998ms	159.805ms	40.196ms	3.5326	3.9756
18446744073709551615	37	392.561ms	125.350ms	39.695ms	9.8897	3.1579
18446744073709551615	100	31.561ms	9.454ms	40.648ms	0.7765	0.2326
18446744073709551615	1337	31.580ms	9.468ms	40.631ms	0.7772	0.2330
18446744073709551615	2147483647	31.602ms	9.471ms	40.643ms	0.7776	0.2330

freesig · 2022-11-02T07:10:22Z

Hey @Dentosal any idea what the main usage of time is on your benchmarks? Would be interesting to get some flamegraphs to compare the two.
You can easily create them with cargo-flamegraph

Dentosal · 2022-11-02T18:11:33Z

I can look into it, but flamegraphs are not going to help here, since the work is completely CPU-bound and inlined into a single function, <u64 as num_integer::roots::Roots>::nth_root::go. That in turn uses Newton's method, which might take a long time to converge on some inputs. The problematic code is this closure, which is iterated way too many times. This is due to Newton's method converging quite slowly here. My code doesn't suffer from the same issue, as it only uses more precise (albeit a bit slower) floating point guess, and then simply looks into the three adjacent integer values to determinate the correct one.

Dentosal · 2022-11-02T23:40:23Z

My older implementation had a bug that it would produce incorrect values for some weird cases with big numbers like nroot(15455138536657945190, 2) and nroot(11875230360893570326, 27). These have now been fixed and updated in the earlier comment along with the benchmark.

Unfortunately I'm unable to prove that the error is always small enough that my code is correct. However, I'm rather confident that it is. I have ran following tests for it, verifying it against the num_integer crate:

For all cases where 2 <= target < 2³² and 2 <= root < 64
For all cases where 2⁶⁴-100_000_000 <= target < 2⁶⁴ and 2 <= root < 64
For 2³⁴ randomly selected values 2 <= target < 2⁶⁴ and 2 <= root < 64
For 2³⁵ randomly selected values 2 <= target < 2⁶⁴ and root == 2
For 2³⁵ randomly selected values 2 <= target < 2⁶⁴ and root == 3

Also ran this through with my mathematician friend who assured me that the error in the initial guess will never exceed one to either direction. I verified the central points about that myself as well. Most importantly, 1024 is the largest absolute error when representing integers below 2⁶⁴ with f64, and in those cases the error will be about 0.00005, which is several magnitudes on the safe area.

Dentosal added the bug Something isn't working label Nov 1, 2022

Dentosal self-assigned this Nov 1, 2022

Dentosal mentioned this issue Nov 3, 2022

Fix MROO rounding issues #262

Merged

Dentosal closed this as completed in #262 Nov 4, 2022

mitchmindtree added the fuel-vm Related to the `fuel-vm` crate. label Dec 9, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MROO has float rounding issues #260

MROO has float rounding issues #260

Dentosal commented Nov 1, 2022

Dentosal commented Nov 1, 2022 •

edited

Loading

freesig commented Nov 2, 2022

Dentosal commented Nov 2, 2022 •

edited

Loading

Dentosal commented Nov 2, 2022 •

edited

Loading

MROO has float rounding issues #260

MROO has float rounding issues #260

Comments

Dentosal commented Nov 1, 2022

Dentosal commented Nov 1, 2022 • edited Loading

Custom error correction

Results

freesig commented Nov 2, 2022

Dentosal commented Nov 2, 2022 • edited Loading

Dentosal commented Nov 2, 2022 • edited Loading

Dentosal commented Nov 1, 2022 •

edited

Loading

Dentosal commented Nov 2, 2022 •

edited

Loading

Dentosal commented Nov 2, 2022 •

edited

Loading