-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consider integration in core and std ? #15
Comments
@Urgau I remember this has been already discussed somewhere in rustc GH issues and as I was searching for it, I noticed a related question has been raised again just yesterday - see rust-lang/rust#85198 for further details. One of concerns to keep in mind: binary size; both dec2flt and the algo used here use lookup tables, but dec2flt will be a bit leaner, I believe. |
Yeah I saw this issue, it's the primary reason why I search crate that do really fast float parsing and ask you that question.
I know that for certain target, like embedded object, binary size is a big deal but for vast majority of users who use Rust on computers, size is not really a big deal. I think that most people will accept +10/20 kio (just a guess not a actual number) increase for blazing performance improvement in float parsing and for size constraint environment a If binary size is the only major concern I think you should consider making a pull request to the rust repository. |
I think that what @Urgau wrote makes sense. The binary size can be important, but that's something one can tweak depending on one's needs. |
@aldanor I'm working on gradual integrations of faster float-parsing algorithms into dec2flt, although obviously, faster algorithms would be preferable if applicable (the benchmarks for this look really promising). As detailed in this internals post, I'm looking to fix in the following order:
My goal is gradual changes, since obviously, it's much easier to justify these changes with minimal differences to binary size but major overall performance improvements. Especially since @hanna-kruppe, the author of the float parsing algorithms, has retired from Rust development, meaning people reviewing these changes are less familiar with the codebase or float-parsing algorithms in general. Any feedback however would be greatly appreciated. EDIT In conversations a few years ago with @hanna-kruppe, it was very clear that both binary size was important, and there was at the time a tepid response to moving away from |
I should add, I've got a working repository for implementing these changes into Rust's core library here, and the different branches provide improvements to different components of dec2flt.
I'll be working on slow-path algorithms later, and my goal is a quick but comprehensive fix of |
One point to consider is that the current library can effectively act as a fast path. That is, there is no need to throw away anything but maybe an existing fast path. So it can be non-invasive (and also, easily made optional). In the Go runtime, the equivalent of the code in this repo. is running. All they did was to remove an existing fast path and replace it with a new one. It is also relatively thin. Here is the gist of the code... pub fn compute_float<F: Float>(q: i64, mut w: u64) -> AdjustedMantissa {
let am_zero = AdjustedMantissa::zero_pow2(0);
let am_inf = AdjustedMantissa::zero_pow2(F::INFINITE_POWER);
let am_error = AdjustedMantissa::zero_pow2(-1);
if w == 0 || q < F::SMALLEST_POWER_OF_TEN as i64 {
return am_zero;
} else if q > F::LARGEST_POWER_OF_TEN as i64 {
return am_inf;
}
let lz = w.leading_zeros();
w <<= lz;
let (lo, hi) = compute_product_approx(q, w, F::MANTISSA_EXPLICIT_BITS + 3);
if lo == 0xFFFF_FFFF_FFFF_FFFF {
let inside_safe_exponent = (q >= -27) && (q <= 55);
if !inside_safe_exponent {
return am_error;
}
}
let upperbit = (hi >> 63) as i32;
let mut mantissa = hi >> (upperbit + 64 - F::MANTISSA_EXPLICIT_BITS as i32 - 3);
let mut power2 = power(q as i32) + upperbit - lz as i32 - F::MINIMUM_EXPONENT;
if power2 <= 0 {
if -power2 + 1 >= 64 {
return am_zero;
}
mantissa >>= -power2 + 1;
mantissa += mantissa & 1;
mantissa >>= 1;
power2 = (mantissa >= (1_u64 << F::MANTISSA_EXPLICIT_BITS)) as i32;
return AdjustedMantissa { mantissa, power2 };
}
if lo <= 1
&& q >= F::MIN_EXPONENT_ROUND_TO_EVEN as i64
&& q <= F::MAX_EXPONENT_ROUND_TO_EVEN as i64
&& mantissa & 3 == 1
&& (mantissa << (upperbit + 64 - F::MANTISSA_EXPLICIT_BITS as i32 - 3)) == hi
{
mantissa &= !1_u64;
}
mantissa += mantissa & 1;
mantissa >>= 1;
if mantissa >= (2_u64 << F::MANTISSA_EXPLICIT_BITS) {
mantissa = 1_u64 << F::MANTISSA_EXPLICIT_BITS;
power2 += 1;
}
mantissa &= !(1_u64 << F::MANTISSA_EXPLICIT_BITS);
if power2 >= F::INFINITE_POWER {
return am_inf;
}
AdjustedMantissa { mantissa, power2 }
} |
@lemire I've mostly been using the term "moderate" path for that, to disambiguate from the term "fast path" used in dec2flt. Since the above uses an extended-precision representation of the significant digits of the float, it would replace the Bellerophon algorithm, which is used if |
Oh I should probably add: Rust's dec2flt doesn't parse the first 19-20 significant digits of the float except in the fast path, which is a major performance issue and trivial to fix (and then your algorithm could then be implemented into Rust's core library). The fast path just sees if there is less than 16-digits in the float's integral and fractional components, and then checks if these significant digits can be exactly represented as a native float. If not, it defaults to creating a big integer from the significant digits, and then calls it's unoptimized Bellerophon algorithm, which can lead to extreme performance hits for no good reason. In addition, the subsequent algorithm_r ideally prefers a value of |
Right. You have Clinger's fast path which is very cheap and handles things like (small) integers and so forth. Otherwise, what you want to do is to grab, say, the first 19 digits. Most of the time, numbers won't have more than 17 digits. If you have no more than 19 digits, then the code above is good enough 99.99999% of the time, and even if you have more than 19 digits, the truncation is good enough almost all of the time. See section 11... https://arxiv.org/pdf/2101.11408.pdf So it should be uncommon that you ever need to materialize a big integer. |
Yep. My modifications keep Clinger's algorithm (since I don't believe anyone currently on the Rust core team has worked on dec2flt before), but moves significant digit parsing outside of the fast path, and adds error calculation for truncated values, and only after that do we use their big-integer algorithms (which have major issues in nearly every facet, but that's a different story). Ideally, we'd remove it, and replace it with your code, but I'm trying minor, incremental improvements at first. Ideally, my goals right now are:
|
From an outside point of view: Wouldn't be it simpler and faster to simply replace the actual rust code with the code of fast-float ? As I see things, there is actual many faults in the rust std implementation, and you are proposing to do a 5 steps improvement (but not a total fix ?). I think that instead of trying to patch the std rust code to ideally be perfect we should take the fastest road and completely replace the actual suboptimal and broken (apparently) by a working one. In this case fast-float or another crate if it fits better the need of the std library. |
@Urgau Sure, if the maintainers are up for that. There's a few issues mentioned though, and that is:
If you can justify the changes, that would be ideal. I'm going to test the binary sizes, but this crate should be pretty lean (likely leaner than core), actually. Also, sure it's not a total fix, but it's an improvement of like 10,000x for some floats, which is not insignificant. |
Ok I've run the numbers and the binary sizes are practically identical, albeit slightly smaller. fast-float
core
If you can get the core team to approve of it, it would be a great addition to the core library. I'm still going to try to get incremental improvements implemented, for the reasons above (IMO, it's easier to justify), but ideally, this should be the end goal. You would likely, however, have to use the [bignum[(https://github.com/rust-lang/rust/blob/master/library/core/src/num/bignum.rs) implementation for the Decimal struct. As an FYI: I would recommend forking my dec2flt fork of Rust's |
Posted an question on the zulip stream for T-Libs.
That great news.
Thanks for the tips but I'm not sure I'm qualified to try doing the integration in std. In fact I barely understand how float are store internally. I will read the academic paper but I'm not sure it will be sufficient. |
@Alexhuszagh I have receive an response from Josh Triplett (who is co-lead of the Rust Libs Team). He said here:
So if you want try to do the pull request replacing the dec2flt with the fast-float code, go for it. There is however an legal concern also raised by Josh.
@aldanor @lemire Would you mind clarifying for Josh Triplett the situation here or directly on zulip concerning the licenses. Also I have being acting like your were okay for your implementation to be part of the rust standard library without never asking you if you were okay for that. So do you agree for your implementation to be part of the rust standard library ? |
I have zero ownership of anything. I am just an interested fellow. |
If anyone wants to chat about the algorithm, I am available (via Zoom). I did not write a single code of Rust for this project though. I am just around to help. |
I was more so responding to the author of the crate in this section, sorry if that wasn't clear. |
I'm also "just an interested fellow", in a way - as Daniel is the true owner of the algorithm, and I was porting the C++ implementation to Rust (with some minor implementational changes and updates here and there) while also having the paper by my side as a reference. If there are any license concerns - we can sure resolve them; I'm pretty sure all of us are interested first and foremost in this to succeed. We can either make the original library Apache2/MIT (which would require all contributors to sign off on that if I understand it correctly), or make this crate Apache2-only. Not a big deal. If Josh and the rest of the core team are happy to go ahead with this - should we give it a go? I'd be happy to have a separate zoom/zulip/whatever discussion if needed. Never had experience of contributing anything major to rustc myself, but as long as there's some guidance re: what needs to go where, I think I can try. |
@aldanor Rust is dual-licensed under both MIT/Apache-2.0, so I believe keeping the original license intact. I'm more than happy to have a Zoom call, if needed, or chat on Zulip. I've had a failed attempt (personal reasons forced me to take a step back) previously at integrating changes into the core library, but I'm tangentially familiar with the process. I would be more than happy to contribute any help I can. |
|
These doesn't seems to be allowed by the IEEE 754 standard either.
This was a bug on the implementation, that have been fix on nightly some weeks ago. See here and also here. |
Ah cool, then only 1). and 2). are necessary. |
Hmm.. fn main() {
for s in &[
"e10", "1.23", "1.23e", "1.", ".1", "1.23e5", ".", ".e1", "0e", "0e1",
] {
println!("{}", s);
println!(" std: {:?}", s.parse::<f64>());
println!(" ff: {:?}", fast_float::parse::<f64, _>(s));
}
} yields (as of the current version)
|
Nevermind, I missed this: |
@aldanor I have discuss with Josh and the only way forward with license is to have fast_float be dual license under MIT/APACHE. I have open a pull request in the fast_float repository to change the license.
Yep, you should give it a try. You should also watch the rustc-dev-guide, there are some very useful information about compiling rustc (shouldn't be necessary) and the standard library. The files that are relevant for you are under the library/core/src/num/dev2flt directories. From what I can see all of the files except |
+1 |
Looking at the commits for fast-float, only @biojppm and @kitaisreal made commits prior to this port being made, and so would likely be the only ones who would need to agree to re-license their code so an (older) version of fast-float could be dual licensed, and therefore this crate dual-licensed. We could ask permission from both of them. kitaisreal's contributions are trivial, so we could likely remove those changes if licensing becomes a major issue. EDIT: Their name is "kitaisreal". |
I have emailed @biojppm |
I've got on my own fork all the data tests from Rust's core library, and I'll be running them tonight on a little-endian (32-bit) and big-endian (32-bit) architecture, since those are the only cases that haven't yet been comprehensively tested IIRC. None of the logic should affect these cases, but it's nice to be extra sure. I'll be running this branch with all the unittests within for the following two architectures:
This is going to take a while, since these tests basically trigger the slow algorithm every time, but it shouldn't be too bad. |
@Alexhuszagh Very nice. |
A PR has been submitted. |
@Alexhuszagh @lemire I saw on zulip this PR rust-lang/rust#31407 about valid float being treated as invalid. Will the new algorithm fix this ? |
@Urgau Nice point. |
@Urgau That is actually addressed in the PR, in the introduction. Not the issue specifically, but the general idea:
// Issue in Rust's dec2fly.
"2.47032822920623272088284396434110686182e-324".parse::<f64>(); // Err(ParseFloatError { kind: Invalid }) So that's absolutely part of the motivation. |
Major updates on the PR:
The only major bottleneck right now seems to be they're waiting for a computer scientist who is familiar with float-parsing to comment on the PR. I think I know an expert.... |
@Alexhuszagh This is great. Thank you for creating the PR and taking it to the end. I just have a small suggestion, it might be nice to add/update the benchmarks comparing speed and code size between the current implementation and the PR one. PS: I just noticed that this PR rust-lang/rust#86048 is going to have conflicts with yours, you will certainly have to be rebased. |
I'll update that after most of the changes have been made: there's one more involving Miri that I'm finishing. I doubt the benchmarks have changed much, because I've ensured every change to ensure the safe routines don't change the resulting ASM. See this repository for all the changes. In fact, everything besides
Yeah, once there's conflicting changes, I'll be more than happy to rebase. However, as it's not yet merged, I'll wait. I'll likely do a rebase once the Miri issues are fixed as well. |
@Alexhuszagh This code looks great, let me know if there is anything specific that needs to be examined. |
Implementation is based off fast-float-rust, with a few notable changes. - Some unsafe methods have been removed. - Safe methods with inherently unsafe functionality have been removed. - All unsafe functionality is documented and provably safe. - Extensive documentation has been added for simpler maintenance. - Inline annotations on internal routines has been removed. - Fixed Python errors in src/etc/test-float-parse/runtests.py. - Updated test-float-parse to be a library, to avoid missing rand dependency. - Added regression tests for rust-lang#31109 and rust-lang#31407 in core tests. - Added regression tests for rust-lang#31109 and rust-lang#31407 in ui tests. - Use the existing slice primitive to simplify shared dec2flt methods - Remove Miri ignores from dec2flt, due to faster parsing times. - resolves rust-lang#85198 - resolves rust-lang#85214 - resolves rust-lang#85234 - fixes rust-lang#31407 - fixes rust-lang#31109 - fixes rust-lang#53015 - resolves rust-lang#68396 - closes aldanor/fast-float-rust#15
Update Rust Float-Parsing Algorithms to use the Eisel-Lemire algorithm. # Summary Rust, although it implements a correct float parser, has major performance issues in float parsing. Even for common floats, the performance can be 3-10x [slower](https://arxiv.org/pdf/2101.11408.pdf) than external libraries such as [lexical](https://github.com/Alexhuszagh/rust-lexical) and [fast-float-rust](https://github.com/aldanor/fast-float-rust). Recently, major advances in float-parsing algorithms have been developed by Daniel Lemire, along with others, and implement a fast, performant, and correct float parser, with speeds up to 1200 MiB/s on Apple's M1 architecture for the [canada](https://github.com/lemire/simple_fastfloat_benchmark/blob/0e2b5d163d4074cc0bde2acdaae78546d6e5c5f1/data/canada.txt) dataset, 10x faster than Rust's 130 MiB/s. In addition, [edge-cases](rust-lang#85234) in Rust's [dec2flt](https://github.com/rust-lang/rust/tree/868c702d0c9a471a28fb55f0148eb1e3e8b1dcc5/library/core/src/num/dec2flt) algorithm can lead to over a 1600x slowdown relative to efficient algorithms. This is due to the use of Clinger's correct, but slow [AlgorithmM and Bellepheron](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.45.4152&rep=rep1&type=pdf), which have been improved by faster big-integer algorithms and the Eisel-Lemire algorithm, respectively. Finally, this algorithm provides substantial improvements in the number of floats the Rust core library can parse. Denormal floats with a large number of digits cannot be parsed, due to use of the `Big32x40`, which simply does not have enough digits to round a float correctly. Using a custom decimal class, with much simpler logic, we can parse all valid decimal strings of any digit count. ```rust // Issue in Rust's dec2fly. "2.47032822920623272088284396434110686182e-324".parse::<f64>(); // Err(ParseFloatError { kind: Invalid }) ``` # Solution This pull request implements the Eisel-Lemire algorithm, modified from [fast-float-rust](https://github.com/aldanor/fast-float-rust) (which is licensed under Apache 2.0/MIT), along with numerous modifications to make it more amenable to inclusion in the Rust core library. The following describes both features in fast-float-rust and improvements in fast-float-rust for inclusion in core. **Documentation** Extensive documentation has been added to ensure the code base may be maintained by others, which explains the algorithms as well as various associated constants and routines. For example, two seemingly magical constants include documentation to describe how they were derived as follows: ```rust // Round-to-even only happens for negative values of q // when q ≥ −4 in the 64-bit case and when q ≥ −17 in // the 32-bitcase. // // When q ≥ 0,we have that 5^q ≤ 2m+1. In the 64-bit case,we // have 5^q ≤ 2m+1 ≤ 2^54 or q ≤ 23. In the 32-bit case,we have // 5^q ≤ 2m+1 ≤ 2^25 or q ≤ 10. // // When q < 0, we have w ≥ (2m+1)×5^−q. We must have that w < 2^64 // so (2m+1)×5^−q < 2^64. We have that 2m+1 > 2^53 (64-bit case) // or 2m+1 > 2^24 (32-bit case). Hence,we must have 2^53×5^−q < 2^64 // (64-bit) and 2^24×5^−q < 2^64 (32-bit). Hence we have 5^−q < 2^11 // or q ≥ −4 (64-bit case) and 5^−q < 2^40 or q ≥ −17 (32-bitcase). // // Thus we have that we only need to round ties to even when // we have that q ∈ [−4,23](in the 64-bit case) or q∈[−17,10] // (in the 32-bit case). In both cases,the power of five(5^|q|) // fits in a 64-bit word. const MIN_EXPONENT_ROUND_TO_EVEN: i32; const MAX_EXPONENT_ROUND_TO_EVEN: i32; ``` This ensures maintainability of the code base. **Improvements for Disguised Fast-Path Cases** The fast path in float parsing algorithms attempts to use native, machine floats to represent both the significant digits and the exponent, which is only possible if both can be exactly represented without rounding. In practice, this means that the significant digits must be 53-bits or less and the then exponent must be in the range `[-22, 22]` (for an f64). This is similar to the existing dec2flt implementation. However, disguised fast-path cases exist, where there are few significant digits and an exponent above the valid range, such as `1.23e25`. In this case, powers-of-10 may be shifted from the exponent to the significant digits, discussed at length in rust-lang#85198. **Digit Parsing Improvements** Typically, integers are parsed from string 1-at-a-time, requiring unnecessary multiplications which can slow down parsing. An approach to parse 8 digits at a time using only 3 multiplications is described in length [here](https://johnnylee-sde.github.io/Fast-numeric-string-to-int/). This leads to significant performance improvements, and is implemented for both big and little-endian systems. **Unsafe Changes** Relative to fast-float-rust, this library makes less use of unsafe functionality and clearly documents it. This includes the refactoring and documentation of numerous unsafe methods undesirably marked as safe. The original code would look something like this, which is deceptively marked as safe for unsafe functionality. ```rust impl AsciiStr { #[inline] pub fn step_by(&mut self, n: usize) -> &mut Self { unsafe { self.ptr = self.ptr.add(n) }; self } } ... #[inline] fn parse_scientific(s: &mut AsciiStr<'_>) -> i64 { // the first character is 'e'/'E' and scientific mode is enabled let start = *s; s.step(); ... } ``` The new code clearly documents safety concerns, and does not mark unsafe functionality as safe, leading to better safety guarantees. ```rust impl AsciiStr { /// Advance the view by n, advancing it in-place to (n..). pub unsafe fn step_by(&mut self, n: usize) -> &mut Self { // SAFETY: same as step_by, safe as long n is less than the buffer length self.ptr = unsafe { self.ptr.add(n) }; self } } ... /// Parse the scientific notation component of a float. fn parse_scientific(s: &mut AsciiStr<'_>) -> i64 { let start = *s; // SAFETY: the first character is 'e'/'E' and scientific mode is enabled unsafe { s.step(); } ... } ``` This allows us to trivially demonstrate the new implementation of dec2flt is safe. **Inline Annotations Have Been Removed** In the previous implementation of dec2flt, inline annotations exist practically nowhere in the entire module. Therefore, these annotations have been removed, which mostly does not impact [performance](aldanor/fast-float-rust#15 (comment)). **Fixed Correctness Tests** Numerous compile errors in `src/etc/test-float-parse` were present, due to deprecation of `time.clock()`, as well as the crate dependencies with `rand`. The tests have therefore been reworked as a [crate](https://github.com/Alexhuszagh/rust/tree/master/src/etc/test-float-parse), and any errors in `runtests.py` have been patched. **Undefined Behavior** An implementation of `check_len` which relied on undefined behavior (in fast-float-rust) has been refactored, to ensure that the behavior is well-defined. The original code is as follows: ```rust #[inline] pub fn check_len(&self, n: usize) -> bool { unsafe { self.ptr.add(n) <= self.end } } ``` And the new implementation is as follows: ```rust /// Check if the slice at least `n` length. fn check_len(&self, n: usize) -> bool { n <= self.as_ref().len() } ``` Note that this has since been fixed in [fast-float-rust](aldanor/fast-float-rust#29). **Inferring Binary Exponents** Rather than explicitly store binary exponents, this new implementation infers them from the decimal exponent, reducing the amount of static storage required. This removes the requirement to store [611 i16s](https://github.com/rust-lang/rust/blob/868c702d0c9a471a28fb55f0148eb1e3e8b1dcc5/library/core/src/num/dec2flt/table.rs#L8). # Code Size The code size, for all optimizations, does not considerably change relative to before for stripped builds, however it is **significantly** smaller prior to stripping the resulting binaries. These binary sizes were calculated on x86_64-unknown-linux-gnu. **new** Using rustc version 1.55.0-dev. opt-level|size|size(stripped) |:-:|:-:|:-:| 0|400k|300K 1|396k|292K 2|392k|292K 3|392k|296K s|396k|292K z|396k|292K **old** Using rustc version 1.53.0-nightly. opt-level|size|size(stripped) |:-:|:-:|:-:| 0|3.2M|304K 1|3.2M|292K 2|3.1M|284K 3|3.1M|284K s|3.1M|284K z|3.1M|284K # Correctness The dec2flt implementation passes all of Rust's unittests and comprehensive float parsing tests, along with numerous other tests such as Nigel Toa's comprehensive float [tests](https://github.com/nigeltao/parse-number-fxx-test-data) and Hrvoje Abraham [strtod_tests](https://github.com/ahrvoje/numerics/blob/master/strtod/strtod_tests.toml). Therefore, it is unlikely that this algorithm will incorrectly round parsed floats. # Issues Addressed This will fix and close the following issues: - resolves rust-lang#85198 - resolves rust-lang#85214 - resolves rust-lang#85234 - fixes rust-lang#31407 - fixes rust-lang#31109 - fixes rust-lang#53015 - resolves rust-lang#68396 - closes aldanor/fast-float-rust#15
The pull-request for integrating the algorthim in the rust standard library has been merged 🎉. Thank you @Alexhuszagh for having created and brought this pull-request to the end. |
@Urgau @Alexhuszagh Great work. One fun challenge would be to re-use the same tables for serialization. It is on my todo to investigate this option. |
@lemire Thanks for all the hard work, and helpful feedback. As of right now... 🎉🎉🎉 :crab rave: |
@Alexhuszagh Do not hesitate to get in touch if you'd like to continue this work and you feel I can help. |
Have you consider replacing the
dec2flt
algorithm in the standard library by your crate ?I remember have seen several members of the Rust Libs Team encouraging people how have better algorithm (fast, efficient, ...) to integrate their crate in the std library.
Source of dec2flt
The text was updated successfully, but these errors were encountered: