-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEATURE] CORE write() formatting control #34
Comments
This would be tricky, since we currently defer float formatting to 1 of 3 different backends, user configurable. Only the unoptimized Grisu algorithm I would have any control over, so it would need to be implemented there, which is considerably slower than Ryu (~2x) and relatively a bit slower than However, this would certainly be doable, at least in the internal Grisu algorithm, since the generation of digits and their emission to the output buffer are done in subsequent steps. It may be faster to contact the maintainer of ryu and dtoa to ask for a similar feature, since this may be a feature required upstream for optimal performance. There are, however, a few assumptions I would likely need to modify in that case (these are all fairly trivial): I may consider forking ryu and seeing if implementing this is possible without negatively impacting performance, since this might be the ideal approach. |
Is it possible to estimate the size reasonably before conversion? If so the return type could be a result, returning an error if the buffer is insufficient and leave dealing with that for the user. |
@Atmelfan Not really because of radix changes, although with the parameters it would be easy to estimate the upper bound on the number of digits. If we were serializing a binary number to hex, it would be easy to estimate the number of digits because all values in binary are representible in hex. However, quickly estimating the number of digits a binary number might require isn't so simple, and is complicated by more modern serializing algorithms (Dragon4, Ryu, Grisu3, etc.), which all aim to efficiently calculate the minimum number of digits required to accurately represent a number in decimal. I'm only going to mention Grisu here, because I haven't implemented Dragon4 or Ryu, and therefore don't know the internal logic of it (and might get something wrong). For Grisu, it first normalizes the floating point number (in an extended-representation, for stability, where a 64-bit integer is used for the significant digits or mantissa, and 16-bits is used for the binary exponent), and then calculates the upper and lower boundary for the value. You can safely approximate this with the following Python code (requires NumPy), using a well-known boundary condition (0.1):
Example import numpy as np
ONE = np.uint64(1)
SIGN_MASK = np.uint64(0x8000000000000000)
INFINITY_BITS = np.uint64(0x7FF0000000000000)
NEGATIVE_INFINITY_BITS = INFINITY_BITS | SIGN_MASK
ZERO = np.float64(0.0)
# Maximum number of digits in decimal that can
# contribute to a number.
# This is calculated via:
# `−emin + p2 + ⌊(emin + 1) log(2, b) − log(1 − 2^(−p2), b)⌋`
# Where:
# b = radix
# emin = -1022
# p2 = 53
MAX_DIGITS = 768
def to_bits(double):
'''Get float bits as an unsigned, 64-bit integer.'''
return np.frombuffer(double.tobytes(), dtype=np.uint64)[0]
def from_bits(bits):
'''Create double from bit representation as an integer.'''
return np.frombuffer(bits.tobytes(), dtype=np.float64)[0]
def is_sign_negative(double):
'''Determine if the sign bit is negative.'''
return to_bits(double) & SIGN_MASK != 0
def is_sign_positive(double):
'''Determine if the sign bit is positive.'''
return not is_sign_negative(double)
def next_double(double):
'''Get the next greater float.'''
bits = to_bits(double)
if is_sign_negative(double) and double == ZERO:
# -0.0
return ZERO
elif bits == INFINITY_BITS:
return from_bits(INFINITY_BITS)
elif is_sign_negative(double):
return from_bits(bits - ONE)
else:
return from_bits(bits + ONE)
def previous_double(double):
'''Get the previous greater float.'''
bits = to_bits(double)
if is_sign_positive(double) and double == ZERO:
# +0.0
return -ZERO
elif bits == NEGATIVE_INFINITY_BITS:
return from_bits(NEGATIVE_INFINITY_BITS)
elif is_sign_negative(double):
return from_bits(bits + ONE)
else:
return from_bits(bits - ONE)
def format_double(double):
'''Format double with all possibly contributing significant digits.'''
return '{{:0.{}f}}'.format(MAX_DIGITS).format(double)
halfway = np.float64(0.1)
next = next_double(halfway)
previous = previous_double(halfway)
print('0.1={}'.format(format_double(halfway)))
print('next(0.1)={}'.format(format_double(next)))
print('previous(0.1)={}'.format(format_double(previous))) Output
As you can see, the boundary shows that it cannot differentiate between 0.1 and the actual stored value of x, so the Grisu algorithm emits 0.1. In order to generate these digits for the upper and lower bound, it must effectively divide by powers of 10 (although computationally it uses tricks that are much faster). In short, I don't believe there's any short circuit possible, at least not easily. However, it should be doable if the calculation of the number of significant digits is integrated into the actual formatting routine, but would have to be tightly coupled to it. Does that make more sense @Atmelfan, I understand this is a lot more complicated than it may initially seem. |
That said, it's very easy to estimate the upper bound on the number of digits written before conversion: 1). Number of significant digits. This would then create a trivial max bound on the number of digits very easily. |
Hi @Alexhuszagh! I'm writing a Rust front-end framework and I've tried many float-to-string libraries and only Optimized Wasm file of an example app with one
However I can't use it now because of missing features in formatting / format!("{:.2}", num) I've found a newer API: lexical::to_string_with_options(num, &options) and it looks like you are just working on it. So.. is there chance that support for these types of formatting corresponds with your roadmap? Thanks! P.S. lexical = { git = "...rust-lexical", rev = "...", features = ["std"], default_features = false } and the biggest with |
@MartinKavik This is low priority for the immediate short term, but after this week I'm planning on working on this. Controlling the number of digits should be very doable, especially since we control write formatting with This, of course, would prevent round-trips for truncated values, but if you're using shorter formatting control, I doubt this is of any relevance. rust-lexical/lexical-core/src/ftoa/grisu2.rs Lines 641 to 646 in 4b136c1
Just an FYI that I will be slightly busy for the next week on other projects, but this should be prioritized after that. Thanks for the excellent suggestion. I will likely implement it, along with a few other features (custom decimal points, and exponent characters) as part of the options API, and only available for |
@Alexhuszagh Thanks a lot and don't hurry with the implementation :)
I was playing a bit with the crate pretty_dtoa. It uses the crate ryu_floating_decimal - a forked |
@MartinKavik quick question, can you think of any other format options that would be appealing other than:
I'm currently drafting initial versions of this functionality, so I'd be interested in feedback, particularly around your use-case. |
I'm not @MartinKavik, but forced scientific format and sign character is common in test equipment. |
Excellent, thank you. Adding both to the list. |
|
Thanks, that's given me a few suggestions. There's also a few anti-features I won't be including, such as:
But the |
A tentative implementation has been done here, which supports all the features aforementioned, including a few others. It's scheduled for the 0.8 release, which focuses on better format control, optimizing algorithms, faster compile times, and fallbacks for very compact implementations. A few modifications should significantly reduce the size of the resulting binaries here:
The static tables have therefore been reduces from 1.4KB to 700 bytes, since we no longer need to store the exponents (and the padding), reducing the size of each element from 16 bytes to 8 bytes (a |
Ok I've done the actual size calculations prior to release, and the compact implementation seems to be ~9x smaller on x86_64 Linux using The measurement may be inexact, but is compared to an empty file with various tricks to ensure code is not optimized out, and then the Anyway, I'm planning for a release early next week. |
Implemented as of lexical v6.0.0 and lexical-core v0.8.0. Please use the |
@Alexhuszagh Do you mind elaborating on this? Suppose I have a LONG list of floats of unknown magnitude and I would like them all formatted with exactly 3 decimal places. How can I use |
Problem
I'm trying to implement a protocol which does not accept the scientific format in all places. It would be useful to control if the decimal output is written in normal or scientific format.
The number of significant digits would also be nice to have some degree of control over. Rounding the number to the desired before doesn't help if the rounded value isn't representable (example
1.2f32
->1.2000000476837158
).Solution
A extra write function which can take formatting hints, possibly
write_format(n, format, significant_digits, bytes)
where:n
- Value to be writtenformat
- An enum for desired formatsignificant_digits
- A usize of maximum number of significant digits, 0 could mean "Don't care".bytes
- Output bufferPrerequisites
If applicable to the feature request, here are a few things you should provide to help me understand the issue:
rustc 1.39.0-nightly (4295eea90 2019-08-30)
0.6.2
lexical-core: features=["radix"], default-features=false
Alternatives
The text was updated successfully, but these errors were encountered: