Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-zero padded exponent in float string representation after Ryu implementation #14682

Closed
franciscoadasme opened this issue Jun 10, 2024 · 6 comments · Fixed by #14695
Closed
Labels
kind:regression Something that used to correctly work but no longer works topic:stdlib:text

Comments

@franciscoadasme
Copy link

As discussed in the forum post of the same name, there was an unintended change in the scientific notation for single-digit exponents, where they are no longer zero-padded after PR #14084 included in v1.11:

# before
printf("%E", 123.45) # => 1.234500E+02
printf("%E", 123.45e15) # => 1.234500E+17
# after
printf("%E", 123.45) # => 1.234500E+2 (note the missing leading zero in the exponent)
printf("%E", 123.45e15) # => 1.234500E+17

This no longer follows the C99 standard, which most languages adhere to. Furthermore, the official and other Ryu implementations (used in #14084) also print zero-padded exponents. It allows nicely aligned numbers, which is useful when writing files with hundreds/thousands lines of floating numbers such as those used in computational chemistry; my field of work.

I ask to revert the format back to include the leading zero.

@franciscoadasme franciscoadasme added the kind:bug A bug in the code. Does not apply to documentation, specs, etc. label Jun 10, 2024
@Blacksmoke16 Blacksmoke16 added topic:stdlib:text kind:regression Something that used to correctly work but no longer works and removed kind:bug A bug in the code. Does not apply to documentation, specs, etc. labels Jun 10, 2024
@beta-ziliani
Copy link
Member

I mildly agree with reverting this change in formatting:

As mentioned in the forum, the original reason to drop the leading zero was to make it consistent with normal printing (e.g., 1e-6.to_s returns "1.0e-6").

So reverting it means breaking again the internal consistency, in favor of some external consistency and backward consistency.

I would like to point out, though, that the argument drawn was to ensure numbers will be parsed correctly. This sounds a bit sketchy. Because while indeed Python, Ruby, OCaml, and, of course, C, seems to agree on this formatting, .NET and Haskell doesn't. .NET pads the exponent with three digits. The C99 std says (bold is mine):

The exponent always contains at least two digits, and only as many more digits as necessary to represent the exponent.

Technically speaking, if a parser fails to parse a number without the "%2d" format of the exponent, it might as well break it .NET's "%3d".

Haskell does what Crystal 1.11 does.

ghci> Text.Printf.printf "%e\n" 1e-6
1.0e-6

@Sija
Copy link
Contributor

Sija commented Jun 11, 2024

So reverting it means breaking again the internal consistency, in favor of some external consistency and backward consistency.

@beta-ziliani I wouldn't call C99 spec some external consistency. It is a spec after all.

And so according to the spec part you've quoted, both the .NET and Haskell are simply wrong (and a minority among the languages). Following it means in this case reverting to the previous behaviour.

@ysbaddaden
Copy link
Contributor

@Sija they're not wrong, they're free to not follow an external spec perfectly. Lang X printf doesn't have to be an exact C printf implementation.

@ysbaddaden
Copy link
Contributor

That being said, the "at least two digits" rule likely didn't come from nowhere, and it may have some practical use, maybe just to improve readability, and maybe improve interoperability with other languages (the format is consistent).

@beta-ziliani
Copy link
Member

Or maybe it's just historical?

@franciscoadasme
Copy link
Author

franciscoadasme commented Jun 13, 2024

Hey everyone, thank you for your input in this minor issue. I think the main problem is that #14084 introduced the change without documenting it, as @straight-shoota suggested in the forum post. From your discussion, it's not agreed which format should be used, so I think is better to revert it to the previous behavior for now. If it's later decided to drop the leading zero, it should be clearly documented on the PR/release notes.

IMHO, being consistent with the C99 standard and most other languages is important to avoid surprises in a heterogeneous environment (multiple languages), especially in scientific software (I believe Crystal is an excellent language for this use case). I think there should a very strong reason to go against a widely-used spec. The issue of internal consistency may be resolved by following this standard instead when using scientific notation in the normal printing of floats.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind:regression Something that used to correctly work but no longer works topic:stdlib:text
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants