-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reconsider digit separators #1485
Comments
Besides international variations, there are also microformats. For example: let mac_address: i64 = 0xa1_b2_c3_d4_e5_f6;
let uuid: i128 = 0x123e4567_e89b_12d3_a456_426614174000; |
As a Chinese developer, I can say
So, maybe adding this kind of variation is worth a while |
I'm not saying you should do this, just throwing out a related idea... In Carbon, In Wuffs, both are valid (from the compiler's point of view) but the formatter (the equivalent of FWIW, Wuffs' formatter's canonicalization of numeric literals also inserts underscores at every 6 digits for decimal and at every 4 digits for hexadecimal: it's |
I think this is a pretty separate question, so if you'd like to pursue it I would move it. FWIW, we can have a near perfect recovery here in the frontend and suggest edits, so I think the difference isn't huge, but it is a difference.
Given the semantically meaningful different groupings mentioned here, I think this question should include not canonicalizing in the formatter. FWIW, I'm sufficiently convinced by things like credit card numbers, UUIDs, and MAC addresses that we should have this flexibility even outside of any ideas around regional differences or different bases. |
FWIW, being hexadecimal, UUIDs and MAC addresses aren't unusably bad if you enforce underscores every 4 digits. The natural microformat boundaries are already multiples of two bytes. Even if the natural UUID grouping involves the last 12 hex digits, that's still easy to see here:
In the MAC address case, "what's the 3rd byte" is still much easier to eyeball with "underscore every 4" than with no underscores at all. As for credit card numbers, do people actually process them as numbers (as opposed to strings)? |
I still find the versions above significantly more readable than these. I agree that no digit separators would be even worse, but I don't think that's really the question. I think the readability gain of format-specific grouping is worthwhile based on the examples here. |
We seem to have good evidence here that we should reconsider this decision, and a good level of consensus for making a change. The next step would be for someone to write a proposal presenting these arguments. |
Maybe I misinterpreted this (in docs/design/lexical_conventions/numeric_literals.md)
I don't understand the restriction of having digit separators only to the left of the decimal point for real numbers and I could not find any rationale behind it in the docs. Consider: let nanosecond: f64 = 0.000000001; vs let nanosecond: f64 = 0.000_000_001; I think that improves readability as much as digit separators in the integer part. |
Created a proposal on #1983 -- let me know if I've misunderstood leads direction there, I can always flip around alternatives if the leads want a different choice.
AFAICT your interpretation is correct, although the proposal has some conflicting examples in ties. Anyways, I think #1983 should produce clear rationale either way. |
Thanks @jonmeow for your reply. My concern was not about ties, but strictly readability. I think scientific notation is symmetric around the decimal point. To be able to group decimal digits in the integer part so that you can easily eyeball which parts are grams, kilograms etc is something that can aid avoiding making mistakes when defining constants. I just think the same argument holds for milligrams, micrograms etc. I could not find any rationale that I could understand in the referred links, but it seems you have already considered this. I was just naïvely thinking that this was something that was overlooked. I am truly amazed by your work, it's quite a challenge you have taken on! |
(removing good-first-issue label as this is now in progress) |
[Proposal #143: Numeric literals](https://github.com/carbon-language/carbon-lang/blob/trunk/proposals/p0143.md) added digit separators with strict rules for placement. It missed some use-cases. In order to address this, remove placement rules for numeric literals. Related issue: #1485 Co-authored-by: Chandler Carruth <[email protected]> Co-authored-by: Richard Smith <[email protected]>
I believe this is resolved by #1983 though I still need to update the design (but I think we can call the leads question closed). |
At present Carbon restricts integer digit separators to every 3 digits, going back to https://github.com/carbon-language/carbon-lang/blob/trunk/proposals/p0143.md.
A contrary mention had been made about the Indian convention. However, it looks like CJK cultures were overlooked, maybe due to conflicting information in https://en.wikipedia.org/wiki/Decimal_separator#Digit_grouping (which says eastern countries have switched to 3 digit groups). According to https://www.statisticalconsultants.co.nz/blog/how-the-world-separates-its-digits.html offers that China uses every 4 digits.
In light of the greater amount of convention differences, it may be worth supporting more variations (e.g., support 3 different conventions for digit groupings), or otherwise loosen restrictions. While that could end up with ambiguous placement for some numbers, larger numbers would less ambiguous because the groupings would repeat.
Note, I think this arose from this tweet
The text was updated successfully, but these errors were encountered: