Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should multiple underscores be allowed between digits? #517

Closed
rmunn opened this issue Feb 3, 2018 · 5 comments
Closed

Should multiple underscores be allowed between digits? #517

rmunn opened this issue Feb 3, 2018 · 5 comments

Comments

@rmunn
Copy link

rmunn commented Feb 3, 2018

The TOML spec currently allows underscores between digits, but states:

Each underscore must be surrounded by at least one digit.

This is slightly different from the semantics allowed in Java, C# (couldn't find a link, but tested), and F#. All three of these languages allow multiple underscores between digits, so that 3_____4 is a valid way to write the number 34 (thirty-four).

On the other hand, Python 3.6 just added underscores between digits, and does NOT allow them to be doubled up (3_4 is valid, but 3__4 (two underscores) is not.) So while there does seem to be some consensus among much of the programming-language community that underscores between digits are a good idea, there is not yet a consensus about whether multiple underscores should be allowed.

Since TOML has "obvious" right in the name, the most important question is "What is obvious here?". And that's a question best answered by a survey, because what is obvious to me may not be obvious to you. There are two different things that could be obvious:

  1. Only allow a single underscore between digits (the current spec), because while it's pretty obvious what 2_147_483_647 means, the meaning of 2___147___483___647 is not as immediately obvious: it's harder to read with the digits so separated. (And anyone who seriously writes the number 34 as 3_____4 should be forced to write nothing but PHP for a year, as penance.)

  2. Change the spec to better match what C#, Java, and F# allow. Here, we're saying that "obvious" means "a syntax very similar to what most major languages allow". Even though I don't find 3_____4 to be obvious, most languages allow that, so it might be surprising to people to have that syntax be forbidden.

So I think it's important to have this discussion now, before the current TOML spec is formalized as 1.0 and it becomes harder to make changes. I'd like to get other people's opnions. Should 3____4 be allowed, or forbidden?

@rmunn
Copy link
Author

rmunn commented Feb 3, 2018

This is a survey with two choices. Please vote 👍 or 👎 on this comment (not the OP) for the following two alternatives:

👍 — the current spec is fine, and multiple underscores like 3_____4 should be forbidden. 3_4 would be valid, but 3_____4 would be forbidden.
👎 — the current spec is NOT fine, and should be changed to allow 3_____4.

If you would pick "c) Other", then please leave a comment below explaining in detail. :-)

@mojombo
Copy link
Member

mojombo commented Feb 4, 2018

I can't think of any valid use cases for multiple consecutive underscores in numbers, and I don't think that the mere allowance of that behavior in a few other languages is enough to warrant allowing something non-useful in TOML. By disallowing them, it makes it more likely that TOML documents will be more obvious to readers, which is an important thing to encourage.

@rmunn
Copy link
Author

rmunn commented Feb 4, 2018

I have thought of one possible use case, which would be to put two underscores between groups of hex digits, like 0x1234_5678__90ab_cdef. Here the 64-bit number has been visually divided into two 32-bit numbers, which are themselves divided into a 16-bit number. That could actually be slightly useful. But... 0x1234_5678_90ab_cdef does a good enough job of visually dividing the number groups; it's easy enough to pick out the two 32-bit numbers without the extra underscore. And IMHO, this one minor use case doesn't trump the many scenarios where multiple underscores would be a detriment to readability, all of which can be forbidden by sticking to the "just one underscore allowed" rule — so I agree with you about which way is more obvious. I'm glad to see that the survey results so far indicate that this is also going to be obvious to most people, as that makes it an easy decision.

@ahrvoje
Copy link

ahrvoje commented Feb 4, 2018

Just studied a lot of code, and changed my vote to single '_' only.

@mojombo
Copy link
Member

mojombo commented Feb 4, 2018

Cool, sounds like this is settled then. Thanks for the discussion!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants