Digit separators in number literals. #2

lrhn · 2018-06-29T06:14:03Z

This is currently under implementation: implementation issue, feature specification.

Solution to #1.

To make long number literals more readable, allow authors to inject digit group separators inside numbers.
Examples with different possible separators:

100 000 000 000 000 000 000  // space 
100,000,000,000,000,000,000  // comma
100.000.000.000.000.000.000  // period
100'000'000'000'000'000'000  // apostrophe (C++)
100_000_000_000_000_000_000  // underscore (many programming languages).

The syntax must work even with just a single separator, so it can't be anything that can already validly seperate two expressions (excludes all infix operators and comma) and should already be part of a number literal (excludes decimal point).
So, the comma and decimal point are probably never going to work, even if they are already the standard "thousands separator" in text in different parts of the world.

Space separation is dangerous because it's hard to see whether it's just space, or it's an accidental tab character. If we allow spacing, should we allow arbitrary whitespace, including line terminators? If so, then this suddenly become quite dangerous. Forget a comma at the end of a line in a multiline list, and two adjacent integers are automatically combined (we already have that problem with strings). So, probably not a good choice, even if it is the preferred formatting for print text.

The apostrope is also the string single-quote character. We don't currently allow adjacent numbers and strings, but if we ever do, then this syntax becomes ambiguous. It's still possible (we disambiguate by assuming it's a digit separator). It is currently used by C++ 14 as a digit group separator, so it is definitely possible.

That leaves underscore, which could be the start of an identifier. Currently 100_000 would be tokenized as "integer literal 100" followed by "identifier _000". However, users would never write an identifier adjacent to another token that contains identifier-valid characters (unlike strings, which have clear delimiters that do not occur anywher else), so this is unlikely to happen in practice. Underscore is already used by a large number of programming languages including Java, Swift, and Python.

We also want to allow multiple separators for higher-level grouping, e.g.,:

100__000_000_000__000_000_000

For this purpose, the underscore extends gracefully. So does space, but has the disadvantage that it collapses when inserted into HTML, whereas '' looks odd.

For ease of reading and ease of parsing, we should only allow a digit separator that actually separates digits - it must occur between two digits of the number, not at the end or beginning, and if used in double literals, not adjacent to the . or e{+,-,} characters, or next to an x in a hexadecimal literal.

Examples

100__000_000__000_000__000_000  // one hundred million million millions!
0x4000_0000_0000_0000
0.000_000_000_01
0x00_14_22_01_23_45  // MAC address
555_123_4567  // US Phone number

Invalid literals:

100_
0x_00_14_22_01_23_45 
0._000_000_000_1
100_.1
1.2e_3

An identifier like _100 is a valid identifier, and _100._100 is a valid member access. If users learn the "separator only between digits" rule quickly, this will likely not be an issue.

Implementation issues

Should be trivial to implement at the parsing level. The only issue is that a parser might need to copy the digits (without the separators) before calling a parse function, where currently it might get away with pointing a native parse function directly at its input bytes.
This should have no effect after the parsing.

Style guides might introduce a preference for digit grouping (say, numbers with more than six digits should use separators) so a formatter or linter may want access to the actual source as well as the numerical value. The front end should make this available for source processing tools.

Library issues

Should int.parse/double.parse accept inputs with underscores. I think it's fine to not accept such input. It is not generated by int.toString(), and if a user has a string containing such an input, they can remove underscores manually before calling int.parse. That is not an option for source code literals.
I'd prefer to keep int.parse as efficient as possible, which means not adding a special case in the inner loop.
In JavaScript, parsing uses the built-in parseInt or Number functions, which do not accept underscores, so it would add (another) overhead for JavaScript compiled code.

Related work

Java digit separators.

The text was updated successfully, but these errors were encountered:

eernstg · 2018-06-29T06:52:18Z

+1!

tejainece · 2018-06-29T08:40:10Z

_ seems to be least confusing and non-intrusive syntax.

munificent · 2018-07-24T17:12:10Z

My feeling has always been that if you need separators in your number literal, you have likely already done something wrong. Instead of separators, create a const expression that shows where that large number is coming from.

Instead of:

const largeThing = 100000000000000000000;
const bigHex = 0x4000000000000000;

Consider, say:

const msPerSecond = 1000;
const nsPerMs = 1000000;
const largeThing = 100000000 * nsPerMs * msPerSecond;

const bigHex = 1 << 62;

This has the advantage of being easier to read and showing why these constants have these values. You do sometimes run into big arbitrary literals coming from empirical measurements or other things, but those tend to be fairly rare.

Given that number separators add confusion around how things like int.parse() behave, and there are "workarounds" that actually lead to clearer code, I've never felt they carried their weight.

tejainece · 2018-07-24T18:21:05Z

@munificent How many digits are there in 100000000000000000000 and 0x4000000000000000? You gotta get a cursor and count. Instead if you put an _ before every 4 digits, you any say x parts * 4 (for hex. 3 for currency, etc).

It is not always possible to decompose a number into its composite parts.

pschiffmann · 2018-08-02T19:32:58Z

FWIW, I'd like to suggest using quotes, like we already have for string literals: 100__000_000__000_000__000_000 might instead be expresssed as `100 000,000 000,000 000,000`. If you'd rather reserve the ` character for some future use, one could consider n'100 000,000 000,000 000,000', modeled after raw strings. Either way, there are more choices for the separator characters inside quotes.

If this repo is not the right place for unsolicited opinions from non-Dart team members, sorry to bother you.

munificent · 2018-08-02T22:25:36Z

It is not always possible to decompose a number into its composite parts.

I think one of these is usually true:

The number can be decomposed into smaller meaningful parts.
The number is some arbitrary empirical constant in which case a human will rarely need to scrutinize the individual digits.

So, in either case, I don't think it's a high priority to be able to easily read very large number literals.

kasperpeulen · 2018-08-15T09:32:05Z

I agree with munificent. But I think Dart needs the exponentiation operator ** for some cases:

const largeThing = 10**14;

Dart should then allow exponentiation of constant numbers to be a constant value with is not possible with pow(10,14) at the moment.

Update feature-specification.md

lrhn · 2019-05-27T09:12:09Z

While an exponentation operator can solve some issues, it won't make me get 0x7FFFFFFFFFFFFFFF right. That is a valid 64-bit integer literal (at least if I counted the F's correctly).

(There is also the option of exponential notation for integers: 1p22 or 0x1p62 as short for 1 * 10**22 and 0x1 * 16 ** 62, like we have for doubles using e).

AdamskiMarcin · 2019-10-07T20:17:13Z

Maybe it's not high priority, but would definitely be a useful thing. See an example:

log.d("Built in ${stopwatch.elapsedMicroseconds / 1000000} s");

and compare with the snippet below:

log.d("Built in ${stopwatch.elapsedMicroseconds / 1_000_000} s");

Just a simple and readable one-liner. No need for adding multiplication or extra variables for readibility, like:

log.d("Built in ${stopwatch.elapsedMicroseconds / (1000 * 1000)} s");

var multiplier = 1000 * 1000; log.d("Built in ${stopwatch.elapsedMicroseconds / multiplier} s");

Levi-Lesches · 2020-06-01T00:09:51Z

Any updates on this?

mpfaff · 2020-06-03T22:33:39Z

I'd love to see this. Particularly with colours in Flutter, where I usually have something like const Color(0xff3F4E90). I would much prefer const Color(0xff_3F4E90), because the 0xff makes it more difficult to read the actual color at a glance.

munificent · 2020-06-09T00:33:07Z

No updates, sorry. We are hard at work on null safety, which I hope everyone agrees is higher impact than digit separators. :)

MarkOSullivan94 · 2021-03-11T19:31:23Z

No updates, sorry. We are hard at work on null safety, which I hope everyone agrees is higher impact that digit separators. :)

Now that null safety has been released, I'm wondering what's the priority of digit separators?

lrhn · 2021-03-12T15:24:53Z

Honestly: Priority is low.

It's not blocking anything. The "small" features which will be part of Dart 2.13 are things you simply couldn't do before, so adding them now enables code that simply couldn't be written before. The sooner the better.

The lack of digit separators is not blocking any code from being written, you can write functional code that does exactly the same thing (it's just harder to read). So, features which remove actual blocks will likely have a higher priority when competing for the finite developer resource.

Personally, I want it yesterday. Yesteryear. Yesterdecade!

Levi-Lesches · 2021-03-12T17:04:36Z

I completely agree with you, but I always found it funny that "making numbers easier" is literally issues #1, #2, #3, and #4 of this repo 😂

pstricks-fans · 2021-04-18T19:50:06Z

Improving int.parse to allow parsing the following might be a good idea.

var reference = int.parse("1111_0000_0001_1110_1111_0000_0001_1110", radix: 2);

srawlins · 2024-05-01T16:52:42Z

I have two CLs that implement this feature:

Uploaded CLs

https://dart-review.googlesource.com/c/sdk/+/365201 adds int.parseWithSeparators() and int.tryParseWithSeparators().
https://dart-review.googlesource.com/c/sdk/+/365181 makes use of int.tryParseWithSeparators() in the Dart parser.

To make use of these before either land, you have to do a fun bootstrapping step. In the Dart SDK, there is a bootstrapped dart VM, which is used when, for example, executing CFE code to parse a script. So:

First build dart with the first CL (I used ./tools/build.py --mode release --arch x64 create_sdk).
Copy that dart (or more likely, the whole compiled dart-sdk) to a separate output directory. Something like
```
$ mkdir $HOME/dart-with-separators
$ cp xcodebuild/ReleaseX64/dart-sdk $HOME/dart-with-separators
```

Switch to the second CL, and overwrite the bootstrapped SDK. Something like:

$ mv tools/sdks/dart-sdk{,BKUP}
$ ln -s $HOME/dart-with-separators/dart-sdk tools/sdks/dart-sdk

Build Dart with the second CL, same way as the first time.

Design

I made some design decisions, treating these changes like a prototype:

The parser, which is written in Dart, efficiently parses numbers by using int.tryParse today, the function made available in dart:core. This code, as implemented in the VM, makes use of some external variables, set by the platform, to determine if an integer can be parsed as a Smi, etc. These external variables are not made public (by dart:core or any other dart: library), and I chose not to duplicate them.

From the parser code (pkg/front_end/lib/src/fasta/kernel/body_builder.dart handleLiteralInt), I could take a first pass over the String and remove underscores. But I think this would negatively affect performance, including for numbers with no separators (the vast majority of numbers). Instead we can skip over separators in the deeper code: the fast code for parsing Smis (_tryParseSmi) simply walks the characters, building an integer. This is the code where we would now skip separators, so I have to conditionally walk over separators, which means we need new public API for the parser to use.
Ergo, the first step to implementing this feature is adding new API to dart:core. I only added int.parseWithSeparators and int.tryParseWithSeparators, we need new API for num, double, and for consistency, BigInt.
This new API does not need to be new sibling functions. We have named parameters. We could instead bump the signature of int.parse from:
- int parse(String source, {int? radix}) to
- int parse(String source, {int? radix, bool withSeparators = false})
I don't think these two choices have any affect on backend implementation or performance. The path split happens deeper than this public API. I think I prefer the named parameters route. It is also not a breaking change.
I implemented the feature as proposed by @lrhn above, including things like:
- The character is underscore. For the dart:core API, we could support other characters (would be nice to support commas?). Or not.
- Underscores cannot be at the beginning, end, or next to x, e, or ..
- Underscores can be positioned otherwise arbitrarily.
- ints, doubles, hexadecimal, and exponential formats are supported.

Further work

A bit more work than the above two CLs would be required before we can call the thing done:

Put the thing behind an experiment flag (I think just the parsing bit, not the dart:core changes).
Add checks for where separators are not allowed, compile-time errors.
Support for num, double, and BigInt.
DAS quick fixes for that/those compile-time errors.
More language tests.
DDC impl, WASM impl.
A pass over lints, and DAS refactorings, to determine where code needs to be updated.

Wdestroier · 2024-05-02T01:34:06Z

Amazing @srawlins! My only suggestion would be to not make parseWithSeparators and tryParseWithSeparators part of the public SDK. Assuming tryParseWithSeparators is a more optimized equivalent of int.tryParse(text.replaceAll('_', '')).

lrhn · 2024-05-02T12:50:45Z

Yes, very impressive!

And agree on not making those methods public API. The goal of {int,double,num}.parse is not to parse Dart source code, but to parse the output of {int,double,num}.toString(), plus a few common cases (which really is just "0xHEX").

If we wouldn't put int.parseWithSeparator into the API if we weren't doing number separators, we just shouldn't do it.

If we then need a way to give efficient functions to our tools, that cannot be written in plain Dart, we can probably hack a way around that.

srawlins · 2024-05-03T06:38:26Z

CL 365181 should be a functional prototype that does not require new Dart SDK library API.

srawlins · 2024-05-09T05:24:22Z

Is this something worth pursuing, this spec as it is, and this CL as it is heading? Should I mail it for review?

lrhn · 2024-05-09T08:31:12Z

I'd love this, and the spec (one or more _s allowed between any two digits) is what I think we'd want, but I don't get to decide unanimously. Sadly.

The parser people need to approve of the approach, and the full language team needs to approve that we're really (and finally) doing this.
@jensjoha @johnniwinther @dart-lang/language-team

natebosch · 2024-05-09T18:40:55Z

Do we expect folks to ask us for a lint ensuring that separators always surround exactly 3 (or some other number) digits?

It looks weird to allow 0_x, but I could be convinced that the simpler spec which ignores every _ in a number literal is easier for users to reason about.

srawlins · 2024-05-09T18:46:35Z

Do we expect folks to ask us for a lint ensuring that separators always surround exactly 3 (or some other number) digits?

Absolutely they will. As for whether we would write it, I don't think it would be high priority. @lrhn gives the example of a US phone number, 555_123_4567.

IIUC, this spec does not allow 0_x, as in 0_x123, because 0x is not part of the "digits" of a number.

lrhn · 2024-05-09T18:51:17Z

Yes, 0_x1 is not allowed, separators are only allowed between digits, and x is not a digit. (But the hex digits after the x are.)

I'm not sure whether the leading 0 should count as a digit or not. (Probably not.) Luckily it doesn't matter.

natebosch · 2024-05-09T18:54:43Z

IIUC, this spec does not allow 0_x, as in 0_x123, because 0x is not part of the "digits" of a number.

Ah cool - I misread one of the tests in the linked CL.

LGTM

lrhn · 2024-05-21T14:29:05Z

@srawlins The language team is OK with implementing the currently proposed specification:
Allow one or more _s between any two otherwise adjacent digits of a NUMBER or HEX_NUMBER token. The following are not digits: The leading 0x or 0X in HEX_NUMBER, and any ., e, E, + or - in NUMBER.

That means only allowing _s between two 0-9 digits in NUMBER and between two 0-9,a-f,A-F digits in HEX_NUMBER.

The grammar would be (changing <DIGIT>+ to <DIGITS> which is then <DIGIT>s with optional _s between, and same for hex digits:

<NUMBER> ::= <DIGITS> (`.' <DIGITS>)? <EXPONENT>?
  \alt `.' <DIGITS> <EXPONENT>?

<EXPONENT> ::= (`e' | `E') (`+' | `-')? <DIGITS>

<DIGITS> ::= <DIGIT> (`_'* <DIGIT>)*

<HEX\_NUMBER> ::= `0x' <HEX\_DIGITS>
  \alt `0X' <HEX\_DIGITS>

<HEX\_DIGIT> ::= `a' .. `f'
  \alt `A' .. `F'
  \alt <DIGIT>

<HEX\_DIGITS> ::= <HEX\_DIGIT> (`_'* <HEX\_DIGIT>)*

(Handing this over to implementation. It's the parser people you have to make happy now 😁).

srawlins · 2024-05-21T17:12:14Z

Thanks much @lrhn and team!

lrhn · 2024-05-22T06:38:19Z

May want to give it an experiment flag (I suggest digit_separators) so the feature will be language versioned.
(Even if it's non-breaking to allow it for all language versions, it may lead someone use it in code which only require, fx, Dart 2.12, and which will then fail if actually run on an older SDK.)

srawlins · 2024-05-22T14:56:08Z

Yep, every language feature we've had since nnbd (except inference-updates) has been non-breaking, except in the case you indicate. Many files included in the CR are just there for the new experiment flag.

Work towards dart-lang/language#2 The feature is well-specified at the issue, but I will also follow up with a specification to check into the language repo. This change implements the feature more-or-less from front to back (because the back is very close to the front in this case :P; no "backend" work in the VM, etc). Digit separators are made available via a new experiment, `digit-separators`. Care is taken to report a single error when an underscore appears in an unexpected position (see new `separators_error_test.dart`). Three test files are added: * `separators_test.dart` is run with the experiment enabled, and has no compile-time errors. * `separators_error_test.dart` is run with the experiment enabled, and has many compile-time errors. * `separators_error_no_experiment_test.dart` is run with the experiment _disabled_. Change-Id: I7f1b1305d28b708b5ddf83f26188cd6e9ce3dd58 Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/365181 Commit-Queue: Samuel Rawlins <[email protected]> Reviewed-by: Lasse Nielsen <[email protected]> Reviewed-by: Kevin Moore <[email protected]> Reviewed-by: Jens Johansen <[email protected]>

gmpassos · 2024-08-13T07:04:55Z

Is this affecting int.parse() or double.parse()?

Until Dart 3.5, an int or double declared in the code could be parsed using int.parse and double.parse.

srawlins · 2024-08-13T12:10:25Z

It is not affecting int.parse or double.parse; you are correct that ints and doubles can no longer necessarily be parsed with int.parse and double.parse without some pre-processing to remove separators.

lrhn · 2024-08-13T15:20:03Z

Agree. Not affecting int.parse, double.parse or num.parse, and also not affecting int.toString or double.toString, which the parse functions are intended to parse the output of.

The contents of runtime strings are the same as Dart source number literals. There are many ways to format numbers for readability, and the parse functions do not support any of them. They do support parsing the result of toString, which is the most important part.

gmpassos · 2024-08-13T17:33:56Z

I don't know if it needs to be documented that *.parser won't be affected or changed. However, it should at least be considered due to the broad use of the language. Additionally, highlight that the use of _ can be incompatible with *.parse/tryParse.

srawlins · 2024-08-13T18:14:54Z

Yeah I'm happy to put that into the changelog or feature spec; I encountered a few tools that assume number literals can be passed to int.parse etc.

mit-mit · 2024-09-27T11:44:06Z

This is currently under implementation, please see dart-lang/sdk#56188

srawlins · 2024-09-30T19:59:53Z

Done. Will be available in Dart SDK 3.6.0 with a minimum Dart language version of 3.6.

sanba-anass · 2024-10-08T23:15:36Z

dang it i did run into this issue now i tought it was an already existing feature lol

lrhn added the feature Proposed language feature that solves one or more problems label Jun 29, 2018

lrhn self-assigned this Jun 29, 2018

lrhn mentioned this issue Jun 29, 2018

Long number literals are unreadable #1

Open

mit-mit added the state-backlog label Oct 31, 2018

Cat-sushi mentioned this issue Feb 5, 2019

How should default parameter values work with NNBD? #156

Closed

mit-mit pushed a commit that referenced this issue May 12, 2019

Merge pull request #2 from 0xflotus/patch-2

36e61fd

Update feature-specification.md

kevmoo mentioned this issue May 23, 2019

Allow n suffix for BigInt to match JavaScript #372

Open

srawlins mentioned this issue Oct 21, 2019

Number literal syntax - allow underscores or other way to make number more readable dart-lang/sdk#26845

Closed

mdakin mentioned this issue Apr 3, 2020

Let decimal literals with exponents greater or equal to the number of decimal places have static type int #909

Closed

lrhn mentioned this issue Jul 8, 2020

Meta: Small and useful features collection #1077

Open

15 tasks

lrhn added the small-feature A small feature which is relatively cheap to implement. label Jul 8, 2020

tatumizer mentioned this issue Feb 24, 2021

Protected extension types or proper "Value types" #1467

Open

JohnGalt1717 mentioned this issue Mar 24, 2021

Null handling deficiencies #1543

Closed

Hixie mentioned this issue Apr 1, 2021

StatefulWidget syntax in Flutter requires 2 classes #329

Open

supposedlysam-bb mentioned this issue Nov 27, 2023

Optional Entries in Map Patterns #2496

Open

srawlins mentioned this issue May 29, 2024

Add digit-separators specification v1.0 #3846

Merged

1 task

mit-mit assigned kallentu and unassigned lrhn Sep 27, 2024

kallentu assigned srawlins and unassigned kallentu Sep 27, 2024

srawlins closed this as completed Sep 30, 2024

Digit separators in number literals. #2

Digit separators in number literals. #2

Comments

lrhn commented Jun 29, 2018 • edited by mit-mit Loading

Examples

Implementation issues

Library issues

Related work

eernstg commented Jun 29, 2018

tejainece commented Jun 29, 2018 • edited Loading

munificent commented Jul 24, 2018 • edited Loading

tejainece commented Jul 24, 2018 • edited Loading

pschiffmann commented Aug 2, 2018

munificent commented Aug 2, 2018

kasperpeulen commented Aug 15, 2018 • edited Loading

lrhn commented May 27, 2019

AdamskiMarcin commented Oct 7, 2019

Levi-Lesches commented Jun 1, 2020

mpfaff commented Jun 3, 2020

munificent commented Jun 9, 2020 • edited Loading

MarkOSullivan94 commented Mar 11, 2021

lrhn commented Mar 12, 2021

Levi-Lesches commented Mar 12, 2021

pstricks-fans commented Apr 18, 2021 • edited Loading

srawlins commented May 1, 2024

Uploaded CLs

Design

Further work

Wdestroier commented May 2, 2024

lrhn commented May 2, 2024

srawlins commented May 3, 2024

srawlins commented May 9, 2024

lrhn commented May 9, 2024

natebosch commented May 9, 2024

srawlins commented May 9, 2024

lrhn commented May 9, 2024 • edited Loading

natebosch commented May 9, 2024

lrhn commented May 21, 2024

srawlins commented May 21, 2024

lrhn commented May 22, 2024

srawlins commented May 22, 2024

gmpassos commented Aug 13, 2024

srawlins commented Aug 13, 2024

lrhn commented Aug 13, 2024

gmpassos commented Aug 13, 2024

srawlins commented Aug 13, 2024

mit-mit commented Sep 27, 2024

srawlins commented Sep 30, 2024

sanba-anass commented Oct 8, 2024

lrhn commented Jun 29, 2018 •

edited by mit-mit

Loading

tejainece commented Jun 29, 2018 •

edited

Loading

munificent commented Jul 24, 2018 •

edited

Loading

tejainece commented Jul 24, 2018 •

edited

Loading

kasperpeulen commented Aug 15, 2018 •

edited

Loading

munificent commented Jun 9, 2020 •

edited

Loading

pstricks-fans commented Apr 18, 2021 •

edited

Loading

lrhn commented May 9, 2024 •

edited

Loading