Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Normative: Support BigInt in NumberFormat and toLocaleString #236

Merged
merged 1 commit into from
Jun 13, 2019

Conversation

littledan
Copy link
Member

This patch brings Intl.NumberFormat support to BigInt, and
adds a BigInt.prototype.toLocaleString method based on it.

The design here is to include overloading between BigInt and Number
as arguments for the format and formatToParts methods based on
ToNumeric. This means that, for example, string arguments are
cast to Number, rather than BigInt. This design preserves
compatibility and consistency with operators like unary -

This definition permits options in the NumberFormat to force
decimal places, e.g., 1n formatting as 1.00000 if the minimum
fractional digits is 5. Alternative semantics would be to
throw an exception in this case.

For the algorithm text itself: the specification algorithms
ToRawPrecision and ToRawFixed are now used for both Numbers
and BigInts. Given the ECMAScript specification's use of implicit
coercisions between Numbers and mathematical values, I believe
that this is valid without any special changes; the phrasing
may change in the future [1].

ICU4C-based implementations of ECMAScript can use
LocalizedNumberFormatter::formatDecimal [2] or
unum_formatDecimal [3] to implement the algorithms in this patch.

[1] tc39/ecma262#1135
[2] http://icu-project.org/apiref/icu4c/classicu_1_1number_1_1LocalizedNumberFormatter.html#a29cd3d107b784496e19175ce0115f26f
[3] http://icu-project.org/apiref/icu4c/unum_8h.html#a59870a322f012dc1b9d99cf8a7b708f1

Closes #218

This patch brings Intl.NumberFormat support to BigInt, and
adds a BigInt.prototype.toLocaleString method based on it.

The design here is to include overloading between BigInt and Number
as arguments for the format and formatToParts methods based on
ToNumeric. This means that, for example, string arguments are
cast to Number, rather than BigInt. This design preserves
compatibility and consistency with operators like unary -

This definition permits options in the NumberFormat to force
decimal places, e.g., 1n formatting as 1.00000 if the minimum
fractional digits is 5. Alternative semantics would be to
throw an exception in this case.

For the algorithm text itself: the specification algorithms
ToRawPrecision and ToRawFixed are now used for both Numbers
and BigInts. Given the ECMAScript specification's use of implicit
coercisions between Numbers and mathematical values, I believe
that this is valid without any special changes; the phrasing
may change in the future [1].

ICU4C-based implementations of ECMAScript can use
LocalizedNumberFormatter::formatDecimal [2] or
unum_formatDecimal [3] to implement the algorithms in this patch.

[1] tc39/ecma262#1135
[2] http://icu-project.org/apiref/icu4c/classicu_1_1number_1_1LocalizedNumberFormatter.html#a29cd3d107b784496e19175ce0115f26f
[3] http://icu-project.org/apiref/icu4c/unum_8h.html#a59870a322f012dc1b9d99cf8a7b708f1

Closes tc39#218
@littledan
Copy link
Member Author

Copy link
Contributor

@caiolima caiolima left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM.

@littledan
Copy link
Member Author

This patch was presented at the May 2018 TC39 meeting. No concerns were expressed, so I take it that it has consensus. Let's consider this change to be "at Stage 3" together with the rest of the BigInt proposal, ready for implementation and test262 tests. cc @jakobkummerow @caiolima @cxielarko

@FrankYFTang
Copy link
Contributor

@sffc @gsathya - I just noticed this. For V8, we need to consider this in our priority and also the proposal Shane is working on.

Copy link

@jakobkummerow jakobkummerow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have concerns about this, see comments.

As a side note, this is potentially going to be tricky to implement, because the BigInt object format is going to be specific to each implementation, so it'll be hard to share much if any of the functionality via ICU.

@@ -376,7 +376,7 @@ <h1>FormatNumberToParts( _numberFormat_, _x_ )</h1>
<h1>ToRawPrecision( _x_, _minPrecision_, _maxPrecision_ )</h1>

<p>
When the ToRawPrecision abstract operation is called with arguments _x_ (which must be a finite non-negative number), _minPrecision_, and _maxPrecision_ (both must be integers between 1 and 21), the following steps are taken:
When the ToRawPrecision abstract operation is called with arguments _x_ (which must be a finite non-negative Number or BigInt), _minPrecision_, and _maxPrecision_ (both must be integers between 1 and 21), the following steps are taken:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minPrecision, and maxPrecision (both must be integers between 1 and 21)

doesn't really make sense for BigInts.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minPrecision/maxPrecision make sense in some cases, like when using scientific or compact notation. For example, if you have a BigInt 88776655443322, you might want to display only 4 significant digits in compact notation, producing "88.78 Trillion".

@@ -410,7 +410,7 @@ <h1>ToRawPrecision( _x_, _minPrecision_, _maxPrecision_ )</h1>
<h1>ToRawFixed( _x_, _minInteger_, _minFraction_, _maxFraction_ )</h1>

<p>
When the ToRawFixed abstract operation is called with arguments _x_ (which must be a finite non-negative number), _minInteger_ (which must be an integer between 1 and 21), _minFraction_, and _maxFraction_ (which must be integers between 0 and 20), the following steps are taken:
When the ToRawFixed abstract operation is called with arguments _x_ (which must be a finite non-negative Number or BigInt), _minInteger_ (which must be an integer between 1 and 21), _minFraction_, and _maxFraction_ (which must be integers between 0 and 20), the following steps are taken:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minInteger (which must be an integer between 1 and 21), minFraction, and maxFraction (which must be integers between 0 and 20)

doesn't really make sense for BigInts.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment about this being relevant when compact or scientific notation is used.

Another use case: maybe a library dealing with integer numbers from the back-end (such as protobuf int64s, etc.) makes all numbers, even small ones, represented by BigInt. So, it would not be unreasonable to have a case where you have a small BigInt, and you want to show decimals on it, like for currencies.


<p>
The FormatNumberToString abstract operation is called with arguments _intlObject_ (which must be an object with [[MinimumSignificantDigits]], [[MaximumSignificantDigits]], [[MinimumIntegerDigits]], [[MinimumFractionDigits]], and [[MaximumFractionDigits]] internal slots), and _x_ (which must be a Number value), and returns _x_ as a string value with digits formatted according to the five formatting parameters.
The FormatNumericToString abstract operation is called with arguments _intlObject_ (which must be an object with [[MinimumSignificantDigits]], [[MaximumSignificantDigits]], [[MinimumIntegerDigits]], [[MinimumFractionDigits]], and [[MaximumFractionDigits]] internal slots), and _x_ (which must be a Number or BigInt value), and returns _x_ as a string value with digits formatted according to the five formatting parameters.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(which must be an object with [[MinimumSignificantDigits]], [[MaximumSignificantDigits]], [[MinimumIntegerDigits]], [[MinimumFractionDigits]], and [[MaximumFractionDigits]] internal slots)

doesn't really make sense for BigInts, and neither calling ToRawPrecision nor ToRawFixed below is appropriate for them. I think this method needs more significant changes than just an s/Number/Numeric/ renaming.

@sffc
Copy link
Contributor

sffc commented Jan 16, 2019

As a side note, this is potentially going to be tricky to implement, because the BigInt object format is going to be specific to each implementation, so it'll be hard to share much if any of the functionality via ICU.

ICU accepts arbitrary-precision numbers via the decNumber APIs. As long as an implementation can get the Latin digits for the BigInt into a string it can run ICU with it. If BigInt has a toString() method, then that output might be sufficient in itself to run ICU.

@FrankYFTang
Copy link
Contributor

As a side note, this is potentially going to be tricky to implement, because the BigInt object format is going to be specific to each implementation, so it'll be hard to share much if any of the functionality via ICU.

ICU accepts arbitrary-precision numbers via the decNumber APIs. As long as an implementation can get the Latin digits for the BigInt into a string it can run ICU with it. If BigInt has a toString() method, then that output might be sufficient in itself to run ICU.

Does this mean the implementation will first turn the BigInt into a string form and then pass that string to

icu::number::LocalizedNumberFormatter::formatDecimal() ?
If so, how should I convert it to the string form that ICU accept, calling the BigInt.prototype.ToString() ?

--

@sffc
Copy link
Contributor

sffc commented Jan 17, 2019

As a side note, this is potentially going to be tricky to implement, because the BigInt object format is going to be specific to each implementation, so it'll be hard to share much if any of the functionality via ICU.

ICU accepts arbitrary-precision numbers via the decNumber APIs. As long as an implementation can get the Latin digits for the BigInt into a string it can run ICU with it. If BigInt has a toString() method, then that output might be sufficient in itself to run ICU.

Does this mean the implementation will first turn the BigInt into a string form and then pass that string to

icu::number::LocalizedNumberFormatter::formatDecimal() ?

Correct

If so, how should I convert it to the string form that ICU accept, calling the BigInt.prototype.ToString() ?

--

That should work. If there's a cheap way to convert the BigInt to scientific notation, that should also work.

@littledan
Copy link
Member Author

@jakobkummerow I can understand these concerns about sloppy writing and parts of this text being irrelevant for BigInt, but I don't understand this part:

As a side note, this is potentially going to be tricky to implement, because the BigInt object format is going to be specific to each implementation, so it'll be hard to share much if any of the functionality via ICU.

I think you can handle this by converting the BigInt to a base-10 string and then calling ICU's functionality to format a string representing a number in a locale-specific way.

Separately: We are discussing over in WebIDL-land whether it's appropriate to overload functions between BigInt and Number. @annevk has advocated for not permitting this overloading, which we could follow if we make a separate formatBigInt method or something like that. I think we should try to make a consistent decision across JavaScript and WebIDL if possible.

@FrankYFTang
Copy link
Contributor

@jakobkummerow I can understand these concerns about sloppy writing and parts of this text being irrelevant for BigInt, but I don't understand this part:

As a side note, this is potentially going to be tricky to implement, because the BigInt object format is going to be specific to each implementation, so it'll be hard to share much if any of the functionality via ICU.

I think you can handle this by converting the BigInt to a base-10 string and then calling ICU's functionality to format a string representing a number in a locale-specific way.

Separately: We are discussing over in WebIDL-land whether it's appropriate to overload functions between BigInt and Number. @annevk has advocated for not permitting this overloading, which we could follow if we make a separate formatBigInt method or something like that. I think we should try to make a consistent decision across JavaScript and WebIDL if possible.

@littledan - so after you see the above comment from @jakobkummerow do you think you still need to revise this PR with some changes before you merge it in? or you think it should be landed AS IS? Just try to understand the implication from @jakobkummerow 's comments.

Also, how likely do you think A) the BigInt and B) this ECMA changes to BigInt.prototype.toLocaleString will make the cut into ECMA262 2019 edition and ECMA402 2019 edition? Just also try to get a sense of priority in term of WHEN should I schedule time to implement it in v8.

@FrankYFTang
Copy link
Contributor

v8 prototype cl https://chromium-review.googlesource.com/c/v8/v8/+/1424021
Will come back next week to add more test code and tight up before asking v8 folks to review. Just fyi

@FrankYFTang
Copy link
Contributor

v8 prototype cl https://chromium-review.googlesource.com/c/v8/v8/+/1424021
Will come back next week to add more test code and tight up before asking v8 folks to review. Just fyi

@littledan - When do you think you can merge this PR into ECMA402? IF you can merge it within the next 10 calendar, I think I can

  1. submit some tests to test262 to test it within the next 10 days AND
  2. get my v8 implementation code review before the feature freeze day of chrome m74 (Feb 22). Actually, most likely within 3-5 days this PR got merged.

What I got from my v8 prototype this morning.
ftang@ftang4:~/v8/v8$ out/x64.release/d8
V8 version 7.3.0 (candidate)
d8> BigInt(12345678901234567890).toString()
"12345678901234567168"
d8> BigInt(12345678901234567890).toLocaleString("de")
"12.345.678.901.234.567.168"
d8> BigInt(12345678901234567890).toLocaleString("en")
"12,345,678,901,234,567,168"
d8> BigInt(12345678901234567890).toLocaleString("hi")
"1,23,45,67,89,01,23,45,67,168"
d8> BigInt(12345678901234567890).toLocaleString("fa")
"۱۲٬۳۴۵٬۶۷۸٬۹۰۱٬۲۳۴٬۵۶۷٬۱۶۸"
d8> BigInt(12345678901234567890).toLocaleString("fr")
"12 345 678 901 234 567 168"
d8> BigInt(12345678901234567890).toLocaleString("th-Thai")
"12,345,678,901,234,567,168"
d8> BigInt(12345678901234567890).toLocaleString("th-u-nu-Thai")
"๑๒,๓๔๕,๖๗๘,๙๐๑,๒๓๔,๕๖๗,๑๖๘"

@FrankYFTang
Copy link
Contributor

ICU4C-based implementations of ECMAScript can use
LocalizedNumberFormatter::formatDecimal [2] or

Just want to add a note
ICU4C-based implementations of ECMAScript has a third option
icu::NumberFormat::format

virtual UnicodeString& icu::NumberFormat::format ( StringPiece number, UnicodeString & appendTo,
FieldPositionIterator * posIter, UErrorCode &status) const
[4] http://icu-project.org/apiref/icu4c/classicu_1_1NumberFormat.html#ab039d331fa562087a01110aade731aff

@littledan
Copy link
Member Author

BigInt will not make it into ECMAScript 2019, but I don't think that should delay your implementation work on toLocaleString any more than BigInt itself. I have already presented this PR to TC39 with no strong concerns raised from the committee.

The remaining open question is, should we overload format or make a separate formatBigInt? This should be a pretty superficial change, but I want to have a conversation involving both the TC39 and WebIDL communities to come to a common conclusion before the overloaded format method ships in browsers.

I don't think we have any remaining questions about the toLocaleString method's semantics, though.

@FrankYFTang
Copy link
Contributor

BigInt will not make it into ECMAScript 2019, but I don't think that should delay your implementation work on toLocaleString any more than BigInt itself. I have already presented this PR to TC39 with no strong concerns raised from the committee.

The remaining open question is, should we overload format or make a separate formatBigInt? This should be a pretty superficial change, but I want to have a conversation involving both the TC39 and WebIDL communities to come to a common conclusion before the overloaded format method ships in browsers.

I don't think we have any remaining questions about the toLocaleString method's semantics, though.

ok. Help me out to understand the dependency below. I am still rather new here so maybe some of my assumptions are wrong.

  1. Could you merge this PR now?
  2. Before this PR got merged, would any PR in test262 adding new tests to test this PR merge into test262? I am willing to write some of the tests, but I do not like to see I put a PR together and that got hang there indefinitely blocked by this PR for a long time.
  3. I think I will have the cl in v8 ready for review next week. (Code of my implementation is ready but I need to add more tests) But unless this PR merged I am not sure v8 reviewers will let me land my implementation into v8.

@littledan
Copy link
Member Author

I would not merge this PR or tests yet because of the unresolved issue. However, IMO it wouldn't be too soon to land a flagged implementation, as the unresolved issue is superficial. Happy to discuss this with more V8 folks as well.

@annevk
Copy link
Member

annevk commented Jan 21, 2019

It seems better not to merge this until there's a consensus on the pattern to be used. I think the main reason I had against overloading was because normally only one type of number makes sense and accepting both would be a convenience that's better done explicitly (otherwise any number of things accepting Number today would also have to accept BigInt).

This method seems to be somewhat exceptional in that it actually makes sense for both types. That seems reasonable, but it seems hard to convey that nuance and I'm afraid there'll be a number of specifications that would do inappropriate overloading if IDL allowed for it (as e.g. argued for some Wasm API and nearly accepted).

@littledan
Copy link
Member Author

OK, how should we get everyone together to discuss this question. Email thread? Call? TC39 meeting topic? I would be happy to stick with Anne's resolution, to take a strong stance to avoid likely confusion.

@annevk
Copy link
Member

annevk commented Jan 21, 2019

I'd be happy to let TC39 deliberate over these points and come up with something for JavaScript/IDL. (To be clear, I will accept the outcome.)

@FrankYFTang
Copy link
Contributor

I would not merge this PR or tests yet because of the unresolved issue. However, IMO it wouldn't be too soon to land a flagged implementation, as the unresolved issue is superficial. Happy to discuss this with more V8 folks as well.

@jakobkummerow & @gsathya - with this, would you mind I add a flag to guard on my BigInt.prototype.toLocaleString implementation? I know Jakob earlier said this should not need a flag but I think Daniel's point probably make it more sense for us to have one. IS that ok?

@jakobkummerow
Copy link

@FrankYFTang Sure, when the spec is still in flux then a (default-off) flag makes sense.

@gsathya
Copy link
Member

gsathya commented Jan 22, 2019

I haven't been following this too closely so apologies if this is obvious,

The remaining open question is, should we overload format or make a separate formatBigInt? This should be a pretty superficial change, but I want to have a conversation involving both the TC39 and WebIDL communities to come to a common conclusion before the overloaded format method ships in browsers.

Is this an observable change? If not, why is this blocking shipping in browsers? If yes, can someone give me a summary of the observable part of this?

@littledan
Copy link
Member Author

@gsathya If you want to ship BigInt.prototype.toLocaleString by itself, there's no observable change. If you want to ship this PR as a whole, there's also a change to Intl.NumberFormat.prototype.format, which overloads between Number and BigInt--that second part, I'd like to discuss in the upcoming TC39 meeting.

@FrankYFTang
Copy link
Contributor

@littledan should BigInt(-0).toLocaleString() format to "0" or "-0" ? why?

@FrankYFTang
Copy link
Contributor

Thanks for the code review from @jakobkummerow the v8 implementation sync with the current spec ( 040f809 ) is landed into v8 tree behind the flag --harmony-intl-bigint . You can also see the test I wrote here
https://chromium-review.googlesource.com/c/v8/v8/+/1424021/8/test/intl/bigint/tolocalestring.js
Notice in this implementation BigInt(-0).toLocaleString() will output "0" instead of "-0". @jakobkummerow said that is the expected behavior. In the other hand (-0).toLocaleString() will return "-0".

@FrankYFTang
Copy link
Contributor

Based on decision on 2019 March TC39 meeting, ECMA402 will go with this PR and abandon the idea in #318.
Per comments with @littledan This PR is a stage 3 PR and will be merged into ECMA402 spec once it reach Stage 4.

@FrankYFTang
Copy link
Contributor

see tc39/proposal-intl-relative-time#106 also

@FrankYFTang
Copy link
Contributor

FYI- Chrome/v8 will ship this BigInt support in m76
https://chromium-review.googlesource.com/c/v8/v8/+/1575019

@leobalter leobalter merged commit c682e90 into tc39:master Jun 13, 2019
@Ms2ger Ms2ger deleted the bigint-to-locale-string branch January 3, 2020 14:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c: numbers Component: numbers, currency, units has consensus Has consensus from TC39-TG2 needs tests stage 3
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BigInt.prototype.toLocaleString
9 participants