-
Notifications
You must be signed in to change notification settings - Fork 105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Discussion] bidi control characters when formatting dates #28
Comments
/cc @bterlson |
STT is still in the proposal status. |
CLDR TR35 5.3.2 says:
|
|
@srl295 yes, that's part of the date patterns from CLDR, I can confirm that the we are getting the same results when using Intl.js polyfill, here is the data with a bunch of The question is:
I wonder what is Chakra/Edge doing differently since it doesn't use CLDR. @bterlson can you clarify? |
Not sure I follow - you said:
Edge currently includes bidi control characters when formatting dates
How is this != what cldr/node/v8/Icu does?
Microsoft is now part of CLDR though (afaik) still deploying it.
Your points About consistency are exactly why we started CLDR...
|
We don't use CLDR for Intl in Edge, fwiw. Here is one difference, though I'm not sure how to characterize it as I'm no expert :)
So you're right that Chrome does add bidi control characters, which I didn't notice before, but they are in different locations. Also Chrome does not include them when formatting en dates as Edge does. Example:
|
I don't think the codes are "added ", they are just part of the CLDR data. Do you happen to be in touch with the Microsoft cldr people? Thanks for putting the code points here, I will take a look a little bit |
Yeah I guess "added" wasn't the verb I wanted. "Included" more like. I believe you that they're part of the CLDR data. I am in touch with the CLDR folks here. I can ask any questions we might have if they don't chime in themselves. Let me know! |
Ok @bterlson let's try to gather all the info for next week, so we can discuss it in person, and try to get to a resolution. I will update the description of this issue now that we have more information. |
So they are different dates in your example. As to formatting codes, it may be excessive but not harmful. I'm not sure why this is an ecma402 discussion actually. I'd rather leave LRM/RLM out of the ecma402 discussion. If it's just a matter of content consistency, as I mentioned that's the whole point of CLDR, it seems akin to discussing whether "modifier letter turned comma" or "apostrophe" or curly quote should be used in certain languages. |
I agree that it feels slightly out of scope for ecma402. In our code, we wrap all variables in strings in FSI/PDI, but that's more of a mixed-content problem. |
In theory, but in practice I have gotten numerous bug reports on Edge's behavior as people expect to be able to parse some localized date in Chrome and have that same code work in Edge. This isn't too much of a stretch for people to make because Intl let's the specify exactly what components they want in the date. Why wouldn't it be safe to parse? I'm not saying this has to be fixed/unified. If it isn't then there should be a statement in the spec I can point to that explicitly says that not treating formatted dates as opaque is a very bad idea and not guaranteed to work. |
Because these bidi marks are not needed for a date string requested for |
I'll check, but it should state something to the effect that results depend
on other data, user prefs, etc. Probably many users doing parsing really
want some other issue in this repo fixed (filed or unfiled).
I'd be surprised if lrms/ rlms were implicated in most of such bug reports
you see. Although Maybe I shouldn't be surprised if even numeric dates are
different due to lrms.
Parsing is a whole other issue itself. I wouldn't expect users to type rlms
into an input field around date items, anymore than I would expect them to
type a THIN SPACE before percent sign or NBSP in the locales that expect
such on format.
|
@bterlson has a theory that we should validate, here is what we discussed: what happen when you have a system preferences in |
Both Chrome and FF are implementing UBA. You can check the behavior in the bidi demo tool. Note that the date string remains as one single run of One might argue that European numbers are directionally weak and might end up being resolved according to directional context ( I would be more than happy to discuss any edge cases folks have encountered before, but even if for some edge case, the aforementioned theory is validated, adding invisible control marks to strings which are not requested for a bidi language is not a solution as it introduces control characters where they are not supposed to appear. Libraries with more peculiar requirements to tailor the directional behaviour of strings in diverse directional contexts can implement means to pass the the context if need be and appropriate the generated strings accordingly, but I strongly agree with others who voiced their concern about whether this topic actually falls within the scope of the spec. |
Joining the party late. Let us be clear on the reason why UCC are injected into date / time patterns (i.e. 05 August 1934) in CLDR. They are injected to assure certain display (i.e. we don't want to see 05 1934 August). Which means the assumption is that rendering engine using those date / time patterns is UBA (Unicode Bidi Algorithm) compliant and fully supports UCC (Unicode control characters such as LRE, RLE, PDF, LRM etc...). This is unfortunately not true in all cases. The proposal to CLDR mentioned at the top of this thread was not meant to resolve display problem. It was not about injection of UCC during formatting. It was about defining the rules for display of text with inherent structure (date /time stamp is just one of many cases). For example we want breadcrumbs to flow from left to right for English / French / Russian ... UI while (1 >> 2 >> 3) for Arabic / Hebrew / Urdu ... UI we want them to flow from right to left (3 <<< 2 <<< 1). Because before we approach the solution of the problem, we would like to have a clear understand about expected display. The expected display for the same pattern may be different for different cultures (think of mathematical formulas). It is only because standard UBA is not capable of automatic identification of structure and enforcing it for display purposes , such a proposal was created. May be in the distant future (or may be not so distant) Siri, Cortana, Watson and similar technology will be able to cope with it. But at the moment it needs to be done manually. What I hope to achieve is some level of automation by:
|
Update: |
@bterlson we resolve the user's locale on the server via a combination of HTTP content negotiation and the user's settings. With their resolved locale we render on the React app on the server and client using this resolved locale value. |
@ericf alright makes sense, thanks for the clarification. |
It looks like this discussion is resolved. Please reopen if necessary. |
Notes:
Problems:
Proposals:
Links:
The text was updated successfully, but these errors were encountered: