-
Notifications
You must be signed in to change notification settings - Fork 180
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Split date time data into smaller data keys? #257
Comments
Based on @sffc comments in that PR, it also seems like he'd suggest we split the dates data into multiple keys:
I assume that for things like time zone names, interval patterns and relative display names he's suggestion we also create separate keys. |
First, please leave skeletons (including Responses to the several sub-questions: DataProvider Flexibility
Right. The intended goal of the data provider keys is not to mimic CLDR. It's to return data that is easy and efficient to consume at runtime. Mapping from CLDR format to ICU4X format happens in CldrJsonDataProvider. I want to make sure we are aligned on that goal. Number of Data KeysRegarding the breakdown of the data keys. I feel strongly that there should be a minimum of three distinct data keys for DateTimeFormat, which Zibi listed in #257 (comment). Here are my reasons:
Format Widths for Display NamesI've been thinking about the format widths (long/short/narrow) for display names and whether they belong in the data requests (key/entry) or not. I'm thinking that no, they don't belong in the data request; long/short/narrow should be together in the same data key and entry, and they should be a leaf of the struct. My reasoning:
Format Widths for PatternsThe situation is a bit different for pattern widths (dateStyle/timeStyle). None of the three aforementioned conditions apply here: the width patterns are not strongly correlated; they do not fall back; and we should be able to slice the data very early in the call stack. Therefore, I think it makes sense to put format widths into either the data key or the data entry. I would like to make this judgement after we have the code written and we can sit down and look at the concrete implications to the data bundles. Calendar SystemsI'm arriving at the conclusion that calendar systems should be in the data key. Reasons:
We may at some point want to add an all-in-one calendar data key, but this is not relevant right now. We should cross that bridge later when we add calendar math to ICU4X. |
Thank you! This is so helpful! |
Because of reasons I discussed in the new doc datetime-input.md, I now believe that we should not have different data keys for different calendar systems. We should pool the essential symbols for calendar systems into the same data key. We could consider filtering the data in the data entry or the offline build tool. However, I do still feel that we should have different data keys for patterns, date symbols, and time symbols. We should probably go further and split the date symbols down into eras, months, day periods, and time zone names. |
Concretely, I envision DateTimeFormat using the following separate, orthogonal keys, which covers all formatting except for time zone (which I want to leave to a separate discussion): Display Names
Format Patterns
Why more keys instead of fewer keys? I listed reasons in #257 (comment), but to reiterate:
|
I'm convinced. This looks like a great design. One additional benefit of it is that version changes will be less common and more isolated in a more chunked model. |
There are still some open questions in my mind about how exactly to provision data across calendar systems, but that question is being tracked in #355. |
Shane to implement this along with #355. |
This should be one of the last things to do in 1.0 after DTF has stabilized. We should do this before 1.0 because it impacts data file stability. |
I think this has two parts, one part is the data keys for the ECMA-402 compatible components bag, and the other is for the ideal components bag. Blocking for 1.0 will be ensuring we have the best split for the ECMA-402 compatible components bag. |
Make sure to look at the data representation of the glue pattern and make changes if necessary for future-proofing. See #1131 |
This comment was marked as outdated.
This comment was marked as outdated.
We have split symbols from patterns and date from time; this is sufficient for the first release. I would still like to explore even more-granular splitting, but there's no time in 1.0 and we should coordinate this with the Ideal Components Bag work. |
The neo date time format stuff does this |
Posting this here because it seems like as good of a place as any: I'l looking into the minimal set of patterns required for year formatting. A year can take three forms, which we could make dynamically selectable at runtime based on the value of the year:
I looked into whether the patterns used for case 1 and 2 differ other than the length of the year field. I found that, at least according to the CLDR algorithm and data, the patterns for one are mostly identical to the patterns for the other with the year width swapped out. There are a few exceptions:
Legend: pattern on the left is the possibly-reduced-precision year with a full-precision year substituted. Pattern on the right is the pattern resulting from a full-precision year used during skeleton selection.
DataLocale{ug}/"chinese": "r-M-d" != "r-MM-dd" Spot-checking, in most of these cases, the pattern is inherited from the root locale, which makes me question the quality of the data. If I were to store the patterns separately, I could use another flag in the data struct. We currently have a single byte reserved for flags in the packed data structure, and 4 bits are used, so this would mean using one more bit. However, we could keep the bit unset if the locale results in equivalent data as observed above. https://github.com/unicode-org/icu4x/blob/main/components/datetime/src/provider/neo.rs#L560 This could be done as a follow-up so long as the bit remains available. |
I believe this issue is fully resolved. |
As I'm implementing Dates in DataProvider and testing them using
DateTimeFormat
, I have some questions about how should we structure that.Generally, the data in question looks like this: https://github.com/unicode-cldr/cldr-dates-modern/tree/master/main/en
It has (per locale):
For now, we need:
Display names come in different:
but they also can be for different calendar systems (I see at least "generic" and "gregorian").
As far as I understand
DataProvider
we have some flexibility in what do we request and what we get in response.We could, for example, put
"months/format/narrow"
as a variant inDataEntry
andgregory
inDataKey
and get just a list of month names in"format"
and"narrow"
for"gregory"
calendar.Or, we can just ask for
"gregory"
and set novariant
inDataEntry
and get all display names for all contexts and all widths.@sffc - what are your thoughts on that? How should a request/response look like?
The text was updated successfully, but these errors were encountered: