-
Notifications
You must be signed in to change notification settings - Fork 180
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Export key registry from icu_datagen #1512
Comments
Here's the design I'm thinking of. The design hinges on trait KeyRegistry {
// Calls `ConcreteDataCallback::callback::<M>();` for the appropriate marker type M
fn apply_for_concrete_data<R>(&self, key: ResourceKey, callback: &mut impl ConcreteDataCallback) -> Result<R, DataError>;
}
// This is a trait since Rust doesn't support closures with generics, but think of it as a closure.
trait ConcreteDataCallback {
fn callback<M: DataMarker, R>(&mut self) -> Result<R, DataError> where
for<'a> &'a <M::Yokeable as Yokeable<'a>>::Output: serde::Serialize,
for<'de> YokeTraitHack<<M::Yokeable as Yokeable<'de>>::Output>: Deserialize<'de>,
} So, for example, to handle #1661, one would probably use something like struct MyCallback(DataPayload<SerializeMarker>);
impl ConcreteDataCallback for MyCallback {
fn callback<M: ...>() {
let mut serializer = postcard::Serializer {output: vec![]};
let serialized = self.0.serialize(&serializer);
measure_allocations(|| {
let d: DataMarker<M> = postcard::deserialize(&serializer.output)?;
})
}
}
fn check_resource(provider: &impl IterableDynProvider<SerializeMarker>, key: ResourceKey, req: &DataRequest) {
let erased: DataPayload<SerializeMarker> = provider.load_payload(key, req)?
REGISTRY.apply_for_concrete_data(MyCallback(erased));
} The support types for this would live in provider/core behind the serde feature (maybe behind an additional feature?), and the registries would live in provider/cldr and provider/uprops, with provider/datagen using a "combined" registry. |
It would be nicer to be able to use ResourceMarker here, but it's by no means necessary, the macro that constructs the registry will just need to be a little bit more involved. This does decouple this work from #1560 |
Also, to give an idea of what the macro would look like, it would take something like create_registry!(CldrRegistry,
JapaneseErasV1Marker::KEY => JapaneseErasV1Marker,
DatePatternsV1Marker::KEY => DatePatternsV1Marker,
....
); and turn it into struct CldrRegistry;
impl KeyRegistry for CldrRegistry {
fn apply_for_concrete_data<R>(&self, key: ResourceKey, callback: &mut impl ConcreteDataCallback) -> Result<R, DataError> {
match k {
JapaneseErasV1Marker::KEY => callback.callback::<JapaneseErasV1Marker>(),
DatePatternsV1Marker::KEY => callback.callback::<DatePatternsV1Marker>(),
....
_ => Err(MissingResourceKey)
}
}
} And we might have a convenient KeyRegistry impl for tuples that chains multiple registries together. If we could use |
I have a draft here #1664 |
There are two traits that are interesting as part of the data exporter:
The global registry should enable us to "traitify" an AnyProvider or BufferProvider into either of those two exportable traits. So, essentially, my mental model of the global registry is something more along the lines of trait Dynify<M: DataMarker> {
fn dynify_buffer(key: ResourceKey, payload: BufferPayload) -> DataPayload<M>;
fn dynify_any(key: ResourceKey, payload: AnyPayload) -> DataPayload<M>;
} And then the source providers like CLDR and UProps would have impl Dynify<SerializeMarker> for CldrJsonDataProvider
impl Dynify<BakeableMarker> for CldrJsonDataProvider
// ... except that these impls would be generated as part of the pre-existing The way to "chain" Dynifiers together is to use a ForkByKeyProvider. In fact, maybe we should even consider putting these two new methods directly into the To support the #1661 use case, we could have another marker trait like measure_allocations(|| {
let _: DataPayload<EmptyMarker> = everything_provider.dynify_buffer(key, buffer)?;
}); All that said, we should explore the idea of a callback function with a generic parameter. It's a different way of achieving this goal. |
I don't think this part is possible, fwiw, Rust doesn't let you piecewise build up a function based on possible generic intermediates You have to have a separate registry that has a list of keys and types; it cannot be done by individual impl_dyn_provider macros. But this can be made a part of your model by implementing (It's easier to just have separate registries so those registries are reusable)
The reason I was a bit wary of this was that this means that non-test code needs to know about |
Got my attempt at #1669 |
PR: #1670 |
I was expecting something like @Manishearth's design. The registry itself should be kept simple and just be a way to iterate over keys with the correct marker types. |
I assume the closing was not on purpose |
Ah, this built the registry but didn't export it, I see. |
We should consider making icu_datagen invokable as a library, instead of only as a command-line tool. icu_datagen is unique in the sense that it is the only place in the code base that has access to all struct definitions for all data keys.
Benefits of doing this:
IterableProvider
forFsDataProvider
andBlobDataProvider
#1506)FYI, the probability of a hash collision reaches 50% at about 75000 keys, assuming a perfect hash function. So, fairly distant, but definitely high enough for us to worry about it.
CC @robertbastian @Manishearth
The text was updated successfully, but these errors were encountered: