Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add sections on zero-copy and exotic types to style guide #699

Merged
merged 2 commits into from
Jun 10, 2021
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 30 additions & 0 deletions docs/process/style_guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -602,6 +602,22 @@ fn main() {

## Data Types

### Zero-copy in DataProvider structs :: required

All data structs that can be passed through the DataProvider pipeline must support *zero-copy deserialization:* in practice, no heap allocations should be required when deserializing from Bincode-like formats. This means that if the type involves variable-length data like strings, vectors, and maps, it must use a zero-copy type backed by a byte buffer to represent them.
sffc marked this conversation as resolved.
Show resolved Hide resolved

Data structs with zero-copy data should have a `'s` lifetime parameter.

Examples of types that can be used in zero-copy data structs:

- Strings: `Cow<'s, str>`, except as noted below
- Vectors of fixed-width types: `ZeroVec<'s, T>`
- Examples: `ZeroVec<'s, u32>`, `ZeroVec<'s, TinyStr8>`
- Vectors of variable-width types: `VarZeroVec<'s, T>`
- Example: `VarZeroVec<'s, String>`
- Maps: `ZeroMap<'s, K, V>`
- Example: `ZeroMap<'s, TinyStr4, String>`

### Conventions for strings in structs :: suggested

Main issue: [#113](https://github.com/unicode-org/icu4x/issues/113)
Expand Down Expand Up @@ -654,6 +670,20 @@ struct MyStructOptions {
}
```

### Pre-parsed fields (exotic types) :: suggested

Main issue: [#523](https://github.com/unicode-org/icu4x/issues/523)

Data in memory should be fully parsed and ready to use. For example, if a data struct contains a datetime pattern, that pattern should be represented as a `Pattern`, not as a string. We call these *exotic types*.

Keep the following in mind when using exotic types:

1. **Stability:** Since exotic types become part of the serialization format of the data struct, their serialized form must remain stable, according to the data struct versioning requirements discussed in [data_pipeline.md](../design/data_pipeline.md).
2. **Zero-Copy:** If the exotic type involves variable-length data (like a string or a vector), it must also support zero-copy deserialization, as described above. This means that such an exotic type must have a lifetime parameter and internal `Cow`s or `ZeroVec`s for data storage.
3. **Data Integrity:** In most cases, it is insufficient to auto-derive `serde::Deserialize` on an exotic type. Deserialization must perform data validation in order to retain internal invariants of the exotic type.

If it is not possible to obey these requirements in an exotic type, use a standard type instead, but make sure that it requires minimal parsing and post-processing.

# Error Handling

See also the [Error Handling](https://doc.rust-lang.org/book/ch09-00-error-handling.html) chapter in the Rust Book.
Expand Down