Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(cache,model)!: parse image hashes #1405

Merged
merged 18 commits into from
Jan 1, 2022
Merged

feat(cache,model)!: parse image hashes #1405

merged 18 commits into from
Jan 1, 2022

Conversation

zeylahellyer
Copy link
Member

Parse image hashes as 128 bit integers. Image hashes from Discord are base 16, 32 character length strings, and so we can reliably parse and fit them into a 128 bit integer, with an additional bit (padded to 8 bits) to store whether the image hash is animated, totalling 17 bytes to store an image hash. This is an alternative over storing image hashes as heap allocated Strings, which are stored as a pointer (usize), a length (usize), and capacity (usize), totalling 24 bytes on most systems, in addition to the value on the heap. This will give us a win in terms of memory usage.

In terms of CPU usage, we do lose out a little by parsing image hashes compared to simply allocating the content:

imagehash string        time:   [191.64 ns 191.94 ns 192.26 ns]
imagehash parse         time:   [271.75 ns 271.84 ns 271.96 ns]

This can likely be improved in a later PR by improving optimizations in the primary parsing loop.

In total, this removes 45 Strings from the cache and model crates, effectively removing about 1/7th of Strings in the model crate alone.

The API of this looks like:

/// Efficient storage for Discord image hashes.
 #[derive(Clone, Copy, Debug, Eq, Hash, PartialEq)]
struct ImageHash {
    /// Instantiate a new hash from its raw parts.
    const fn new(bytes: [u8; 16], animated: bool) -> Self;

    /// Parse an image hash into an efficient integer-based storage.
    const fn parse(value: &[u8]) -> Result<Self, ImageHashParseError>;

    /// Efficient packed bytes of the hash.
    const fn bytes(self) -> [u8; 16];

    /// Whether the hash is for an animated image.
    const fn is_animated(self) -> bool;

    /// Create an iterator over the nibbles of the hexadecimal image hash.
    const fn nibbles(self) -> Nibbles;
}

impl<'de> Deserialize<'de> for ImageHash;

impl Display for ImageHash;

impl FromStr for ImageHash;

impl Serialize for ImageHash;

impl TryFrom<&'_ [u8]> for ImageHash;

impl TryFrom<&'_ str> for ImageHash;

/// Iterator over the nibbles of a hexadecimal image hash.
#[derive(Debug)]
struct Nibbles;

impl DoubleEndedIterator for Nibbles;

impl ExactSizeIterator for Nibbles;

impl Iterator for Nibbles;

Parse image hashes as 128 bit integers. Image hashes from Discord are
base 16, 32 character length strings, and so we can reliably parse and
fit them into a 128 bit integer, with an additional bit (padded to 8
bits) to store whether the image hash is animated, totalling 17 bytes to
store an image hash. This is an alternative over storing image hashes as
heap allocated Strings, which are stored as a pointer (usize), a length
(usize), and capacity (usize), totalling 24 bytes on most systems, in
addition to the value on the heap. This will give us a win in terms of
memory usage.

In terms of CPU usage, we do lose out a little by parsing image hashes
compared to simply allocating the content:

```
imagehash string        time:   [191.64 ns 191.94 ns 192.26 ns]
imagehash parse         time:   [271.75 ns 271.84 ns 271.96 ns]
```

This can likely be improved in a later PR by improving optimizations in
the primary parsing loop.

In total, this removes 45 Strings from the cache and model crates,
effectively removing about 1/7th of Strings in the model crate alone.

The API of this looks like:

```rust
/// Efficient storage for Discord image hashes.
 #[derive(Clone, Copy, Debug, Eq, Hash, PartialEq)]
struct ImageHash {
    /// Instantiate a new hash from its raw parts.
    const fn new(bytes: [u8; 16], animated: bool) -> Self;

    /// Parse an image hash into an efficient integer-based storage.
    const fn parse(value: &[u8]) -> Result<Self, ImageHashParseError>;

    /// Efficient packed bytes of the hash.
    const fn bytes(self) -> [u8; 16];

    /// Whether the hash is for an animated image.
    const fn is_animated(self) -> bool;

    /// Create an iterator over the nibbles of the hexadecimal image hash.
    const fn nibbles(self) -> Nibbles;
}

impl<'de> Deserialize<'de> for ImageHash;

impl Display for ImageHash;

impl FromStr for ImageHash;

impl Serialize for ImageHash;

impl TryFrom<&'_ [u8]> for ImageHash;

impl TryFrom<&'_ str> for ImageHash;

/// Iterator over the nibbles of a hexadecimal image hash.
 #[derive(Debug)]
struct Nibbles;

impl DoubleEndedIterator for Nibbles;

impl ExactSizeIterator for Nibbles;

impl Iterator for Nibbles;
```

BREAKING CHANGES: cache and model fields and methods that are or return
image hashes are now `twilight_model::util::ImageHash`es instead of
Strings.

Signed-off-by: Zeyla Hellyer <[email protected]>
Signed-off-by: Zeyla Hellyer <[email protected]>
Signed-off-by: Zeyla Hellyer <[email protected]>
model/src/util/image_hash.rs Outdated Show resolved Hide resolved
model/src/util/image_hash.rs Outdated Show resolved Hide resolved
model/src/util/image_hash.rs Outdated Show resolved Hide resolved
Signed-off-by: Zeyla Hellyer <[email protected]>
Signed-off-by: Zeyla Hellyer <[email protected]>
Signed-off-by: Zeyla Hellyer <[email protected]>
@zeylahellyer zeylahellyer added c-cache Affects the cache crate c-model Affects the model crate m-breaking change Breaks the public API. t-feature Addition of a new feature labels Dec 31, 2021
Signed-off-by: Zeyla Hellyer <[email protected]>
Signed-off-by: Zeyla Hellyer <[email protected]>
Erk-
Erk- previously requested changes Dec 31, 2021
model/src/util/image_hash.rs Outdated Show resolved Hide resolved
model/src/util/image_hash.rs Show resolved Hide resolved
Signed-off-by: Zeyla Hellyer <[email protected]>
Signed-off-by: Zeyla Hellyer <[email protected]>
Signed-off-by: Zeyla Hellyer <[email protected]>
Signed-off-by: Zeyla Hellyer <[email protected]>
Signed-off-by: Zeyla Hellyer <[email protected]>
@github-actions github-actions bot added c-all Affects all crates or the project as a whole c-command-parser c-gateway Affects the gateway crate c-http Affects the http crate c-lavalink Affects the lavalink crate c-standby Affects the standby crate c-util Affects the util crate c-validate Affects the validate crate t-ci Anything to do with CI. labels Dec 31, 2021
@zeylahellyer zeylahellyer linked an issue Dec 31, 2021 that may be closed by this pull request
@github-actions github-actions bot removed c-http Affects the http crate c-command-parser c-util Affects the util crate c-lavalink Affects the lavalink crate c-standby Affects the standby crate c-gateway Affects the gateway crate c-validate Affects the validate crate labels Dec 31, 2021
@zeylahellyer zeylahellyer merged commit f4bb1d9 into twilight-rs:next Jan 1, 2022
@zeylahellyer zeylahellyer deleted the feat-image-hash branch January 1, 2022 19:20
7596ff added a commit that referenced this pull request Jan 23, 2022
Changes

All types and method signatures have been updated to use the new `Id<T>` syntax
([#1260] - [@zeylahellyer]).

The MSRV has been updated to 1.57 ([#1402] - [@zeylahellyer]).

Methods that return an image hash, such as `CachedGuild::banner`, now return an
`ImageHash` instead of a string ([#1405] - [@zeylahellyer]).

The Rust edition has been updated to 2021 ([#1412] - [@vilgotf]).

`UpdateCache` trait is now sealed ([#1431] - [@vilgotf]).

[#1260]: #1260
[#1402]: #1402
[#1405]: #1405
[#1412]: #1412
[#1431]: #1431

[@vilgotf]: https://github.com/vilgotf
[@zeylahellyer]: https://github.com/zeylahellyer
7596ff added a commit that referenced this pull request Jan 23, 2022
`Id<T>`

IDs are now a unified type (`Id`) with marker generics
(`ApplicationMarker`, ...) ([#1260] - [@zeylahellyer]). The new type
implements all of what each type used to implement, as well as
`FromStr`, `TryFrom<u64>`, and `TryFrom<i64>`, and others. `Id::cast`
aids in converting between IDs without copying. See the PR and the
documentation for more details.

Additions

Support scheduled events ([#1347] - [@7596ff]). Adds the following
types: `EntityMetadata`, `EntityType`, `GuildScheduledEventUser`,
`GuildScheduledEvent`, `PrivacyLevel`, and `Status`.

Changes

All types and method signatures have been updated to use the new `Id<T>`
syntax ([#1260] - [@zeylahellyer]).

The MSRV has been updated to 1.57 ([#1402] - [@zeylahellyer]).

Image hashes are now parsed and stored in a more efficient struct,
rather than deserializing as a `String` ([#1405] - [@zeylahellyer]).

The Rust edition has been updated to 2021 ([#1412] - [@vilgotf]).

`StageInstance::discoverable_disabled` and `PrivacyLevel::Public` have
been removed, as public stage instances are no longer supported ([#1479]
- [@itohatweb]).

`GuildWidget::channel_id` is now optional ([#1480] - [@itohatweb]).

[#1260]: #1260
[#1402]: #1402
[#1405]: #1405
[#1412]: #1412
[#1479]: #1479
[#1480]: #1480

[@7596ff]: https://github.com/7596ff
[@itohatweb]: https://github.com/itohatweb
[@vilgotf]: https://github.com/vilgotf
[@zeylahellyer]: https://github.com/zeylahellyer
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c-all Affects all crates or the project as a whole c-cache Affects the cache crate c-model Affects the model crate m-breaking change Breaks the public API. t-ci Anything to do with CI. t-feature Addition of a new feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Parse image hashes for storage efficiency
5 participants