Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider supporting multiple hash functions #5

Closed
michaelwoerister opened this issue Jun 21, 2024 · 3 comments · Fixed by #8
Closed

Consider supporting multiple hash functions #5

michaelwoerister opened this issue Jun 21, 2024 · 3 comments · Fixed by #8

Comments

@michaelwoerister
Copy link
Member

The current StableHasher implementation unconditionally uses SipHash13, which is the algorithm of choice for incremental compilation fingerprinting in the Rust compiler. However, for other use cases different hash functions with stronger guarantees, such as BLAKE3, are often preferable.

StableHasher could therefore be generic of the hash algorithm used, e.g.:

/// Trait for retrieving the result of the stable hashing operation.
pub trait StableHasherResult<const W: usize>: Sized {
    fn finish(hash: [u64; W]) -> Self;
}

/// The hash algorithm used
pub trait HashImpl<const W: usize>: Hasher + Debug + Default {
    fn short_write<const LEN: usize>(&mut self, bytes: [u8; LEN]);
    fn finish(self) -> [u64; W];
}

/// When hashing something that ends up affecting properties like symbol names,
/// we want these symbol names to be calculated independently of other factors
/// like what architecture you're compiling *from*.
///
/// To that end we always convert integers to little-endian format before
/// hashing and the architecture dependent `isize` and `usize` types are
/// extended to 64 bits if needed.
#[derive(Debug)]
pub struct StableHasher<const W: usize, H: HashImpl<W>> {
    state: H,
}

See here for a quick and dirty POC impl: https://github.com/michaelwoerister/rustc-stable-hash/tree/generic_hash
I've been using hash: [u64; W] just for simplicity. It might make more sense to make that hash: [u8; W] to get rid of any endianess ambiguities.

Now that StableHasher is available outside the compiler, it makes sense to provide different hash algorithms and to make it clear that SipHash13 is not a generically good choice.

@Urgau
Copy link
Member

Urgau commented Jun 30, 2024

I really like the idea. StableHasher is already a somewhat generic.

@michaelwoerister What's the reason for using a const generic const W: usize and not a associated type?

I'm thinking of something like this, which is less verbose and more flexible.

/// Trait for retrieving the result of the stable hashing operation.
pub trait StableHasherResult<H: HashImpl>: Sized {
    fn finish(hash: H::Output) -> Self;
}

/// The hash algorithm used
pub trait HashImpl: Hasher + Debug + Default {
    type Output;

    fn short_write<const LEN: usize>(&mut self, bytes: [u8; LEN]);
    fn finish(&self) -> Self::Output;
}


#[derive(Debug)]
pub struct StableHasher<H: HashImpl> {
    state: H,
}

@michaelwoerister
Copy link
Member Author

Yes, an associated type makes more sense than a generic parameter.

That StableHasherResult would refer to a specific HashImpl seems a bit unfortunate though.

@Urgau
Copy link
Member

Urgau commented Jul 1, 2024

And it doesn't even need to be the case. StableHasherResult could just directly take the hash type as a generic or associated type. Which seems even cleaner.

/// Trait for retrieving the result of the stable hashing operation.
pub trait StableHasherResult<H>: Sized {
    fn finish(hash: H) -> Self;
}

/// The hash algorithm used
pub trait HashImpl: Hasher + Debug + Default {
    type Output;

    fn short_write<const LEN: usize>(&mut self, bytes: [u8; LEN]);
    fn finish(self) -> Self::Output;
}


#[derive(Debug)]
pub struct StableHasher<H: HashImpl> {
    state: H,
}

impl<H: HashImpl> StableHasher<H> {
    pub fn finish<W: StableHasherResult<H::Output>>(self) -> W {
        W::finish(self.state.finish())
    }
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants