Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Secure finishing for Hasher #1768

Open
Ericson2314 opened this issue Oct 10, 2016 · 14 comments
Open

Secure finishing for Hasher #1768

Ericson2314 opened this issue Oct 10, 2016 · 14 comments
Labels
T-libs-api Relevant to the library API team, which will review and decide on the RFC.

Comments

@Ericson2314
Copy link
Contributor

Ericson2314 commented Oct 10, 2016

With sufficiently expressive associated constants (including rust-lang/rust#34344), it would be nice to provide scaffolding for secure hashing without allocation by extending Hasher like:

pub trait Hasher {
    const DigestSize: usize = 8;

    /// Completes a round of hashing, producing the untruncated output hash generated.
    fn untruncated_finish(&self) -> [u8; Self::DigestSize] {
        self.finish()
    }

    ...
}

[I think specialization might also be required to write that default method; I forget what syntax is needed.]

@burdges
Copy link

burdges commented Oct 12, 2016

Just some thoughts :

I think finishing semantics can vary among hash functions. If I understand untruncated_finish, then it might often require coping the state, not such a big deal, but finish should not usually call untruncated_finish. And hash functions might've different behavior with respect to whether more input can be accepted after doing some output. Also, some hash functions have variable length output, like Shake for SHA3.

If you want a hash trait usable for protocols, not just local hash maps, then endianness matters. Also, it's unclear how/if a hash trait, or many writer-like traits, can really consume a usize/isize, well maybe you could map them to u64/i64, assuming they'll never exceed that value. That's okay for hashing, but a general writer traits cannot do that due to the corresponding reader trait needing to go the other way.

I'd propose using separate traits for hash function input and output, so the input trait can be very general, live in core, maybe handle issues like endianness or usize/isize consistently, etc., while the output traits could diversify slightly according to the hash function, and mostly live in crates besides core or std. At least that's what we seemingly wanted in DaGenix/rust-crypto#333

@Ericson2314
Copy link
Contributor Author

@burdges Good points.

For split input I wish Read/Write had an associated error type, so Write<Error=!> could be used. Similarly, https://github.com/BurntSushi/byteorder/blob/master/src/lib.rs takes care of endianness for any Write and could be used for hashing too. The lack of a valid usize/isize is fine with me because anything of those types won't have cross-platform meaning anyways and shouldn't be used in a protocol.

I didn't know there was so much diversity in finalization.

@burdges
Copy link

burdges commented Oct 13, 2016

I suppose SHA3's Shake output mode could be handled by returning an associated reader-like type or something, so maybe one does only need the associated type. We should therefore not need to think about output traits now, except that you need an associated type, not just an associated constant.

It's true hash functions normally cannot fail, so the error type is irksome. I know a Result<T,()> is equivalent to an Option<T> but I donno if a Result<T,!> is equivalent to just a wrapped T, which one wants for that case. It's not crazy to have a reader-like trait that cannot fail though.

Argon2 would often be configured to allocate many megabytes, or even gigabytes, of memory though, so presumably one wants an Argon2 builder function that does the allocation, and can fail, but returns an object that consumes input without the possibility of failure.

I suspect there are cleaner solutions to endianness than the ByteOrder create with slices and unwrap everywhere. I hacked up the endian-types crate here to do conversions to/from fixed length array types [u8; n]. I'll redo it more cleanly though.

I understand why reader traits do not have endianness as a type parameter, well usually they're outputting to something like JSON that does not care. I suspect hashing might benefit from making endianness a type parameter of the hash input trait.

@Ericson2314
Copy link
Contributor Author

Result<T,!> is indeed equivalent to !. That's one of the reasons ! is so useful!

I'll look at your links in more detail tomorow, but see jethrogb/rust-core_io#3 for me trying to get maintainable better Read/Write.

@nrc nrc added the T-libs-api Relevant to the library API team, which will review and decide on the RFC. label Oct 17, 2016
@burdges
Copy link

burdges commented Oct 18, 2016

At least there is an explanation for my worries around fixed size arrays in #1772

@burdges
Copy link

burdges commented Oct 23, 2016

I suppose the byteorder crate's tendency to specify endianness with each write or read is actually incorrect. Instead, endianness should be embedded in the type of the hasher and its builder, maybe via some wrapper hasher and/or builder that sets the endianness.

@Ericson2314
Copy link
Contributor Author

That's easily done too. With Hash <=> Write<Error=!>, it would be hard to prevent the use of edniannesss overrides, but certainly a generic wrapper type that sets the default would be easy.

@burdges
Copy link

burdges commented Oct 23, 2016

I'd think hashers should take native endianness by default, given that they're frequently used for internal data structures like HashMap. If you are hashing for a protocol, then you should use a wrapper that sets the endianness by calling .to_be/.to_le everywhere.

@burdges
Copy link

burdges commented Nov 14, 2016

Is there anything wrong with just adding an associated type?

pub trait Hasher {
    type Finish = u64; // Default usable by HashMap

    fn finish(&self) -> Finish;
    fn finish_mut(&mut self) -> Finish {
        finish(self)
    }
    ...
}

It might break some code around Hasher implementations, but not necessarily if the associated type can be defaulted to u64. I'd think hash functions that need multiple return types can simply implement create different types for each return type. In most cases, you must know the return type when you start hashing, so that's no biggie.

There is another issue that some hash functions cannot, or should not, copy their internal state, so you likely want this finish_mut(&mut self) -> Finish that mutably borrows the hash and wrecks it as part of finishing. We need this because modern password hashing algorithms like Argon2 hog ram. We could even add a type FinishMut = Finish; to make Finish = ! possible, but afaik anything like Argon2 that uses enough ram to want that could be configured to use little enough ram that copying remains possible. And finish could always just panic if you really should never call it.

In principle, one could maybe avoid the associated type as follows, but doing so seems worse.

pub trait HasherFinish<H> {
    fn finish(&H) -> Self;
    fn finish_mut(&mut H) -> Self;
}
pub trait Hasher {
    fn  finish<T>(&self) -> T where T: HasherFinish<Self> {
        HasherFinish::finish(self)
    }
    fn  finish_mut<T>(&mut self) -> T where T: HasherFinish<Self>{
        HasherFinish::finish_mut(self)
    }
    ...
}

Initially I was thinking about finish(&self) -> impl HasherFinish<Self> even but that's forbidden.

@burdges
Copy link

burdges commented Nov 30, 2016

Just posted a little wrapper that addresses the endianness issue. Interestingly it would not work with the input_hashable approached suggested here, so maybe that's another argument for an associated type.

@burdges
Copy link

burdges commented Dec 7, 2016

Appears an associated Output type was removed in #823 but only coincidentally as part of simplifying the Hash trait. I donno if making Output default to u64 resolves the reservations expressed there.

If the associated type is actually painful, even with the default, then an approach like

pub trait HasherIn {
    fn write(&mut self, bytes: &[u8]);

    fn write_u8(&mut self, i: u8) { ... }
    fn write_u16(&mut self, i: u16) { ... }
    fn write_u32(&mut self, i: u32) { ... }
    fn write_u64(&mut self, i: u64) { ... }
    fn write_usize(&mut self, i: usize) { ... }
    fn write_i8(&mut self, i: i8) { ... }
    fn write_i16(&mut self, i: i16) { ... }
    fn write_i32(&mut self, i: i32) { ... }
    fn write_i64(&mut self, i: i64) { ... }
    fn write_isize(&mut self, i: isize) { ... }
}

pub trait Hasher : HasherIn {
    fn finish(&self) {.. }
}

likely restricts the breakage to the Hasher ecosystem, meaning Hashers and code like <T as Hasher>::write(..).

If nothing like that works then crypto code can always import some hashish crate that does roughly :

pub trait Digester {
    fn write(&mut self, bytes: &[u8]);

    fn write_u8(&mut self, i: u8) { ... }
    fn write_u16(&mut self, i: u16) { ... }
    fn write_u32(&mut self, i: u32) { ... }
    fn write_u64(&mut self, i: u64) { ... }
    fn write_usize(&mut self, i: usize) { ... }
    fn write_i8(&mut self, i: i8) { ... }
    fn write_i16(&mut self, i: i16) { ... }
    fn write_i32(&mut self, i: i32) { ... }
    fn write_i64(&mut self, i: i64) { ... }
    fn write_isize(&mut self, i: isize) { ... }
}

impl<T> Hasher for T where T: Digester {
    fn finish(&self) { panic!("Hasher::finish()"); }
    fn write(&mut self, bytes: &[u8])  { <T as Digester>::write(self,bytes); }

    fn write_u8(&mut self, i: u8) { <T as Digester>::write_u8(self,i); }
    fn write_u16(&mut self, i: u16) { <T as Digester>::write_u16(self,i); }
    fn write_u32(&mut self, i: u32) { <T as Digester>::write_u32(self,i); }
    fn write_u64(&mut self, i: u64) { <T as Digester>::write_u64(self,i); }
    fn write_usize(&mut self, i: usize) { <T as Digester>::write_usize(self,i); }
    fn write_i8(&mut self, i: i8) { <T as Digester>::write_i8(self,i); }
    fn write_i16(&mut self, i: i16) { <T as Digester>::write_i16(self,i); }
    fn write_i32(&mut self, i: i32) { <T as Digester>::write_i32(self,i); }
    fn write_i64(&mut self, i: i64) { <T as Digester>::write_i64(self,i); }
    fn write_isize(&mut self, i: isize) { <T as Digester>::write_isize(self,i); }
}

@eddiew
Copy link

eddiew commented Jan 23, 2019

+1. Restricting Hasher to u64 output makes it unusable for anything cryptography-related

@burdges
Copy link

burdges commented Jan 24, 2019

Just as an aside, we're unsure exactly what a secure hasher actually looks like, much less what a generic one looks like.

Almost all cryptographic hashing interfaces like digest really suck, due to being modeled on NIST's obsolete hash function competition requirements. In particular, these interfaces encourage wasting enormous amounts of time micro-optimizing the domain separation.

At the lower level, we should abandon all existing hash function and stream cipher based interfaces in favor of mapping permutations into STROBE-like interfaces, but actually doing a zero cost interface for STROBE looks painful, certainly without const generics.

Above this, merlin currently provides the nicest interface for discrete log based signatures and zero knowledge proofs. In essence, you write traits that map your types data into hashing like calls that support good domain separation. Isolating the domain separation into tiers like this prevents micro-optimizations that waste protocol designer time.

Also STROBE-like constructions could likely "double" their rate by feeding the domain separators into hash state separately, so we should eventually outperform the legacy interfaces.

Above this, we want domain separation inferred so the interface should look more like serde, etc.

@Ericson2314
Copy link
Contributor Author

Well regardless of the algorithm, we need to be able to get more bits of information out of it.

It's been a while since I thought of this. Are enough language features stable for this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
T-libs-api Relevant to the library API team, which will review and decide on the RFC.
Projects
None yet
Development

No branches or pull requests

4 participants