Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shake output should be variable length #14

Closed
burdges opened this issue Feb 13, 2017 · 18 comments
Closed

Shake output should be variable length #14

burdges opened this issue Feb 13, 2017 · 18 comments

Comments

@burdges
Copy link

burdges commented Feb 13, 2017

Sha3's shake mode should provides variable length output determined at runtime, but it's currently a user defined type level numeric. I'd think the OutputSize type of the Digest trait should be replaced by a Output type to fix this, but I have no looked into doing it.

@newpavlov
Copy link
Member

newpavlov commented Feb 13, 2017

You are right to bring this issue. I've known about it and planned to fix it later, but I am not completely sure what approach to use.

One way is to create three traits, which I imagined would look somewhat like this:

pub trait Digest {
    type BlockSize: ArrayLength<u8>;
    fn input(&mut self, input: &[u8]);
}

pub trait FixedOutput : Digest {
    type OutputSize: ArrayLength<u8>;
    fn result(self) -> GenericArray<u8, Self::OutputSize>;
}

pub trait VariableOutput : Digest {
    fn result(self, &mut [u8]);
}

Here length of the passed slice will be length of the output. Disadvantage here is need to import two traits instead of one for all users.

Alternative approach would be to create VariableDigest and FixedDigest traits:

pub trait FixedDigest : Digest {
    type OutputSize: ArrayLength<u8>;

    fn input(&mut self, input: &[u8]) {
       Digest::input(self, input);
    }

    fn result(self) -> GenericArray<u8, Self::OutputSize>;
}

pub trait VariableDigest : Digest {
    fn input(&mut self, input: &[u8]) {
       Digest::input(self, input);
    }

    fn result(self, &mut [u8]);
}

It would be a bit more ergonomic to use for most of the cases, but solution is a less elegant. Also it can cause problems and be unergonomic in case if VariableDigest and FixedDigest will be needed at the same time. It could be solved by renaming methods to variable_input, variable_result, but I like this solution even less.

Currently I am leaning towards the second solution without method renaming.

I wanted to experiment with it after fixing #5 based on blake2-rfc. But of course pull requests are also welcomed. ;)

@burdges
Copy link
Author

burdges commented Feb 13, 2017

Is there some restriction of the generic_array crate causing this? I forget if you can write Self::Output == GenericArray<u8,N> in a where clause. If not, you could make a trait reflect GenericArray somehow :

pub trait Digest {
    type Output;
    type BlockSize: ArrayLength<u8>;

    /// Digest input data. This method can be called repeatedly
    /// for use with streaming messages.
    fn input(&mut self, input: &[u8]);

    /// Retrieve the digest result. This method consumes digest instance.
    fn result(self) -> Self::Output;

    /// Get the block size in bytes.
    fn block_bytes(&self) -> usize { Self::BlockSize::to_usize() }

    /// Get the block size in bits.
    fn block_bits(&self) -> usize { 8 * Self::BlockSize::to_usize() }

    /// Get the output size in bytes.
    default fn output_bytes(&self) -> usize;

    /// Get the output size in bits.
    default fn output_bits(&self) -> usize { 8 * self.output_bytes() }
}

pub trait IsAGenericArray { 
    type Element;
    type Length;
}
impl<T,N> IsAGenericArray for GenericArray<T,N> {
    type Element = T;
    type Length = N;
}

impl<T> Digest for T where T: Digest, Self::Output: IsAGenericArray {
    default fn output_bytes(&self) -> usize {
        Self::Output::Length::to_usize() * ::std::mem::size_of::<Self::Output::Element>() 
    }
}

Also, these four methods for the block or output size in bytes and bits only return a runtime representation anyways, so maybe you want them to be type level numbers in a separate DigetFixed trait and drop them from the Digest trait entirely. At present, rust-crypto uses them as runtime parameters in hmac, hkdf, and pbkdf2.rs, and hmac is itself used by hkdf. I could do the same in lioness probably, but would anyone make say an hmac construction that needs runtime output size though?

@newpavlov
Copy link
Member

newpavlov commented Feb 13, 2017

AFAIK when using generic_array (and typenum in general) all related expressions must be evaluated at compile time, so size of output (GenericArray<u8, Self::OutputSize>) can not be changed at runtime. With generic-array compiler creates monomorphized functions for each used size in case of generic parameter, as usual with types. So when using current Digest you get stack allocated array, size of which must be known at compile time. (at least until we'll get alloca support) Thus I don't think your code can be compiled. (if I understood it correctly)

As for dropping type level numbers from Digest, in my opinion the default mode for hashes is a fixed one, variable length functions more an exception than rule. (especially variable at runtime)

Regarding methods block_* and output_* they are excessive and I thought about removing them altogether. But BlockSize used later in HMAC for creating internal GenericArray buffer, this is why it must be in the form of type parameter. (I think the same will be done for hkdf and pbkdf2 when I get to them) As for runtime variable output size for HMAC, I honestly don't know if it will be usefull for someone, but if we can easily do it generically, why impose unnecessary restrictions?

@burdges
Copy link
Author

burdges commented Feb 13, 2017

There was a type-o in my comment, oops. I'd meant that the Output type should not be restricted in the trait.

I'd also ignored that result() -> Output though so you cannot take Output to be [u8] anyways. You could make it work with result(&mut Output) like in rust-crypto, hence your multi-trait ideas.

Another approach might be :

pub trait Digest {
    type Output;
    type BlockSize: ArrayLength<u8>;
    fn input(&mut self, input: &[u8]);
    fn finalize(self) -> Output;
}

pub trait DigestOutput {
    fn result(&mut self, &mut [u8]);
}

Any conventional mode would simply consume the hasher with finalize and return the fixed length Output. Any mode offering variable length output would transform itself like

pub struct Shake<..> { _sha3: Sha3<..> }
pub struct ShakeOutput<..>{ _sha3: Sha3<..> }

impl Digest for Shake<..> {
    type Output = ShakeOutput<..>;
    ...
    fn finalize(self) { ShakeOutput { _sha3: self._sha3 } }
}

In this, there is no allowance for modes that intermix input and output operations however, not sure if that's allowed by any common modes. It's possible one should always drop down to the underlying permutation function for that sort of thing.

@newpavlov
Copy link
Member

newpavlov commented Feb 13, 2017

With DigestOutput you force user to copy data from result, instead of immediately using stack allocated array as result in a fixed case scenario. So you'll have to first create an array, get result, copy data from result to this preallocated array, that gives three operations instead of one, and this is not mentioning potential runtime errors which before that were enforced at compile time by the type system. Also it adds a bit of boilerplate to each crate. In my opinion this approach introduces unnecessary complexities and has important disadvantages.

I will try to play around this problem a little more. Also it possible that someone else will come up with a good solution to it later.

@burdges
Copy link
Author

burdges commented Feb 13, 2017

Just to be clear, any normal mode would only satisfy Digest<Ouput=[u8; n]>, while only variable length modes would offer an output that satisfies DigestOutput. Anyways it's true there are various RFCs being iterated upon that'd simplify all this.

@burdges
Copy link
Author

burdges commented Feb 15, 2017

Are there any hash functions with modes that allow intermixed input after output?

@newpavlov
Copy link
Member

newpavlov commented Feb 15, 2017

Are there any hash functions with modes that allow intermixed input after output?

If I understood your question correctly, this use-case is not covered explicitly and as far as I remember all hashes in this repository have some kind of finalization, but you can take advantage of Copy/Clone trait by copying state before calling result().

@burdges
Copy link
Author

burdges commented Feb 15, 2017

Right. I realized OS PRNGs do this without cloning, but they pull randomness from various outside sources as they cannot know how much they'll be expected to produce at any given time. You could avoid that with a better permutation, but afaik cryptographers do not do so currently, if only because their constructions mirror the underlying mathematics. Anyways if anybody wants to do this in the future then they can make a special API. No need to worry about it here.

@burdges
Copy link
Author

burdges commented Feb 22, 2017

Just fyi, a degree of support for defining arrays whose length is given by an associated constant just landed rust-lang/rust#40008 It currently depends upon not using parameters rust-lang/rfcs#1915 (comment) but that's probably fine.

@newpavlov
Copy link
Member

But in this PR it's clearly written that:

don't get too excited, array lengths still can't depend on type parameters

Or am I missing something?

Also note that for example in case of groestl we use conditions on constants, so unfortunately until something like this will land we'll have to use typenum.

@burdges
Copy link
Author

burdges commented Feb 22, 2017

I've no idea how long before array lengths can be controlled by type parameters, but it's seemingly on the way, while const dependent types sound distant future if ever.

@newpavlov
Copy link
Member

I've updated crates to a new version of digest crate, which now includes traits Input, VariableOutput, FixedOutput and convenience wrapper Digest = Input + FixedOutput. You can see results in the digest-rework branch.

It works mostly fine, but does not fit for Groestl, which requires output size to be passed at hasher initialization. I think I'll just create new(n: usize) method for it and will implement VariableOutput which will accept only slices with length n.

@newpavlov
Copy link
Member

I've merged digest-rework into master and published new versions of the crates.

@burdges
Copy link
Author

burdges commented Apr 9, 2017

As an aside, I'm nervous about HMAC using a key the same size as BlockSize. It's clear that HMAC must know how much input older hashes need to fully replace their internal state, but what that means depends upon the hash, and may not have much to do with the actual BlockSize.

A prioir, I think BlockSize should be removed from the digest traits and a new trait HMACable should specify how much input the hash needs, maybe by just implementing the HMAC code directly. These newer hash functions like SHA3 should never need HMAC operation, so anyone who really wants it can do wrappers themselves. At the extreme, you'd never feel like Argon2 needed a gigabytes of input. ;)

@newpavlov
Copy link
Member

Hmac implementation follows the standard definition of the HMAC (e.g. RFC 2104) introduced for digest functions which operate on the blocks of data, and size of the block is used to define algorithm. So in my opinion generic implementation for digest function which have fixed block size and output size is a right approach.

Though you are right that it's not always sensible to use HMAC construct (e.g. for BLAKE2), in those cases MAC trait will be implemented directly on the digest struct. It will be done for BLAKE2 and probably for SHA-3 (using simple prepending of the key). I should not forget to add to documentation a discouragement of using HMAC in case if digest already implements MAC. But I don't think we need to artificially restrict ability to use HMAC on such functions.

A more philosophical note: as I see it, users of "low-level" RustCrypto crates should have a bare minimum understanding of that they are doing. It's a bit risky, but it's an intentional decision to open a path for a more flexible ecosystem. As a result, for example, it allows inclusion of compromised or experimental algorithms.

@burdges
Copy link
Author

burdges commented Apr 9, 2017

Yes, it makes sense to implement RFC 2104 where you have older hash functions, but one could still place BlockSize elsewhere besides in the Digest traits. Anyone who wants an HMAC-Blake2b can always make a wrapper and pick their own BlockSize.

I'm not suggesting this influences usability, safety, etc. It's just pure aesthetics for hash functions without any sensible BlockSize. :)

@newpavlov
Copy link
Member

Lets move this discussion into #20.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants