Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace hash_to_base PRG with HKDF-Expand. #141

Merged
merged 14 commits into from
Jul 6, 2019
Merged

Replace hash_to_base PRG with HKDF-Expand. #141

merged 14 commits into from
Jul 6, 2019

Conversation

chris-wood
Copy link
Collaborator

Addresses #137.

@chris-wood chris-wood requested review from kwantam, armfazh and samscott89 and removed request for kwantam and armfazh July 4, 2019 15:49
Copy link
Collaborator

@kwantam kwantam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great.

It wasn't clear to me that it would be quite this easy to invoke HKDF, because I was thinking that the size of the PRK argument to HKDF was restricted to the output length of H, rather than the block size of H. But as far as I can tell, this mostly works for the hash functions we'd expect people to use (but see below).

We will have to stage this and #139 because the definition of m' changes in that PR. (EDIT: #139 is now merged, so we'll have to do some conflict resolution here.)

Assuming we go with the string "HASH-TO-CURVE" as in #139, the length of m' is 14 + H_output_len bytes. Just for sanity, let's see whether this works with the hash functions we might care to use:

  • 224-bit hashes: len(m') = 42 bytes
    • SHA2-224 block size = 64 bytes ✔️
    • SHA3-224 block size = 144 bytes ✔️
    • BLAKE2s224 block size = 64 bytes ✔️
  • 256-bit hashes: len(m') = 46 bytes
    • SHA2-256 block size = 64 bytes ✔️
    • SHA3-256 block size = 136 bytes ✔️
    • BLAKE2s256 block size = 64 bytes ✔️
    • BLAKE2b256 block size = 128 bytes ✔️
  • 384-bit hashes: len(m') = 62 bytes
    • SHA2-384 block size = 128 bytes ✔️
    • SHA3-384 block size = 104 bytes ✔️
    • BLAKE2b384 block size = 128 bytes ✔️
  • 512-bit hashes: len(m') = 78 bytes
    • SHA2-512 block size = 128 bytes ✔️
    • SHA3-512 block size = 72 bytes ❌ hmmmm
    • BLAKE2b512 block size = 128 bytes ✔️

So the only one that's a little weird is SHA3-512. Maybe we should consider replacing "HASH-TO-CURVE" with "H2CURVE", which is 6 bytes shorter and thus works with SHA3-512, or even just "H2C" as @samscott89 has suggested elsewhere.

(Note that len(m') > block size isn't fatal---it just requires, per RFC2104, hashing m' again to give an HMAC key that is shorter than the block length. We should probably avoid this.)

draft-irtf-cfrg-hash-to-curve.md Outdated Show resolved Hide resolved
draft-irtf-cfrg-hash-to-curve.md Show resolved Hide resolved
draft-irtf-cfrg-hash-to-curve.md Outdated Show resolved Hide resolved
draft-irtf-cfrg-hash-to-curve.md Outdated Show resolved Hide resolved
draft-irtf-cfrg-hash-to-curve.md Outdated Show resolved Hide resolved
@kwantam
Copy link
Collaborator

kwantam commented Jul 4, 2019

An alternative suggestion:

What if we did

m' = H(msg)
for i in (1, ..., m):
  info = "H2C" || I2OSP(ctr, 1) || I2OSP(i, 1)
  t = HKDF-Expand-H(m', info, L)
  e_i = OS2IP(t) mod p
return u = (e_1, ..., e_m)

This guarantees that the HMAC key (m') is shorter than the block length. Moreover, the info argument to HKDF-Expand is 5 bytes, which guarantees the minimum possible number of H invocations in HKDF-Expand (4 per iteration, because of HMAC) for the major hash functions I'm aware of (see below).

Also, aesthetically this is slightly nicer, since it moves all of the hash-to-curve--specific domain separation pieces ("H2C", ctr, i) into one place.

Finally, this change sort of anticipates the suggestion in my next comment.


Let's check to make sure that HMAC uses the minimum number of H invocations in all cases.

Recall that HMAC(k, msg) = H( (k XOR OPAD) || H( (k XOR IPAD) || msg ) ). For simplicity, I'm assuming that k is one block long (in reality, it's always padded or hashed-and-padded to that length, so this is a reasonable simplification).

HKDF-Expand(k, info, L) in the worst case invokes

HMAC(k, H(something) || info || b)

where b is 1 byte long. So what we need to check is that H(something) || info || b is short enough for each hash function of interest.

  • SHA-2 adds at least 9 bytes (rounding up) of padding to its argument, so when H is a SHA-2 function, the argument to the inner invocation of H in the HMAC invocation in HKDF-Expand is block_len + hash_len + len(info) + 1 + 9 bytes long, and we want this value to be at most 2 * block_len. Worst case is SHA2-256, which has block_len = 64, hash_len = 32. In this case, len(info) must be at most 64 - 32 - 10 = 22 bytes. ✔️

  • SHA-3 adds at least 1 byte (rounding up) of padding to its argument, so when H is a SHA-3 function, the argument to the inner H invocation is block_len + hash_len + len(info) + 1 + 1 bytes long. Worst case is SHA3-512, which has block_len = 72, hash_len = 64. In this case, len(info) must be at most 72 - 64 - 2 = 6 bytes ✔️

  • BLAKE2 doesn't force padding, so when H is a BLAKE function, the argument to the inner H invocation is block_len + hash_len + len(info) + 1 bytes long. Worst case is BLAKE2s256, which has block_len = 64, hash_len = 32. In this case, len(info) must be at most 64 - 32 - 1 = 31 bytes ✔️

So it looks like "H2C" is preferred if we want to avoid another compression function invocation in the absolute worst case, which is SHA3-512.

@kwantam
Copy link
Collaborator

kwantam commented Jul 4, 2019

Another question to consider: should we use HKDF-Extract to compute m'?

(Just spitballing here, not sure whether I like it or not. Also, I'm going to assume for concreteness that we're going with the suggested change in my prior comment. This could work either way, though.)

HKDF-Extract takes two arguments, salt and msg. In the spirit of @hoeteck's suggestion in #124 (and a suggestion from Dan out-of-band), we might require higher-level protocols to set the value for salt based on their domain separation string.

hash_to_base(msg, ctr)

Parameters:
- DSS, a domain separation string chosen according to the
  guidelines given in {{domain-separation}}.
- H, a cryptographic hash function.
- F, a finite field of characteristic p and order q = p^m.
- L = ceil((ceil(log2(p)) + k) / 8), where k is the security parameter
  of the cryptosystem (e.g., k = 128).
- HKDF-Extract-H is the HKDF-Extract function of RFC5869
  instantiated with hash function H.
- HKDF-Expand-H is the HKDF-Expand function of RFC5869
  instantiated with hash function H.

Inputs:
- msg is the message to hash.
- ctr is 0, 1, or 2.
  This is used to efficiently create independent
  instances of hash_to_base (see discussion above).

Output:
- u, an element in F.

Steps:
1. m' = HKDF-Extract-H(H(DSS), msg)
2. for i in (1, ..., m):
3.   info = "H2CURVE" || I2OSP(ctr, 1) || I2OSP(i, 1)
4.   t = HKDF-Expand-H(m', info, L)
5.   e_i = OS2IP(t) mod p
6. return u = (e_1, ..., e_m)

If DSS is fixed, H(DSS) can be precomputed to save one invocation of H. Also, this lets people use domain separation strings of arbitrary length with effectively no performance penalty.

@burdges
Copy link

burdges commented Jul 4, 2019

As an aside, STROBE would handle this role fairly cleanly too.

@kwantam
Copy link
Collaborator

kwantam commented Jul 4, 2019

As an aside, STROBE would handle this role fairly cleanly too.

Great! Since this is a very general framework, is there a specific STROBE-related hash function that you have in mind here?

(My guess is that our initial ciphersuite specs will all use hashes in the SHA2 family, but I'm certain that other people will eventually want to use, e.g., BLAKE. So probably the action item with respect to STROBE is just to make sure that we're not accidentally specifying something that's incredibly inefficient.)

@chris-wood
Copy link
Collaborator Author

STROBE would handle this role fairly cleanly too.

That is good to know, though I don't think we could adopt it so easily at the moment.

@chris-wood
Copy link
Collaborator Author

Another question to consider: should we use HKDF-Extract to compute m'?

(Just spitballing here, not sure whether I like it or not. Also, I'm going to assume for concreteness that we're going with the suggested change in my prior comment. This could work either way, though.)

HKDF-Extract takes two arguments, salt and msg. In the spirit of @hoeteck's suggestion in #124 (and a suggestion from Dan out-of-band), we might require higher-level protocols to set the value for salt based on their domain separation string.

I'm fine with this change, though I think I'd remove the initial hash computation of DSS. My reasoning being that HKDF will compute this hash anyway if |DSS| > H's output size anyway. Thanks for the suggestion!

hash_to_base(msg, ctr)

Parameters:
- DSS, a domain separation string chosen according to the
  guidelines given in {{domain-separation}}.
- H, a cryptographic hash function.
- F, a finite field of characteristic p and order q = p^m.
- L = ceil((ceil(log2(p)) + k) / 8), where k is the security parameter
  of the cryptosystem (e.g., k = 128).
- HKDF-Extract-H is the HKDF-Extract function of RFC5869
  instantiated with hash function H.
- HKDF-Expand-H is the HKDF-Expand function of RFC5869
  instantiated with hash function H.

Inputs:
- msg is the message to hash.
- ctr is 0, 1, or 2.
  This is used to efficiently create independent
  instances of hash_to_base (see discussion above).

Output:
- u, an element in F.

Steps:
1. m' = HKDF-Extract-H(H(DSS), msg)
2. for i in (1, ..., m):
3.   info = "H2CURVE" || I2OSP(ctr, 1) || I2OSP(i, 1)
4.   t = HKDF-Expand-H(m', info, L)
5.   e_i = OS2IP(t) mod p
6. return u = (e_1, ..., e_m)

If DSS is fixed, H(DSS) can be precomputed to save one invocation of H. Also, this lets people use domain separation strings of arbitrary length with effectively no performance penalty.

@chris-wood
Copy link
Collaborator Author

It wasn't clear to me that it would be quite this easy to invoke HKDF, because I was thinking that the size of the PRK argument to HKDF was restricted to the output length of H, rather than the block size of H. But as far as I can tell, this mostly works for the hash functions we'd expect people to use (but see below).

This seems to resolve itself by just using Extract() before Expand(). :-)

Copy link
Collaborator

@kwantam kwantam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great!

These are pretty small nits, even though they look like a bunch of comments...

One other small thing: should we add a forward ref from {{domain-separation}} to here?

Maybe a standalone paragraph before the one that starts "Care is required..." that says something like

{{hashtobase}} specifies how to apply a domain separation tag.

draft-irtf-cfrg-hash-to-curve.md Outdated Show resolved Hide resolved
draft-irtf-cfrg-hash-to-curve.md Show resolved Hide resolved
draft-irtf-cfrg-hash-to-curve.md Outdated Show resolved Hide resolved
draft-irtf-cfrg-hash-to-curve.md Outdated Show resolved Hide resolved
draft-irtf-cfrg-hash-to-curve.md Outdated Show resolved Hide resolved
@burdges
Copy link

burdges commented Jul 5, 2019

I think only the keccak-f(1600) based STROBE variant has any implementations right now.

@chris-wood Are you saying the hash-to-field functions call extract in a tree like way? It's true STROBE does not add so much for trees where you'd clone the state all the time. You'd need to impose an ordering on the extractions to exploit STROBE optimally. And doing so encurages constraints on the order in which developers extract field elements.

It's actually common to clone STROBE states, which may still save some stack space over HKDF, but not much, and maybe worse with hand optimizations. I suppose the most efficient scheme for extracting a tree is to simply use ChaCha20, assigning nonces in a tree-like way using "heap addressing".

All this is moot because BLS is really for consensus protocols, not "accounts", so nobody will ever run BLS on ridiculously constrained devices anyways, like say a Ledger device.

@chris-wood
Copy link
Collaborator Author

@chris-wood Are you saying the hash-to-field functions call extract in a tree like way? It's true STROBE does not add so much for trees where you'd clone the state all the time. You'd need to impose an ordering on the extractions to exploit STROBE optimally. And doing so encurages constraints on the order in which developers extract field elements.

No, sadly, my comment was more reflective about IETF than it was about anything technical. (We'd need to fully specify STROBE here or elsewhere prior to adopting it.)

@burdges
Copy link

burdges commented Jul 5, 2019

Right, I'm actually not convinced STROBE is optimal anyways. I'd think the meta_ad used for domain separation could probably safely xor into another part of the state in parallel to the main data xor, thus reducing keccak invocations. Also you'd want some fast input command analogous to kangarootwelve. And some ChaCha based variant. Anyways sorry for the derail..

@kwantam
Copy link
Collaborator

kwantam commented Jul 5, 2019

Awesome!

I just realized there's one more bit of inconsistency that this PR should fix: the description in {{hashtobase-perf}}. I opened #143 against the PR branch because suggested edits can't yet do multiline as far as I can tell.

parameter of the cryptosystem (e.g., k = 128).
- HKDF-Extract-H is the HKDF-Extract function of RFC5869
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this syntax a little confusing. Is the idea that the H in HKDF-Extract-H should be expanded to SHA2, etc, in each case?

I suggest we stick to the same notation used in the original draft, and others like the TLS 1.3 draft. So, just use "HKDF-Extract" and specify under that the hash function used is given by the ciphersuite?

Copy link
Collaborator

@kwantam kwantam Jul 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we just called it "Extract"?

- Extract is the HKDF-Extract function of RFC5869 instantiated with
  hash function H.

Is that clearer?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe:

 - HKDF-Expand and HKDF-Extract are as defined in {{rfc5869}}, instantiated with the hash function H

Again, keeping it closer to notation used elsewhere.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I preferred the -H notation since it made clear that H determined Extract, though the expansion issue is a valid concern. I'm fine with the proposal!

Copy link
Collaborator

@kwantam kwantam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Just one tiny annoyance, sorry...

draft-irtf-cfrg-hash-to-curve.md Outdated Show resolved Hide resolved
@chris-wood chris-wood merged commit 95e8aed into master Jul 6, 2019
@chris-wood
Copy link
Collaborator Author

Nice! Just one tiny annoyance, sorry...

Nits are always appreciated! No need to apologize.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants