Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

domain separation #139

Merged
merged 9 commits into from
Jul 4, 2019
76 changes: 72 additions & 4 deletions draft-irtf-cfrg-hash-to-curve.md
Original file line number Diff line number Diff line change
Expand Up @@ -952,6 +952,30 @@ In contrast, this document is concerned with encodings from arbitrary bit string
to elliptic curve points.
This document does not cover serialization or deserialization.

### Domain separation {#term-domain-separation}

In most cases, cryptographic protocols that use random oracles are analyzed
kwantam marked this conversation as resolved.
Show resolved Hide resolved
under the assumption that the random oracle answers only queries generated
kwantam marked this conversation as resolved.
Show resolved Hide resolved
by that protocol.
In practice, this assumption may not hold: commonly, two or more protocols
may model the same hash function as a random oracle, which violates the above
assumption if both protocols compute the hash of the same value.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps add:

That is, let R() be a random oracle used by protocols P1 and P2. If P1 and P2 ever query R with the same value x, the assumption above is violated.

And maybe then show how P1 and P2 would use R1 and R2 (as defined below) to address this?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think the first paragraph is unclear in a couple ways. I'll try to address and incorporate the above.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, I think.


A common approach to addressing this issue is called domain separation,
which allows a single random oracle to simulate multiple, independent oracles.
This is effected by ensuring that each simulated oracle sees queries that are
distinct from those seen by all other simulated oracles.
For example, to simulate two oracles R1 and R2 given a single oracle R,
one might define

R1(x) := R("R1" || x)
R2(x) := R("R2" || x)

In this example, "R1" and "R2" are called domain separation tags.
Because of these domain separation tags, R1 and R2 cannot query R on
kwantam marked this conversation as resolved.
Show resolved Hide resolved
overlapping values.
Thus, it is safe to treat them as independent oracles.

# Roadmap {#roadmap}

This section presents a general framework for encoding bit strings to points
Expand Down Expand Up @@ -984,7 +1008,7 @@ Input: alpha, an arbitrary-length bit string.
Output: P, a point in G.

Steps:
1. u = hash_to_base(alpha, 0)
1. u = hash_to_base(alpha, 2)
kwantam marked this conversation as resolved.
Show resolved Hide resolved
2. Q = map_to_curve(u)
3. P = clear_cofactor(Q)
4. return P
Expand Down Expand Up @@ -1015,6 +1039,47 @@ Instances of these functions are given in {{suites}}, which defines a list of
suites that specify a full set of parameters matching elliptic curves and
algorithms.

## Domain separation requirements {#domain-separation}

When invoking hash\_to\_curve, implementors MUST use domain separation
kwantam marked this conversation as resolved.
Show resolved Hide resolved
({{term-domain-separation}}) to avoid interfering with other protocols
that also use the hash\_to\_curve functionality.
In addition, any protocol that uses two or more hash\_to\_curve functions
targeting different elliptic curves MUST enforce domain separation between
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the expectation that if one had two hash-to-curve functions H1 and H2 targeting the same curve, then H1 = H2?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, unless they're explicitly separated. I suppose I should clarify this point.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed an edit to address this.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be good to clarify why this requirement is needed. (That is, to avoid overlapping hash2base output IIUC.)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed an edit to address this.

the two functions if those functions are modeled in the protocol as
independent random oracles.
Finally, protocols that use encode\_to\_curve SHOULD use domain separation
if possible, but it is not required in this case.
kwantam marked this conversation as resolved.
Show resolved Hide resolved

Care is required when choosing a domain separation tag.
Implementors SHOULD observe the following guidelines:

1. Tags should be prepended to the value being hashed, as in the example
in {{term-domain-separation}}.

2. Tags should have fixed length, or should be encoded in a way that makes
the length of a given tag unambiguous.
If a variable-length tag is used, it should be prefixed with a
fixed-length field that encodes the length of the tag.

3. Tags should begin with a fixed protocol identification string.
Ideally, this identification string should be unique to the protocol.

4. Tags should include a protocol version number.

5. For protocols that support multiple ciphersuites, tags should include
a ciphersuite identifier.

As an example, consider a fictional key exchange protocol named Quux.
A reasonable choice of tag might be "QUUX-V\<xx\>-CS\<yy\>", where \<xx\> and \<yy\>
are two-digit numbers indicating the version and ciphersuite, respectively.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it help to give an example of a protocol that needs domain separation internally, too?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed an edit to address this.


Alternatively, if a variable-length ciphersuite string must be used,
a reasonable choice of tag might be "QUUX-V\<xx\>-L\<zz\>-\<csid\>", where
where \<csid\> is a the ciphersuite string, and \<xx\> and \<zz\> are
two-digit numbers indicating the version and the length of the ciphersuite
string, respectively.

# Utility Functions {#utility}

Algorithms in this document make use of utility functions described below.
Expand Down Expand Up @@ -1217,14 +1282,14 @@ Parameters:

Inputs:
- msg is the message to hash.
- ctr is either 0 or 1.
- ctr is either 0, 1, or 2.
This is used to efficiently create independent
instances of hash_to_base (see discussion above).

Output: u, an element in F.

Steps:
1. m' = H(msg) || I2OSP(ctr, 1)
1. m' = "HASH-TO-CURVE" || H(msg) || I2OSP(ctr, 1)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would we consider making HASH-TO-CURVE shorter (e..g "H2C"/ "HTC"), and potentially truncating H(msg) such that we can fit m' into a single block in the loop below?

E.g. the input to the hash in the loop becomes "H2C" || H(msg)[..len(H) - 6] || I2OSP(ctr, 1) || I2OSP(i, 1) || I2OSP(j, 1)?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps in a separate issue?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we'd have to truncate a reasonable amount to avoid going past one block, because SHA-256 adds at least 65 bits of padding to the end of the message. Specifically, the padding is

  • append a single '1' bit
  • append k '0' bits where 0 <= k < 512 and bitlen(msg) + 1 + k + 64 is divisible by 512
  • append a 64-bit representation of bitlen(msg)

so it would have to be something like H(msg)[..len(H) - 15], which is pretty extreme.

Also, we might want to resolve #137 first, since that would moot this discussion.

2. for i in (1, ..., m):
3. t = "" // initialize t to the empty string
4. for j in (1, ..., W):
Expand Down Expand Up @@ -2028,13 +2093,16 @@ This document has no IANA actions.

# Security Considerations

Each encoding function variant accepts arbitrary input and maps it to a pseudorandom
Each encoding function accepts arbitrary input and maps it to a pseudorandom
point on the curve.
Directly evaluating the mappings of {{mappings}} produces an output that is
distinguishable from random.
{{roadmap}} shows how to use these mappings to construct a function approximating a
random oracle.

{{domain-separation}} describes considerations related to domain separation
for random oracle encodings.

{{hashtobase}} describes considerations for uniformly hashing to field elements.

# Acknowledgements
Expand Down