-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ciphersuites and domain separation #124
Comments
If we assume a fixed-length value for the cipher suite, e.g., a 2 or 4 byte value that maps to a ciphersuite and is maintained by a registry, I think I'm more comfortable with option 2. While it may be redundant, it seems as though it would help prevent possible misuse or failure to add domain separation by the calling code or protocol. @armfazh @grittygrease @samscott89, please chime in! |
I will go for not including domain separation inside the definition of the suite. Hash-to-curve functions must be as generic as possible. Note that the interface receives a string of opaque bytes. I also consider, the document must provide recommendations about the usage of the suites, providing advise on the cases where domain separation is necessary. I might not be considering all the scenarios, feedback is appreciated. |
As a point of comparison, HPKE [1] bakes the cipher suite into the context string used inside the Seal and Open functions. I'm not sure doing something similar here, with a fixed-length suite, would negatively impact generality, especially since the ciphersuite already uniquely determines the set of algorithms used to construct [1] https://tools.ietf.org/html/draft-barnes-cfrg-hpke-00#section-5.1 |
I thought about this a bit more over the past few days, and I tend to think that domain separation in hash-to-curve will not do much for security in practice; I explain below. But as @chris-wood says, if it doesn't hurt performance or generality, I'd tend to err on the side of caution, with two caveats. First, if we include domain separation, we should also do as @armfazh suggests and add either language or a citation that recommends domain separation strings in upper-level protocols; otherwise, the worry is that we'd give users the incorrect impression that domain separation at the protocol level was unnecessary. Second, we need to make sure we have an interop story for curves that are not in the "official" ciphersuite table; see my comment immediately below. (Come to think of it: are there any CFRG guidelines or informational documnents that talk about domain separation? Should there be? It might be nice if there were a uniform way of doing domain separation, and if every CFRG protocol just did it that way.) OK, why should we think it won't do much? Recall that one reason to use domain separation is to make sure that random oracle security proofs still hold when protocols are composed---in particular, that composition doesn't break the freshness of random oracle queries. Concretely, if protocol A's security proof relies on making a fresh random oracle query on input X, and an attacker can force composed protocol B to make a query on input X first, the security of the composition may be broken. Now, let's think about a scenario where we're composing two protocols that both use hash-to-curve, and in which neither protocol uses domain separation strings. In this case, whether or not domain separation at the hash-to-curve level helps depends on whether or not the two protocols use the same hash-to-curve suite. Specifically, if the two protocols use different curves, h2c domain separation would help them. But if they use the same curve and h2c suite, it wouldn't. So: if we think that composed protocols would end up using the same suite in almost all cases, then we should probably conclude that h2c domain separation won't do much. My inclination is to believe that implementors will prefer to use a single hash-to-curve codebase, i.e., that they'll actively try to use a common suite when composing protocols---which defeats the h2c domain separation. But like I said, if adding separation is really free, then maybe there's no reason not to do it... |
On the question of whether h2c is really free, I have a concern: if domain separation includes an opaque ciphersuite ID from a table, there is the possibility that interop will be broken for curves/suites that aren't in the table. I think this qualifies as a real downside, and it's something we have to think carefully about if we include suite IDs in hash2base. Along these lines, if we're going to have a ciphersuite table in the RFC, how and when does that table get updated with new suites? Would it make sense to run an informal registry of not-yet-standardized suites, e.g., on GitHub? (I see this ties into #126) |
Interesting idea! At the very least, it seems like something worth discussing in Montreal. Shall we put down something in writing? |
Oh, good idea! Let me think a bit about this. (But: is it reasonable to assume that the in-writing bit is lower priority than getting this draft updated?) |
Absolutely! |
After thinking about this for a few more days, I still don't see a clear path to a good interop story for curves that aren't enumerated in the document. Because of that, I'm worried that adding a ciphersuite tag will hurt the interop story for hash-to-curve without a compensating improvement in security (for reasons given in my prior comments). So: I think my preference is tending towards adding language that encourages upper-level protocols to add domain separation, but not to add it in hash-to-curve. I've asked the BLS and VRF standards authors for their thoughts; I'm hopeful that they'll come and add comments to this issue. |
I tend to agree with @kwantam. The security benefits of domain separation at the hash-to-curve level are unclear to me, but the logistical drawbacks are. If you add ciphersuites to this draft and someone wants to use hash-to-curve with a new hash function or a new curve, they would have to figure out the IETF process for adding a new ciphersuite to this standard, in addition to whatever other standardization and implementation efforts they are already undertaking. Increasing logistical barriers means some people just won't bother and will choose their own options, ignoring this process. I think the value of hash-to-curve draft is that it covers a broad range of use cases, hash functions, base fields, and curves. Adding ciphersuites will reduce this value. Also, adding ciphersuties will make existing implementations already deployed in the wild (such as implementations of the VRF draft) incompatible. |
I also tend to agree with both @kwantam and @reyzin, namely that ciphersuite and domain separation should be provided/enforced by the high-level application. Here's a slightly different take on the issue: do we expect a ciphersuite string in hash-to-curve to contain any additional information that's not already in the ciphersuite string for the high-level application (e.g. BLS signatures)? To me, that shouldn't be, because the latter should completely determine which hash-to-curve algorithm will be used. If so, I don't see any advantage to having the same information appear twice; if anything, there's only disadvantages, namely aesthetics, and as @reyzin pointed out, logistic drawbacks to maintaining consistency. Side note / clarification: I think it'd still be useful for hash-to-curve to specify a table of ciphersuite strings (which will be referred to by the higher-level applications), but the ciphersuite string should not be part of the input to hash2base. |
The arguments put forth here seem reasonable. Trying not to overthink it, but there are 4 scenarios, where you have the application author and library author are different people, each of whom can either use domain separation or not.
The point being, the difference between 3 and 4 doesn't seem particularly meaningful, but the gap between 1 and 2 does. Unless there's a convincing reason why 4 is a bad situation. The reason I think this is an important distinction, is the number of implementations of h2c should be much fewer than the numbers of implementations of applications. |
IMO this is the crucial bit. I don’t think we can assume applications will add domain separation, even if we say they MUST do so. That seems to imply that we’re left considering the pros and cons of 1+4 (library does it) versus 2+3 (library doesn’t do it). What’s more troubling is that while I agree with all of the downsides, from logistical nightmares to additional complexity and less re-use, the upsides are not well understood. That is, we have some concerns about possible cross-protocol attacks if h2c doesn’t perform domain separation. (Perhaps these are silly and not well founded — I’m not an expert here.) I think some more rigor would help make the decision easier. And it’s probably time to take this issue to the list for wider discussion. :-) |
@chris-wood you are right, we should consider potential upsides. Here are the upsides as I understand them. The main value of domain separation that I know is if the same secret key is used in multiple different schemes. In that case, domain separation may (but won't always!) help a security proof for the joint security of these schemes to go through, because at least random oracle queries of the two schemes will not overlap, so whatever arguments required fresh randomness / programmability are more likely to still go through. I have not seen a convincing case for domain separation besides the above scenario. But in general using the same SK for multiple purposes requires a thorough analysis, and simply putting in domain separation is insufficient. Moreover, domain separation only at the level of hash-to-curve is even less likely to be sufficient, because it will not necessarily ensure domain separation in the upstream apps -- esp. if the apps are using the same curve. Basically, app-level domain separation is where you would get the upsides. The chances of the upsides coming from domain separation in hash-to-curve are low, because if people are using the same SK for different schemes, they are likely using the same curve, too. |
Another issue that came up in discussing #132: if hash-to-curve injects a ciphersuite string, and assuming that (say) curve25519 and edwards25519 have different ciphersuites, then these two hash functions won't be compatible. It would probably be nicer if hash-to-curve25519 and hash-to-edwards25519 gave points that are equivalent via the birational map specified in RFC7748. |
@reyzin I concur :-) and I don't have a convincing case to deliver. I was simply advocating for weighing the options. It seems most folks (at least here on GitHub) are in favor of pushing the burden of domain separation to hash-to-curve callers, which is probably fine. Minimally, we should add some text describing why we find this tradeoff acceptable, if we go down that route. |
I was chatting with @henrycg about this last night, and he pointed out something that's perhaps a second-order concern, but certainly worth writing down: a case where h2c domain separation would potentially be useful is when a single protocol makes queries to two distinct hash-to-curve oracles, and relies on those queries being uncorrelated. (Note that this is a case where we expect that a protocol's ciphersuite string is not sufficient to give domain separation, which answers @hoeteck's question above in a perhaps unexpected way.) Our discussion was in the context of a contrived example of a protocol that hashes to both Curve25519 and P-256, but I think there's a much more natural one. Consider a protocol (vaguely reminiscent of the one by Muller) that uses points on both an elliptic curve and on its quadratic twist. Suppose that this hypothetical protocol relies on hashing to both the curve and its twist, and models these two hash functions as independent random oracles. In this case, simply following the hash-to-curve document would not yield independent random oracles. Since the curve and its twist by definition reside in the same base field, the hash_to_base function for both curves will make exactly the same calls to H (say, SHA256), and will return exactly the same value. Now it's not at all obvious that the two oracles are uncorrelated! To be clear, I don't know of any protocol like the above. But you could certainly imagine that a reader of the current hash-to-curve draft who is trying to implement such a protocol might incorrectly assume that hashes to two different curves can be treated as orthogonal random oracles! There are a couple possible remedies here:
|
Summarizing the practical concerns with adding a domain separation tag to hash-to-curve, the primary concerns are
Handling (1), at a high level, requires us to specify some deterministic algorithm to compute a ciphersuite tag given the parameters of an elliptic curve. Handling (2) is slightly trickier, but it appears to be possible. Here's how: It is a theorem (due to Tate) that any two curves over a field F having the same number of points are isogenous. (In fact, the implication goes both ways; the other direction is obvious by the definition of an isogeny.) Thus, the algorithm that is used to derive a ciphersuite tag will give equivalent outputs for isogenous curves in the case that the input to the algorithm is F, the field, and n, the order of the elliptic curve group. Here is a candidate such algorithm:
This 4-byte tag would be used as described in option 2 in the 1st message of this thread:
Note that for a given curve, csid is fixed for all time, so it can just be a hard-coded constant. There's no need to evaluate the ciphersuite_id function at runtime. To be clear: I'm not sure yet whether I'm in favor of this or not. But at least this clears away some of the deployment / pragmatic concerns with using ciphersuites and lets us focus on whether or not this is desirable strictly from a security perspective. |
The above proposal, while perhaps an improvement, is by no means perfect. Here are some concerns that I can think of, off the top of my head:
|
One more possibility related to domain separation that came up when chatting with @henrycg that I forgot to mention earlier today. Let's assume that we're not doing any kind of ciphersuite-specific domain separation. In that case, it might still make sense to inject a hash-to-curve--specific (but not ciphersuite-specific) string into the hash_to_base function, with the aim of orthogonalizing the invocations of Like in the case a few comments above, protocol designers might reasonably expect that these random oracle invocations are orthogonal---and in all likelihood they are, considering the highly stylized H() invocations in hash_to_base. But adding an extra layer of protection is essentially free and might give a tiny bit of peace of mind. Concretely, I'm thinking something like this
inside hash_to_base. In other words, there's a third option that sits in between "no domain separation" and "per-ciphersuite domain separation," namely, adding a fixed string that's the same for all ciphersuites in hash_to_base. This ensures that invocations of H in hash-to-curve are orthogonal to other invocations of H. I think this is another instance in which an application-level ciphersuite string isn't enough. And it seems like not enforcing separation between calls to H() inside hash_to_base and calls to H() elsewhere in an upper-level protocol may really be inviting badness. |
If I am not wrong, the curve and its twist have different curve equations, so how it is possible to get the same point since the elliptic curve coefficients play role in the mappings? |
You're right that the mapping will return a different point. I was pointing out that hash_to_base will return the same value for both curves when invoked on the same string, because hash_to_base only depends on F (which is definitely the same) and W and H (which are almost certainly the same). If that happens, then the same |
Folks, I'd like to submit a PR on domain separation by EoD tomorrow, so I'd appreciate any last-minute thoughts on the above. Here's my concrete proposal:
This leaves us with the question of per-curve domain separation. Status quo appears to be that we will clearly state that this document does not guarantee domain separation between encodings to different curves. If a protocol invokes two different encodings and requires the results to be orthogonal, the protocol MUST inject its own domain separation tags. I have to admit, I don't love this solution. Adding a 4-byte, deterministically-generated domain separation tag via something like |
This is mostly a note-to-self: to avoid adding another compression function invocation in hash_to_base, we can have at most 23 bytes beyond H(msg) in m'. Right now we add 3, namely, ctr, i, and j. "HASH-TO-CURVE" is 13 bytes, so we could in principle have up to a 7-byte csid without spilling into another compression invocation. |
If I understand correctly, the latest issue with the curve and its twist goes away if we include the curve in the ciphersuite_id, and the high-level application calls hash_to_curve with the ciphersuite_id. Is that right? Going back to the higher-level discussion, I don't think we should be trying to protect protocol designers from deviating from the recommendations of this draft; that incurs too much overhead and takes us down a deep rabbit hole. More generally, we need to distinguish between implementation errors and design errors. I agree with the general principle of resilience to implementation errors (e.g. resilience to weak randomness and side channel attacks). On the other hand, I'm a lot less sympathetic to paying a price for resilience to design errors. |
@hoeteck more or less -- I assumed that the implementation of the specific cipher suite would just include ciphersuite_id, i.e., the caller would not pass anything beyond the message to hash. |
How about the following as a compromise? hash-to-curve does incorporate a ciphersuite_id with an extra 8 (or 16) bits, which are zeroes by default, but can be changed by higher-level applications. In particular, we can use option 2: fixed string + 4-byte ciphersuite in hash2base // note I removed "HASH-TO-CURVE". but the first 2 bytes of ciphersuite are always 0 by default (we can think of 0x00 as encoding the string "HASH-TO-CURVE"). Moreover, the hash-to-curve spec should explicitly allow higher-level applications to modify those 2 bytes. |
Maybe---it depends what you mean by the ciphersuite id. The issue is that a single, protocol-level ciphersuite ID isn't sufficient in this case---the protocol has to use a separate ID tag for each curve it hashes to, in order to ensure that those hashes are orthogonal. Concretely, imagine that a protocol implements two functions, hash_to_curve and hash_to_twist, and wants to be sure that they are orthogonal. Then the following is OK:
but this is not OK:
To me, that's a reasonably subtle distinction, and it seems like "hash_to_curve" and "hash_to_twist" should be doing that work, not their callers. I completely agree that it's impossible to prevent people from misunderstanding or ignoring the recommendations. On the other hand, to me it makes sense to try to anticipate insidious misunderstandings and to make those misunderstandings implementation errors that can be caught in one place (with test vectors), rather than subtle bugs at individual call sites that may very well go unnoticed. I know I sound like a broken record, but the case of library users really worries me. Library users should not need to understand detailed security recommendations from the hash-to-curve document in order to safely invoke a hash-to-curve function that someone else wrote. Or, maybe more accurately: library users just will not read this document. To whatever extent is reasonable, complying implementations should protect them anyway. |
I think I'm not quite understanding your proposal:
|
I realize that you don't mean a price in the literal sense, but to be clear: there is no computational overhead when adding a per-curve ciphersuite. hash_to_base invokes exactly the same number of rounds of SHA2 in either case. |
There are a couple issues being discussed, but here, I'm focusing on the issue of option 1 vs option 2 at the beginning of this thread, in the context of BLS signatures. In BLS signatures, we want to support additional ciphersuite information beyond what's in hash-to-curve, let's suppose we only need a single byte (concretely, this byte would indicate different mechanisms for preventing rogue-key attacks, e.g. 0x01 for proof of possession and 0x02 for message augmentation). To answer,
Let's suppose we go with option 2 with pre-hashed for free, namely:
Let's supposed we want to sign the message "Hello" using BLS signatures with option 0x01 over BLS12-381 curve. Looking up the current table, I'd use "H2C-0008". Now, what would m' be?
I see two advantages in option H2:
More generally, we can have H2C-xxyyzz with 3 bytes, with xx defaulting to 00 and reserved for high-level applications.
This is mostly aesthetic, but if we include a string "HASH-TO-CURVE" in m', then we should also include "BLS-SIGN" in m', and I don't know a clean way to do in option 2. But let's put this aside for now. Hope that clarifies things somewhat! :) |
I agree this is nice from a performance perspective, but in my mind it doesn't provide meaningful domain separation. This is related to pairingwg/bls_standard#17. The issue is that one byte is not sufficient to ensure that different protocols make distinct calls to the random oracle. Concretely, if protocols A and B both use a one-byte ciphersuite tag, there's a really good chance that both of them will compute exactly the same value for This is exactly the situation we're trying to avoid with domain separation: the protocols need to somehow include a globally unique (we hope) string inside I'm also not in favor of weakening the abstraction / complicating the interface between hash-to-curve and upper-level protocols. In my mind, the signature of the hash-to-curve functions should be
And they should behave in a way that, to the greatest extent possible, aligns with intuition. For the purposes of this thread, from my perspective "intuitive" means that different hash functions are fully distinct from all other random oracles in a protocol. This is the best possible guarantee that hash-to-curve can give. That doesn't mean that higher-level protocols don't need to do domain separation among themselves! but it does mean that it's safe to treat conforming hash-to-curve implementations as a black box. In other words, of course hash-to-curve functions can be misused, but the simplest and most obvious way to use them is probably the right way, modulo responsibilities that only the upper-level protocol is in a position to discharge. As far as performance goes, in my mind the calling protocol should only be passing tagged messages to any random oracle. This means that there really is no extra cost for computing
because the upper-level protocol should never call EDIT: this is sort of separate from the above, but I think it's pretty clear from the discussion upthread that fixing a table of ciphersuite IDs is a non-starter. So let's assume that we'd compute the ciphersuite ID using a deterministic algorithm. I've edited the first post in the thread to that effect. |
To add another perspective: I spoke with Dan (Boneh) about this today, and his take was that this is very application dependent, so it might be best to leave it to the applications to decide whether they want per-curve domain separation rather than arbitrarily decree that there shall be separation just between isogeny classes. So that's another vote against per-curve separation. Dan is in favor of adding some fixed string to the H() calls inside hash_to_base, roughly as discussed here. But he suggested that it is probably better to use HKDF than to "roll our own" PRG. Let's leave that to an orthogonal discussion---I've created #137. Since it seems like there's little enthusiasm to go all-in on per-curve domain separation, I'm fine adding text that explicitly delegates this task to the upper-level protocols, at least for now. We can revisit this decision in the future if necessary, but I'd rather get some text about domain separation before the deadline, since that's probably the most effective way to solicit feedback from the broader community. |
I created #139 capturing more or less what's here, tabling the question of per-curve domain separation for now since there seems to be little enthusiasm for it. Comments appreciated. |
We reached a consensus on domain separation, so I'm closing this issue. |
Chris and I discussed the issue of ciphersuites and domain separation in hash-to-curve. Put concisely, the question is, should hash-to-curve provide domain separation among the suites, or should domain separation be left to the upper-level protocol? We decided that the best way to proceed is to get feedback from the CFRG list by coming up with and sending two specific proposals along with a discussion of pros and cons.
From chatting with the BLS and VRF folks, my impression is that they prefer to keep ciphersuites out of hash-to-curve. Their argument is (paraphrasing), the upper-level protocol needs to ensure domain separation, so what's the point of doing it redundantly in hash-to-curve? There was also serious concern about variable-length ciphersuite strings (currently used in the poc impls but not specified in the document), because that is a potential source of confusion and bugs. Right now, the BLS sigs standard is proceeding with no ciphersuite in hash-to-curve, only in the BLS signature itself.
But there may be room to meet in the middle. For example, it would be essentially free to add a short, fixed-length ciphersuite (say, <20 bytes) in hash2base. It might be worth running this by the BLS and VRF folks before we go to the list, to hear out the likely objectors first and see if we can get buy-in.
Concretely, I propose that we consider the following two options.
option 1: no ciphersuite
This preserves the current version of hash2base as specified in the document. The important line is the one in which msg gets hashed. In the current version, that's
option 2: fixed string + 4-byte ciphersuite in hash2base
EDIT: or, maybe slightly preferable so that "prehash-for-free" still works:
EDIT 2: I think it's safe to assume that "ciphersuite" here would be generated following something like the procedure in this comment, below. This avoids issues with ciphersuites for curves not discussed in the spec document.
Note that the string "HASH-TO-CURVE" ensures domain separation even from other protocols that use a 4-byte ciphersuite tag. It might be nice for protocols to adopt this approach more genreally.
(Actually, I kind of like the idea of using the RFC number (e.g., "RFC1234" for RFC 1234's domain separation string), but that has the downside that the protocol's test vectors can't be finalized until the RFC number is assigned.)
Thoughts on the above two proposals? Any other issues we should consider?
The text was updated successfully, but these errors were encountered: