Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

upgrade or replace IPEX to support async exchange of evidence #899

Open
dhh1128 opened this issue Dec 5, 2024 · 4 comments
Open

upgrade or replace IPEX to support async exchange of evidence #899

dhh1128 opened this issue Dec 5, 2024 · 4 comments

Comments

@dhh1128
Copy link
Contributor

dhh1128 commented Dec 5, 2024

Feature request description/rationale

IPEX sends signed messages. These messages are not anchored to the KEL. They have to be processed in a timely fashion, where "timely" means "soon enough that they don't time out in escrow."

This model works fine in the sort of issuance behaviors that the vLEI requires, where there is a ceremony, and all stakeholders are guaranteed to be present and able to interact with their wallets in realtime. However, it does not work in a very large proportion of real-world scenarios where evidence exchange is likely to be interesting.

To give one concrete example: I am currently working on a telco use where telephone number allocation credentials need to be issued by a regulator to an enterprise: Acme has the right to use phone numbers X-Z, which it pays for. In this scenario, either or both (usually both) sides may be multisig. We cannot require these parties to schedule ceremonies and make all stakeholders show up, just to make credential issuance work. We need a model where one person at the regulator starts an issuance process, and a system uses emails or pagers or SMS messages or some other mechanism to get the attention and approval of all the other parties at the regulator who must approve in order to reach their multisig threshold. Then, on the receiving end, we need another email-driven, asynchronous process to collect enough signatures to admit the messages.

A simple workaround is to increase the escrow timeout for these messages. As a temporary stop gap, this might alleviate some pain, but it's really not safe. I believe this would require that messages be anchored to the KEL (turning the messages into ACDCs), rather than directly signed; otherwise, the validity of the digital signature on the message cannot be evaluated with respect to the key state at the time the message was created -- only at the time the message is verified. Such a gap is a security risk that we must close. I discussed this with @SmithSamuelM and I think I'm being faithful to the analysis he offered.

It's possible that not all of the messages in the protocol need to be anchored to the KEL; I haven't looked at that closely. It's also possible that a participant in the protocol should have the option of choosing which proof strategy (direct signing versus kel anchoring) is used, on a message-by-message basis. This would allow parties who get to step 2 in the protocol to say, "right now, for this particular exchange, I'm going to require the next step to occur within 5 min or it will time out" but at step 3 to say "I'm willing to wait weeks for this".

I'm curious to know whether smart people in the community agree with the need, and if so, whether they feel that we can simply upgrade the definition of IPEX, or whether we need a new protocol that doesn't make IPEX's assumptions.

@daidoji
Copy link
Contributor

daidoji commented Dec 5, 2024

Does IPEX make any assumptions? It seems like an async process already. The implementation of IPEX in keripy does have implicit constraints that could certainly be changed but is that a protocol level issue?

Long lived protocols that have requirements on temporal issues escrow time to live and credential expirations do seem like they should be another protocol though imo as the constaints get a lot more complicated.

@SmithSamuelM
Copy link
Collaborator

SmithSamuelM commented Dec 5, 2024

@dhh1128 maybe a point of clarification. There is a difference between:

  1. a transaction consisting of a set of EXN messages timing out
  2. a multisig EXN message in escrow timing out waiting for threshold satisfying set of signatures
  3. The KRAM (KERI Request Authentication Mechanism) replay attack protection timeout window

IPEX is agnostic about 1. and 3. and 2. Is currently the same for all EXNs. But time delays in collecting multisigs can trigger 3.

At a higher level than IPEX it might make sense to define 1. for a given transaction type which transaction type might be conveyed by a set of EXNs.

One way to address 2 and 3 is the combination of a pre-protocol for collecting signatures and using full KRAM which is a timed cache not simply a time window. Currently only a time window is implemented.

The pre-protocol means any end of a transaction that employs multisig on EXN messages manages the coordination of multisig (i.e. collects all signatures before sending over the wire). The simplest way to do this is to designate a multisig group leader. Using a leader method only one EXN is ever generated by the leader to be signed by all members of the group Thus all signers sign the same EXN with the same datetime.

This may still run afoul of KRAM because the time to collect the signature may make the embedded datetime fall outside a simple KRAM window. So the message gets dropped. However with full KRAM, the timeout windows can be of much longer duration. Basically limited only by cache memory acroows all cached source AIDs for each transaction (type). With full KRAM windows can be of any reasonable length (days or weeks) where there is a timed cache for each source AID involved in a transaction of when the last message from the source for that transaction was received. In this case KRAM is perfectly happy with long timeouts while providing replay attack protection.

This in combination with a pre-protocol for collecting signatures and timed cache KRAM solves 2 and 3 simultaneously. Messages make it past the KRAM replay filter and are immediately verified because they come with all signatures attached. All that remains is 1. And that is at a higher layer than IPEX.

This means that in general 2. and 3. no longer affect transactions where the time spacing between EXN exchanges is long or short. Both 2. and 3. only become relevant when partially signed EXNs either are outside the simple KRAM time window and get dropped or get trapped in escrow so that it exceeds the escrow timeout waiting for the collection of multiple signatures.

Given an multi-sig coordination pre-protocol and full KRAM, the question of should the transaction include messages that are anchored becomes largely independent of IPEX or EXNs themselves and operates at the higher level of the transaction type. There are good applications for transaction types that benefit from having anchored messages. So this does not obviate that broader discussion. But I think it would be more beneficial to first:

  1. Implement a leadered pre-protocol for multisig group coordination.
  2. Implement full timed cache KRAM

@dhh1128
Copy link
Contributor Author

dhh1128 commented Dec 11, 2024

I feel like I understand task 1 pretty well. I will open a separate issue for it, and propose a sequence diagram as a starting point.

I don't understand "full KRAM". Perhaps we could discuss at the next opportunity for a live conversation?

@SmithSamuelM
Copy link
Collaborator

SmithSamuelM commented Dec 11, 2024

@dhh1128 https://github.com/SmithSamuelM/Papers/blob/master/whitepapers/kram.md

explianed in more detail above.

But simply

  1. simple kram uses a short time window relative to the host (i.e the host clock is determinative). All signed client requests must include a timestamp field that lies within the window else they are dropped. This does not require any host cache. A replay attack is only possible inside the time window synchronized to the host clock. A short time window minimizes the opportunity for replay attack. But simple kram may be problematic for multi-sig where the time to collect signatures is longer than the time window. This is what is currently implemented in keripy.

  2. full kram is a monotonic timed cache. The last request from any client is cached inside a sliding time window. A new request must both have a later datetime than the cached request and must be inside the host time window. Requests that move outside the sliding time window may be safely deleted thereby pruning the cache. The timed cache can be made more granular by caching a request per message type for a given client or even more granular by caching messages with a specific time window for a given transaction itself not merely the message type or the transactions type. Essentially, creating an expiration datetime as the request datetime plus the host configured timewindow for all instances of a given message at a given stage in a transaction. The monotonicity of the cached requests protects against replay attacks. The time window bounds the size of the cache. That has not been implemented yet in keripy. The timewindow needs to be controlled by the host relative to the host's clock otherwise the client can attack it.

With a host-controlled sliding window timed monotonic cache, because the monotonicity of the cache protects against a replay attack and the time window merely bounds the memory requirements, one can tune the window per message type (and per aid) and/or per transaction type or per transaction itself, so that messages associated with certain types of transactions or certain transactions themselves can have relatively long time windows (days or weeks) which windows are only bounded by memory and the traffic load for those message from given clients.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants