-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prevent duplicate event when anchoring reg or cred in multisigs #271
base: main
Are you sure you want to change the base?
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #271 +/- ##
==========================================
+ Coverage 83.82% 83.84% +0.02%
==========================================
Files 48 48
Lines 4229 4260 +31
Branches 1034 1062 +28
==========================================
+ Hits 3545 3572 +27
- Misses 656 684 +28
+ Partials 28 4 -24 ☔ View full report in Codecov by Sentry. |
In my opinion, we should try to reduce the amount of API-calls we do in each method. So I am wondering if there is a way to resolve this issue without adding an additional call. What is the actual scenario here?
I would expect keria to respond with HTTP 400 Bad Request if member 3 tries to create an inconsistent state. Maybe member 3 can detect that the issuance is already anchored and it only needs to "import" the credential? |
What is currently happening (as implemented in KERIA and Signify), for example when creating a Registry:
For KERIA, the event that member3 is trying to create is valid. It would be tricky to make it aware of those edge cases automatically, but not imposible. I added the "check" at the client side for simplicity and also because it gives more control on the decision, but I know it's not the perfect solution. Every time we automate decisions, we'll find edge cases. Regarding the additional API call, I also agree that we should reduce those, but it this case the client needs to get the latest status of the KEL to proceed correctly. It'd also face an extra call even if KERIA responds with 400 I think. |
|
||
// check if last event already has the anchor in it | ||
// and avoid creating a new event if it does | ||
const lastEvent = events[events.length - 1]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not too familiar with registry events but I'm also curious if there's a way to generally prevent duplicate anchoring events in a KEL directly in KERIA, at least with the same signing keys.
The trouble here is if there are many concurrent things happening, and the duplicate event is e.g. not actually the lastEvent but the event before that (events.length - 2
or so on)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also think this should be handled server side and in a way that avoids any race conditions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might actually just be something we can add to keripy directly, maybe Sam or Phil have some ideas
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that something can also be done in the keripy/keria side. Those types of race conditions are always tricky and recover from them is not trivial. There's some code on keripy that assigns one of the members as the lead. And I remember Phil mentioning that they cover for some race conditions, but probably not all cases.
Anyway, the goal of this PR is to solve a specific use case that is that one member of a multisig initiates the creation of a new event and the others join; however one of them joins late, after the event was already completed (thresholds satisfied) with the signatures of the other members. We want this tardy member to create the correct event, not a new one. And we can catch the error on the client side because we know that the same anchor is already on the KEL. There are no race conditions in this use case. One member start the event, and the rest of the members join. This require an extra call, but I think it worth it since prevent other problems.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would still be cleaner if the client side did the redundant signing while late joining and KERIA just accepts it and doesn't add it to the KEL (as it's already a duplicate).
Doing on the client side here doesn't cover the case I mentioned for events.length - 2
and also doesn't cover the case of the final threshold signature appearing at the same time as the client is signing but after they did this check (which is a race condition).
Albeit we could have it as a stop gap solution perhaps
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Though, then again, that would leave an inconsistent KEL locally since ultimately the controller of the KEL is on the client side, hmm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right in both, this PR only covers the use case of events.lenght - 1
. And for the race condition, we need something on keripy
that I was trying to avoid, or postpone it for later, for simplicity and urgent need.
We do also need a way to recover from the duplication in case it happens.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't reviewed the nitty gritty details of it but if it's urgent, perhaps it's OK so long as we open the appropriate issues now and tackle it soon (and not let it fall into the pile of issues :P)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In light of this discussion, I'm interested to see what @rodolfomiranda and @iFergal think about @lenkan PR solution #286
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lenkan's PR makes sense to me and avoids my race condition concerns here
@rodolfomiranda Sorry for delay here. We are hitting this too. I think we will be able to end up in the same situation anyway if we are unlucky with the timing? What do you think? Is there a way to move this check closer to the DB transaction to avoid race conditions? I am not as familiar as you with the internals of KERIA. |
I have had some more time to investigate and create other reproductions of similar issues. At the moment, I think it would be better if these methods accept the sequence number as an argument instead of trying to calculate it. That way you can pass it in from the exn message that you received from the other group participants. Does that make sense? We had a similar discussion about this here: #222 (comment) |
This PR fixes an inconsistency that occurs when a member of a multisig joins an anchoring event (for a registry creation or credential issuance/revocation) and when it's KEL already contains that event, resulting in a new event with the same anchor.
That situation can be easily replicated if the threshold of the multisig is less that the total number of members, and the last member joins with a time delay that allows the propagation of the event.
To fix that problem, the code now checks if the last event already has the anchor event and avoid creating a new event in the KEL.