-
Notifications
You must be signed in to change notification settings - Fork 114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow clearing of dead ops to avoid negatively impacting resolution time, storage/bandwidth, and new node boot time #266
Comments
A long-time goal of the protocol has been to devise a means for large scale pruning of historical data to reduce the total amount of data a full node stores. The following proposal aims to achieve that goal via a set of changes to existing data structures and protocol rules. The changes make it possible to eliminate all past operation data across the network. The first step in this effort is retooling the Anchor/Batch file approach to branch out files by operation and type and include another level: a Map File that captures everything necessary to securely, accurately derive the lineage of an ID far into the future, without retention of verbose op data from batch files. New three-tiered file structure:The protocol will be modified to separate ops by type within the Anchor file, linking to their own, distinct Map and Batch files. The number of files will be increased from the current two (Anchor and Batch files) to the following six:
Create OperationsAnchor Single values:
Per DID:
Map Batch Per DID:
Update OperationsAnchor Single values:
Map
Per DID:
Batch
Recovery/Checkpoint OperationsAnchor Single values:
Per DID:
Map Single values:
Per DID:
Batch
Other related changes to files/data:
CheckpointingIn order to purge past data, there must exist a mechanism to trigger a checkpoint, wherein all DID owners generate an op that refreshes their ID state to the latest full state. This proposal advocates an automatic, network-wide, checkpointing mechanism, triggered by a deterministic calculation of known values, such as: passage of N blocks of chain-time, a some N number of updates in ratio to the number of IDs in the system, etc. With this set of changes, all of the following can be pruned after a checkpoint:
|
^ @thehenrytsai @OR13 @rado0x54 have a look at this rough proposal and let me know what you think. |
Here's roughly what each file's structure would look like: Anchor FileThis would be an Anchor File with all types of ops represented:
Create Batch File
Update Map File
Update Batch File
Recovery/Checkpoint Map File
Recovery/Checkpoint Batch File
|
Note on proposal: Signatures in the scheme moving from the Batch file to the Map file to represent a weight shift to lighter nodes, but we may be able to implement a signature aggregation scheme (if one fits our heterogeneous message constrains) that would significantly reduce the sig load in batches over a certain low op-count floor size. |
@csuwildcat I think we need to provide better sample data in order to make these changes more accessible, and also clearer names. I will update this comment based on anything I have gotten wrong.
There is only one schema for I don't understand why create operations are not signed, that seems bad, since the recovery pub key is included with the create op, a signature from it should be required. There should be an unbroken chain of proof of control of keys starting from create. We may even want to make the create operation signature include an expensive hash function, so that there is a computational cost for new DIDs and its payment is authenticated. A related, and particularly spicy detail of linking the It seems like we are trying to shorten property names. I suggest we not do that in json, and instead add support for compact binary representations of these files and schemas somewhere else. This ticket should be about providing maximum clarity regarding the identifiers, file schema's and relationships, and more verbose names will help with that. We can shorten them once its clear. Technically this ticket is combining both the commit reveal anti DOS protocol update with a pruning protocol update. (neither are fully spec'ed or implemented AFAIK). I'm very much in favor of these changes, especially since treating IPFS as permanent unlimited storage is a very bad idea, and in order to protect against a number of sybil attacks, there should be a cost to keeping DIDs fresh over long periods of time. In order to see the true value of these changes, I think its worth outline the worst case protocol attack, and see how these changes are required to mitigate it. Consider the case where wealthy dedicated attackers seek, DOS or sybil attacks, where a single user or group of users controls huge percentages of DIDs which are then used in second order reputation based attacks, or DOS attacks. Consider DOS: DOS is achieved by causing the traffic volume or traffic size to become too burdensome for clients, which will ultimately cause DIDs to become unresolvable. While the proposed commit/reveal strategy can help clients avoid downloading data related to a DID that was not created by the DID, an attacker can always craft ledger transactions and anchorFiles and clients will always need to process at least these 2 events, there is a minimum amount of data which must always be kept available, and it is the frequency of creation of this data and its size the attacker will exploit. Under this new schema, an attacker will only have the ability to attack ledger transactions, anchorFiles. The checkpoint window will protect the other files from exploitation, nodes will be able to prune data that failed to checkpoint at the start of each new window, and the attacker's bloating efforts will be reset (A sybil vulnerability remains if the attacker is willing to checkpoint, but DOS is mitigated partially). Stale DIDs will have their data pruned, so the issue with GPG Key servers is also partially mitigated here. Consider Sybil: A sybil attack is achieved when an attacker can control a significant percentage of a network. In a free to join p2p system with no central authority to check IDs, this is exceedingly difficult to mitigate, and there are plenty of research papers covering the topic. The main defense against sybil is increasing the cost of creating and maintaining nodes in the network, or disrupting their ability to coordinate. The sybil attack will start with a |
A few replies to @OR13: I agree 100% with all your naming nits, we can certainly call these props/values whatever makes things clearest to everyone. "I don't understand why create operations are not signed, that seems bad, since the recovery pub key is included with the create op, a signature from it should be required." One reason for this is that we'd be transitioning from a DID Suffix of: (Because you brought up spam in general, here's an aside about another future means of hobbling spammers, beyond the checkpointing mechanism: #271) |
This ticket gives me anxiety ;) The main thing that it needs is additional comments from other implementers. @thehenrytsai IMO, we should version the protocol and create a clean v2 spec, so we can think about it not as a set of breaking changes but as a new implementation of a new spec. |
@OR13, I agree with the 'anxiety' comment. I have been dreading and delaying implementing this for as long as I could, and the time has come. I have gone as far as proposing an entirely different, purist approach by introducing the concept of protocol enforced DID expiration and DID rolling, however it appears to have unrealistic requirements of 1. requiring relying parties to perform a resolution of the DID in concern once every few years and, 2. forcing claims to have the same expiry constraints as the subject and issuing DID. I'd love to brainstorm further on my alternative approach, but it is tabled for the time being. I believe I now understand these requirements well 'enough' and will start implementation next week, the intention is for this to be the primary change for the v0.5.0 release. I will update the protocol spec and implementation documentation accordingly. DIDs created with the current protocol version (v0.4.0) should continue to resolve in testnet as per #269, but going forward the scheme described in this issue will be the official protocol, with the expectation that only the new scheme will be supported in the bitcoin mainnet. |
Let's not feel too much anxiety - until we hit mainnet beta, there was
always the expectation that we'd have to resolve these critical challenges
in ways that would significantly modify the existing spec as written. I'm
not inclined to carry any legacy work/maintenence or technical debt against
the current doc or code as it was structured. If we had done this a year
from now, well into mainnet, sure, I'd treat it differently.
As for the alternate proposal: the most serious non-starter is that it
would create a condition where you essentially lose your ID every so often,
have to connect with all RPs to redo proofing, and can never securely prove
to any new entity you engage with that you ever owned the IDs the system
forcibly deleted. That's functionally degrading the system's primary
utility to an unacceptable point, so I don't want to spend more time on it.
…On Thu, Aug 22, 2019, 4:31 PM Henry Tsai ***@***.***> wrote:
@OR13 <https://github.com/OR13>, I agree with the 'anxiety' comment. I
have been dreading and delaying implementing this for as long as I could,
and the time has come. I have gone as far as proposing an entirely
different, purist approach by introducing the concept of protocol enforced
DID expiration and DID rolling, however it appears to have unrealistic
requirements of 1. requiring relying parties to perform a resolution of the
DID in concern once every few years and, 2. forcing claims to have the same
expiry constraints as the subject and issuing DID. I'd love to brainstorm
further on my alternative approach, but it is tabled for the time being.
I believe I now understand these requirements well 'enough' and will start
implementation next week, the intention is for this to be the primary
change for the v0.5.0 release. I will update the protocol spec and
implementation documentation accordingly. DIDs created with the current
protocol version (v0.4.0) should continue to resolve in testnet as per
#269 <#269>, but
going forward the scheme described in this issue will be the official
protocol, with the expectation that only the new scheme will be supported
in the bitcoin mainnet.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#266?email_source=notifications&email_token=AABAFSSPHQHOJURZ5Y62MNTQF4OVRA5CNFSM4IEU4HBKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD46WDEI#issuecomment-524116369>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AABAFSQ3G33I3HXRLARXKRLQF4OVRANCNFSM4IEU4HBA>
.
|
@csuwildcat, I am fine tabling my alternate proposal, but disagree with the following statements:
No, you don't. At the end of the day, it is an RP's decision to keep your account in its system or not, if a system that uses ION chooses to delete your account due to your inactivity, there is nothing a user can do about it. The alternate proposal is simply promoting the decision of keeping someone in your service/system an explicit decision, which can be argued as a selling point because we now just gave the relying parties a definitive way to prune stale/spam accounts. Any serious systems especially commercial consumer-facing services would probably have every economic incentive to do so.
If you are talking about authentication, you would not go to a new entity using an old DID that is rolled over/tombstoned, you would simply use the new DID. If you are talking about presenting claims/credentials issued to your old DID, then yes, the alternate proposal essentially forces claims/credentials to have an expiry that's tied to the life-time of the DID (including tombstone period), which is admittedly a limitation as stated in my comments above, but I will say that: 1. The notion that all claims have an expiry isn't inherently bad; and 2. This gives signer of the claims the opportunity to renew claims, which isn't inherently horrible for the ecosystem, because if we accept that signers must have mechanisms in which they revoke/invalidate claims they issue, this approach just gave them a built-in time-based way to do so, and renewal can even be automated even if we tie in the Hub. Finally, I agree that it is counter-intuitive that the DID lifetime is not forever, but we get a new passport number every time we renew our passport, we don't seem to be concerned about services/systems that may have our old passport numbers. I also wonder if there are situations when I actually want to roll my DID from one to another and have all the services I use seamlessly move on from the old DID to the new one, the alternative proposal obviously give you that ability, this is analogous to a Spotify user changing the login email from one to another. The alternate proposal is not without limitations but nor is the current proposal, however I am not going to explore the alternate proposal any further unless it gets some thumb ups. |
There are two big things I see wrong here that we should talk about. Let's
definitely talk you, I, and perhaps Orie, so I can explain the two major
issues with this that make it a nonstarter.
…On Fri, Aug 23, 2019, 4:34 PM Henry Tsai ***@***.***> wrote:
@csuwildcat <https://github.com/csuwildcat>, I am fine tabling my
alternate proposal, but disagree with the following statements:
have to connect with all RPs to redo proofing
No, you don't. At the end of the day, it is an RP's decision to keep your
account in its system or not, if a system that you have not used for years
chooses to delete your account (by doing nothing - i.e. not resolving your
DID once every few years), there is nothing a user can do about it. The
alternate approach is simply making the decision of keeping someone in your
service/system an explicit decision, which can be argued as a selling point
because we now just gave the relying parties a definitive way to prune
stale/spam accounts.
and can never securely prove to any new entity you engage with that you
ever owned the IDs the system forcibly deleted.
If you are talking about authentication, you would not go to a new entity
using an old DID that is rolled over/tombstoned, you would simply use the
new DID. If you are talking about presenting claims/credentials issued to
your old DID, then yes, the alternate proposal essentially forces
claims/credentials to have an expiry that's tied to the life-time of the
DID (including tombstone period), which is admittedly a limitation as
stated in my comments above, but I will say that: 1. The notion that all
claims have an expiry isn't inherently bad; and 2. This gives signer of the
claims the opportunity to renew claims, which isn't inherently horrible for
the ecosystem, because if we accept that signers must have mechanisms in
which they revoke/invalidate claims they issue, this approach just gave
them a built-in time-based way to do so, and renewal can even be automated
even if we tie in the Hub.
Finally, I agree that it is counter-intuitive that the DID lifetime is not
forever, but we get a new passport number every time we renew our passport,
we don't seem to be concerned about services/systems that may have our old
passport numbers.
I also wonder if there are situations when I actually want to roll my DID
from one to another and have all the services I use seamlessly move on from
the old DID to the new one, the alternative proposal obviously give you
that ability, this is analogous to a Spotify user changing the login email
from one to another.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#266?email_source=notifications&email_token=AABAFSVQTRKE3DWC5SBISRLQGBXZRA5CNFSM4IEU4HBKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5BR75Y#issuecomment-524492791>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AABAFSTBSBBGYNLSM6DN2BTQGBXZRANCNFSM4IEU4HBA>
.
|
I don't think we necessarily need separate batch files for the operation types, so it may just be the addition of the update map file and the recovery/checkpoint map file. This could save a decent amount of implementation time, so let's discuss it soon. |
This is the kind of thing it might help to have a higher bandwidth call on, potentially with slides / pictures of the preposed layout. |
I agree - with an actual working group and the rigor of developing against
a more formal spec, it will be easier to drive focus on specific issues and
facilitate discussion/collaboration, including calls we request all members
of the group to attend ;)
I'll setup a call for this issue in the meantime.
…On Thu, Jan 2, 2020, 7:30 AM Orie Steele ***@***.***> wrote:
This is the kind of thing it might help to have a higher bandwidth call
on, potentially with slides / pictures of the preposed layout.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#266?email_source=notifications&email_token=AABAFSTPCHJO275MPVBU5BDQ3YCBVA5CNFSM4IEU4HBKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEH6TGJY#issuecomment-570241831>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AABAFSSP74G4V7CC4G7TRCTQ3YCBVANCNFSM4IEU4HBA>
.
|
After discussing some ideas with @thehenrytsai, we are proposing a modification to the structures above that will make the structural changes even simpler, while allowing for a number of performance-enhancing features that can be delivered as protocol updates. The structures would be modified as follows: Anchor FileAll types of ops represented:
Map File
Batch Chunks (example of one chunk)
With this updated proposal, we do a few things:
|
format of PUB_KEY and * _SIGNATURE need to defined... I would love if they were JWKs and JWS... we would not have to worry about the complexity of none standard signing or key representations... |
This seems like it will be a big improvement... but since its such a big change, I think we should also address the over reliance of secp256k1 for sidetree... we should support NIST Curves, RSA, Ed25519, and secp256k1, so we have good coverage for non extractable key systems / secure enclaves / azure key vault, etc... secp256k1 is the worst curve wrt extractability... |
we should also address canonicalization attacks... before a hash is computed over a serialized object... the object must be canonicalized. |
|
I am not sure about how important this is, given the guts of Sidetree don't (and needn't) really care much about presentational key formatting, and we can output whatever we want, but perhaps the cause could be made. I would want to understand the size impact of actually storing it in a verbose presentational format, vs a truncated form that is simply output to whatever format we choose. |
secp256k1 has no trusted hardware support... it doesn't even work with azure key vault.
RSA is supported by android secure enclave, and is widely used on networks that are not even allowed to use secp256k1... I do agree that size is a problem for it... but I don't agree that its wise to ban it, when other did methods will support it and work better with existing trusted hardware systems because of it. |
but you are picking a key format here... if its hex encoding... you should really reconsider... it should be JWK or multibase, and multibase is not really ready yet. |
also since JWK is now a valid way to express every key that you can in a DID Document, we should all be leveraging that to make things easy and interoperable. |
Point taken on the JWK front - we can always codify new, more compact JWK
representations in the registry.
…On Fri, Jan 24, 2020, 5:32 PM Orie Steele ***@***.***> wrote:
also since JWK is now a valid way to express every key that you can in a
DID Document, we should all be leveraging that to make things easy and
interoperable.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#266?email_source=notifications&email_token=AABAFSUR4AWRTL47TFAJ3JLQ7OJDZA5CNFSM4IEU4HBKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJ4R3BA#issuecomment-578362756>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AABAFSQKBVQ2EN3PJ2V2CN3Q7OJDZANCNFSM4IEU4HBA>
.
|
1. Added Map File to file structure. 1. Increased code coverage of `TransactionProcessor` to 100%. 1. Some code refactoring.
* Replaced `DidResolutionModel` with `DocumentState` so the state keeping is no longer DID document specific. * Removed various classes such as `AnchoredOperation` and `ApplyResult` + various renames for consistency.
No description provided.
The text was updated successfully, but these errors were encountered: