Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow clearing of dead ops to avoid negatively impacting resolution time, storage/bandwidth, and new node boot time #266

Closed
csuwildcat opened this issue Jul 17, 2019 · 25 comments
Labels
code refactoring protocol Sidetree protocol change proposal

Comments

@csuwildcat
Copy link
Member

No description provided.

@csuwildcat
Copy link
Member Author

csuwildcat commented Jul 20, 2019

A long-time goal of the protocol has been to devise a means for large scale pruning of historical data to reduce the total amount of data a full node stores. The following proposal aims to achieve that goal via a set of changes to existing data structures and protocol rules. The changes make it possible to eliminate all past operation data across the network.

The first step in this effort is retooling the Anchor/Batch file approach to branch out files by operation and type and include another level: a Map File that captures everything necessary to securely, accurately derive the lineage of an ID far into the future, without retention of verbose op data from batch files.

New three-tiered file structure:

The protocol will be modified to separate ops by type within the Anchor file, linking to their own, distinct Map and Batch files. The number of files will be increased from the current two (Anchor and Batch files) to the following six:

  1. Anchor File
  2. Update op Map File
  3. Recovery/Checkpoint op Map File
  4. Create op Batch file
  5. Update op Batch file
  6. Recovery/Checkpoint op Batch file

Create Operations

Anchor

Single values:

  • Hash of Create op Batch File: 32 bytes

Per DID:

  • Initial DID Doc hash: 32 bytes
  • Recovery commit hash: 32 bytes
  • Recovery key: 33 bytes

Map
NO MAP FILE

Batch

Per DID:

  • 1 Kilobyte of op data - must include first Update commit hash

Update Operations

Anchor

Single values:

  • Map File hash: 32 bytes

Map
Single values:

  • Hash of Update op Batch File: 32 bytes

Per DID:

  • Operation hash: 32 bytes
  • Update reveal value: 32 bytes
  • Signature over operation hash: 71 bytes

Batch

  • 1 Kilobyte of op data - must include next Update commit hash

Recovery/Checkpoint Operations

Anchor

Single values:

  • Hash of Recovery/Checkpoint op Map file: 32 bytes

Per DID:

  • DID Unique Suffix: 32 bytes
  • Reveal Value: 32 bytes

Map

Single values:

  • Hash of Recovery/Checkpoint op Batch file: 32 bytes

Per DID:

  • Operation hash: 32 bytes
  • Commit hash: 32 bytes
  • Recovery key (optional): 33 bytes
  • Signature over reveal value (from Anchor), DDO hash, commit hash, and optional recovery key: 71 bytes

Batch

  • 1 Kilobyte of op data
Other related changes to files/data:
  • Create ops will no longer be signed. To do so would require us to either keep the Create op data for all time, or force the user to sign the Anchor File entry with the specified recovery key, which are not trade-offs we should make.

Checkpointing

In order to purge past data, there must exist a mechanism to trigger a checkpoint, wherein all DID owners generate an op that refreshes their ID state to the latest full state. This proposal advocates an automatic, network-wide, checkpointing mechanism, triggered by a deterministic calculation of known values, such as: passage of N blocks of chain-time, a some N number of updates in ratio to the number of IDs in the system, etc.

With this set of changes, all of the following can be pruned after a checkpoint:

  • All Create op Batch files
  • All Update op Map and Batch files
  • All previous Recovery op Batch files
  • All previous Checkpoint op's Batch files

@csuwildcat
Copy link
Member Author

^ @thehenrytsai @OR13 @rado0x54 have a look at this rough proposal and let me know what you think.

@csuwildcat csuwildcat added code refactoring high priority protocol Sidetree protocol change proposal labels Jul 20, 2019
@csuwildcat
Copy link
Member Author

csuwildcat commented Jul 21, 2019

Here's roughly what each file's structure would look like:

Anchor File

This would be an Anchor File with all types of ops represented:

{
    create: {
        batch: BATCH_FILE_HASH,
        ops: [
            {
              initial_state: INITIAL_DOC_HASH,
              recovery_key: PUB_KEY,
              recovery_commitment: SECRET_VALUE_HASH
            },
            {...}
        ],
    }
    update: {
        map: MAP_FILE_HASH,
        batch: BATCH_FILE_HASH
    },
    recover: {
        map: MAP_FILE_HASH,
        batch: BATCH_FILE_HASH,
        ops: [
            {
              did: DID_SUFFIX,
              reveal_value: REVEALED_SECRET_VALUE
            },
            { ... }
        ]
    },
    checkpoint: {
        map: MAP_FILE_HASH,
        batch: BATCH_FILE_HASH,
        ops: [
            {
              did: DID_SUFFIX,
              reveal_value: REVEALED_SECRET_VALUE
            },
            { ... }
        ]
    }
}

Create Batch File

{
    ops: [
        { 1K_OP_DATA, UPDATE_COMMITMENT_HASH },
        {...}
    ]
}

Update Map File

{
    batch: 32_BYTE_HASH,
    ops: [
        {
          didUniqueSuffix,
          delta: DDO_DELTA_HASH,
          reveal_value: REVEALED_COMMITMENT_VALUE
          sig: SIGNATURE_OVER_DELTA
        },
        {...}
    ]
}

Update Batch File

{
    ops: [
        { 1K_OP_DATA, UPDATE_COMMITMENT_HASH },
        {...}
    ]
}

Recovery/Checkpoint Map File

{
    batch: 32_BYTE_HASH,
    ops: [
        {
          did_document_hash: DID_DOC_HASH,
          commitment: SECRET_VALUE_HASH,
          recovery_key: PUB_KEY, (optional)
          sig: RECOVERY_KEY_SIGNATURE
          // signed over the op hash, revealed value, 
          // commitment hash, and optional recovery pub key
        },
        {...}
    ]
}

Recovery/Checkpoint Batch File

{
    ops: [
        { 1K_OP_DATA },
        {...}
    ]
}

@csuwildcat
Copy link
Member Author

Note on proposal: Signatures in the scheme moving from the Batch file to the Map file to represent a weight shift to lighter nodes, but we may be able to implement a signature aggregation scheme (if one fits our heterogeneous message constrains) that would significantly reduce the sig load in batches over a certain low op-count floor size.

@OR13
Copy link
Contributor

OR13 commented Jul 23, 2019

@csuwildcat I think we need to provide better sample data in order to make these changes more accessible, and also clearer names.

I will update this comment based on anything I have gotten wrong.

  • batchFileHash is a pointer to a batchFile, which contains base64url encoded operations.

    • no change.
  • mapFileHash is a pointer to a mapFile, which contains a batchFileHash and a property called opSignatures which are json objects with required properties: delta, commitment, sig, and optional properties: recovery_key. the sig is over both the delta, commitment, and reveal, but note that reveal is actually stored in the anchorFile, not the map file.

  • there are actually 2 kinds of map files, and for updates, the mapFile is disposable, but that is not the case for recovery and checkpoint operations.

There is only one schema for batchFile but a "recovery/checkpoint" batchFile is made of only operations that are recovery or checkpoint.

I don't understand why create operations are not signed, that seems bad, since the recovery pub key is included with the create op, a signature from it should be required. There should be an unbroken chain of proof of control of keys starting from create. We may even want to make the create operation signature include an expensive hash function, so that there is a computational cost for new DIDs and its payment is authenticated.

A related, and particularly spicy detail of linking the recovery_key to the anchorFile is that key format must also be provided, and similarly signature formats must be specified. We would not want to limit the protocol to only secp256k1 keys, that would not support the "blockchain agnostic vision".

It seems like we are trying to shorten property names. I suggest we not do that in json, and instead add support for compact binary representations of these files and schemas somewhere else. This ticket should be about providing maximum clarity regarding the identifiers, file schema's and relationships, and more verbose names will help with that. We can shorten them once its clear.

Technically this ticket is combining both the commit reveal anti DOS protocol update with a pruning protocol update. (neither are fully spec'ed or implemented AFAIK).

I'm very much in favor of these changes, especially since treating IPFS as permanent unlimited storage is a very bad idea, and in order to protect against a number of sybil attacks, there should be a cost to keeping DIDs fresh over long periods of time.

In order to see the true value of these changes, I think its worth outline the worst case protocol attack, and see how these changes are required to mitigate it.

Consider the case where wealthy dedicated attackers seek, DOS or sybil attacks, where a single user or group of users controls huge percentages of DIDs which are then used in second order reputation based attacks, or DOS attacks.

Consider DOS:

DOS is achieved by causing the traffic volume or traffic size to become too burdensome for clients, which will ultimately cause DIDs to become unresolvable. While the proposed commit/reveal strategy can help clients avoid downloading data related to a DID that was not created by the DID, an attacker can always craft ledger transactions and anchorFiles and clients will always need to process at least these 2 events, there is a minimum amount of data which must always be kept available, and it is the frequency of creation of this data and its size the attacker will exploit.

Under this new schema, an attacker will only have the ability to attack ledger transactions, anchorFiles. The checkpoint window will protect the other files from exploitation, nodes will be able to prune data that failed to checkpoint at the start of each new window, and the attacker's bloating efforts will be reset (A sybil vulnerability remains if the attacker is willing to checkpoint, but DOS is mitigated partially). Stale DIDs will have their data pruned, so the issue with GPG Key servers is also partially mitigated here.

Consider Sybil:

A sybil attack is achieved when an attacker can control a significant percentage of a network. In a free to join p2p system with no central authority to check IDs, this is exceedingly difficult to mitigate, and there are plenty of research papers covering the topic. The main defense against sybil is increasing the cost of creating and maintaining nodes in the network, or disrupting their ability to coordinate. The sybil attack will start with a recovery_key generation, followed by a create. In the old system, since there was no pruning, the attacker only had to be patient. Now the attacker needs to manage a checkpoint operation for each DID. If the cost of create and checkpoint are high enough to be burdensome at scale, but not too burden some for a single low energy device, a decent compromised is achieved. The current solution would require the attacker to participate in each checkpoint window for each DID it manages, you can see that the length of the checkpoint window and the cost of the checkpoint / create operations create bounds for how many DIDs an attacker can create and maintain at a given computational strength.

@csuwildcat
Copy link
Member Author

A few replies to @OR13:

I agree 100% with all your naming nits, we can certainly call these props/values whatever makes things clearest to everyone.


"I don't understand why create operations are not signed, that seems bad, since the recovery pub key is included with the create op, a signature from it should be required."

One reason for this is that we'd be transitioning from a DID Suffix of: SHA256(initial_ddo), to the following: SHA256(initial_ddo, commitment, recovery_pub_key). This means that any change to the bytes of those three values would literally be a different DID.


(Because you brought up spam in general, here's an aside about another future means of hobbling spammers, beyond the checkpointing mechanism: #271)

@OR13
Copy link
Contributor

OR13 commented Aug 7, 2019

This ticket gives me anxiety ;)

The main thing that it needs is additional comments from other implementers. @thehenrytsai

IMO, we should version the protocol and create a clean v2 spec, so we can think about it not as a set of breaking changes but as a new implementation of a new spec.

@thehenrytsai
Copy link
Collaborator

@OR13, I agree with the 'anxiety' comment. I have been dreading and delaying implementing this for as long as I could, and the time has come. I have gone as far as proposing an entirely different, purist approach by introducing the concept of protocol enforced DID expiration and DID rolling, however it appears to have unrealistic requirements of 1. requiring relying parties to perform a resolution of the DID in concern once every few years and, 2. forcing claims to have the same expiry constraints as the subject and issuing DID. I'd love to brainstorm further on my alternative approach, but it is tabled for the time being.

I believe I now understand these requirements well 'enough' and will start implementation next week, the intention is for this to be the primary change for the v0.5.0 release. I will update the protocol spec and implementation documentation accordingly. DIDs created with the current protocol version (v0.4.0) should continue to resolve in testnet as per #269, but going forward the scheme described in this issue will be the official protocol, with the expectation that only the new scheme will be supported in the bitcoin mainnet.

@csuwildcat
Copy link
Member Author

csuwildcat commented Aug 23, 2019 via email

@OR13
Copy link
Contributor

OR13 commented Aug 23, 2019

This will be a good opportunity for us to get better typescript support in element. Its been a while since I thought through these changes, I'd love to help assist with spec revisions. Feel free to assign @OR13 and @gjgd to PRs.

@thehenrytsai
Copy link
Collaborator

thehenrytsai commented Aug 23, 2019

@csuwildcat, I am fine tabling my alternate proposal, but disagree with the following statements:

have to connect with all RPs to redo proofing

No, you don't. At the end of the day, it is an RP's decision to keep your account in its system or not, if a system that uses ION chooses to delete your account due to your inactivity, there is nothing a user can do about it. The alternate proposal is simply promoting the decision of keeping someone in your service/system an explicit decision, which can be argued as a selling point because we now just gave the relying parties a definitive way to prune stale/spam accounts. Any serious systems especially commercial consumer-facing services would probably have every economic incentive to do so.

and can never securely prove to any new entity you engage with that you ever owned the IDs the system forcibly deleted.

If you are talking about authentication, you would not go to a new entity using an old DID that is rolled over/tombstoned, you would simply use the new DID. If you are talking about presenting claims/credentials issued to your old DID, then yes, the alternate proposal essentially forces claims/credentials to have an expiry that's tied to the life-time of the DID (including tombstone period), which is admittedly a limitation as stated in my comments above, but I will say that: 1. The notion that all claims have an expiry isn't inherently bad; and 2. This gives signer of the claims the opportunity to renew claims, which isn't inherently horrible for the ecosystem, because if we accept that signers must have mechanisms in which they revoke/invalidate claims they issue, this approach just gave them a built-in time-based way to do so, and renewal can even be automated even if we tie in the Hub.

Finally, I agree that it is counter-intuitive that the DID lifetime is not forever, but we get a new passport number every time we renew our passport, we don't seem to be concerned about services/systems that may have our old passport numbers.

I also wonder if there are situations when I actually want to roll my DID from one to another and have all the services I use seamlessly move on from the old DID to the new one, the alternative proposal obviously give you that ability, this is analogous to a Spotify user changing the login email from one to another.

The alternate proposal is not without limitations but nor is the current proposal, however I am not going to explore the alternate proposal any further unless it gets some thumb ups.

@csuwildcat
Copy link
Member Author

csuwildcat commented Aug 24, 2019 via email

@csuwildcat csuwildcat changed the title Restructure protocol to enable network-wide pruning of operation data Restructure protocol and source file for efficient op processing and future data pruning Oct 9, 2019
@csuwildcat csuwildcat changed the title Restructure protocol and source file for efficient op processing and future data pruning Adjust file structure to avoid degrading resolution time, storage/bandwidth constraints, and new node boot time Dec 18, 2019
@csuwildcat csuwildcat changed the title Adjust file structure to avoid degrading resolution time, storage/bandwidth constraints, and new node boot time Allow clearing of useless ops to avoid degrading resolution time, storage/bandwidth constraints, and new node boot time Dec 18, 2019
@csuwildcat csuwildcat changed the title Allow clearing of useless ops to avoid degrading resolution time, storage/bandwidth constraints, and new node boot time Allow clearing of dead ops to avoid harming resolution time, storage/bandwidth, and new node boot time Dec 18, 2019
@csuwildcat csuwildcat changed the title Allow clearing of dead ops to avoid harming resolution time, storage/bandwidth, and new node boot time Allow clearing of dead ops to avoid negatively impacting resolution time, storage/bandwidth, and new node boot time Dec 18, 2019
@csuwildcat
Copy link
Member Author

I don't think we necessarily need separate batch files for the operation types, so it may just be the addition of the update map file and the recovery/checkpoint map file. This could save a decent amount of implementation time, so let's discuss it soon.

@OR13
Copy link
Contributor

OR13 commented Jan 2, 2020

This is the kind of thing it might help to have a higher bandwidth call on, potentially with slides / pictures of the preposed layout.

@csuwildcat
Copy link
Member Author

csuwildcat commented Jan 2, 2020 via email

@csuwildcat
Copy link
Member Author

After discussing some ideas with @thehenrytsai, we are proposing a modification to the structures above that will make the structural changes even simpler, while allowing for a number of performance-enhancing features that can be delivered as protocol updates. The structures would be modified as follows:

Anchor File

All types of ops represented:

{
    map_file: MAP_FILE_HASH,
    create: {
        ops: [
            {
              initial_state: INITIAL_STATE_HASH,
              initial_recovery_key: PUB_KEY,
              initial_recovery_commitment: SECRET_VALUE_HASH
            },
            {...}
        ],
    },
    recovery: {  // checkpoints just use recovery ops
        ops: [
            {
              did: DID_SUFFIX,
              recovery_reveal: REVEALED_SECRET_VALUE,
              new_recovery_commitment: SECRET_VALUE_HASH,
              new_state: RECOVERY_STATE_HASH,
              new_recovery_key: PUB_KEY,  // (optional, for rolling)
              sig: RECOVERY_KEY_SIGNATURE
            },
            { ... }
        ]
    }
}

Map File

{
   chunks: [
      { chunk: CHUNK_HASH }
   ],
   updates: {
     ops: [
        {
          did: DID_UNIQUE_SUFFIX,
          update_reveal: REVEALED_COMMITMENT_VALUE,
          update_patch: STATE_PATCH_HASH,
          sig: UPDATE_KEY_SIGNATURE
        },
        {...}
      ]
    }
}

Batch Chunks (example of one chunk)

{
   ops: [
        { data: 1K_OP_DATA },  // Recovery op
        {  // Create & Update ops
          data: 1K_OP_DATA,
          update_commitment: UPDATE_COMMITMENT_HASH
        }
   ]
}

With this updated proposal, we do a few things:

  1. Move retained recovery/checkpoint proving data into the anchor file.
  2. Removed the concept of operation specific Map Files, and consolidate all map-related functions into a single Map File, which can be discarded after checkpoints.
  3. Introduce the foundation for deterministic batch chunking.
  4. Remove the concept of operation specific Batch Files, and instead move to a generic Batch Chunk scheme where the subset of ops contained in the chunk are of commingled types.
  5. Because all proving data now resides in the Anchor File, none of the other files, regardless of op type they may contain, are required to be retained after checkpoints.

@OR13
Copy link
Contributor

OR13 commented Jan 25, 2020

format of PUB_KEY and * _SIGNATURE need to defined... I would love if they were JWKs and JWS... we would not have to worry about the complexity of none standard signing or key representations...

@OR13
Copy link
Contributor

OR13 commented Jan 25, 2020

This seems like it will be a big improvement... but since its such a big change, I think we should also address the over reliance of secp256k1 for sidetree... we should support NIST Curves, RSA, Ed25519, and secp256k1, so we have good coverage for non extractable key systems / secure enclaves / azure key vault, etc... secp256k1 is the worst curve wrt extractability...

@OR13
Copy link
Contributor

OR13 commented Jan 25, 2020

we should also address canonicalization attacks... before a hash is computed over a serialized object... the object must be canonicalized.

@csuwildcat
Copy link
Member Author

This seems like it will be a big improvement... but since its such a big change, I think we should also address the over reliance of secp256k1 for sidetree... we should support NIST Curves, RSA, Ed25519, and secp256k1, so we have good coverage for non extractable key systems / secure enclaves / azure key vault, etc... secp256k1 is the worst curve wrt extractability...

  1. I would like to stick with just secp256k1 for at least the near/mid-term
  2. RSA size is an issue, and I have no intention of supporting it in ION, but if we revisited the protocol language around different key types, another implementation can take on work for all the necessary abstraction/modifications and support it in their network.

@csuwildcat
Copy link
Member Author

format of PUB_KEY and * _SIGNATURE need to defined... I would love if they were JWKs and JWS... we would not have to worry about the complexity of none standard signing or key representations...

I am not sure about how important this is, given the guts of Sidetree don't (and needn't) really care much about presentational key formatting, and we can output whatever we want, but perhaps the cause could be made. I would want to understand the size impact of actually storing it in a verbose presentational format, vs a truncated form that is simply output to whatever format we choose.

@OR13
Copy link
Contributor

OR13 commented Jan 25, 2020

  1. I would like to stick with just secp256k1 for at least the near/mid-term

secp256k1 has no trusted hardware support... it doesn't even work with azure key vault.

2. RSA size is an issue,

RSA is supported by android secure enclave, and is widely used on networks that are not even allowed to use secp256k1... I do agree that size is a problem for it... but I don't agree that its wise to ban it, when other did methods will support it and work better with existing trusted hardware systems because of it.

@OR13
Copy link
Contributor

OR13 commented Jan 25, 2020

I am not sure about how important this is, given the guts of Sidetree don't (and needn't) really care much about presentational key formatting,

but you are picking a key format here... if its hex encoding... you should really reconsider... it should be JWK or multibase, and multibase is not really ready yet.

@OR13
Copy link
Contributor

OR13 commented Jan 25, 2020

also since JWK is now a valid way to express every key that you can in a DID Document, we should all be leveraging that to make things easy and interoperable.

@csuwildcat
Copy link
Member Author

csuwildcat commented Jan 25, 2020 via email

thehenrytsai added a commit that referenced this issue Feb 3, 2020
1. Added Map File to file structure.
1. Increased code coverage of `TransactionProcessor` to 100%.
1. Some code refactoring.
thehenrytsai added a commit that referenced this issue Feb 13, 2020
thehenrytsai added a commit that referenced this issue Mar 11, 2020
* Replaced `DidResolutionModel` with `DocumentState` so the state keeping is no longer DID document specific.
* Removed various classes such as `AnchoredOperation` and `ApplyResult` + various renames for consistency.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
code refactoring protocol Sidetree protocol change proposal
Projects
None yet
Development

No branches or pull requests

3 participants