diff --git a/CAR.md b/CAR.md deleted file mode 100644 index 62bee684..00000000 --- a/CAR.md +++ /dev/null @@ -1,108 +0,0 @@ -# ![](https://img.shields.io/badge/status-wip-orange.svg?style=flat-square) Certified ARchive - -CARs (Certified ARchives) are archives for IPLD DAGs. - -## Summary - -CARs are archives of IPLD DAGs. They are: - -1. Certified -2. Seekable -3. Compact -4. Reproducible -5. Simple and Stable - -The actual format is just a (mostly) recursively-defined topological sort of the -desired DAG with some metadata for fast traversal. - -``` -CID -len(ROOT) -ROOT -[ - CHILD-1-OFFSET - ... -] -[ - len(CHILD-1) - CHILD-1 - [ - CHILD-1/1-OFFSET - ... - ] - ... -] -``` - -Offsets are relative, offsets for missing children use a sentinel value. - -We only bother including the root CID because all the other CIDs are embedded in -the objects themselves. This saves space and *forces* parsers to actually -traverse the DAG (hopefully validating it). - -## Motivation - -Use cases: - -1. Reliably export/import a DAG to/from an external hard drive (backup, sneakernet). -2. Traverse a large DAG on an external hard drive without parsing the *entire* DAG. -3. Traverse a large DAG streamed over HTTP without downloading the entire thing. - -The simple method is to copy the entire repo. However, for performance, we need -to be able to upgrade the repo format so this isn't really a stable format. -Additionally, repos need to support insertions, deletions, and random lookups. -Supporting these efficiently necessarily complicates the formats. We'd like -something simple and portable (backup). - -The slightly more complex way is to download every object into a separate file -and then import each file. However, this isn't very convenient and *does not* -scale to large directories well (use case 2). - -One could improve this multi-file approach by splitting up the DAG into multiple -directories and providing a set of tools to manage the files. However, we'd -rather not rely on the filesystem for anything, really. Filesystems: - -* Don't always deal with names well (e.g., FAT16). -* Don't always handle many small files well. -* Aren't usually as space-efficient as possible (to support updates). -* Are complex (easy to corrupt metadata/structure). - -Additionally, it's hard to download a directory structure over HTTP (motivated by -use-case 3). One can just TAR it up but that layers another (complex) file -format into the mix. - -So, we'd like a new single-file format that, if necessary, we can just `dd` to a -drive in place of a filesystem. - -TODO: Expand. - -# Questions - -However, there are a few open questions. - -## Uint64/Varint - -The advantage of using uint64s over varints is that we can leave the "jump -tables" blank and then fill them in on a second pass after we've written -everything. However, if we topologically sort the DAG, we may be able to compute -the jump tables up-front. - -The advantages of varints over uint64 are space and flexibility (DAGs larger -than 16 Exbibytes). - -Currently, I'm leaning toward varints as this will make storing lots of small -blocks significantly more efficient. - -## Inline Blocks - -So, we can technically have inline blocks using the identity multihash. How do -we deal with them? - -1. We *don't* want to duplicate the data. -2. We need to support inline blocks with children. - -## Topological sort - -So, a topological sort makes it really easy to traverse the CAR, even when -streaming. However producing a topologically sorted DAG is a bit trickier. Note: -whatever we choose, it won't have any affect on the asymptotic runtime (memory or time). diff --git a/CID.md b/CID.md new file mode 100644 index 00000000..01e957c8 --- /dev/null +++ b/CID.md @@ -0,0 +1,68 @@ +# CIDv1 + +# Content IDs + +This document will use the words Content IDs or CIDs. + +Prior base58 multihash links to protobuf data be called CID Version 0. + +## CIDs Version 1 + +Putting together the IPLD Link update statements above, we can term the new handle for IPLD data CID Version 1, with a multibase prefix, a version, a packed multicodec, and a multihash. + +``` + +``` + +Where: +- `` is a multibase prefix describing the base that encodes this CID. If binary, this is omitted. +- `` is the version number of the cid. +- `` is a multicodec-packed identifier, from the CID multicodec table +- `` is a cryptographic multihash, including: `` + +Note that all CIDs v1 and on should always begin with ``, this evolving nicely. + +### Multicodec Packed Representation + +It is useful to have a compact version of multicodec, for use in small identifiers. This compact identifier will just be a single varint, looked up in a table. Different applications can use different tables. We should probably have one common table for well-known formats. + +We will establish a table for common authenticated data structure formats, for example: IPFS v0 Merkledag, CBOR IPLD, Git, Bitcoin, and more. The table is a simple varint lookup. + +### Distinguishing v0 and v1 CIDs (old and new) + +It is a HARD CONSTRAINT that all IPFS links continue to work. This means we need to continue to support v0 CIDs. This means IPFS APIs must accept both v0 and v1 CIDs. This section defines how to distinguish v0 from v1 CIDs. + +Old v0 CIDs are strictly sha2-256 multihashes encoded in base58 -- this is because IPFS tooling only shipped with support for sha2-256. This means the binary versions are 34 bytes long (sha2-256 256 bit multihash), and that the string versions are 46 characters long (base58 encoded). This means we can recognize a v0 CID by ensuring it is a sha256 bit multihash, of length 256 bits, and base58 encoded (when a string). Basically: + +- `` is implicitly base58. +- `` is implicitly 0. +- `` is implicitly protobuf (for backwards compat with v0). +- `` is a cryptographic multihash, explicit. + +We can re-write old v0 CIDs into v1 CIDs, by making the elements explicit. This should be done henceforth to avoid creating more v0 CIDs. But note that many references exist in the wild, and thus we must continue supporting v0 links. In the distant future, we may remove this support after sha2 breaks. + +Note we can cleanly distinguish the values, which makes it easy to support both. The code for this check is here: https://gist.github.com/jbenet/bf402718a7955bf636fb47d214bcef8a + +### IPLD supports non-CID hash links as implicit CIDv1s + +Note that raw hash links _stored in various data structures_ (eg Protbouf, Git, Bitcoin, Ethereum, etc) already exist. These links -- when loaded directly as one of these data structures -- can be seen as "linking within a network" whereas proper CIDv1 IPLD links can be seen as linking "across networks" (internet of data! internet of data structures!). Supporting these existing (or even new) raw hash links as a CIDv1 can be done by noting that when on data structure links with just a raw binary link, the rest of the CIDv1 fields are implicit: + +- `` is implicitly binary or whatever the format encodes. +- `` is implicitly 1. +- `` is implicitly the same as the data structure. +- `` can be determined from the raw hash. + +Basically, we construct the corresponding CIDv1 out of the raw hash link because all the other information is _in the context_ of the data structure. This is very useful because it allows: +- more compact encoding of a CIDv1 when linking from one data struct to another +- linking from CBOR IPLD to other CBOR IPLD objects exactly as has been spec-ed out so far, so any IPLD adopters continue working. +- (most important) opens the door for native support of other data structures + +### IPLD addresses raw data + +Given the above addressing changes, it is now possible to address raw data directly, as an IPLD node. This node is of course taken to be just a byte buffer, and devoid of links (i.e. a leaf node). + +The utility of this is the ability to directly address any object via hashing external to IPLD datastructures. + +### Support for multiple binary packed formats + +Contrary to prior Merkle objects (e.g IPFS protobuf legacy, git, bitcoin, dat and others), new IPLD ojects are authenticated AND self described data blobs, each IPLD object is serialized and prefixed by a multicodec identifying its format. diff --git a/Codecs/DAG-CBOR.md b/Codecs/DAG-CBOR.md new file mode 100644 index 00000000..a2b16c9b --- /dev/null +++ b/Codecs/DAG-CBOR.md @@ -0,0 +1,37 @@ +# [WIP] DagCBOR Spec + +DAG-CBOR supports the full ["IPLD Data Model v1."](../IPLD-Data-Model-v1.md) + +CBOR already natively supports all ["IPLD Data Model v1: Simple Types."](../IPLD-Data-Model-v1.md#simple-types) + +## Format + +The CBOR IPLD format is called DagCBOR to disambiguate it from regular CBOR. +Most CBOR objects are valid DagCBOR. The only hard restriction is that any field +with the tag 42 must be a valid CID. + +## Link Format + +As with all IPLD formats, DagCBOR must be able to encode merkle-links. In +DagCBOR, links are encoded using the raw-binary (identity, NUL) multibase in a +field with a byte-string type (major type 2), with the tag 42. + +(the inclusion of the multibase exists for historical reasons) + +## Map Key Restriction + +In DagCBOR, map keys must be strings (TODO: drop this? We already have +unpathable map keys). Furthermore, map keys should avoid using `/` as this is +unpathable (TODO: drop this? IMO, we should support path escaping out of the +box). + +## Canonical DagCBOR + +Canonical DagCBOR should: + +1. Use no tags other than the CID tag (42). Other tags may be lost in + conversion. +2. Should use the canonical CBOR encoding and field ordering. Other orderings + will yield different CIDs. +3. Should only use string map keys. Some implementations may not be able to + handle non-string keys. diff --git a/Codecs/DAG-JSON.md b/Codecs/DAG-JSON.md new file mode 100644 index 00000000..488f8fbe --- /dev/null +++ b/Codecs/DAG-JSON.md @@ -0,0 +1,25 @@ +# [WIP] DAG-JSON v1 + +DAG-JSON supports the full ["IPLD Data Model v1."](../IPLD-Data-Model-v1.md) + +## Format + +### Simple Types + +All simple types except binary are supported natively by JSON. + +Contrary to popular belief, JSON as a format supports Big Integers. It's only +JavaScript itself that has trouble with them. This means JS implementations +of `dag-json` can't use the native JSON parser and serializer. + +#### Binary Type + +```javascript +{"/": { "base64": String }} +``` + +### Link Type + +```javascript +{"/": String /* base encoded CID */} +``` \ No newline at end of file diff --git a/Data-Structures/HAMT.md b/Data-Structures/HAMT.md new file mode 100644 index 00000000..4f155f8c --- /dev/null +++ b/Data-Structures/HAMT.md @@ -0,0 +1,5 @@ +# [WIP] Hash-Array Mapped Trie + +This specifies a standardized hash-array mapped trie on IPLD Data Model v1. + +TODO: write this spec. \ No newline at end of file diff --git a/IPLD-Data-Model-v1.md b/IPLD-Data-Model-v1.md new file mode 100644 index 00000000..5b6256e9 --- /dev/null +++ b/IPLD-Data-Model-v1.md @@ -0,0 +1,17 @@ +# [WIP] IPLD Data Model + +## Simple Types + +* Boolean +* Null +* String +* Integer +* Float +* Array +* Object (Hash Map) +* Binary + +## Link Type + +This type represents a link to another IPLD Block. The link reference +is a [`CID`]('./CID.md). diff --git a/IPLD-Path.md b/IPLD-Path.md new file mode 100644 index 00000000..ab7aae7f --- /dev/null +++ b/IPLD-Path.md @@ -0,0 +1,21 @@ +# [WIP] IPLD Path v1 + +An IPLD Path is a string identifier used for deep references into IPLD +graphs. + +IPLD Path's are constructed following the same constraints as [URI Paths](https://tools.ietf.org/html/rfc3986#section-3.3). + +Similarly, the string `?` is reserved for future use as a query separator. + +# Path Resolution + +Path resolution is broken into two parts: full path resolution and block level resolution. + +Block level path resolutionis defined by individual codecs. + +Full path resolution should use block level resolution through each block. +When a block level resolver returns an `IPLD Link` a full path resolution +should retreive that block, load its codec, and continue on with additional +block level resolution until the full path is resolved. Finally, path resolution +should return a [**representation**](./IPLD-Path.md#representation) +of the value for the given path. \ No newline at end of file diff --git a/IPLD.md b/IPLD.md deleted file mode 100644 index 68cd8535..00000000 --- a/IPLD.md +++ /dev/null @@ -1,553 +0,0 @@ -# ![](https://img.shields.io/badge/status-draft-green.svg?style=flat-square) IPLD `OUT OF DATE` - -> The "thin-waist" merkle dag format - -There are a variety of systems that use merkle-tree and hash-chain inspired datastructures (e.g. git, bittorrent, ipfs, tahoe-lafs, sfsro). IPLD (Inter Planetary Linked Data) defines: - -- **_merkle-links_**: the core unit of a merkle-graph -- **_merkle-dag_**: any graphs whose edges are _merkle-links_. `dag` stands for "directed acyclic graph" -- **_merkle-paths_**: unix-style paths for traversing _merkle-dags_ with _named merkle-links_ -- **IPLD Formats**: a set of formats in which IPLD objects can be represented, for example JSON, CBOR, CSON, YAML, Protobuf, XML, RDF, etc. -- **IPLD Canonical Format**: a deterministic description on a serialized format that ensures the same _logical_ object is always serialized to _the exact same sequence of bits_. This is critical for merkle-linking, and all cryptographic applications. - -## Intro - -### What is a _merkle-link_? - -A _merkle-link_ is a link between two objects which is content-addressed with the _cryptographic hash_ of the target object, and embedded in the source object. Content addressing with merkle-links allows: - -- **Cryptographic Integrity Checking**: resolving a link's value can be tested by hashing. In turn, this allows wide, secure, trustless exchanges of data (e.g. git or bittorrent), as others cannot give you any data that does not hash to the link's value. -- **Immutable Datastructures**: data structures with merkle links cannot mutate, which is a nice property for distributed systems. This is useful for versioning, for representing distributed mutable state (eg CRDTs), and for long term archival. - -A _merkle-link_ is represented in the IPLD object model by a map containing a single key `/` mapped to a "link value". For example: - - -**A link, represented in json as a "link object"** - -```js -{ "/" : "/ipfs/QmUmg7BZC1YP1ca66rRtWKxpXp77WgVHrnv263JtDuvs2k" } -// "/" is the link key -// "/ipfs/QmUmg7BZC1YP1ca66rRtWKxpXp77WgVHrnv263JtDuvs2k" is the link value -``` - -**Object with a link at `foo/baz`** - -```js -{ - "foo": { - "bar": "/ipfs/QmUmg7BZC1YP1ca66rRtWKxpXp77WgVHrnv263JtDuvs2k", // not a link - "baz": {"/": "/ipfs/QmUmg7BZC1YP1ca66rRtWKxpXp77WgVHrnv263JtDuvs2k"} // link - } -} -``` - -**Object with pseudo "link object" at `files/cat.jpg` and actual link at `files/cat.jpg/link`** - -```js -{ - "files": { - "cat.jpg": { // give links properties wrapping them in another object - "link": {"/": "/ipfs/QmUmg7BZC1YP1ca66rRtWKxpXp77WgVHrnv263JtDuvs2k"}, // the link - "mode": 0755, - "owner": "jbenet" - } - } -} -``` - -When dereferencing the link, the map itself is to be replaced by the object it points to unless the link path is invalid. - -The link can either be a multihash, in which case it is assumed that it is a link in the `/ipfs` hierarchy, or directly the absolute path to the object. Currently, only the `/ipfs` hierarchy is allowed. - -If an application wants to use objects with a single `/` key for other purposes, the application itself is responsible to escape the `/` key in the IPLD object so that the application keys do not conflict with IPLD's special `/` key. - -### What is a _merkle-graph_ or a _merkle-dag_? - -Objects with merkle-links form a Graph (merkle-graph), which necessarily is both Directed, and which can be counted on to be Acyclic, iff the properties of the cryptographic hash function hold. I.e. a _merkle-dag_. Hence all graphs which use _merkle-linking_ (_merkle-graph_) are necessarily also Directed Acyclic Graphs (DAGs, hence _merkle-dag_). - -### What is a _merkle-path_? - -A merkle-path is a unix-style path (e.g. `/a/b/c/d`) which initially dereferences through a _merkle-link_ and allows access of elements of the referenced node and other nodes transitively. - -General purpose filesystems are encouraged to design an object model on top of IPLD that would be specialized for file manipulation and have specific path algorithms to query this model. - -### How do _merkle-paths_ work? - -A _merkle-path_ is a unix-style path which initially dereferences through a _merkle-link_ and then follows _named merkle-links_ in the intermediate objects. Following a name means looking into the object, finding the _name_ and resolving the associated _merkle-link_. - -For example, suppose we have this _merkle-path_: - -``` -/ipfs/QmUmg7BZC1YP1ca66rRtWKxpXp77WgVHrnv263JtDuvs2k/a/b/c/d -``` - -Where: -- `ipfs` is a protocol namespace (to allow the computer to discern what to do) -- `QmUmg7BZC1YP1ca66rRtWKxpXp77WgVHrnv263JtDuvs2k` is a cryptographic hash. -- `a/b/c/d` is a path _traversal_, as in unix. - -Path traversals, denoted with `/`, happen over two kinds of links: - -- **in-object traversals** traverse data within the same object. -- **cross-object traversals** traverse from one object to another, resolving through a merkle-link. - -#### Examples - -Using the following dataset: - - > ipfs object cat --fmt=yaml QmUmg7BZC1YP1ca66rRtWKxpXp77WgVHrnv263JtDuvs2k - --- - a: - b: - link: - /: QmV76pUdAAukxEHt9Wp2xwyTpiCmzJCvjnMxyQBreaUeKT - c: "d" - foo: - /: QmQmkZPNPoRkPd7wj2xUJe5v5DsY6MX33MFaGhZKB2pRSE - - > ipfs object cat --fmt=yaml QmV76pUdAAukxEHt9Wp2xwyTpiCmzJCvjnMxyQBreaUeKT - --- - c: "e" - d: - e: "f" - foo: - name: "second foo" - - > ipfs object cat --fmt=yaml QmQmkZPNPoRkPd7wj2xUJe5v5DsY6MX33MFaGhZKB2pRSE - --- - name: "third foo" - -An example of the paths: - -- `/ipfs/QmUmg7BZC1YP1ca66rRtWKxpXp77WgVHrnv263JtDuvs2k/a/b/c` will only traverse the first object and lead to string `d`. -- `/ipfs/QmUmg7BZC1YP1ca66rRtWKxpXp77WgVHrnv263JtDuvs2k/a/b/link/c` will traverse two objects and lead to the string `e` -- `/ipfs/QmUmg7BZC1YP1ca66rRtWKxpXp77WgVHrnv263JtDuvs2k/a/b/link/d/e` traverse two objects and leads to the string `f` -- `/ipfs/QmUmg7BZC1YP1ca66rRtWKxpXp77WgVHrnv263JtDuvs2k/a/b/link/foo/name` traverse the first and second object and lead to string `second foo` -- `/ipfs/QmUmg7BZC1YP1ca66rRtWKxpXp77WgVHrnv263JtDuvs2k/a/b/foo/name` traverse the first and last object and lead to string `third foo` - - -## What is the IPLD Data Model? - -The IPLD Data Model defines a simple JSON-based _structure_ for all merkle-dags, and identifies a set of formats to encode the structure into. - -### Constraints and Desires - -Some Constraints: -- IPLD paths MUST be unambiguous. A given path string MUST always deterministically traverse to the same object. (e.g. avoid duplicating link names) -- IPLD paths MUST be universal and avoid oppressing non-english societies (e.g. use UTF-8, not ASCII). -- IPLD paths MUST layer cleanly over UNIX and The Web (use `/`, have deterministic transforms for ASCII systems). -- Given the wide success of JSON, a huge number of systems present JSON interfaces. IPLD MUST be able to import and export to JSON trivially. -- The JSON data model is also very simple and easy to use. IPLD MUST be just as easy to use. -- Definining new datastructures MUST be trivially easy. It should not be cumbersome -- or require much knowledge -- to experiment with new definitions on top of IPLD. -- Since IPLD is based on the JSON data model, it is fully compatible with RDF and Linked Data standards through JSON-LD. -- IPLD Serialized Formats (on disk and on the wire) MUST be fast and space efficient. (should not use JSON as the storage format, and instead use CBOR or similar formats) -- IPLD cryptographic hashes MUST be upgradeable (use [multihash](https://github.com/multiformats/multihash)) - -Some nice-to-haves: -- IPLD SHOULD NOT carry over mistakes, e.g. the lack of integers in JSON. -- IPLD SHOULD be upgradable, e.g. if a better on-disk format emerges, systems should be able to migrate to it and minimize costs of doing so. -- IPLD objects SHOULD be able to resolve properties too as paths, not just merkle links. -- IPLD Canonical Format SHOULD be easy to write a parser for. -- IPLD Canonical Format SHOULD enable seeking without parsing full objects. (CBOR and Protobuf allow this). - - -### Format Definition - -(**NOTE:** Here we will use both JSON and YML to show what formats look like. We explicitly use both to show equivalence of the object across two different formats.) - -At its core, IPLD Data Model "is just JSON" in that it (a) is also tree based documents with a few primitive types, (b) maps 1:1 to json, (c) users can use it through JSON itself. It "is not JSON" in that (a) it improves on some mistakes, (b) has an efficient serialized representation, and (c) does not actually specify a single on-wire format, as the world is known to improve. - -#### Basic Node - -Here is an example IPLD object in JSON: - -```json -{ - "name": "Vannevar Bush" -} -``` - -Suppose it hashes to the multihash value `QmAAA...AAA`. Note that it has no links at all, just a string name value. But we are still be able to "resolve" the key `name` under it: - -```sh -> ipld cat --json QmAAA...AAA -{ - "name": "Vannevar Bush" -} - -> ipld cat --json QmAAA...AAA/name -"Vannevar Bush" -``` - -And -- of course -- we are able to view it in other formats - -```sh -> ipld cat --yml QmAAA...AAA ---- -name: Vannevar Bush - -> ipld cat --xml QmAAA...AAA - - - Vannevar Bush - -``` - -#### Linking Between Nodes - -Merkle-Linking between nodes is the reason for IPLD to exist. A Link in IPLD is just an embedded node with a special format: - -```js -{ - "title": "As We May Think", - "author": { - "/": "QmAAA...AAA" // links to the node above. - } -} -``` - -Suppose this hashes to the multihash value `QmBBB...BBB`. This node links the _subpath `author` to `QmAAA...AAA`, the node in the section above. So we can now do: - -```sh -> ipld cat --json QmBBB...BBB -{ - "title": "As We May Think", - "author": { - "/": "QmAAA...AAA" // links to the node above. - } -} - -> ipld cat --json QmBBB...BBB/author -{ - "name": "Vannevar Bush" -} - -> ipld cat --yml QmBBB...BBB/author ---- -name: "Vannevar Bush" - -> ipld cat --json QmBBB...BBB/author/name -"Vannevar Bush" -``` - -#### Link Properties Convention - -IPLD allows users to construct complex datastructures, with other properties associated with links. This is useful to encode other information along with a link, such as the kind of relationship, or ancilliary data required in the link. This is _different from_ the "Link Objects Convention", discussed below, which are very useful in their own right. But sometimes, you just want to add a bit of data on the link and not have to make another object. IPLD doesn't get in your way. You can simply do it by nesting the actual IPLD link within another object, with the additional properties. - -> IMPORTANT NOTE: the link properties are not allowed directly in the link object because of travesal ambiguities. Read the spec history for a discussion on the difficulties. - -For example, supposed you have a file system, and want to assign metadata like permissions, or owners in the link between objects. Suppose you have a `directory` object with hash `QmCCC...CCC` like this: - -```js -{ - "foo": { // link wrapper with more properties - "link": {"/": "QmCCC...111"} // the link - "mode": "0755", - "owner": "jbenet" - }, - "cat.jpg": { - "link": {"/": "QmCCC...222"}, - "mode": "0644", - "owner": "jbenet" - }, - "doge.jpg": { - "link": {"/": "QmCCC...333"}, - "mode": "0644", - "owner": "jbenet" - } -} -``` - -or in YML - -```yml ---- -foo: - link: - /: QmCCC...111 - mode: 0755 - owner: jbenet -cat.jpg: - link: - /: QmCCC...222 - mode: 0644 - owner: jbenet -doge.jpg: - link: - /: QmCCC...333 - mode: 0644 - owner: jbenet -``` - -Though we have new properties in the links that are _specific to this datastructure_, we can still resolve links just fine: - -```js -> ipld cat --json QmCCC...CCC/cat.jpg -{ - "data": "\u0008\u0002\u0012��\u0008����\u0000\u0010JFIF\u0000\u0001\u0001\u0001\u0000H\u0000H..." -} - -> ipld cat --json QmCCC...CCC/doge.jpg -{ - "subfiles": [ - { - "/": "QmPHPs1P3JaWi53q5qqiNauPhiTqa3S1mbszcVPHKGNWRh" - }, - { - "/": "QmPCuqUTNb21VDqtp5b8VsNzKEMtUsZCCVsEUBrjhERRSR" - }, - { - "/": "QmS7zrNSHEt5GpcaKrwdbnv1nckBreUxWnLaV4qivjaNr3" - } - ] -} - -> ipld cat --yml QmCCC...CCC/doge.jpg ---- -subfiles: - - /: QmPHPs1P3JaWi53q5qqiNauPhiTqa3S1mbszcVPHKGNWRh - - /: QmPCuqUTNb21VDqtp5b8VsNzKEMtUsZCCVsEUBrjhERRSR - - /: QmS7zrNSHEt5GpcaKrwdbnv1nckBreUxWnLaV4qivjaNr3 - -> ipld cat --json QmCCC...CCC/doge.jpg/subfiles/1/ -{ - "data": "\u0008\u0002\u0012��\u0008����\u0000\u0010JFIF\u0000\u0001\u0001\u0001\u0000H\u0000H..." -} -``` - -But we can't extract the link as nicely as other properties, as links are meant to _resolve through_. - -#### Duplicate property keys - -Note that having two properties with _the same_ name IS NOT ALLOWED, but actually impossible to prevent (someone will do it and feed it to parsers), so to be safe, we define the value of the path traversal to be _the first_ entry in the serialized representation. For example, suppose we have the object: - -```json -{ - "name": "J.C.R. Licklider", - "name": "Hans Moravec" -} -``` - -Suppose _this_ was the _exact order_ in the _Canonical Format_ (not json, but cbor), and it hashes to `QmDDD...DDD`. We would _ALWAYS_ get: - -```sh -> ipld cat --json QmDDD...DDD -{ - "name": "J.C.R. Licklider", - "name": "Hans Moravec" -} -> ipld cat --json QmDDD...DDD/name -"J.C.R. Licklider" -``` - - -#### Path Restrictions - -There are some important problems that come about with path descriptions in Unix and the web. For a discussion see [this discussion](https://github.com/ipfs/go-ipfs/issues/1710). In order to be compatible with the models and expectations of unix and the web, IPLD explicitly disallows paths with certain path components. **Note that the data itself _may_ still contain these properties (someone will do it, and there are legitimate uses for it). So it is only _Path Resolvers_ that MUST NOT resolve through those paths.** The restrictions are the same as typical unix and UTF-8 path systems: - - -TODO: -- [ ] list path resolving restrictions -- [ ] show examples - -#### Integers in JSON - -IPLD is _directly compatible_ with JSON, to take advantage of JSON's successes, but it need not be _held back_ by JSON's mistakes. This is where we can afford to follow format idiomatic choices, though care MUST be given to ensure there is always a well-defined 1:1 mapping. - -On the subject of integers, there exist a variety of formats which represent integers as strings in JSON, for example, [EJSON](https://docs.meteor.com/api/ejson.html). These can be used and conversion to and from other formats should happen naturally-- that is, when converting JSON to CBOR, an EJSON integer should be transformed naturally to a proper CBOR integer, instead of representing it as a map with string values. - - -## Serialized Data Formats - -IPLD supports a variety of serialized data formats through [multicodec](https://github.com/multiformats/multicodec). These can be used however is idiomatic to the format, for example in `CBOR`, we can use `CBOR` type tags to represent the merkle-link, and avoid writing out the full string key `@link`. Users are encouraged to use the formats to their fullest, and to store and transmit IPLD data in whatever format makes the most sense. The only requirement **is that there MUST be a well-defined one-to-one mapping with the IPLD Canonical format.** This is so that data can be transformed from one format to another, and back, without changing its meaning nor its cryptographic hashes. - -### Serialized CBOR with tags - -IPLD links can be represented in CBOR using tags which are defined in [RFC 7049 section 2.4](http://tools.ietf.org/html/rfc7049#section-2.4). - -A tag `` is defined. This tag can be followed by a text string (major type 3) or byte string (major type 2) corresponding to the link target. - -When encoding an IPLD "link object" to CBOR, use this algorithm: - -- The *link value* is extracted. -- If the *link value* is a valid [multiaddress](https://github.com/multiformats/multiaddr) and converting that link text to the multiaddress binary string and back to text is guaranteed to result to the exact same text, the link is converted to a binary multiaddress stored in CBOR as a byte string (major type 2). -- Else, the *link value* is stored as text (major type 3) -- The resulting encoding is the `` followed by the CBOR representation of the *link value* - -When decoding CBOR and converting it to IPLD, each occurences of `` is transformed by the following algorithm: - -- The following value must be the *link value*, which is extracted. -- If the link is a binary string, it is interpreted as a multiaddress and converted to a textual format. Else, the text string is used directly. -- A map is created with a single key value pair. The key is the standard IPLD *link key* `/`, the value is the textual string containing the *link value*. - -When an IPLD object contains these tags in the way explained here, the multicodec header used to represent the object codec must be `/cbor/ipld-tagsv1` instead of just `/cbor`. Readers should be able to use an optimized reading process to detect links using these tags. - -### Canonical Format - -In order to preserve merkle-linking's power, we must ensure that there is a single **_canonical_** serialized representation of an IPLD document. This ensures that applications arrive at the same cryptographic hashes. It should be noted --though-- that this is a system-wide parameter. Future systems might change it to evolve representations. However we estimate this would need to be done no more than once per decade. - -**The IPLD Canonical format is _canonicalized CBOR with tags_.** - -The canonical CBOR format must follow rules defines in [RFC 7049 section 3.9](http://tools.ietf.org/html/rfc7049#section-3.9) in addition to the rules defined here. - -Users of this format should not expect any specific ordering of the keys, as the keys might be ordered differently in non canonical formats. - -The legacy canonical format is protocol buffers. - -This canonical format is used to decide which format to use when creating the object for the first time and computing its hash. Once the format is decided for an IPLD object, it must be used in all communications so senders and receivers can check the data against the hash. - -For example, when sending a legacy object encoded in protocol buffers over the wire, the sender must not send the CBOR version as the receiver will not be able to check the file validity. - -In the same way, when the receiver is storing the object, it must make sure that the canonical format for this object is store along with the object so it will be able to share the object with other peers. - -A simple way to store such objects with their format is to store them with their multicodec header. - - -## Datastructure Examples - -It is important that IPLD be a simple, nimble, and flexible format that does not get in the way of users defining new or importing old datastractures. For this purpose, below I will show a few example data structures. - - -### Unix Filesystem - - -#### A small File - -```js -{ - "data": "hello world", - "size": "11" -} -``` - -#### A Chunked File - -Split into multiple independent sub-Files. - -```js -{ - "size": "1424119", - "subfiles": [ - { - "link": {"/": "QmAAA..."}, - "size": "100324" - }, - { - "link": {"/": "QmAA1..."}, - "size": "120345", - "repeat": "10" - }, - { - "link": {"/": "QmAA1..."}, - "size": "120345" - }, - ] -} -``` - -#### A Directory - -```js -{ - "foo": { - "link": {"/": "QmCCC...111"}, - "mode": "0755", - "owner": "jbenet" - }, - "cat.jpg": { - "link": {"/": "QmCCC...222"}, - "mode": "0644", - "owner": "jbenet" - }, - "doge.jpg": { - "link": {"/": "QmCCC...333"}, - "mode": "0644", - "owner": "jbenet" - } -} -``` - -### git - -#### git blob - -```js -{ - "data": "hello world" -} -``` - -#### git tree - -```js -{ - "foo": { - "link": {"/": "QmCCC...111"}, - "mode": "0755" - }, - "cat.jpg": { - "link": {"/": "QmCCC...222"}, - "mode": "0644" - }, - "doge.jpg": { - "link": {"/": "QmCCC...333"}, - "mode": "0644" - } -} -``` - -#### git commit - -```js -{ - "tree": {"/": "e4647147e940e2fab134e7f3d8a40c2022cb36f3"}, - "parents": [ - {"/": "b7d3ead1d80086940409206f5bd1a7a858ab6c95"}, - {"/": "ba8fbf7bc07818fa2892bd1a302081214b452afb"} - ], - "author": { - "name": "Juan Batiz-Benet", - "email": "juan@benet.ai", - "time": "1435398707 -0700" - }, - "committer": { - "name": "Juan Batiz-Benet", - "email": "juan@benet.ai", - "time": "1435398707 -0700" - }, - "message": "Merge pull request #7 from ipfs/iprs\n\n(WIP) records + merkledag specs" -} -``` - -### Bitcoin - -#### Bitcoin Block - -```js -{ - "parent": {"/": "Qm000000002CPGAzmfdYPghgrFtYFB6pf1BqMvqfiPDam8"}, - "transactions": {"/": "QmTgzctfxxE8ZwBNGn744rL5R826EtZWzKvv2TF2dAcd9n"}, - "nonce": "UJPTFZnR2CPGAzmfdYPghgrFtYFB6pf1BqMvqfiPDam8" -} -``` - -#### Bitcoin Transaction - -This time, in YML. TODO: make this a real txn - -```yml ---- -inputs: - - input: {/: Qmes5e1x9YEku2Y4kDgT6pjf91TPGsE2nJAaAKgwnUqR82} - amount: 100 -outputs: - - output: {/: Qmes5e1x9YEku2Y4kDgT6pjf91TPGsE2nJAaAKgwnUqR82} - amount: 50 - - output: {/: QmbcfRVZqMNVRcarRN3JjEJCHhQBcUeqzZfa3zoWMaSrTW} - amount: 30 - - output: {/: QmV9PkR2gXcmUgNH7s7zMg9dsk7Hy7bLS18S9SHK96m7zV} - amount: 15 - - output: {/: QmP8r8fLUnEywGnRRUrHB28nnBKwmshMLiYeg8udzYg7TK} - amount: 5 -script: OP_VERIFY -``` diff --git a/IPLD_FUTURE_FROM_PAST.md b/IPLD_FUTURE_FROM_PAST.md deleted file mode 100644 index 37af6284..00000000 --- a/IPLD_FUTURE_FROM_PAST.md +++ /dev/null @@ -1,181 +0,0 @@ -**This document contains a draft from a proposed spec for an older version of IPLD. It remains here until we have published the new spec** - -# IPLD Spec v1 - -Editor: Nicola Greco, MIT - -> This specification defines a data model and a naming scheme for linking data with cryptographic hashes. -> -> InterPlanetary Linked Data (IPLD) is an information space of inter-linked data, where content addresses and links are expressed using the content's cryptographic hash. IPLD is designed to universally address and link to any data so that the data does not require a central naming authority, the integrity of the data can be verified, and untrusted parties can help distribute the data. This specification describes a data model for structured data and a naming scheme to point to data and subsets of the data. These design goals make it different from earlier data models such as JSON or RDF, and naming schemes such as NI [[RFC6920]](https://tools.ietf.org/html/rfc6920), or Magnet URI. - - -## Table of content - -- [Introduction](#introduction) -- [Basic Concepts](#basic-concepts) -- [IPLD](#ipld) - - [Data Model](#data-model) - - [Naming Scheme](#naming-scheme) -- [Serialization](#serialization) -- [Security Considerations](#security-considerations) -- [Examples](#examples) -- [Acknowledgements](#acknowledgements) -- [References](#references) - ---- - -## Introduction -Naming things with hashes solves three fundamental problems for the decentralized web: - -1. **Data integrity**: URLs give no guarantees that the data we receive hasn't been compromised. The IPLD naming system ensures that no one can lie about the data they are serving. -2. **Distributed naming**: Only the owner of a domain name can serve you the data behind a URL; in IPLD, any computer - trusted and untrusted - that has the data can participate in distributing it. -3. **Immutable Content**: The content behind URLs can change or disappear, breaking links or pointing to unexpected content. IPLD links cannot mutate. - -Using cryptographic hashes as pointers for data objects is not a new concept. Successful applications (e.g. Bitcoin, Git, Certificate Transparency) and existing specs ([[RFC6920]](https://tools.ietf.org/html/rfc6920)) used this strategy to authenticate their datasets, generate global identifiers and to provide end-to-end integrity to their systems. However existing applications have implemented a different data model and pointer format which does not interoperate, making it difficult to re-use the same data across applications. Furthermore, vertical implementations are application specific (e.g. forcing a particular data model) and can hardly be used elsewhere. - -IPLD is a forest of hash-linked directed acyclic graphs, also referred to as Merkle DAGs (or generically, tree-based authenticated data structures). -IPLD aims at being the way to address any authenticated data structure under the IPLD namespace `/ipld/`, followed by the hash of the data. Conceptually, any Bittorrent, Git, or Blockchain data also resides in this namespace, thus solving the described interoperability problem. - -This specification defines: -- **IPLD Data Model**: a data model to describe unstructured and structured data and to represent Merkle DAGs. -- **IPLD Naming Scheme**: a UNIX-like naming scheme that is self-authenticating. It can be used to point to data or subsets of it. - -The IPLD Data Model and Naming Scheme defined bellow follow specific design goals that are not currently met by other existing standards. The underlining data model is an extension of the JSON [[RFC4627]](https://www.ietf.org/rfc/rfc4627.txt) and the CBOR data model [[RFC7049]](https://tools.ietf.org/html/rfc7049). The Naming Scheme is built upon JSON Pointer [[RFC6901]](https://tools.ietf.org/html/rfc6901). It is important to note that this is not a proposal of a data format, but an abstract data model that can be serialized in multiple formats. - -Related specs: CID - - - -## Basic Concepts - -In this section we cover some basic concepts on which IPLD builds upon. - -**Cryptographic integrity**. Cryptographic hash functions are one way functions that can be used to map any binary data to a specific string, called a digest or a hash. A cryptographic hash function gives strong probabilistic guarantees that different content don't *collide* on the same hash, meaning that no two different content can have the same hash. By naming content with hashes, we can guarantee that the data has not been altered during storage or transmission, since when obtaining a file, the receiver can themselves regenerate the hash of the content received. - -**Range verifiability**. A cryptographic hash provides integrity guarantees not only to the content it directly dereferences to, as well as the entire graph of the content that it is linked from it. - -**Merkle DAGs**. We refer to directed acyclic graphs linked via cryptographic hashes as Merkle DAGs. Systems such as Git, IPFS, Bittorrent, Bitcoin use different type of hash-based direct acyclic graphs. - -## Objectives - -Objectives of the IPLD Data Model: - -1. Data must be able to be decoded without a schema description. -2. The Data model must support all the JSON data types for conversion from and to JSON. -3. The representation must be able to unambiguously encode most common data formats, as well as existing data structures used in Internet and Web standards. - -Objectives of the IPLD Naming Scheme: - -1. Names must be self-descriptive on how they are encoded, what type of content they contain and the hash functions used -2. The Naming Scheme must be extensible, new hash functions and new encoding must be able to be introduced without loosing backward compatibility. -3. The Naming Scheme must be respect conventions used in the Unix file system and on the World Wide Web. - -## Terminology - -| Name | Description | -| :---- | :---- | -| Resource | Any piece of data, structured or unstructured that can be addressed via cryptographic hash. | -| IPLD Objects | semi-structured data (similar to JSON) that consists of attribute-value pairs objects that conform to the IPLD Objects Data Model. | -| IPLD Link Object | The value of an attribute in an IPLD Object can be a Link Object, a special object that describes a link to another resource. | -| CID | The cryptographic hash of a resource prefixed by bits that describe the type of data, the cryptographic hash function used and the encoding of the hash. | -| IPLD Address | A name combined of the CID and an optional path scheme that points to a resource or an attribute in an IPLD Object. | -| IPLD Formats | The process of serialization/deserialization of an IPLD Object into/from a data format (e.g. CBOR, JSON) | -| IPLD Types | The process of serialization/deserialization of an IPLD object into/from a special data structure (e.g. Ethereum block) | - -## IPLD Data Model - -### IPLD Objects - -IPLD objects consists of attribute–value pairs (similar to JSON). - -An object has a set of attribute each of which has a corresponding value. -A value can be of four types: -- a terminal: which can be a string, an integer, a real number, a boolean -- an IPLD Object (recursive definition) -- an IPLD Link Object -- an ordered array of the previous - -### Link Object -``` -TODO: describe the link object -- the `/` keyword and accepted values -- pointers can be of these forms: - - relative (?) - - pointers: (for further understanding of pointers, see below) - - only hash - - hash + path -``` - - -## IPLD Naming Scheme - -``` -TODO: define the different components of an IRI -- A Pointer is "Protocol(optional?) + CID + Path" -- CID (multicodec, multihash, versioning, etc) -- Path (optional) - - must respect the shape of the object or will result in a error -``` - -``` -TODO: format -- restricted char -``` - -## Representations -``` -TODO: specifying the canonical format in the CID -``` - -``` -TODO: serializing and de-serializing -``` - -``` -TODO: different formats -- json -- yaml -- cbor - -TODO: define the possibility of converting -``` - -## Error Handling -``` -TODO: describe possible errors: -- CID has bad syntax -- hash function not known -- pointer referencing to non existent value -``` - -## Security considerations - -``` -TODO: -- no secret information required to generate or verify a name, names are secure and self-evident - - corollary: causal links -- disclosure of names does not disclose the referenced object but may enable the attacker to determine the contents referenced -- note about hash collision and probabilistic guarantees -- hash functions can break -``` - -## Examples - -### Hello World -### File system example -### Social network example - -## Acknowledgements - -``` -TODO: list all contributors -``` - -## References diff --git a/README.md b/README.md index 307171ba..e63025e1 100644 --- a/README.md +++ b/README.md @@ -1,45 +1,98 @@ IPLD Specifications =================== -[![](https://img.shields.io/badge/made%20by-Protocol%20Labs-blue.svg?style=flat-square)](http://ipn.io) -[![](https://img.shields.io/badge/project-IPLD-blue.svg?style=flat-square)](http://github.com/ipld/ipld) -[![](https://img.shields.io/badge/freenode-%23ipfs-blue.svg?style=flat-square)](http://webchat.freenode.net/?channels=%23ipfs) +IPLD is not a single specification, it is a set of specifications. -> This repository contains the specs for InterPlanetary Linked Data (IPLD). +``` + The IPLD Stack -**Specs are not finished yet. We use the following tag system to identify their state:** + +-----------------------------+ + +-------------+ | | + | | | End-User Application Stacks | + | MFS in IPFS | | | + | | +-----------------------------+ + +-------------+ | | + | | | Structured Data w/ indexes | + | unixfs v2 | | VR, Geo, SQL, etc. | +----------+ + | | | | | | + +-------------+ +-----------------------------+ | MFS in | + | | | | | IPFS | + | HAMT | | Sorted Index (sharded) | | | + | | | | +----------+ + +-------------+-+-----------------------------+ | | + | | | unixfs | + | Complex Data Structures | | v1 | + | | | | ++-----------------------------------------------------------------------------------------+ +| | | | | +| | dag-json dag-cbor | ipld-git | | +| | | | | +| Codecs +---------------------------------------------+ ipld-btc | dag-pb | +| | | | | +| | IPLD Data Model | ipld-zcash | | +| | | | | ++-------------------------------------------------------------+----------------+----------+ + | | + | CID Path | + | | + +-------------------------------------------------------------------------+ +``` -- ![](https://img.shields.io/badge/status-wip-orange.svg?style=flat-square) - this spec is a work-in-progress, it is likely not even complete. -- ![](https://img.shields.io/badge/status-draft-yellow.svg?style=flat-square) - this spec is a rough draft and will likely change substantially. -- ![](https://img.shields.io/badge/status-reliable-green.svg?style=flat-square) - this spec is believed to be close to final. Minor changes only. -- ![](https://img.shields.io/badge/status-stable-brightgreen.svg?style=flat-square) - this spec is likely to improve, but not change fundamentally. -- ![](https://img.shields.io/badge/status-permanent-blue.svg?style=flat-square) - this spec will not change. +The goal of this stack is to enable decentralized data-structures +which in turn will enable more decentralized applications. -Nothing in this spec repository is `permanent` yet. As in many IPLD repositories, most of the work is happening in [the issues](https://github.com/ipld/specs/issues/) or in [active pull requests](https://github.com/ipld/specs/pulls/). Go take a look! +Many of the specifications in this stack are inter-dependent. -## Documents +``` + IPLD Dependency Graph -- [**Roadmap**](/ROADMAP.md) -- **Specifications:** - - ![](https://img.shields.io/badge/status-draft-yellow.svg?style=flat-square) [`IPLD`](/IPLD.md) - spec about the data model, pointers and link formats - - ![](https://img.shields.io/badge/status-wip-orange.svg?style=flat-square) `IPLD Selectors` - spec about simple language to select multiple unknown nodes in a graph - - ![](https://img.shields.io/badge/status-wip-orange.svg?style=flat-square) `IPLD Transformations` - spec about the language to trasform an IPLD graph into another - - ![](https://img.shields.io/badge/status-reliable-green.svg?style=flat-square) [`CID (Content IDentifier)`](https://github.com/ipld/cid) - - ![](https://img.shields.io/badge/status-wip-orange.svg?style=flat-square) [`IPLD Formats`](https://github.com/ipld/interface-ipld-format) - interface definition for adding support to different formats - - ![](https://img.shields.io/badge/status-draft-yellow.svg?style=flat-square) [`CAR`](/CAR.md) - Content Addressable Archives ++---+ +----------+ +|CID+-----------+-------------->Raw Blocks| ++---+ | +--+-------+ + +------v-------------+ | ++----+ |Links (Conceptually)| | +|Path| +------+-------------+ | +-----------+ ++-+--+ | +------------->Replication| + | Codecs | | +-----------+ ++-v-------------v-----------------+---+ +| | +| +---+ +-----------------------+ | Complex Data-Structures +| |Git| | Data Model v1 | | +--------------v-------+ +| +---+ | | | | | +| | +--------+ +--------+ +----> +----+ +-----------+ | +| +------+ | |dag|json< |dag|cbor< | | | |HAMT| |Sorted Tree| | +| |dag|pb| | +--------+ +--------+ | | | +--+-+ +----+------+ | +| +------+ | | | | | | | +| +-----------------------+ | +----------------------+ +| | | | ++-------------------------------------+ | | + | | + +----------------------+ | | + | File System (unixfs) <-----+ | + +----------------------+ | + +--------------------+ | + | | | +Structured Data | VR, Geo, SQL, etc. <----------------+ + w/ indexes | | + +--------------------+ +``` -## Discussion +## Specification Repo Layout -Join the discussion for: +* [/IPLD-Data-Model-v1.md](/IPLD-Data-Model-v1.md) +* [/IPLD-Path.md](/IPLD-Path.md) +* [/CID.md](/CID.md) +* [/Codecs](/Codecs) + * [/Codecs/DAG-JSON.md](/Codecs/DAG-JSON.md) + * [/Codecs/DAG-CBOR.md](/Codecs/DAG-CBOR.md) +* [/Data-Structures](/Data-Structures) + * [/Data-Structures/HAMT.md](/Data-Structures/HAMT.md) -- Specs - https://github.com/ipld/specs/issues -- General IPLD - https://github.com/ipld/ipld/issues -- JavaScript Implementation - https://github.com/ipld/js-ipld/issues -- Golang Implementation - https://github.com/ipfs/go-ipld-format +## Discussion -## Weekly Hangout +Discussion of specific specifications happens in [this repository's issues](https://github.com/ipld/specs/issues). -TBA soon™ +Discussion of IPLD more generally happens in the [IPLD repository](https://github.com/ipld/ipld/issues). ## Contribute @@ -54,3 +107,155 @@ Small note: If editing the README, please conform to the [standard-readme](https ## License This repository is only for documents. All of these are licensed under the [CC-BY-SA 3.0](https://ipfs.io/ipfs/QmVreNvKsQmQZ83T86cWSjPu2vR3yZHGPm5jnxFuunEB9u) license, © 2016 Protocol Labs Inc. + +# Terminology + +## Description of IPLD + +IPLD is a set of standards and implementations for creating decentralized data-structures that are +universally addressable and linkable. These structures allow us to do for data what URLs and +links did for HTML web pages. + +## Generic Terms + +### Content Addressability + +"Content addressability" refers to the ability to refer to content by a trustless identifier. + +Rather than referring to content by a string identifier or URL, content addressable systems refer to content +by a cryptographic hash. This allows complete decentralization of the content as the identifier +does not specify the retrieval method and provides a secure way to verify the content. + +## IPLD Terms + +### Multihash + +Multihash is hash format that is not specific to a single hashing algorithm. + +A multihash describes the algorithm used for the hash as well as the hash value. + +``` ++-----------+----------------------------+ +| Hash Type | Hash Value | ++-----------+----------------------------+ +``` + +SHA-256 example. + +``` ++---------+------------------------------------------------------------------+ +| SHA-256 | 2413fb3709b05939f04cf2e92f7d0897fc2596f9ad0b8a9ea855c7bfebaae892 | ++---------+------------------------------------------------------------------+ +``` + +Note: these examples are simplifications of the concepts. For a complete description visit the [project and its specs](https://github.com/multiformats/multihash). + +### CID (Content Identifier) + +Hash based content identifier. Includes the `codec` and `multihash`. + +CID's + +``` ++-------+------------------------------+ +| Codec | Multihash | ++-------+------------------------------+ +``` + +The long version +``` ++------------------------------+ +|Codec | ++------------------------------+ +|Multihash | +| +----------+---------------+ | +| |Hash Type | Hash Value | | +| +----------+---------------+ | +| | ++------------------------------+ +``` + +Note: these examples are simplifications of the concepts. For a complete description visit the [spec](./CID.md). + +### Block + +A CID and the binary data value for that CID. + +The short version. +``` ++-----+--------------------------------+ +| CID | Data | ++-----+--------------------------------+ +``` + +The long version. +``` ++-----------------------------------+------------------+ +| CID | Binary Data | +| +------------------------------+ | | +| |Codec | | | +| +------------------------------+ | | +| |Multihash | | | +| | +----------+---------------+ | | | +| | |Hash Type | Hash Value | | | | +| | +----------+---------------+ | | | +| | | | | +| +------------------------------+ | | +| | | ++-----------------------------------+------------------+ +``` + +### IPLD Path + +A string identifier used for deep references into IPLD +graphs. Follows similar escape and segmentation rules as URI Paths. + +[Read the full specification for more details.](./IPLD-Path.md) + +### IPLD Data Model + +The IPLD Data Model describes a set of base types. Codecs that support these base types +can be used by any of the data-structures built on top of the IPLD Data Model. + +Codecs that support the IPLD Data Model: + +* [DAG-CBOR](/Codecs/DAG-CBOR.md) +* WIP: [DAG-JSON](/Codecs/DAG-JSON.md) + +### Codec + +A codec exposes serialization and deserialization for IPLD blocks. If it also supports +content addressable links then the codec exposes those links as `CID`'s. A codec +also supports atomic IPLD Path lookups on the block. + +#### Serializer, Deserializer and Format + +A logical separation exists in any given IPLD codec between the **format** and the **serializer/deserializer**. + +``` ++--------------------+ +--------------------+ +| | | | +| Serializer | | Deserializer | +| | | | ++---------+----------+ +---------+----------+ + | ^ + | Sent to another peer | + v | ++---------+----------+ +----------+---------+ +| | | | +| Format +-------------> Format | +| | | | ++--------------------+ +--------------------+ +``` + +A **format** may represent object types and tree structures any +way it wishes. This includes existing representations (JSON, BSON, CBOR, +Protobuf, msgpack, etc) or even new custom serializations. + +Therefor, a **format** is the standardized representation of IPLD Links and Paths. It describes how to translate between structured data and binary. + +It is worth noting that **serializers** and **deserializers** differ by programming language while the **format** does not and MUST remain consistent across all codec implementations. + +#### Representation + +The in-memory representation of a de-serialized IPLD value. diff --git a/ROADMAP.md b/ROADMAP.md deleted file mode 100644 index 1807a759..00000000 --- a/ROADMAP.md +++ /dev/null @@ -1,3 +0,0 @@ -# IPLD ROADMAP - -[Soon™](https://github.com/ipld/specs/issues/41)