-
Notifications
You must be signed in to change notification settings - Fork 108
add basic spec for hamt #109
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,107 @@ | ||
# [WIP] Hash-Array Mapped Trie | ||
# IPLD HAMT Spec | ||
|
||
This specifies a standardized hash-array mapped trie on IPLD Data Model v1. | ||
Good reading: | ||
- https://blog.acolyer.org/2015/11/27/hamt/ | ||
- https://michael.steindorfer.name/publications/oopsla15.pdf | ||
|
||
TODO: write this spec. | ||
|
||
The HAMT is a Key Value store implemented with a `N`ary hash keyed trie. `N` is | ||
currently chosen to be 256, as it provides a reasonable max size for the nodes, | ||
and also makes getting the next index very simple. The basic structure in it | ||
is a `Node`. The node is as follows: | ||
|
||
```go | ||
type Node struct { | ||
Bitfield Bitfield | ||
Pointers []*Pointer | ||
} | ||
``` | ||
|
||
The `Node` is serialized as a cbor object (major type 5), with the bitfield | ||
serialized as a cbor major type 2 byte array. The Bitfield field uses `bf` as | ||
its object key, and the Pointers array uses `p` as its object key. | ||
|
||
```go | ||
type Pointer struct { | ||
KVs []*KV | ||
Link Cid | ||
} | ||
``` | ||
|
||
The `Pointer` is also serialized as a cbor object (major type 5), with the KVs | ||
field serialized as a major type 4 array of the `KV` objects, and the Link | ||
field serialized as an [ipld dag-cbor Cid](https://github.com/ipld/specs/blob/master/Codecs/DAG-CBOR.md#link-format). | ||
|
||
```go | ||
type KV struct { | ||
Key string | ||
Value Anything | ||
} | ||
``` | ||
|
||
The `KV` is serialized as a cbor array (major type 4) with the 'key' field | ||
serialized as a cbor byte array (major type 2) and placed in the | ||
zero'th position of the array, and the value serialized { in some way } and | ||
placed in array position 1. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. A cbor map would be even more compact (although, I guess, IPLD doesn't currently support binary keys...). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. hrm.. yeah. that might get complicated |
||
|
||
## Lookup | ||
|
||
To look up a value in the HAMT, first hash the key using a 128 bit murmur3 hash. | ||
Then, for each layer take the first W bits of the hash, and use that to compute | ||
the index for your key, as follows: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Isn't There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. if it's fixed to an arity of 256, but I'm wondering why 256 is chosen here, is it mainly for the ease of accessing the hash in 8-bit chunks? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we cap number of links in protobuf nodes at ~170 so I think the conservative approach was taken and the number of links was kept roughly similar. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, wanted to make it a number that would result in acceptable maximum node sizes, but 256 specifically was chosen simply because it makes reading the next index off the hash easy. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So, there is a pathological case here: large keys. Should we impose a size limit (256 bytes?) as most filesystems do? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think a keysize limit is worthwhile, though that should probably be left up to the application |
||
|
||
### Index Calculation | ||
|
||
To compute the index at a given layer, take the first N bits of the bitfield | ||
(where N is the number represented by the next W bits of the hashed key) and | ||
count the number of set bits. This count will give the correct index into the | ||
`Pointers` array to search for the given key. | ||
|
||
### Recursing | ||
If no Pointer exists at the specified index, the value does not exist. | ||
Otherwise, if the pointer contains a non-empty kvs array, then search for a KV | ||
pair matching the desired key in that array, returning the value if found, and | ||
'not found' otherwise. If the pointer instead has a 'Link' Cid set, load that | ||
object as a `Node`, and recurse. | ||
|
||
|
||
## Set Value | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Weren't we considering some kind of hashing with replacement system to completely fill up each layer? Or was that too expensive? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I didnt end up investigating it too much, it feels like it might get pretty expensive. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The buckets system is probably good enough for most cases. What test data did you use when testing depth? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. just random keys and values, did some number of inserts, measured average densities, depths, etc. I don't think i committed the code, but it was just the tests in go-hamt-ipld, with some stats collection. Could rig it up again pretty quickly |
||
|
||
To set a value, perform the same operations as the lookup to find where the | ||
value should go. | ||
|
||
If the lookup terminated on a node with an unset bit in the bitfield where our | ||
search path was supposed to go, create a new Pointer and put the key and value | ||
in its KVs array. | ||
|
||
If the lookup terminated on a non-nil Pointer with existing KVs: | ||
|
||
1.) If the KVs array has fewer than `M` items in it, insert the new key value | ||
pair into the KVs array in bytewise key order. | ||
2.) If the KVs array has `M` items in it (more than `M` would be breaking | ||
an invariant) take all `M+1` items, delete the KVs array, create a new Node, and | ||
insert those `M+1` items into that node starting from the current depth | ||
(meaning, if the current tree depth is 3, skip the first `3 * W` bits of the | ||
key hash before starting index calculation. Then, put the cid of the resulting | ||
node in the `Link` field of the current node where the removed KVs array was. | ||
|
||
> Note: We currently set `M` to be 3. | ||
|
||
|
||
## Delete Value | ||
|
||
To delete a value, perform the same operations as the lookup to find the value | ||
to be deleted. | ||
|
||
If the value does not exist, return not found. | ||
|
||
If the value is found (it will be in a KVs array) remove it from the array. | ||
|
||
Now, count the total number of KV pairs across all Pointers in the current | ||
Node. If that number is less than four, and we are not at the root node of the | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. less than M+1 |
||
tree, gather the remaining KV pairs, delete the node, and re-insert them. If | ||
the node they are re-inserted into also then has less than four elements in it | ||
(the newly reinserted elements are the only ones in the node) then recurse. | ||
|
||
This process ensures that the tree always has the same exact structure as | ||
another tree with the same items inserted. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we add an optional seed (defaults to 0)? That way we have room to fix the hashmap DoS attack if necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been messing with hash algorithm pluggability thinking that (a) different algorithms (and different key lengths) might be optimal in different scenarios, and (b) having the ability to switch it out provides some future-proofing in case of fundamental flaws being discovered in a chosen algorithm. Having space for a seed would also open up space for some keyed algorithms too.
What I can't see (yet) is what kind of use-cases of IPLD are there that attacks against the hash would matter? What's the threat model where this is a concern or is it just a matter of being safe for some as yet unforeseen scenario?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The hashmap DoS attack works by attacker inserting keys that hash to the same bucket in a hash map. Something very similar can be done with HAMT by selecting keys to lay in the same branch of the tire.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seed/Nonce works around it by making so the attacker can't simply predict in which branch of the tree will given key lay.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not actually sure how adding a seed helps here. It needs to be deterministic, and if its deterministic, then the attacker can know it too and it doesnt make their lives any harder.
I guess forcing a rehash at each layer makes the attack linearly more expensive, but doesnt necessarily prevent attacks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, we actually do need to support, e.g., sha256 if we want both security and determinism on systems like Filecoin. Currently, given an N byte insecure hash function, an attacker could create a tree N deep at the target hash, filling the last layer. This could be used to prevent anyone from using a specific key.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a hard time imagining wanting to use non-deterministic balancing in a distributed system. It seems that would produce either massive flapping if actually used in a frequently updated dataset, and/or introduces a need for coordination where they're previously wasn't one (which is pretty much universally a Bad Thing in a distributed system). Is there a concrete situation where we can imagine using such a thing, and using it well?
"Use a SHA (or other cryptographic function) when it matters" sounds like a much better approach. We're already tossing around enough cryptographic functions that it doesn't sound likely to be much of a cost center to introduce another.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, the simple use-case is a block-chain where the blockchain determines the seed. Once every N blocks, the hamt would be reseeded automatically.
In general, this will also work for single-writer, multi-reader setups. That's usually the most common case.
I agree although this still won't be optimal. That is, An attacker could pretty easily create a very deep hamt.
Basically, I'd prefer to leave room for future improvements now instead of having to introduce them later. However, custom hash functions is probably enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Stebalien what youre suggesting would mean rewriting the entire HAMT every epoch. I think thats far worse than a (worst case) 32 deep lookup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm saying it could be rebalanced as necessary. However, I agree the better solution is to just use SHA256.
Note: The worst case without using sha256 isn't a 32 deep lookup, it's a full kv list at the max depth preventing further modifications.