diff --git a/data-structures.md b/data-structures.md
index 9e7027db6..1a124aafd 100644
--- a/data-structures.md
+++ b/data-structures.md
@@ -24,7 +24,7 @@ To learn more, take a look at the [Address Spec](https://github.com/filecoin-pro
 
 ## CID
 
-For most objects referenced by Filecoin, a Content Identifier (CID for short) is used. This is effectively a hash value, prefixed with its hash function (multihash) prepended with a few extra labels to inform applications about how to deserialize the given data. To learn more, take a look at the [CID Spec](https://github.com/ipld/cid). 
+For most objects referenced by Filecoin, a Content Identifier (CID for short) is used. This is effectively a hash value, prefixed with its hash function (multihash) prepended with a few extra labels to inform applications about how to deserialize the given data. To learn more, take a look at the [CID Spec](https://github.com/ipld/cid).
 
 CIDs are serialized by applying binary multibase encoding, then encoding that as a CBOR byte array with a tag of 42.
 
@@ -68,7 +68,7 @@ type Block struct {
 	// MessageReceipts is a set of receipts matching to the sending of the `Messages`.
 	// TODO: should be the same type of merkletree-list thing that the messages are
 	MessageReceipts []MessageReceipt
-    
+
     // The block Timestamp is used to enforce a form of block delay by honest miners.
     // Unix time UTC timestamp stored as an unsigned integer
     Timestamp Timestamp
@@ -363,3 +363,59 @@ StateRoot:       Cid("zDPWYqFD5abn4FyknPm1PibXdJ2kwRNVPDabKyzfdXVJGjnDuq4B")
 Messages:        []SignedMessage{}
 MessageReceipts: []MessageReceipt{}
 ```
+
+## RLE+ Bitset Encoding
+
+RLE+ is a lossless compression format based on [RLE](https://en.wikipedia.org/wiki/Run-length_encoding).
+It's primary goal is to reduce the size in the case of many individual bits, where RLE breaks down quickly,
+while keeping the same level of compression for large sets of contiugous bits.
+
+In tests it has shown to be more compact than RLE iteself, as well as [Concise](https://arxiv.org/pdf/1004.0403.pdf) and [Roaring](https://roaringbitmap.org/).
+
+### Format
+
+The format consists of a header, followed by a series of blocks, of which there are three different types.
+
+The format can be expressed as the following [BNF](https://en.wikipedia.org/wiki/Backus%E2%80%93Naur_form) grammar.
+
+```
+    <encoding> ::= <header> <blocks>
+      <header> ::= <version> <bit>
+     <version> ::= "00"
+      <blocks> ::= <block_single> | <block_short> | <block_long>
+<block_single> ::= "1"
+ <block_short> ::= "01" <bit> <bit> <bit> <bit>
+  <block_long> ::= "00" <unsigned_varint>
+         <bit> ::= "0" | "1"
+```
+
+An `<unsigned_varint>` is defined as specified [here](https://github.com/multiformats/unsigned-varint).
+
+#### Header
+
+The header indiciates the very first bit of the bit vector to encode. This means the first bit is always
+the same for the encoded and non encoded form.
+
+#### Blocks
+
+The blocks represent how many bits, of the current bit type there are. As `0` and `1` alternate in a bit vector
+the inital bit, which is stored in the header, is enough to determine if a length is currently referencing
+a set of `0`s, or `1`s.
+
+##### Block Single
+
+If the running length of the current bit is only `1`, it is encoded as a single set bit.
+
+##### Block Short
+
+If the running length is less than `16`, it can be encoded into up to four bits, which a short block
+represents. The length is encoded into a 4 bits, and prefixed with `01`, to indicate a short block.
+
+##### Block Long
+
+If the running length is `16` or larger, it is encoded into a varint, and then prefixed with `00` to indicate
+a long block.
+
+
+> **Note:** The encoding is unique, so no matter which algorithm for encoding is used, it should produce
+> the same encoding, given the same input.