Skip to content

Latest commit

 

History

History

rfcBBL203A

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 

RFC|BB|L2-03A: Use of compression and adjustable block size

Abstract

This RFC proposes the exploration of using compression in block transmission. These techniques go from: Block by block standard compression (e.g. gzip) Whole transfer compression (e.g. when responding to a graphsync query, send all the blocks compressed) Custom coding tables for sequences of bytes that appeared often (e.g. generate an Huffman table for all the protobuf headings so that these are compressed by default, like hpack does for http)

Additionally, to optimize the use of these schemes, a system of adjustable block sizes and coding strategies in transmission could be devised (e.g. dynamic Huffman tables).

Shortcomings

Blocks in IPFS are exchanged without the use of compression, this is a huge opportunity loss to minimize the bandwidth footprint and latency of transferring a file. For context, even minimal web assets are transmitted compressed through HTTP to increase website loading performance, most of them are below 256KiB, which is IPFS default block size. We expect to see several gains in transmission times.

Description

Current implementation of file-sharing protocols may benefit from the use of on-the-fly compression to optimize the use of bandwidth and optimize the transmission of content. Even more, when using the “Graphsynced” approach in the discovery of content, where we request peers for the level of fulfillment of an IPLD selector, we can request all the blocks for the IPLD selector to be compressed in the same package and forwarded to the requestor.

Some of the compression approaches to be explored in this RFC are:

  • Block by block standard compression (e.g. gzip): Every block (and optionally every single Bitswap message) is compressed. Get inspiration from web compression.
  • Whole transfer compression: All the blocks requested by a peer in a Wantlist or a graphsync IPLD selector are compressed in the same package.
  • Custom coding tables for sequences of bytes that appeared often (e.g. generate a Huffman table for all the protobuf headings so that these are compressed by default, like hpack does for http).
  • Use of “compressed caches” so that when a specific content has been identified as “regularly exchanged”, instead of having to compress it again it can be retrieved from the cache. This scheme may not be trivial.
  • Use of different compression algorithms.
  • Use of different block sizes before compression.

Implementation plan

  • Perform a simple test to evaluate the benefits of "on-the-fly" compression on blocks (to determine if IPFS could benefit from directly exchanging compressed messages and blocks). Evaluate different compression algorithms used in the web.

    • Evaluate the compression of full Bitswap messages (bs.compressionStrategy = "full"): To achieve this we add a compression flag in Bitswap messages to be able to identify when messages are compressed. Afterwards, if compression is enabled we need to compress the message before sending it. Compressed messages are identified in the newMessageFromProto of receiving peers, they are uncompressed and processed seamlessly by Bitswap. In order to open the door to the use different compression algorithms and different full-message compression strategies a compressionType has been added to message.proto.

    • Evaluate the compression of blocks only (bs.compressionStrategy = "blocks"): We compress each block before adding it to the protobuf message in ToProtoV1 function of message.go, and then uncompress them in newMessageFromProto. For the compression of blocks, the only thing that is changed for the transmission of the block is the RawData, the CID is kept without change so the block is conveniently identified.

    • Use GZip as the default compression algorithm (engine.compressor = "Gzip").

    • Instead of compressing fields of the protobuf message, evaluate the compression of the full stream in the bitswap network.

      • We may choose to use a multicodec to signal that a stream is compressed and evaluate the fact that instead of using a prefix to signal the size of sent messages, in order to be able to leverage streams, use multicodec and KeepReading and EndOfStream signals in protocol streams so there is no need to know the size of the compressed message beforehand.
    • Evaluate other compression algorithms (Brotli and gzip are the best alternative, but in case we want to test with other algorithms):

    • Compare computational footprint of above implementations

  • Design and evaluate a simple scheme to "gather" WANT requests and create fully compiled/network coded responses to the requestor. This involves several tasks

    • A way of matching several blocks with a request (graphsync, wantlist).

    • Perform the compression operation. This may be computationally expensive and would need some time, and depending on the data, when the prepared response is ready to send, the requestor may have already received all the blocks conforming his desired content.

  • Evaluate how the use of different block sizes and compression may benefit performance.

  • Include a "compression parameter" in exchange requests to signal peers that you want blocks to be transmitted using a specific compression technique.

  • From the preliminary tests performed after the above implementations, evaluate if the protocol could benefit from the use of Huffman tables and compressed caches.

Implementation details

  • Block compression: Files within Bitswap are exchanged in the form of blocks. Files are composed of several blocks organized in a DAG structure (with each block having a size limit of 256KB). In this compression approach, we compress blocks before including them in a message and transmitting them to the network.

  • Full message compression: In this compression strategy instead of only compressing blocks we compress every single message before sending it. It is the equivalent of compressing header+body in HTTP.

  • Stream compression: It uses compression at a stream level, so every byte that enters a stream from the node to other peers is compressed (i.e. using a compressed writer).

  • To drive the compression idea even further, we prototyped a Compression transport into libp2p (between the Muxer and the Security layer) so that every stream running over a libp2p node can potentially benefit from the use of compression. This is a non-breaking change as the transport-upgrader has also been updated to enable compression negotiation (so eventually anyone can come with their own compression and embed it into libp2p seamlessly). Some repos to get started with compression in libp2p:

See a discussion on the results here.

Impact

A reduction of latency due to compressed transmissions. Potential increase in computational overhead.

Evaluation Plan

  • The IPFS File Transfer benchmarks.

  • See the computational footprint of different compression strategies and algorithms.

  • Compare the data sent and received using compression and with baseline Bitswap.

Prior Work

This RFC takes inspiration from:

Results

The results for the implementation of this RFC were reported here: https://research.protocol.ai/blog/2020/honey-i-shrunk-our-libp2p-streams/

Future Work

  • If the use of exchange requests and the negotiation phase for content transmission (RFC | BB | L1/2-01) is implemented, it makes sense that once identified a specific peer (or a group of them) as the ones storing a large number of the desired blocks, to request more advanced compression and network coding techniques for their transmission.

  • Detect the type of data being exchanged in blocks and apply the most suitable compression for the data type, such as image-specific compression if images are being exchanged (for this approach, a node will need to have all the blocks for the data).