Skip to content

Latest commit

 

History

History
223 lines (191 loc) · 11 KB

blob-splitting.md

File metadata and controls

223 lines (191 loc) · 11 KB

Blob Splitting Protocol

Introduction

Due to the generic nature of our build system, we do not impose any restrictions on the type of artifacts that are generated by actions or that are inputs to actions. Thus, artifacts might be large executables, libraries, or a compiled set of slides in PDF format, or even whole file system images. Such artifacts result in large blobs, which need to be fetched from a remote CAS if they are to be inspected or used locally. Depending on the network connection, this might imply a significant waiting time until the complete artifact is downloaded as well as results in a lot of network traffic. This situation is especially painful in case of short modification-inspection cycles. For each small modification of the sources, the complete artifact needs to be downloaded even though maybe only a small fraction of the compiled binary artifact has been changed.

Thus, we introduced a blob splitting API as conservative extension to the original remote execution API, which allows to split the binary data of a blob into chunks and then to transmit only the modified chunks instead of the whole blob data. This reduces traffic between the remote server and the local host and avoids unnecessary waiting times for users frequently working on large artifacts. The downside of this API is an increased storage consumption, since the binary data of the splitted artifacts will be stored twice.

Remote execution API extension

We extended the existing remote execution API by introducing a new remote procedure call (rpc) to the ContentAddressableStorage service to split a blob into chunks along with respective request and response messages, and by adding a flag to the CacheCapabilities message to indicate whether blob splitting is supported by the server instance or not. The following code snippet depicts the required extensions to the remote execution protocol:

service ContentAddressableStorage {
  ...
  rpc SplitBlob(SplitBlobRequest) returns (SplitBlobResponse) {}
}

message SplitBlobRequest {
  string instance_name = 1;
  Digest blob_digest = 2;
}

message SplitBlobResponse {
  repeated Digest chunk_digests = 1;
}

message CacheCapabilities {
  ...
  bool blob_split_support = <free field number>;
}

Contract between client and server

The contract between the client and the server for this protocol extension is that if a client requests to split a blob at the remote execution server, the server answers with a list of chunk digests and gives the promise that first, all referenced blobs are available in its CAS to be downloaded and second, the concatenation of the returned blobs in the given order will result in the original blob the client has requested.

The client does not give any promise to the server, it is free to not use the blob splitting rpc, but in order to make sense of the protocol extension (saving traffic), a meaningful behavior of the client would be to request the chunk digests of a blob from the server, to fetch only those chunks that are missing in its local CAS, and to store the fetched chunks as well as the reconstructed original blob in its local CAS. If the client requests to split a blob, but the server does not support blob splitting or if an error occured during the request, the client falls back to fetch the entire blob.

Blob split procedure

Server side

When the server receives a request to split a blob via the SplitBlob rpc, it first checks whether the blob given by the blob_digest from the SplitBlobRequest message exists in its CAS. If not, the status code google.rpc.Code.NOT_FOUND is returned. Otherwise, it loads the blob data and splits it into chunks according to the implemented data chunking algorithm. As explained later, there are different implementation options for this algorithm, but they all must ensure one property: the concatenation of the chunks result in the original blob. After the blob is splitted, the server computes a digest for each chunk according to the configured digest function and puts each chunk that is not yet stored in its CAS. If an error occurs during storing the chunks in the CAS due to storage shortage, the status code google.rpc.Code.RESOURCE_EXHAUSTED is returned. Otherwise, the chunk digests are collected in the chunk_digests field of a SplitBlobResponse message in the same order as the chunks occur within the binary blob data and the message is sent back to the client with the status code google.rpc.Code.OK.

Note: Since the same data is basically stored twice for each blob, the storage consumption for the remote server is roughly doubled. Some savings occur when chunks are reused in more than one blob or even several times in the same blob.

Client side

If a client wants to take advantage from blob splitting, it requests to split a blob into chunks at the remote side by calling the SplitBlob rpc from the ContentAddressableStorage service given a SplitBlobRequest message containing the blob_digest of the respective blob. If the status code returned by the SplitBlob rpc contains an error, the split operation failed and the client needs to fall back to fetch the entire blob. Otherwise, blob splitting was successful and the remote server returns an ordered list of chunk_digests and guarantees that the chunks are available in its CAS and that the concatention of all chunks in the given order will result in the requested blob. The client checks which chunks are available locally, fetches only the locally missing chunks from the remote side via the BatchReadBlobs rpc or the ByteStream API, and assembles the requested blob by concatenating the binary data of its chunks. Finally, it stores the chunks as well as the resulting blob in its local CAS.

Note: Since the same data is basically stored twice for each blob, the local storage consumption is roughly doubled. Some savings occur when chunks are reused in more than one blob or even several times in the same blob.

Compatibility issues

We aim to get this protocol extension into the official remote execution API mentioned above. Once this is done and field numbers are assigned, servers that have upgraded to the new API version inform clients that have upgraded to the new API version whether they support blob splitting or not by setting the blob_split_support field in the CacheCapabilities message accordingly. A server only provides the aforementioned guarantee to clients once it has announced to support blob splitting using this field.

An old client can communicate with a server implementing the new API without any modification. Nobody is forced to use the blob splitting functionality and unknown fields are just ignored at the client side. A client implementing the new API can communicate with an old server by evaluating the blob_split_support field, which will be set to its default value false at the client side.

Until the protocol extension is officially accepted, the field number for the blob_split_support field is not known. In this case, early implementors can use a trial request to determine whether the remote server supports blob splitting or not. This allows to implement a prototypical client and server employing blob splitting without requiring the protocol extension to be officially accepted. It also allows new clients to communicate with old servers, they still behave correctly and do not need to be adapted. The client implementation needs to be rebuilt once a field number has been assigned.

Data chunking algorithm

As stated earlier, a blob split request from a client is answered by the server with the promise that the concatenation of the returned chunks results in the requested blob. While this property is guaranteed by the server, it does not tell anything about the quality of the split result. It is desirable to generate chunks that are likely to be known by the client, because then less data needs to be transferred.

A trivial implementation of the split operation would be returning a singleton list containing the original blob as single chunk. While this implementation is correct and fast, it is not very useful, since it does not save any traffic.

A better approach is fixed-block chunking where the data is split into chunks of fixed and equal size. While this approach typically results in more than one chunk and improves the probability to save some traffic, it comes with the limitation of the data shifting problem. The insertion of a single character at the beginning of a data stream shifts the entire data while chunk boundaries are fixed. Thus, completely new chunks, unknown to the client are created even though the data patterns are mostly similar.

A more intelligent way of splitting blobs would be to look for locations in the content of the data as chunk boundaries such that splitting of a slightly modified blob results in almost similar chunks insensitive to the data shifting problem of fixed-block chunking. Since the resulting chunk sizes are typically different, this approach is also called variable-block chunking and is explained in the following section.

Variable-block chunking

In variable-block chunking, the borders of the chunks are content defined and shifted with the data pattern. This can be achieved by computing hash values of data blocks of a specific size at every byte position in the data stream and matching those hash values against a predefined pattern. In case of a match, that byte position is declared as chunk boundary. This approach avoids the data shifting problem of fixed-block chunking and ensures that unmodified data more than a block size away from the changes will have the same boundaries. A possible pattern to be matched might be all zeroes in the lower n bits. This would result in chunks with an average size of 2^n bytes.

Rolling hashes

A common technique to efficiently compute hash values of consecutive bytes at every byte position in the data stream is to use a rolling hash function. A rolling hash function allows to cheaply compute the hash value of a chunk of bytes of a specific size at byte position i from the hash value of the data chunk of the same size at byte position i-1. Different plain rolling hash functions are available for implementation such as a polynomial rolling hash, the Rabin fingerprint, or a cyclic polynomial rolling hash (also called Buzhash). However, the fast rolling Gear hash algorithm (also called FastCDC) has been proven to be very compute efficient and faster than the other rolling hash algorithms specifically for content-based chunking of a stream of data while achieving similar deduplication ratios as the Rabin fingerprint.