Due to the generic nature of our build system, we do not impose any restrictions on the type of artifacts that are generated by actions or that are inputs to actions. Thus, artifacts might be large executables, libraries, or a compiled set of slides in PDF format, or even whole file system images. Such artifacts result in large blobs, which need to be fetched from a remote CAS if they are to be inspected or used locally. Depending on the network connection, this might imply a significant waiting time until the complete artifact is downloaded as well as results in a lot of network traffic. This situation is especially painful in case of short modification-inspection cycles. For each small modification of the sources, the complete artifact needs to be downloaded even though maybe only a small fraction of the compiled binary artifact has been changed.
Thus, we introduced a blob splitting API as conservative extension to the original remote execution API, which allows to split the binary data of a blob into chunks and then to transmit only the modified chunks instead of the whole blob data. This reduces traffic between the remote server and the local host and avoids unnecessary waiting times for users frequently working on large artifacts. The downside of this API is an increased storage consumption, since the binary data of the splitted artifacts will be stored twice.
We extended the existing remote execution API by introducing a new
remote procedure call (rpc) to the ContentAddressableStorage
service
to split a blob into chunks along with respective request and response
messages, and by adding a flag to the CacheCapabilities
message to
indicate whether blob splitting is supported by the server instance or
not. The following code snippet depicts the required extensions to the
remote execution protocol:
service ContentAddressableStorage {
...
rpc SplitBlob(SplitBlobRequest) returns (SplitBlobResponse) {}
}
message SplitBlobRequest {
string instance_name = 1;
Digest blob_digest = 2;
}
message SplitBlobResponse {
repeated Digest chunk_digests = 1;
}
message CacheCapabilities {
...
bool blob_split_support = <free field number>;
}
The contract between the client and the server for this protocol extension is that if a client requests to split a blob at the remote execution server, the server answers with a list of chunk digests and gives the promise that first, all referenced blobs are available in its CAS to be downloaded and second, the concatenation of the returned blobs in the given order will result in the original blob the client has requested.
The client does not give any promise to the server, it is free to not use the blob splitting rpc, but in order to make sense of the protocol extension (saving traffic), a meaningful behavior of the client would be to request the chunk digests of a blob from the server, to fetch only those chunks that are missing in its local CAS, and to store the fetched chunks as well as the reconstructed original blob in its local CAS. If the client requests to split a blob, but the server does not support blob splitting or if an error occured during the request, the client falls back to fetch the entire blob.
When the server receives a request to split a blob via the SplitBlob
rpc, it first checks whether the blob given by the blob_digest
from
the SplitBlobRequest
message exists in its CAS. If not, the status
code google.rpc.Code.NOT_FOUND
is returned. Otherwise, it loads the
blob data and splits it into chunks according to the implemented data
chunking algorithm. As explained later, there are different
implementation options for this algorithm, but they all must ensure one
property: the concatenation of the chunks result in the original blob.
After the blob is splitted, the server computes a digest for each chunk
according to the configured digest function and puts each chunk that is
not yet stored in its CAS. If an error occurs during storing the chunks
in the CAS due to storage shortage, the status code
google.rpc.Code.RESOURCE_EXHAUSTED
is returned. Otherwise, the chunk
digests are collected in the chunk_digests
field of a
SplitBlobResponse
message in the same order as the chunks occur within
the binary blob data and the message is sent back to the client with the
status code google.rpc.Code.OK
.
Note: Since the same data is basically stored twice for each blob, the storage consumption for the remote server is roughly doubled. Some savings occur when chunks are reused in more than one blob or even several times in the same blob.
If a client wants to take advantage from blob splitting, it requests to
split a blob into chunks at the remote side by calling the SplitBlob
rpc from the ContentAddressableStorage
service given a
SplitBlobRequest
message containing the blob_digest
of the
respective blob. If the status code returned by the SplitBlob
rpc
contains an error, the split operation failed and the client needs to
fall back to fetch the entire blob. Otherwise, blob splitting was
successful and the remote server returns an ordered list of
chunk_digests
and guarantees that the chunks are available in its CAS
and that the concatention of all chunks in the given order will result
in the requested blob. The client checks which chunks are available
locally, fetches only the locally missing chunks from the remote side
via the BatchReadBlobs
rpc or the ByteStream
API, and assembles the
requested blob by concatenating the binary data of its chunks. Finally,
it stores the chunks as well as the resulting blob in its local CAS.
Note: Since the same data is basically stored twice for each blob, the local storage consumption is roughly doubled. Some savings occur when chunks are reused in more than one blob or even several times in the same blob.
We aim to get this protocol extension into the official remote execution
API mentioned above. Once this is done and field numbers are assigned,
servers that have upgraded to the new API version inform clients that
have upgraded to the new API version whether they support blob splitting
or not by setting the blob_split_support
field in the
CacheCapabilities
message accordingly. A server only provides the
aforementioned guarantee to clients once it has announced to support
blob splitting using this field.
An old client can communicate with a server implementing the new API
without any modification. Nobody is forced to use the blob splitting
functionality and unknown fields are just ignored at the client side. A
client implementing the new API can communicate with an old server by
evaluating the blob_split_support
field, which will be set to its
default value false
at the client side.
Until the protocol extension is officially accepted, the field number
for the blob_split_support
field is not known. In this case, early
implementors can use a trial request to determine whether the remote
server supports blob splitting or not. This allows to implement a
prototypical client and server employing blob splitting without
requiring the protocol extension to be officially accepted. It also
allows new clients to communicate with old servers, they still behave
correctly and do not need to be adapted. The client implementation needs
to be rebuilt once a field number has been assigned.
As stated earlier, a blob split request from a client is answered by the server with the promise that the concatenation of the returned chunks results in the requested blob. While this property is guaranteed by the server, it does not tell anything about the quality of the split result. It is desirable to generate chunks that are likely to be known by the client, because then less data needs to be transferred.
A trivial implementation of the split operation would be returning a singleton list containing the original blob as single chunk. While this implementation is correct and fast, it is not very useful, since it does not save any traffic.
A better approach is fixed-block chunking where the data is split into chunks of fixed and equal size. While this approach typically results in more than one chunk and improves the probability to save some traffic, it comes with the limitation of the data shifting problem. The insertion of a single character at the beginning of a data stream shifts the entire data while chunk boundaries are fixed. Thus, completely new chunks, unknown to the client are created even though the data patterns are mostly similar.
A more intelligent way of splitting blobs would be to look for locations in the content of the data as chunk boundaries such that splitting of a slightly modified blob results in almost similar chunks insensitive to the data shifting problem of fixed-block chunking. Since the resulting chunk sizes are typically different, this approach is also called variable-block chunking and is explained in the following section.
In variable-block chunking, the borders of the chunks are content
defined and shifted with the data pattern. This can be achieved by
computing hash values of data blocks of a specific size at every byte
position in the data stream and matching those hash values against a
predefined pattern. In case of a match, that byte position is declared
as chunk boundary. This approach avoids the data shifting problem of
fixed-block chunking and ensures that unmodified data more than a block
size away from the changes will have the same boundaries. A possible
pattern to be matched might be all zeroes in the lower n
bits. This
would result in chunks with an average size of 2^n
bytes.
A common technique to efficiently compute hash values of consecutive
bytes at every byte position in the data stream is to use a rolling hash
function. A rolling hash function allows to cheaply compute the hash
value of a chunk of bytes of a specific size at byte position i
from
the hash value of the data chunk of the same size at byte position
i-1
. Different plain rolling hash functions are available for
implementation such as a polynomial rolling
hash, the Rabin
fingerprint,
or a cyclic polynomial rolling
hash (also called
Buzhash). However, the fast rolling Gear hash
algorithm
(also called FastCDC) has been proven to be very compute efficient and
faster than the other rolling hash algorithms specifically for
content-based chunking of a stream of data while achieving similar
deduplication ratios as the Rabin fingerprint.