Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
26553: storage: return byte batches in ScanResponse r=jordanlewis a=jordanlewis Previously, the MVCCScan API call completely deserialized the batch it got from C++ into a slice of roachpb.KeyValue, and sent that slice of roachpb.KeyValue over gRPC via ScanResponse. This is needlessly expensive for several reasons. - gRPC must re-serialize what we sent it to a flat byte stream. But, we already had a flat byte stream to begin with, before inflating it into KeyValues. In effect, we're doing pointless deserialization and reserialization. - We needed to dynamically allocate a slice of roachpb.KeyValue on every scan request, in buildScanResponse. This was the second largest cause of allocations in our system, beside the first copy from C++ to Go. But, it's pointless, since we're just going to throw that slice away again when we either serialize to the network or iterate over it and inflate the KeyValues into rows later down the pipe. Now, MVCCScan can optionally skip this inflation and return the raw write batch that it got from C++. The txnKVFetcher and rowFetcher are modified to use this option. They now deserialize keys from the write batch as necessary. This results in a large decrease in the number of allocations performed per scan. When going over the network, only 1 object has to be marshalled and demarshalled (the batch) instead of the number of returned keys. Also, we don't have to allocate the initial slice of []KeyValue, or any of the slices within Key or Value, to return data. I haven't delved into modifying the relevant unit tests yet, but logic tests pass and I've been playing around with the resultant binary for some performance testing. I don't see much of a concrete performance change, but pprof reports reduced allocations as I'd expect. Release note: None Co-authored-by: Jordan Lewis <[email protected]>
- Loading branch information