Skip to content

Commit

Permalink
sql,storage: add support for COL_BATCH_RESPONSE scan format
Browse files Browse the repository at this point in the history
This commit introduces a new `COL_BATCH_RESPONSE` scan format for Scans
and ReverseScans which results only in needed columns to be returned
from the KV server. In other words, this commit introduces the ability
to perform the KV projection pushdown.

The main idea of this feature is to use the injected decoding logic from
SQL in order to process each KV and keep only the needed parts (i.e.
necessary SQL columns). Those needed parts are then propagated back to
the KV client as coldata.Batch'es (serialized in the Apache Arrow format).

Here is the outline of all components involved:

     ┌────────────────────────────────────────────────┐
     │                       SQL                      │
     │________________________________________________│
     │          colfetcher.ColBatchDirectScan         │
     │                        │                       │
     │                        ▼                       │
     │                 row.txnKVFetcher               │
     │    (behind the row.KVBatchFetcher interface)   │
     └────────────────────────────────────────────────┘
                              │
                              ▼
     ┌────────────────────────────────────────────────┐
     │                    KV Client                   │
     └────────────────────────────────────────────────┘
                              │
                              ▼
     ┌────────────────────────────────────────────────┐
     │                    KV Server                   │
     │________________________________________________│
     │           colfetcher.cFetcherWrapper           │
     │ (behind the storage.CFetcherWrapper interface) │
     │                        │                       │
     │                        ▼                       │
     │              colfetcher.cFetcher               │
     │                        │                       │
     │                        ▼                       │
     │          storage.mvccScanFetchAdapter ────────┐│
     │    (behind the storage.NextKVer interface)    ││
     │                        │                      ││
     │                        ▼                      ││
     │           storage.pebbleMVCCScanner           ││
     │ (which put's KVs into storage.singleResults) <┘│
     └────────────────────────────────────────────────┘

On the KV client side, `row.txnKVFetcher` issues Scans and ReverseScans with
the `COL_BATCH_RESPONSE` format and returns the response (which contains the
columnar data) to the `colfetcher.ColBatchDirectScan`.

On the KV server side, we create a `storage.CFetcherWrapper` that asks the
`colfetcher.cFetcher` for the next `coldata.Batch`. The `cFetcher`, in turn,
fetches the next KV, decodes it, and keeps only values for the needed SQL
columns, discarding the rest of the KV. The KV is emitted by the
`mvccScanFetchAdapter` which - via the `singleResults` struct - exposes
access to the current KV that the `pebbleMVCCScanner` is pointing at.

Note that there is an additional "implicit synchronization" between
components that is not shown on this diagram. In particular,
`storage.singleResults.maybeTrimPartialLastRow` must be in sync with the
`colfetcher.cFetcher` which is achieved by
- the `cFetcher` exposing access to the first key of the last incomplete SQL
  row via the `FirstKeyOfRowGetter`,
- the `singleResults` using that key as the resume key for the response,
- and the `cFetcher` removing that last partial SQL row when `NextKV()`
  returns `partialRow=true`.
This "upstream" link (although breaking the layering a bit) allows us to
avoid a performance penalty for handling the case with multiple column
families. (This case is handled by the `storage.pebbleResults` via tracking
offsets into the `pebbleResults.repr`.)

This code structure deserves some elaboration. First, there is a mismatch
between the "push" mode in which the `pebbleMVCCScanner` operates and the
"pull" mode that the `NextKVer` exposes. The adaption between two different
modes is achieved via the `mvccScanFetcherAdapter` grabbing (when the control
returns to it) the current unstable KV pair from the `singleResults` struct
which serves as a one KV pair buffer that the `pebbleMVCCScanner` `put`s into.
Second, in order be able to use the unstable KV pair without performing a
copy, the `pebbleMVCCScanner` stops at the current KV pair and returns the
control flow (which is exactly what `pebbleMVCCScanner.getOne` does) back to
the `mvccScanFetcherAdapter`, with the adapter advancing the scanner only when
the next KV pair is needed.

There are multiple scenarios which are currently not supported:
- SQL cannot issue Get requests (likely will support in 23.1)
- `TraceKV` option is not supported (likely will support in 23.1)
- user-defined types other than enums are not supported (will _not_
support in 23.1)
- non-default key locking strength as well as SKIP LOCKED wait policy
are not supported (will _not_ support in 23.1).

The usage of this feature is currently disabled by default, but I intend
to enable it by default for multi-tenant setups. The rationale is that
currently there is a large performance hit when enabling it for
single-tenant deployments whereas it offers significant speed up in the
multi-tenant world.

The microbenchmarks [show](https://gist.github.com/yuzefovich/669c295a8a4fdffa6490532284c5a719)
the expected improvement in multi-tenant setups when the tenant runs in
a separate process whenever we don't need to decode all of the columns
from the table.

The TPCH numbers, though, don't show the expected speedup:
```
Q1:	before: 11.47s	after: 8.84s	 -22.89%
Q2:	before: 0.41s	after: 0.29s	 -27.71%
Q3:	before: 7.89s	after: 9.68s	 22.63%
Q4:	before: 4.48s	after: 4.52s	 0.86%
Q5:	before: 10.39s	after: 10.35s	 -0.29%
Q6:	before: 33.57s	after: 33.41s	 -0.48%
Q7:	before: 23.82s	after: 23.81s	 -0.02%
Q8:	before: 3.78s	after: 3.76s	 -0.68%
Q9:	before: 28.15s	after: 28.03s	 -0.42%
Q10:	before: 5.00s	after: 4.98s	 -0.42%
Q11:	before: 2.44s	after: 2.44s	 0.22%
Q12:	before: 34.78s	after: 34.65s	 -0.37%
Q13:	before: 3.20s	after: 2.94s	 -8.28%
Q14:	before: 3.13s	after: 3.21s	 2.43%
Q15:	before: 16.80s	after: 16.73s	 -0.38%
Q16:	before: 1.60s	after: 1.65s	 2.96%
Q17:	before: 0.85s	after: 0.96s	 13.04%
Q18:	before: 16.39s	after: 15.47s	 -5.61%
Q19:	before: 13.76s	after: 13.01s	 -5.45%
Q20:	before: 55.33s	after: 55.12s	 -0.38%
Q21:	before: 24.31s	after: 24.31s	 -0.00%
Q22:	before: 1.28s	after: 1.41s	 10.26%
```
Note that much better numbers were obtained on a buggy prototype. Those
numbers are invalid, and more details can be found
[here](cockroachdb#94438 (comment)).

At the moment, `coldata.Batch` that is included into the response is
always serialized into the Arrow format, but I intend to introduce the
local fastpath to avoid that serialization. That work will be done in
a follow-up and should be able to reduce the perf hit for single-tenant
deployments.

A quick note on the TODOs sprinkled in this commit:
- `TODO(yuzefovich)` means that this will be left for 23.2 or later.
- `TODO(yuzefovich, 23.1)` means that it should be addressed in 23.1.

A quick note on testing: this commit randomizes the fact whether the new
infrastructure is used in almost all test builds. Introducing some unit
testing (say, in `storage` package) seems rather annoying since we must
create keys that are valid SQL keys (i.e. have TableID / Index ID
prefix) and need to come with the corresponding
`fetchpb.IndexFetchSpec`. Not having unit tests in the `storage` seems
ok to me given that the "meat" of the work there is still done by the
`pebbleMVCCScanner` which is exercised using the regular Scans.
End-to-end testing is well covered by all of our existing tests which
now runs randomly. I did run the CI multiple times with the new feature
enabled by default with no failure, so I hope that it shouldn't become
flaky.

Release note: None
  • Loading branch information
yuzefovich committed Jan 26, 2023
1 parent 6fc1022 commit d153ece
Show file tree
Hide file tree
Showing 58 changed files with 1,791 additions and 417 deletions.
2 changes: 1 addition & 1 deletion docs/generated/settings/settings-for-tenants.txt
Original file line number Diff line number Diff line change
Expand Up @@ -297,4 +297,4 @@ trace.jaeger.agent string the address of a Jaeger agent to receive traces using
trace.opentelemetry.collector string address of an OpenTelemetry trace collector to receive traces using the otel gRPC protocol, as <host>:<port>. If no port is specified, 4317 will be used.
trace.span_registry.enabled boolean true if set, ongoing traces can be seen at https://<ui>/#/debug/tracez
trace.zipkin.collector string the address of a Zipkin instance to receive traces, as <host>:<port>. If no port is specified, 9411 will be used.
version version 1000022.2-32 set the active cluster version in the format '<major>.<minor>'
version version 1000022.2-34 set the active cluster version in the format '<major>.<minor>'
2 changes: 1 addition & 1 deletion docs/generated/settings/settings.html
Original file line number Diff line number Diff line change
Expand Up @@ -235,6 +235,6 @@
<tr><td><div id="setting-trace-opentelemetry-collector" class="anchored"><code>trace.opentelemetry.collector</code></div></td><td>string</td><td><code></code></td><td>address of an OpenTelemetry trace collector to receive traces using the otel gRPC protocol, as &lt;host&gt;:&lt;port&gt;. If no port is specified, 4317 will be used.</td></tr>
<tr><td><div id="setting-trace-span-registry-enabled" class="anchored"><code>trace.span_registry.enabled</code></div></td><td>boolean</td><td><code>true</code></td><td>if set, ongoing traces can be seen at https://&lt;ui&gt;/#/debug/tracez</td></tr>
<tr><td><div id="setting-trace-zipkin-collector" class="anchored"><code>trace.zipkin.collector</code></div></td><td>string</td><td><code></code></td><td>the address of a Zipkin instance to receive traces, as &lt;host&gt;:&lt;port&gt;. If no port is specified, 9411 will be used.</td></tr>
<tr><td><div id="setting-version" class="anchored"><code>version</code></div></td><td>version</td><td><code>1000022.2-32</code></td><td>set the active cluster version in the format &#39;&lt;major&gt;.&lt;minor&gt;&#39;</td></tr>
<tr><td><div id="setting-version" class="anchored"><code>version</code></div></td><td>version</td><td><code>1000022.2-34</code></td><td>set the active cluster version in the format &#39;&lt;major&gt;.&lt;minor&gt;&#39;</td></tr>
</tbody>
</table>

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

3 changes: 3 additions & 0 deletions pkg/ccl/logictestccl/tests/3node-tenant/generated_test.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

9 changes: 9 additions & 0 deletions pkg/clusterversion/cockroach_versions.go
Original file line number Diff line number Diff line change
Expand Up @@ -404,6 +404,10 @@ const (
// V23_1KeyVisualizerTablesAndJobs adds the system tables that support the key visualizer.
V23_1KeyVisualizerTablesAndJobs

// V23_1_KVDirectColumnarScans introduces the support of the "direct"
// columnar scans in the KV layer.
V23_1_KVDirectColumnarScans

// *************************************************
// Step (1): Add new versions here.
// Do not add new versions to a patch release.
Expand Down Expand Up @@ -694,6 +698,11 @@ var rawVersionsSingleton = keyedVersions{
Key: V23_1KeyVisualizerTablesAndJobs,
Version: roachpb.Version{Major: 22, Minor: 2, Internal: 32},
},
{
Key: V23_1_KVDirectColumnarScans,
Version: roachpb.Version{Major: 22, Minor: 2, Internal: 34},
},

// *************************************************
// Step (2): Add new versions here.
// Do not add new versions to a patch release.
Expand Down
3 changes: 3 additions & 0 deletions pkg/cmd/generate-logictest/templates.go
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,9 @@ func runExecBuildLogicTest(t *testing.T, file string) {
serverArgs := logictest.TestServerArgs{
DisableWorkmemRandomization: true,{{ if .ForceProductionValues }}
ForceProductionValues: true,{{end}}
// Disable the direct scans in order to keep the output of EXPLAIN (VEC)
// deterministic.
DisableDirectColumnarScans: true,
}
logictest.RunLogicTest(t, serverArgs, configIdx, filepath.Join(execBuildLogicTestDir, file))
}
Expand Down
21 changes: 19 additions & 2 deletions pkg/cmd/roachtest/tests/multitenant_tpch.go
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,9 @@ import (
// runMultiTenantTPCH runs TPCH queries on a cluster that is first used as a
// single-tenant deployment followed by a run of all queries in a multi-tenant
// deployment with a single SQL instance.
func runMultiTenantTPCH(ctx context.Context, t test.Test, c cluster.Cluster) {
func runMultiTenantTPCH(
ctx context.Context, t test.Test, c cluster.Cluster, enableDirectScans bool,
) {
c.Put(ctx, t.Cockroach(), "./cockroach", c.All())
c.Put(ctx, t.DeprecatedWorkload(), "./workload", c.Node(1))
c.Start(ctx, t.L(), option.DefaultStartOpts(), install.MakeClusterSettings(install.SecureOption(true)), c.All())
Expand All @@ -40,6 +42,11 @@ func runMultiTenantTPCH(ctx context.Context, t test.Test, c cluster.Cluster) {
// one at a time (using the given url as a parameter to the 'workload run'
// command). The runtimes are accumulated in the perf helper.
runTPCH := func(conn *gosql.DB, url string, setupIdx int) {
setting := fmt.Sprintf("SET CLUSTER SETTING sql.distsql.direct_columnar_scans.enabled = %t", enableDirectScans)
t.Status(setting)
if _, err := conn.Exec(setting); err != nil {
t.Fatal(err)
}
t.Status("restoring TPCH dataset for Scale Factor 1 in ", setupNames[setupIdx])
if err := loadTPCHDataset(
ctx, t, c, conn, 1 /* sf */, c.NewMonitor(ctx), c.All(), false, /* disableMergeQueue */
Expand Down Expand Up @@ -93,6 +100,16 @@ func registerMultiTenantTPCH(r registry.Registry) {
Name: "multitenant/tpch",
Owner: registry.OwnerSQLQueries,
Cluster: r.MakeClusterSpec(1 /* nodeCount */),
Run: runMultiTenantTPCH,
Run: func(ctx context.Context, t test.Test, c cluster.Cluster) {
runMultiTenantTPCH(ctx, t, c, false /* enableDirectScans */)
},
})
r.Add(registry.TestSpec{
Name: "multitenant/tpch_direct_scans",
Owner: registry.OwnerSQLQueries,
Cluster: r.MakeClusterSpec(1 /* nodeCount */),
Run: func(ctx context.Context, t test.Test, c cluster.Cluster) {
runMultiTenantTPCH(ctx, t, c, true /* enableDirectScans */)
},
})
}
8 changes: 8 additions & 0 deletions pkg/col/coldataext/extended_column_factory.go
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,14 @@ func NewExtendedColumnFactory(evalCtx *eval.Context) coldata.ColumnFactory {
return &extendedColumnFactory{evalCtx: evalCtx}
}

// NewExtendedColumnFactoryNoEvalCtx returns an extendedColumnFactory that will
// be producing coldata.DatumVecs that aren't fully initialized - the eval
// context is not set on those vectors. This can be acceptable if the caller
// cannot provide the eval.Context but also doesn't intend to compare datums.
func NewExtendedColumnFactoryNoEvalCtx() coldata.ColumnFactory {
return &extendedColumnFactory{}
}

func (cf *extendedColumnFactory) MakeColumn(t *types.T, n int) coldata.Column {
if typeconv.TypeFamilyToCanonicalTypeFamily(t.Family()) == typeconv.DatumVecCanonicalTypeFamily {
return newDatumVec(t, n, cf.evalCtx)
Expand Down
2 changes: 2 additions & 0 deletions pkg/col/colserde/arrowbatchconverter.go
Original file line number Diff line number Diff line change
Expand Up @@ -569,6 +569,8 @@ func getValueBytesAndOffsets(

// Release should be called once the converter is no longer needed so that its
// memory could be GCed.
// TODO(yuzefovich): consider renaming this to Close in order to not be confused
// with execreleasable.Releasable interface.
func (c *ArrowBatchConverter) Release(ctx context.Context) {
if c.acc != nil {
c.acc.Shrink(ctx, c.accountedFor)
Expand Down
9 changes: 9 additions & 0 deletions pkg/kv/kvserver/batcheval/cmd_reverse_scan.go
Original file line number Diff line number Diff line change
Expand Up @@ -63,6 +63,15 @@ func ReverseScan(
return result.Result{}, err
}
reply.BatchResponses = scanRes.KVData
case roachpb.COL_BATCH_RESPONSE:
scanRes, err = storage.MVCCScanToCols(
ctx, reader, cArgs.Header.IndexFetchSpec, args.Key, args.EndKey,
h.Timestamp, opts, cArgs.EvalCtx.ClusterSettings(),
)
if err != nil {
return result.Result{}, err
}
reply.BatchResponses = scanRes.KVData
case roachpb.KEY_VALUES:
scanRes, err = storage.MVCCScan(
ctx, reader, args.Key, args.EndKey, h.Timestamp, opts)
Expand Down
9 changes: 9 additions & 0 deletions pkg/kv/kvserver/batcheval/cmd_scan.go
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,15 @@ func Scan(
return result.Result{}, err
}
reply.BatchResponses = scanRes.KVData
case roachpb.COL_BATCH_RESPONSE:
scanRes, err = storage.MVCCScanToCols(
ctx, reader, cArgs.Header.IndexFetchSpec, args.Key, args.EndKey,
h.Timestamp, opts, cArgs.EvalCtx.ClusterSettings(),
)
if err != nil {
return result.Result{}, err
}
reply.BatchResponses = scanRes.KVData
case roachpb.KEY_VALUES:
scanRes, err = storage.MVCCScan(
ctx, reader, args.Key, args.EndKey, h.Timestamp, opts)
Expand Down
2 changes: 2 additions & 0 deletions pkg/kv/kvserver/batcheval/intent.go
Original file line number Diff line number Diff line change
Expand Up @@ -125,6 +125,8 @@ func acquireUnreplicatedLocksOnKeys(
res.Local.AcquiredLocks[i] = roachpb.MakeLockAcquisition(txn, copyKey(row.Key), lock.Unreplicated)
}
return nil
case roachpb.COL_BATCH_RESPONSE:
return errors.AssertionFailedf("unexpectedly acquiring unreplicated locks with COL_BATCH_RESPONSE scan format")
default:
panic("unexpected scanFormat")
}
Expand Down
3 changes: 3 additions & 0 deletions pkg/kv/kvserver/replica_read.go
Original file line number Diff line number Diff line change
Expand Up @@ -531,6 +531,9 @@ func (r *Replica) collectSpansRead(
resp := br.Responses[i].GetInner()

if ba.WaitPolicy == lock.WaitPolicy_SkipLocked && roachpb.CanSkipLocked(req) {
if ba.IndexFetchSpec != nil {
return nil, nil, errors.AssertionFailedf("unexpectedly IndexFetchSpec is set with SKIP LOCKED wait policy")
}
// If the request is using a SkipLocked wait policy, it behaves as if run
// at a lower isolation level for any keys that it skips over. If the read
// request did not return a key, it does not need to check for conflicts
Expand Down
4 changes: 4 additions & 0 deletions pkg/kv/kvserver/replica_tscache.go
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ import (
"github.com/cockroachdb/cockroach/pkg/util/hlc"
"github.com/cockroachdb/cockroach/pkg/util/log"
"github.com/cockroachdb/cockroach/pkg/util/uuid"
"github.com/cockroachdb/errors"
"github.com/cockroachdb/redact"
)

Expand Down Expand Up @@ -99,6 +100,9 @@ func (r *Replica) updateTimestampCache(
start, end := header.Key, header.EndKey

if ba.WaitPolicy == lock.WaitPolicy_SkipLocked && roachpb.CanSkipLocked(req) {
if ba.IndexFetchSpec != nil {
log.Errorf(ctx, "%v", errors.AssertionFailedf("unexpectedly IndexFetchSpec is set with SKIP LOCKED wait policy"))
}
// If the request is using a SkipLocked wait policy, it behaves as if run
// at a lower isolation level for any keys that it skips over. If the read
// request did not return a key, it does not make a claim about whether
Expand Down
2 changes: 2 additions & 0 deletions pkg/roachpb/BUILD.bazel
Original file line number Diff line number Diff line change
Expand Up @@ -151,6 +151,7 @@ proto_library(
"//pkg/kv/kvserver/concurrency/lock:lock_proto",
"//pkg/kv/kvserver/readsummary/rspb:rspb_proto",
"//pkg/settings:settings_proto",
"//pkg/sql/catalog/fetchpb:fetchpb_proto",
"//pkg/storage/enginepb:enginepb_proto",
"//pkg/util:util_proto",
"//pkg/util/admission/admissionpb:admissionpb_proto",
Expand All @@ -173,6 +174,7 @@ go_proto_library(
"//pkg/kv/kvserver/concurrency/lock",
"//pkg/kv/kvserver/readsummary/rspb",
"//pkg/settings",
"//pkg/sql/catalog/fetchpb",
"//pkg/storage/enginepb",
"//pkg/util",
"//pkg/util/admission/admissionpb",
Expand Down
66 changes: 46 additions & 20 deletions pkg/roachpb/api.proto
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ import "roachpb/errors.proto";
import "roachpb/metadata.proto";
import "roachpb/span_config.proto";
import "settings/encoding.proto";
import "sql/catalog/fetchpb/index_fetch.proto";
import "storage/enginepb/mvcc.proto";
import "storage/enginepb/mvcc3.proto";
import "util/hlc/timestamp.proto";
Expand Down Expand Up @@ -557,6 +558,10 @@ enum ScanFormat {
// The batch_response format: a byte slice of alternating keys and values,
// each prefixed by their length as a varint.
BATCH_RESPONSE = 1;
// The serialized (in the Apache Arrow format) representation of
// coldata.Batch'es that contain only necessary (according to the
// fetchpb.IndexFetchSpec) columns.
COL_BATCH_RESPONSE = 2;
}


Expand All @@ -568,9 +573,9 @@ message ScanRequest {

RequestHeader header = 1 [(gogoproto.nullable) = false, (gogoproto.embed) = true];

// The desired format for the response. If set to BATCH_RESPONSE, the server
// will set the batch_responses field in the ScanResponse instead of the rows
// field.
// The desired format for the response. If set to BATCH_RESPONSE or
// COL_BATCH_RESPONSE, the server will set the batch_responses field in the
// ScanResponse instead of the rows field.
ScanFormat scan_format = 4;

// The desired key-level locking mode used during this scan. When set to None
Expand Down Expand Up @@ -601,12 +606,19 @@ message ScanResponse {
// revisit this decision if this ever becomes a problem.
repeated KeyValue intent_rows = 3 [(gogoproto.nullable) = false];

// If set, each item in this repeated bytes field contains part of the results
// in batch format - the key/value pairs are a buffer of varint-prefixed
// slices, alternating from key to value. Each entry in this field is
// complete - there are no key/value pairs that are split across more than one
// entry. There are num_keys total pairs across all entries, as defined by the
// ResponseHeader. If set, rows will not be set and vice versa.
// If set, then depending on the ScanFormat, each item in this repeated bytes
// field contains part of the results in batch format:
// - for BATCH_RESPONSE - the key/value pairs are a buffer of varint-prefixed
// slices, alternating from key to value. Each entry in this field is complete
// (i.e. there are no key/value pairs that are split across more than one
// entry). There are num_keys total pairs across all entries, as defined by
// the ResponseHeader.
// - for COL_BATCH_RESPONSE - each []byte is a single serialized (in the
// Apache Arrow format) coldata.Batch. Each SQL row in that coldata.Batch is
// complete. num_keys total key-value pairs were used to populate all of the
// coldata.Batch'es in this field.
//
// If set, rows will not be set and vice versa.
repeated bytes batch_responses = 4;
}

Expand All @@ -618,9 +630,9 @@ message ReverseScanRequest {

RequestHeader header = 1 [(gogoproto.nullable) = false, (gogoproto.embed) = true];

// The desired format for the response. If set to BATCH_RESPONSE, the server
// will set the batch_responses field in the ScanResponse instead of the rows
// field.
// The desired format for the response. If set to BATCH_RESPONSE or
// COL_BATCH_RESPONSE, the server will set the batch_responses field in the
// ScanResponse instead of the rows field.
ScanFormat scan_format = 4;

// The desired key-level locking mode used during this scan. When set to None
Expand Down Expand Up @@ -651,12 +663,19 @@ message ReverseScanResponse {
// revisit this decision if this ever becomes a problem.
repeated KeyValue intent_rows = 3 [(gogoproto.nullable) = false];

// If set, each item in this repeated bytes field contains part of the results
// in batch format - the key/value pairs are a buffer of varint-prefixed
// slices, alternating from key to value. Each entry in this field is
// complete - there are no key/value pairs that are split across more than one
// entry. There are num_keys total pairs across all entries, as defined by the
// ResponseHeader. If set, rows will not be set and vice versa.
// If set, then depending on the ScanFormat, each item in this repeated bytes
// field contains part of the results in batch format:
// - for BATCH_RESPONSE - the key/value pairs are a buffer of varint-prefixed
// slices, alternating from key to value. Each entry in this field is complete
// (i.e. there are no key/value pairs that are split across more than one
// entry). There are num_keys total pairs across all entries, as defined by
// the ResponseHeader.
// - for COL_BATCH_RESPONSE - each []byte is a single serialized (in the
// Apache Arrow format) coldata.Batch. Each SQL row in that coldata.Batch is
// complete. num_keys total key-value pairs were used to populate all of the
// coldata.Batch'es in this field.
//
// If set, rows will not be set and vice versa.
repeated bytes batch_responses = 4;
}

Expand Down Expand Up @@ -2525,8 +2544,6 @@ message Header {
// The given value specifies the maximum number of keys in a row (i.e. the
// number of column families). If any larger rows are found at the end of the
// result, an error is returned.
//
// Added in 22.1, callers must check the ScanWholeRows version gate first.
int32 whole_rows_of_size = 26;
// If true, allow returning an empty result when the first result exceeds a
// limit (e.g. TargetBytes). Only supported by Get, Scan, and ReverseScan.
Expand Down Expand Up @@ -2605,6 +2622,15 @@ message Header {
// If a recording mode is set, the response will have collect_spans filled in.
util.tracing.tracingpb.TraceInfo trace_info = 25;

// index_fetch_spec, if set, is used for ScanRequests and ReverseScanRequests
// with COL_BATCH_RESPONSE ScanFormat.
//
// The rationale for having this as a single "global" field on the
// BatchRequest is that SQL never issues requests touching different indexes
// in a single BatchRequest, so it would be redundant to copy this field into
// each Scan and ReverseScan.
sql.sqlbase.IndexFetchSpec index_fetch_spec = 29;

reserved 7, 10, 12, 14, 20;
}

Expand Down
Loading

0 comments on commit d153ece

Please sign in to comment.