Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

multi-tenant: Validate unified architecture performance using TPC-H #87610

Closed
ajstorm opened this issue Sep 8, 2022 · 6 comments
Closed

multi-tenant: Validate unified architecture performance using TPC-H #87610

ajstorm opened this issue Sep 8, 2022 · 6 comments
Assignees
Labels
A-multitenancy Related to multi-tenancy A-unification branch-master Failures and bugs on the master branch. branch-release-23.1 Used to mark GA and release blockers, technical advisories, and bugs for 23.1 C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) GA-blocker T-multitenant Issues owned by the multi-tenant virtual team

Comments

@ajstorm
Copy link
Collaborator

ajstorm commented Sep 8, 2022

With #87084 we identified that multi-tenant configurations are significantly slower that single-tenant configurations when running TPC-H. The theory is that this is due to the inability to push certain SQL operations down into the KV layer. This however, would only be an issue for separate-process MT configurations. For the shared-process UA implementation, we should validate that there is no discernable performance difference when running TPC-H.

Epic CRDB-23559

Jira issue: CRDB-19444

@ajstorm ajstorm added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) A-multitenancy Related to multi-tenancy A-unification labels Sep 8, 2022
craig bot pushed a commit that referenced this issue Jan 4, 2023
94348: sql,storage: some preliminary changes for KV projection pushdown r=yuzefovich a=yuzefovich

This PR contains a couple of commits that are mostly mechanical changes in preparation of the KV pushdown work. Some microbenchmarks of this PR are [here](https://gist.github.com/yuzefovich/24d3238bc638cc1121fd345c68ca3d0b), and they show effectively no change in the scan speed.

Epic: CRDB-14837

Informs: #82323
Informs: #87610

Co-authored-by: Yahor Yuzefovich <[email protected]>
@exalate-issue-sync exalate-issue-sync bot added the T-multitenant Issues owned by the multi-tenant virtual team label Jan 10, 2023
craig bot pushed a commit that referenced this issue Jan 25, 2023
94438: sql,storage: add support for COL_BATCH_RESPONSE scan format r=yuzefovich a=yuzefovich

This commit introduces a new `COL_BATCH_RESPONSE` scan format for Scans
and ReverseScans which results only in needed columns to be returned
from the KV server. In other words, this commit introduces the ability
to perform the KV projection pushdown.

The main idea of this feature is to use the injected decoding logic from
SQL in order to process each KV and keep only the needed parts (i.e.
necessary SQL columns). Those needed parts are then propagated back to
the KV client as coldata.Batch'es (serialized in the Apache Arrow format).

Here is the outline of all components involved:
```
     ┌────────────────────────────────────────────────┐
     │                       SQL                      │
     │________________________________________________│
     │          colfetcher.ColBatchDirectScan         │
     │                        │                       │
     │                        ▼                       │
     │                 row.txnKVFetcher               │
     │    (behind the row.KVBatchFetcher interface)   │
     └────────────────────────────────────────────────┘
                              │
                              ▼
     ┌────────────────────────────────────────────────┐
     │                    KV Client                   │
     └────────────────────────────────────────────────┘
                              │
                              ▼
     ┌────────────────────────────────────────────────┐
     │                    KV Server                   │
     │________________________________________________│
     │           colfetcher.cFetcherWrapper           │
     │ (behind the storage.CFetcherWrapper interface) │
     │                        │                       │
     │                        ▼                       │
     │              colfetcher.cFetcher               │
     │                        │                       │
     │                        ▼                       │
     │          storage.mvccScanFetchAdapter ────────┐│
     │    (behind the storage.NextKVer interface)    ││
     │                        │                      ││
     │                        ▼                      ││
     │           storage.pebbleMVCCScanner           ││
     │ (which put's KVs into storage.singleResults) <┘│
     └────────────────────────────────────────────────┘
```
On the KV client side, `row.txnKVFetcher` issues Scans and ReverseScans with
the `COL_BATCH_RESPONSE` format and returns the response (which contains the
columnar data) to the `colfetcher.ColBatchDirectScan`.

On the KV server side, we create a `storage.CFetcherWrapper` that asks the
`colfetcher.cFetcher` for the next `coldata.Batch`. The `cFetcher`, in turn,
fetches the next KV, decodes it, and keeps only values for the needed SQL
columns, discarding the rest of the KV. The KV is emitted by the
`mvccScanFetchAdapter` which - via the `singleResults` struct - exposes
access to the current KV that the `pebbleMVCCScanner` is pointing at.

Note that there is an additional "implicit synchronization" between
components that is not shown on this diagram. In particular,
`storage.singleResults.maybeTrimPartialLastRow` must be in sync with the
`colfetcher.cFetcher` which is achieved by
- the `cFetcher` exposing access to the first key of the last incomplete SQL
  row via the `FirstKeyOfRowGetter`,
- the `singleResults` using that key as the resume key for the response,
- and the `cFetcher` removing that last partial SQL row when `NextKV()`
  returns `partialRow=true`.
This "upstream" link (although breaking the layering a bit) allows us to
avoid a performance penalty for handling the case with multiple column
families. (This case is handled by the `storage.pebbleResults` via tracking
offsets into the `pebbleResults.repr`.)

This code structure deserves some elaboration. First, there is a mismatch
between the "push" mode in which the `pebbleMVCCScanner` operates and the
"pull" mode that the `NextKVer` exposes. The adaption between two different
modes is achieved via the `mvccScanFetcherAdapter` grabbing (when the control
returns to it) the current unstable KV pair from the `singleResults` struct
which serves as a one KV pair buffer that the `pebbleMVCCScanner` `put`s into.
Second, in order be able to use the unstable KV pair without performing a
copy, the `pebbleMVCCScanner` stops at the current KV pair and returns the
control flow (which is exactly what `pebbleMVCCScanner.getOne` does) back to
the `mvccScanFetcherAdapter`, with the adapter advancing the scanner only when
the next KV pair is needed.

There are multiple scenarios which are currently not supported:
- SQL cannot issue Get requests (likely will support in 23.1)
- `TraceKV` option is not supported (likely will support in 23.1)
- user-defined types other than enums are not supported (will _not_
support in 23.1)
- non-default key locking strength as well as SKIP LOCKED wait policy
are not supported (will _not_ support in 23.1).

The usage of this feature is currently disabled by default, but I intend
to enable it by default for multi-tenant setups. The rationale is that
currently there is a large performance hit when enabling it for
single-tenant deployments whereas it offers significant speed up in the
multi-tenant world.

The microbenchmarks [show](https://gist.github.com/yuzefovich/669c295a8a4fdffa6490532284c5a719)
the expected improvement in multi-tenant setups when the tenant runs in
a separate process whenever we don't need to decode all of the columns
from the table.

The TPCH numbers, though, don't show the expected speedup:
```
Q1:	before: 11.47s	after: 8.84s	 -22.89%
Q2:	before: 0.41s	after: 0.29s	 -27.71%
Q3:	before: 7.89s	after: 9.68s	 22.63%
Q4:	before: 4.48s	after: 4.52s	 0.86%
Q5:	before: 10.39s	after: 10.35s	 -0.29%
Q6:	before: 33.57s	after: 33.41s	 -0.48%
Q7:	before: 23.82s	after: 23.81s	 -0.02%
Q8:	before: 3.78s	after: 3.76s	 -0.68%
Q9:	before: 28.15s	after: 28.03s	 -0.42%
Q10:	before: 5.00s	after: 4.98s	 -0.42%
Q11:	before: 2.44s	after: 2.44s	 0.22%
Q12:	before: 34.78s	after: 34.65s	 -0.37%
Q13:	before: 3.20s	after: 2.94s	 -8.28%
Q14:	before: 3.13s	after: 3.21s	 2.43%
Q15:	before: 16.80s	after: 16.73s	 -0.38%
Q16:	before: 1.60s	after: 1.65s	 2.96%
Q17:	before: 0.85s	after: 0.96s	 13.04%
Q18:	before: 16.39s	after: 15.47s	 -5.61%
Q19:	before: 13.76s	after: 13.01s	 -5.45%
Q20:	before: 55.33s	after: 55.12s	 -0.38%
Q21:	before: 24.31s	after: 24.31s	 -0.00%
Q22:	before: 1.28s	after: 1.41s	 10.26%
```

At the moment, `coldata.Batch` that is included into the response is
always serialized into the Arrow format, but I intend to introduce the
local fastpath to avoid that serialization. That work will be done in
a follow-up and should be able to reduce the perf hit for single-tenant
deployments.

A quick note on the TODOs sprinkled in this commit:
- `TODO(yuzefovich)` means that this will be left for 23.2 or later.
- `TODO(yuzefovich, 23.1)` means that it should be addressed in 23.1.

A quick note on testing: this commit randomizes the fact whether the new
infrastructure is used in almost all test builds. Introducing some unit
testing (say, in `storage` package) seems rather annoying since we must
create keys that are valid SQL keys (i.e. have TableID / Index ID
prefix) and need to come with the corresponding
`fetchpb.IndexFetchSpec`. Not having unit tests in the `storage` seems
ok to me given that the "meat" of the work there is still done by the
`pebbleMVCCScanner` which is exercised using the regular Scans.
End-to-end testing is well covered by all of our existing tests which
now runs randomly. I did run the CI multiple times with the new feature
enabled by default with no failure, so I hope that it shouldn't become
flaky.

Addresses: #82323.
Informs: #87610.

Epic: CRDB-14837

Release note: None

95701: gossip: Track latency by nodeID rather than addr r=kvoli,erikgrinaker a=andrewbaptist

Previously the latency to remote nodes was tracked by address rather than the node's id. This could result in a few problems. First, the remote address could be reused across nodes. This could result in incorrect information. Additionally, places that used this information (such as the allocator) needed to unnecessarily map the node id to address just to do a lookup.

Finally in preparation for dialback on heartbeat #84289 the use of the OriginAddr field in the PingRequest will change to be the actual address that a node should use to dial back. Currently this field is not set correctly.

Epic: none
Release note: None

95796: ui: add CPU Time chart do statement details r=maryliag a=maryliag

This commit adds a new chart for CPU time
on Statement Details page.

Part Of #87213

<img width="1508" alt="Screen Shot 2023-01-24 at 6 01 07 PM" src="https://user-images.githubusercontent.com/1017486/214440274-c48d3bb6-ecbe-47a2-861a-0a8407d219c4.png">


Release note (ui change): Add CPU Time chart to Statement Details page.

95832: cdc: remove 'nonsensitive' tag from changefeed description in telemetry logs r=jayshrivastava a=jayshrivastava

Previously, the description field in changefeed telemetry logs was marked as `nonsensitive`. This is incorrect because the description field may contain an SQL statement which is not safe to report. This change removes the `nonsensitive` tag so the field is redacted by default.

Fixes: #95823
Epic: none
Release note: none

95838: logictest: remove smallEngineBlocks randomization r=yuzefovich a=yuzefovich

This metamorphic randomization has caused some flakiness (due to a subset of tests taking very long time) so is now removed. This feature should be tested in a more targeted fashion.

Fixes: #95799.
Fixes: #95829

Release note: None

95840: opt: replace make with dev in test instructions r=mgartner a=mgartner

Epic: None

Release note: None

95842: roachtest: fix parameters passed to require.NoError r=yuzefovich,srosenberg,herkolategan a=renatolabs

When context is passed to an assertion, the parameters *must* be a string format, followed by arguments (as you would in a call to `fmt.Sprintf`). The previous code would panic trying to cast int to string.

Informs #95416

Release note: None

Co-authored-by: Yahor Yuzefovich <[email protected]>
Co-authored-by: Andrew Baptist <[email protected]>
Co-authored-by: maryliag <[email protected]>
Co-authored-by: Jayant Shrivastava <[email protected]>
Co-authored-by: Marcus Gartner <[email protected]>
Co-authored-by: Renato Costa <[email protected]>
craig bot pushed a commit that referenced this issue Mar 8, 2023
95033: storage,colfetcher: implement local fast-path for COL_BATCH_RESPONSE r=yuzefovich a=yuzefovich

This commit implements the local fast-path for the COL_BATCH_RESPONSE
scan format. The idea is that if a Scan request is evaluated locally
(i.e. on the same node for single-tenant deployments or within the
shared process for multi-tenant deployments), then we can avoid
the redundant serialization of the columnar batches in the Apache
Arrow format and just pass the batches as a slice of pointers through
the protobuf. Additionally, this also allows us to avoid a copy of the
data from `ScanResponse.BatchResponse` into the columnar batch.

To achieve this the ScanResponses and the ReverseScanResponses now
contain a new custom `ColBatches` message which only includes
`[]coldata.Batch` that is not marshalled as part of the protobuf
serialization.

Now that we can have a single multi-range request result in locally- and
remotely-executed single-range requests, we need to be careful when
combining them. In particular, in order to preserve the ordering between
single-range requests we now always deserialize the remotely-executed
ones (since this "combining" happens on the KV client side and won't be
sent over the wire again) while "merging" them accordingly. This
required introduction of an injected helper for the deserialization from
the Apache Arrow format into the `kvpb` package. This deserialization
also required that we have access to the `fetchpb.IndexFetchSpec` proto
that is stored in the BatchRequest, thus, the signature of `combine`
method has been adjusted to include the reference to the BatchRequest.

Additional quirk of this commit is that the `cFetcher` cannot reuse the
same batch when it is used by the `cFetcherWrapper` when skipping the
serialization. (If it did reuse batches, then the slice of batches would
contain multiple references to the same batch, so only the last reference
would be correct - all previous ones would have been reset.) To do that
the `colmem.SetAccountingHelper` has been adjusted to be able to keep
the same heuristic when it comes to the sizing of the batch while always
allocating a new one, even if under other circumstances it would have
reused the old batch.

It's also worth noting the story about memory accounting of these local
batches. The `SetAccountingHelper` used by the `cFetcher` always tracks
the memory usage only of the last batch, so we need to account for all
other batches ourselves. We go around this by providing the `cFetcher`
with a "detached" memory account (i.e. an account that is not connected
to the memory accounting system) that is used by the `cFetcher` to limit
the batch size based on the footprint, and modifying the
`cFetcherWrapper` to perform the accounting against the proper memory
account. This commit also clarifies the contract of
`CFetcherWrapper.NextBatch` that it is the wrapper's responsibility to
perform memory accounting of all batches, regardless of the return
format, against the provided memory account.

This only covers part of the story from the KV server side. On the KV
client side the memory accounting is done in `txnKVFetcher`. When the
batches are serialized, they are included in
`ScanResponse.BatchResponse` field and, thus, are included into
`BatchResponse.Size` which we use for accounting. For the non-serialized
batches this commit implements the custom `Size()` method so that the
true footprint of all `coldata.Batch`es is included into
`BatchResponse.Size`. As a result, all local batches (including the ones
that were deserialized when combining responses to locally- and
remotely-executed requests) are tracked by the `txnKVFetcher` until a
new `BatchRequest` is issued, so the ColBatchDirectScan doesn't need to
perform the accounting. (Note that we perform the accounting for
`ScanResponse.BatchResponse` slices in a similar manner - we don't
shrink the memory account when a single response becomes garbage (due
to likely under-accounting in other places).)

A special note on type schemas with enums: since enums require type
hydration that is not easily available on the KV server side and we
treat them simply as bytes values, the presence of enums forces us to
serialize the batches even for locally-executed requests. This seems
like a minor limitation in comparison to not supporting enums at all.

Another note on the datum-backed vectors: since the `cFetcherWrapper`
also doesn't have access to a valid `eval.Context`, the datum-backed
vectors produced by the wrapper are "incomplete". Previously, since we
always serialized the batches, it wasn't an issue. However, now if we
get a non-serialized batch from the locally-executed request, we must
update all datum-backed vectors with the proper eval context. This is
done by the `ColBatchDirectScan`.

The microbenchmarks of this change when the direct columnar scans are
always enabled are [here](https://gist.github.com/yuzefovich/a9b28669f35ff658b2e89ed7b1d43e38).
Note that there are three distinct operation modes in that gist:
- `Cockroach` and `MultinodeCockroach` - single-tenant deployments
- `SharedProcessTenant` - this is how we imagine that dedicated clusters
will run once the Unified Architecture is achieved
- `SepProcessTenant` - this is how we run Serverless.

For the first two this commit results mostly in a minor improvement in
latency and sometimes noticeable reducation in allocations, as expected.
SepProcessTenant config - which cannot take advantage of the local
fastpath - sees a minor slowdown in latency and no changes in
allocations, as expected (I'm attributing this to increased overhead of
the direct columnar scans and increased size of `ScanResponse` objects).

However, these are micro-benchmarks, and they don't show the full
picture. In particular, they don't process enough data and often select
all columns in the table for this feature to show its benefits. I'm more
excited about the results on the TPC-H queries. Here is the impact of
this commit on 3 node cluster running in single-tenant model (averaged
over 10 runs):
```
Q1:	before: 4.46s	after: 4.23s	 -5.15%
Q2:	before: 3.18s	after: 3.30s	 3.45%
Q3:	before: 2.43s	after: 2.11s	 -13.20%
Q4:	before: 1.83s	after: 1.84s	 0.44%
Q5:	before: 2.65s	after: 2.48s	 -6.34%
Q6:	before: 7.59s	after: 7.46s	 -1.65%
Q7:	before: 5.56s	after: 5.72s	 2.71%
Q8:	before: 1.14s	after: 1.11s	 -2.29%
Q9:	before: 5.77s	after: 5.31s	 -7.86%
Q10:	before: 1.98s	after: 1.94s	 -1.92%
Q11:	before: 0.73s	after: 0.69s	 -5.52%
Q12:	before: 7.18s	after: 6.91s	 -3.79%
Q13:	before: 1.24s	after: 1.24s	 0.16%
Q14:	before: 0.70s	after: 0.66s	 -5.32%
Q15:	before: 3.99s	after: 3.64s	 -8.89%
Q16:	before: 0.95s	after: 0.94s	 -1.16%
Q17:	before: 0.27s	after: 0.26s	 -5.49%
Q18:	before: 2.67s	after: 2.15s	 -19.39%
Q19:	before: 4.03s	after: 2.96s	 -26.46%
Q20:	before: 12.91s	after: 11.49s	 -10.98%
Q21:	before: 7.14s	after: 6.99s	 -2.13%
Q22:	before: 0.60s	after: 0.57s	 -5.48%
```

Furthermore, here is the comparison of the direct columnar scans
disabled vs enabled:
```
Q1:	before: 4.36s	after: 4.23s	 -2.91%
Q2:	before: 3.57s	after: 3.30s	 -7.63%
Q3:	before: 2.31s	after: 2.11s	 -8.61%
Q4:	before: 1.88s	after: 1.84s	 -2.07%
Q5:	before: 2.55s	after: 2.48s	 -2.70%
Q6:	before: 7.94s	after: 7.46s	 -6.04%
Q7:	before: 5.87s	after: 5.72s	 -2.61%
Q8:	before: 1.12s	after: 1.11s	 -1.07%
Q9:	before: 5.79s	after: 5.31s	 -8.27%
Q10:	before: 1.97s	after: 1.94s	 -1.47%
Q11:	before: 0.69s	after: 0.69s	 -0.29%
Q12:	before: 6.99s	after: 6.91s	 -1.16%
Q13:	before: 1.24s	after: 1.24s	 -0.48%
Q14:	before: 0.68s	after: 0.66s	 -3.37%
Q15:	before: 3.72s	after: 3.64s	 -2.23%
Q16:	before: 0.96s	after: 0.94s	 -1.88%
Q17:	before: 0.28s	after: 0.26s	 -6.18%
Q18:	before: 2.47s	after: 2.15s	 -12.87%
Q19:	before: 3.20s	after: 2.96s	 -7.35%
Q20:	before: 11.71s	after: 11.49s	 -1.88%
Q21:	before: 7.00s	after: 6.99s	 -0.06%
Q22:	before: 0.58s	after: 0.57s	 -2.07%
```
In other words, on TPC-H queries it is now already beneficial to
enable the direct columnar scans in single-tenant world (and I think
there are more minor optimizations ahead). For reference, [here](https://gist.github.com/yuzefovich/0afce5c0692713cf28712f076bab415b)
is the comparison of direct columnar scans disabled vs enabled on this
commit. It also shows that we're not that far off from reaching the
performance parity in micro-benchmarks.

Addresses: #82323.
Informs: #87610.

Epic: CRDB-14837

Release note: None

Co-authored-by: Yahor Yuzefovich <[email protected]>
@knz knz added the GA-blocker label Apr 2, 2023
@blathers-crl
Copy link

blathers-crl bot commented Apr 2, 2023

Hi @knz, please add branch-* labels to identify which branch(es) this release-blocker affects.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@knz knz added branch-master Failures and bugs on the master branch. branch-release-23.1 Used to mark GA and release blockers, technical advisories, and bugs for 23.1 labels Apr 2, 2023
@ecwall ecwall added T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions) and removed T-multitenant Issues owned by the multi-tenant virtual team labels Apr 3, 2023
@ecwall
Copy link
Contributor

ecwall commented Apr 5, 2023

Workload Results:
workload run tcph output: system-tenant.txt shared-tenant.txt
CSV: https://docs.google.com/spreadsheets/d/13aOen7pJ9oOtnBued82WVUX1TBkKquuGF7OsLTmsff8/edit?usp=sharing

Setup:
Checkout branch #100634
Apply diff to fix #100654

diff --git a/pkg/server/server_controller_new_server.go b/pkg/server/server_controller_new_server.go
index 838cf7027a2..7fac12964ca 100644
--- a/pkg/server/server_controller_new_server.go
+++ b/pkg/server/server_controller_new_server.go
@@ -323,7 +323,7 @@ func makeSharedProcessTenantServerConfig(
        // see: https://github.com/cockroachdb/cockroach/issues/84588.
        useStore := kvServerCfg.SQLConfig.TempStorageConfig.Spec
        tempStorageCfg := base.TempStorageConfigFromEnv(
-               ctx, st, useStore, "" /* parentDir */, kvServerCfg.SQLConfig.TempStorageConfig.Mon.MaximumBytes())
+               ctx, st, useStore, "" /* parentDir */, base.DefaultTempStorageMaxSizeBytes)
        // TODO(knz): Make tempDir configurable.
        tempDir := useStore.Path
        if tempStorageCfg.Path, err = fs.CreateTempDir(tempDir, TempDirPrefix, stopper); err != nil {
# create roachprod cluster
roachprod create $CLUSTER -n 4
roachprod setup-ssh $CLUSTER

# setup cockroach
./dev build short --cross
roachprod put $CLUSTER:1-3 artifacts/cockroach-short cockroach 
roachprod start $CLUSTER:1-3

# setup workload
./dev build workload --cross
roachprod put $CLUSTER:4 artifacts/workload

# system tenant
roachprod ssh $CLUSTER:4 "./workload init tpch {pgurl:1}"
roachprod ssh $CLUSTER:4 "./workload run tpch {pgurl:1-3} --duration=30m" > system-tenant.txt 2>&1 
awk 'BEGIN{print "Line,Query,NumRows,Seconds"}{if ($13 == "seconds") print $6","$7","$9","$12}' system-tenant.txt > system-tenant.csv

# shared tenant
roachprod ssh $CLUSTER:4 "./workload init tpch {pgurl:1} --tenant-type=shared"
roachprod ssh $CLUSTER:4 "./workload run tpch {pgurl:1-3} --duration=30m --tenant-type=shared" > shared-tenant.txt 2>&1
awk 'BEGIN{print "Line,Query,NumRows,Seconds"}{if ($13 == "seconds") print $6","$7","$9","$12}' shared-tenant.txt > shared-tenant.csv

@rafiss rafiss removed the T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions) label Apr 6, 2023
@rafiss rafiss added the T-multitenant Issues owned by the multi-tenant virtual team label Apr 6, 2023
@knz
Copy link
Contributor

knz commented Apr 6, 2023

Here are the benchstat results for the measurements above:

        │ system-bench.log │            shared-bench.log             │
        │       s/op       │     s/op      vs base                   │
Q1             9.575 ± 11%    7.800 ± 16%  -18.54% (p=0.007 n=68+64)
Q2             7.090 ±  7%   11.230 ±  8%  +58.39% (p=0.000 n=68+63)
Q3             6.150 ±  9%    6.720 ±  8%   +9.27% (p=0.021 n=67+63)
Q4             5.150 ±  9%    3.220 ±  7%  -37.48% (p=0.000 n=67+63)
Q5             6.420 ± 12%    7.050 ± 11%        ~ (p=0.077 n=67+63)
Q6             14.53 ±  8%    14.69 ±  8%        ~ (p=0.893 n=66+63)
Q7             16.98 ±  7%    19.68 ±  6%  +15.94% (p=0.003 n=66+61)
Q8             3.015 ± 10%    3.110 ±  8%        ~ (p=0.480 n=66+61)
Q9             23.58 ±  5%    16.81 ± 14%  -28.71% (p=0.000 n=64+61)
Q10            5.465 ±  8%    4.320 ±  8%  -20.95% (p=0.000 n=64+61)
Q11            2.690 ± 10%    2.520 ± 16%        ~ (p=0.949 n=64+61)
Q12            15.77 ±  9%    10.52 ±  9%  -33.32% (p=0.000 n=64+60)
Q13            3.370 ±  8%    3.820 ± 12%  +13.35% (p=0.007 n=64+60)
Q14            1.530 ± 13%    1.375 ± 11%        ~ (p=0.166 n=64+60)
Q15            31.70 ± 14%    45.72 ± 27%  +44.23% (p=0.000 n=62+59)
Q16            2.045 ± 11%    2.500 ±  6%  +22.25% (p=0.000 n=62+59)
Q17            2.395 ±  9%    2.470 ± 17%        ~ (p=0.194 n=62+59)
Q18            6.680 ±  9%    6.665 ±  8%        ~ (p=0.753 n=62+58)
Q19            1.455 ± 11%    1.055 ± 12%  -27.49% (p=0.000 n=62+58)
Q20            24.74 ±  7%    33.14 ±  6%  +33.95% (p=0.000 n=61+57)
Q21            16.63 ± 11%    17.49 ±  4%        ~ (p=0.431 n=60+56)
Q22            1.650 ± 12%    1.955 ± 16%  +18.48% (p=0.006 n=60+56)
geomean        6.226          6.202         -0.37%

@knz
Copy link
Contributor

knz commented Apr 6, 2023

Yahor suggests to re-run with 1 node cluster, and use BACKUP/RESTORE to fix the data.

@ecwall
Copy link
Contributor

ecwall commented Apr 6, 2023

Results:

benchstat system-benchstat-input.txt shared-benchstat-input.txt                                              
name   old s/op   new s/op    delta
[q1]   31.3 ±31%   30.4 ±26%    ~     (p=0.534 n=35+35)
[q2]   17.7 ±23%   18.3 ±30%    ~     (p=0.205 n=34+35)
[q3]   20.8 ±25%   20.4 ±38%    ~     (p=0.935 n=33+34)
[q4]   7.65 ±21%   7.98 ±18%    ~     (p=0.118 n=34+33)
[q5]   15.2 ±22%   14.8 ±21%    ~     (p=0.530 n=33+33)
[q6]   30.0 ±42%   28.7 ±42%    ~     (p=0.134 n=33+34)
[q7]   33.1 ±15%   32.3 ±20%    ~     (p=0.379 n=32+33)
[q8]   5.00 ±19%   4.97 ±18%    ~     (p=0.717 n=29+31)
[q9]   44.7 ±12%   41.9 ±16%  -6.24%  (p=0.001 n=31+33)
[q10]  10.4 ±10%   10.5 ±31%    ~     (p=0.550 n=31+33)
[q11]  4.74 ±42%   4.56 ±32%    ~     (p=0.383 n=32+32)
[q12]  32.3 ±47%   30.7 ±44%    ~     (p=0.136 n=32+32)
[q13]  9.17 ±39%   8.84 ±27%    ~     (p=0.617 n=31+31)
[q14]  3.17 ±26%   3.29 ±34%    ~     (p=0.343 n=31+32)
[q15]  50.6 ±59%  49.4 ±128%    ~     (p=0.244 n=24+25)
[q16]  4.99 ±46%   4.63 ±55%    ~     (p=0.178 n=28+30)
[q17]  1.67 ±27%   1.71 ±41%    ~     (p=0.651 n=26+30)
[q18]  23.8 ±17%   22.8 ±31%    ~     (p=0.337 n=26+27)
[q19]  2.33 ±39%   2.14 ±42%    ~     (p=0.079 n=27+29)
[q20]  61.5 ±20%   58.7 ±26%    ~     (p=0.216 n=26+28)
[q21]  29.7 ±19%   28.6 ±19%    ~     (p=0.364 n=27+27)
[q22]  3.05 ±29%   2.74 ±39%  -9.99%  (p=0.011 n=26+28)

system-workload-output.txt
system-benchstat-input.txt

shared-workload-output.txt
shared-benchstat-input.txt

Setup:

# create roachprod cluster
roachprod create $CLUSTER -n 2
roachprod setup-ssh $CLUSTER

# setup cockroach
./dev build short --cross
roachprod put $CLUSTER:1 artifacts/cockroach-short cockroach 
roachprod start $CLUSTER:1

# setup workload
./dev build workload --cross
roachprod put $CLUSTER:2 artifacts/workload

# system tenant
roachprod ssh $CLUSTER:2 "./workload init tpch {pgurl:1}"
roachprod ssh $CLUSTER:2 "./workload run tpch {pgurl:1} --duration=30m" > system-workload-output.txt 2>&1 
awk '{if ($13 == "seconds") printf "Benchmark%s 1 %s s/op\n", $7, $12}' system-workload-output.txt > system-benchstat-input.txt

# shared tenant
roachprod sql $CLUSTER:1
> BACKUP INTO 'nodelocal://1/backup.out';
> CREATE TENANT "secondary-tenant";
> ALTER TENANT "secondary-tenant" START SERVICE SHARED;
> \connect postgres://root@localhost:26257/cluster:secondary-tenant/
> SET sql_safe_updates = false;
> RESTORE FROM LATEST IN 'nodelocal://1/backup.out';
roachprod ssh $CLUSTER:2 "./workload run tpch {pgurl:1} --duration=30m --tenant-type=shared" > shared-workload-output.txt 2>&1
awk '{if ($13 == "seconds") printf "Benchmark%s 1 %s s/op\n", $7, $12}' shared-workload-output.txt > shared-benchstat-input.txt

@knz
Copy link
Contributor

knz commented Apr 10, 2023

The results are super noisy, but the average latency and error bars are the same in both deployment styles, so we can consider this a wash.

In any case, we're observing here:

  • that there is no significant performance drawback.
  • that the performance doesn't tank after a while due to rate restrictions and whatnot.

This is as much as we wanted to validate in v23.1. Through the v23.2 cycle, we can evaluate more specific workloads.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-multitenancy Related to multi-tenancy A-unification branch-master Failures and bugs on the master branch. branch-release-23.1 Used to mark GA and release blockers, technical advisories, and bugs for 23.1 C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) GA-blocker T-multitenant Issues owned by the multi-tenant virtual team
Projects
None yet
Development

No branches or pull requests

4 participants