Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
53911: roachtest: fix tpchvec/perf in some cases r=yuzefovich a=yuzefovich

Previously, whenever a slowness threshold is exceeded, we would run
EXPLAIN ANALYZE (DEBUG) on the query at fault using the same connection
for all cluster setups. This would result in running the query only with
vectorize=on which is not what we want. We will now be opening up new
connections to the database after the cluster settings have been updated
in order to get the correct behavior for each of the setups.

Informs: #53884.

Release note: None

54072: server: introduce Test{ClusterConnectivity,JoinVersionGate} r=irfansharif a=irfansharif

Re-adding a few tests that I didn't check in while working on #52526.

---

TestClusterConnectivity sets up an uninitialized cluster with custom join
flags (individual nodes point to specific others, constructed using
connectivity matrices). Included in the test configurations are the more
"standard" ones where every node points to n1, or where there are
bidirectional links set up between nodes. It tests various topologies
and ensures that cluster IDs are distributed throughout connected
components, and store/node IDs are allocated in the way we'd expect them
to be.

TestJoinVersionGate checks to see that improperly versioned cockroach
nodes are not able to join a cluster.

Release note: None


54159: opt: reduce allocations in Implicator and JoinOrderBuilder r=RaduBerinde a=mgartner

#### partialidx: lazily initialize the implicator constraint cache

This commit makes the constructing of the Implicator's constrain cache
map lazy. Instead of making the map in the `Init` method, it is made
only when it is needed.

This fixes performance regressions in microbenchmarks.

    name                                  old time/op  new time/op  delta
    Phases/kv-read-const/Normalize-16      170ns ± 0%   139ns ± 0%  -18.66%  (p=0.008 n=5+5)
    Phases/kv-read-const/OptBuild-16       173ns ± 1%   141ns ± 1%  -18.38%  (p=0.008 n=5+5)
    Phases/kv-read-const/Explore-16        183ns ± 1%   151ns ± 0%  -17.70%  (p=0.008 n=5+5)
    Phases/kv-read-const/ExecBuild-16      674ns ± 2%   636ns ± 1%   -5.58%  (p=0.008 n=5+5)
    Phases/kv-read/ExecBuild-16           13.3µs ± 1%  13.1µs ± 1%   -1.78%  (p=0.008 n=5+5)
    Phases/tpcc-delivery/ExecBuild-16     29.4µs ± 1%  28.9µs ± 0%   -1.64%  (p=0.016 n=5+4)
    Phases/tpcc-stock-level/Explore-16     153µs ± 1%   151µs ± 1%   -1.24%  (p=0.008 n=5+5)
    Phases/kv-read-no-prep/Explore-16     42.3µs ± 1%  41.9µs ± 0%   -0.85%  (p=0.008 n=5+5)
    Phases/tpcc-delivery/OptBuild-16      12.1µs ± 1%  12.1µs ± 0%   -0.65%  (p=0.032 n=5+5)
    Phases/kv-read/Parse-16               2.49ns ± 1%  2.48ns ± 0%     ~     (p=0.413 n=5+4)
    Phases/kv-read/OptBuild-16            6.46µs ± 2%  6.44µs ± 1%     ~     (p=0.841 n=5+5)
    Phases/kv-read/Normalize-16           6.48µs ± 3%  6.48µs ± 2%     ~     (p=1.000 n=5+5)
    Phases/kv-read/Explore-16             12.1µs ± 1%  11.9µs ± 2%     ~     (p=0.056 n=5+5)
    Phases/kv-read-no-prep/OptBuild-16    32.4µs ± 3%  32.2µs ± 1%     ~     (p=1.000 n=5+5)
    Phases/kv-read-no-prep/Normalize-16   34.9µs ± 1%  34.7µs ± 2%     ~     (p=0.222 n=5+5)
    Phases/kv-read-no-prep/ExecBuild-16   44.0µs ± 3%  43.7µs ± 1%     ~     (p=0.310 n=5+5)
    Phases/kv-read-const/Parse-16         2.60ns ± 1%  2.60ns ± 1%     ~     (p=0.643 n=5+5)
    Phases/tpcc-new-order/Normalize-16    12.8µs ± 1%  13.9µs ±10%     ~     (p=0.690 n=5+5)
    Phases/tpcc-new-order/ExecBuild-16    38.5µs ± 1%  38.6µs ± 2%     ~     (p=1.000 n=5+5)
    Phases/tpcc-delivery/Parse-16         2.55ns ± 1%  2.56ns ± 1%     ~     (p=0.921 n=5+5)
    Phases/tpcc-delivery/Normalize-16     12.6µs ± 2%  12.5µs ± 1%     ~     (p=0.151 n=5+5)
    Phases/tpcc-delivery/Explore-16       28.0µs ± 2%  27.6µs ± 1%     ~     (p=0.063 n=5+5)
    Phases/tpcc-stock-level/OptBuild-16   52.8µs ± 1%  52.9µs ± 1%     ~     (p=1.000 n=5+5)
    Phases/tpcc-stock-level/Normalize-16  53.7µs ± 1%  53.7µs ± 1%     ~     (p=1.000 n=5+5)
    Phases/tpcc-stock-level/ExecBuild-16   160µs ± 1%   161µs ± 1%     ~     (p=0.222 n=5+5)
    Phases/tpcc-new-order/Explore-16      36.2µs ± 0%  36.7µs ± 1%   +1.47%  (p=0.016 n=4+5)
    Phases/kv-read-no-prep/Parse-16       12.6µs ± 1%  12.8µs ± 1%   +1.58%  (p=0.016 n=4+5)
    Phases/tpcc-new-order/Parse-16        2.58ns ± 1%  2.67ns ± 7%   +3.33%  (p=0.048 n=5+5)
    Phases/tpcc-stock-level/Parse-16      2.52ns ± 1%  2.63ns ± 4%   +4.29%  (p=0.016 n=5+5)
    Phases/tpcc-new-order/OptBuild-16     13.0µs ± 1%  14.7µs ± 2%  +12.50%  (p=0.008 n=5+5)

Release note: None

#### opt: embed JoinOrderBuilder in Optimizer

This commit embeds the JoinOrderBuilder in the Optimizer struct. This
avoids allocating a new JoinOrderBuilder on the heap every time
`Optimizer.Init` is called.

BenchmarkPhase microbenchmarks comparing the current master branch to
this commit:

    name                                  old time/op  new time/op  delta
    Phases/kv-read-const/Normalize-16      171ns ± 1%    77ns ± 1%  -55.11%  (p=0.008 n=5+5)
    Phases/kv-read-const/OptBuild-16       175ns ± 1%    79ns ± 2%  -54.83%  (p=0.008 n=5+5)
    Phases/kv-read-const/Explore-16        189ns ± 7%    89ns ± 0%  -53.14%  (p=0.008 n=5+5)
    Phases/tpcc-new-order/OptBuild-16     16.6µs ± 6%  12.7µs ± 2%  -23.47%  (p=0.008 n=5+5)
    Phases/tpcc-new-order/Normalize-16    14.7µs ± 8%  12.4µs ± 1%  -15.39%  (p=0.008 n=5+5)
    Phases/kv-read-const/ExecBuild-16      687ns ± 4%   592ns ± 1%  -13.89%  (p=0.008 n=5+5)
    Phases/tpcc-new-order/Parse-16        2.94ns ± 8%  2.58ns ± 2%  -12.23%  (p=0.008 n=5+5)
    Phases/tpcc-delivery/Parse-16         2.86ns ± 8%  2.54ns ± 1%  -11.33%  (p=0.008 n=5+5)
    Phases/tpcc-new-order/ExecBuild-16    43.0µs ±10%  38.6µs ± 0%  -10.21%  (p=0.016 n=5+4)
    Phases/tpcc-new-order/Explore-16      39.0µs ± 3%  36.6µs ± 0%   -6.12%  (p=0.008 n=5+5)
    Phases/tpcc-delivery/OptBuild-16      12.4µs ± 1%  12.0µs ± 0%   -3.49%  (p=0.008 n=5+5)
    Phases/kv-read/ExecBuild-16           13.4µs ± 0%  12.9µs ± 0%   -3.37%  (p=0.008 n=5+5)
    Phases/kv-read/Explore-16             12.1µs ± 1%  11.7µs ± 2%   -3.04%  (p=0.008 n=5+5)
    Phases/tpcc-delivery/Normalize-16     12.8µs ± 1%  12.4µs ± 1%   -3.00%  (p=0.008 n=5+5)
    Phases/kv-read-no-prep/ExecBuild-16   44.1µs ± 2%  43.2µs ± 1%   -2.16%  (p=0.008 n=5+5)
    Phases/kv-read/Normalize-16           6.48µs ± 1%  6.36µs ± 1%   -1.73%  (p=0.008 n=5+5)
    Phases/kv-read-no-prep/Explore-16     42.6µs ± 1%  41.9µs ± 1%   -1.55%  (p=0.032 n=5+5)
    Phases/tpcc-delivery/ExecBuild-16     29.5µs ± 1%  29.1µs ± 0%   -1.41%  (p=0.008 n=5+5)
    Phases/kv-read-no-prep/OptBuild-16    32.2µs ± 0%  31.7µs ± 1%   -1.34%  (p=0.008 n=5+5)
    Phases/kv-read-no-prep/Normalize-16   34.8µs ± 1%  34.4µs ± 1%   -1.25%  (p=0.016 n=5+4)
    Phases/tpcc-stock-level/OptBuild-16   53.1µs ± 0%  52.7µs ± 0%   -0.83%  (p=0.016 n=5+5)
    Phases/kv-read/Parse-16               2.49ns ± 0%  2.49ns ± 1%     ~     (p=0.952 n=4+5)
    Phases/kv-read/OptBuild-16            6.48µs ± 0%  6.44µs ± 1%     ~     (p=0.310 n=5+5)
    Phases/kv-read-no-prep/Parse-16       12.7µs ± 2%  12.7µs ± 1%     ~     (p=0.508 n=5+5)
    Phases/kv-read-const/Parse-16         2.59ns ± 1%  2.59ns ± 1%     ~     (p=0.857 n=5+5)
    Phases/tpcc-delivery/Explore-16       27.9µs ± 0%  27.9µs ± 2%     ~     (p=0.310 n=5+5)
    Phases/tpcc-stock-level/Parse-16      2.57ns ± 2%  2.54ns ± 3%     ~     (p=0.254 n=5+5)
    Phases/tpcc-stock-level/Normalize-16  53.8µs ± 0%  53.5µs ± 2%     ~     (p=0.151 n=5+5)
    Phases/tpcc-stock-level/Explore-16     153µs ± 0%   152µs ± 3%     ~     (p=0.151 n=5+5)
    Phases/tpcc-stock-level/ExecBuild-16   162µs ± 1%   161µs ± 1%     ~     (p=0.095 n=5+5)

Release note: None

Co-authored-by: Yahor Yuzefovich <[email protected]>
Co-authored-by: irfan sharif <[email protected]>
Co-authored-by: Marcus Gartner <[email protected]>
  • Loading branch information
4 people committed Sep 10, 2020
4 parents 6227db0 + 53e9ebf + 78b5535 + 6949697 commit 8547a97
Show file tree
Hide file tree
Showing 9 changed files with 435 additions and 47 deletions.
8 changes: 8 additions & 0 deletions pkg/base/test_server_args.go
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ package base

import (
"context"
"net"
"time"

"github.com/cockroachdb/cockroach/pkg/roachpb"
Expand Down Expand Up @@ -44,6 +45,13 @@ type TestServerArgs struct {
// is always set to true when the server is started via a TestCluster.
PartOfCluster bool

// Listener (if nonempty) is the listener to use for all incoming RPCs.
// If a listener is installed, it informs the RPC `Addr` used below. The
// Server itself knows to close it out. This is useful for when a test wants
// manual control over how the join flags (`JoinAddr`) are populated, and
// installs listeners manually to know which addresses to point to.
Listener net.Listener

// Addr (if nonempty) is the RPC address to use for the test server.
Addr string
// SQLAddr (if nonempty) is the SQL address to use for the test server.
Expand Down
11 changes: 10 additions & 1 deletion pkg/cmd/roachtest/tpchvec.go
Original file line number Diff line number Diff line change
Expand Up @@ -331,9 +331,18 @@ func (p *tpchVecPerfTest) postTestRunHook(
// "catch" the slowness).
for setupIdx, setup := range runConfig.clusterSetups {
performClusterSetup(t, conn, setup)
// performClusterSetup has changed the cluster settings;
// however, the session variables might contain the old values,
// so we will open up new connections for each of the setups in
// order to get the correct cluster setup on each.
tempConn := c.Conn(ctx, 1)
defer tempConn.Close()
if _, err := tempConn.Exec("USE tpch;"); err != nil {
t.Fatal(err)
}
for i := 0; i < runConfig.numRunsPerQuery; i++ {
t.Status(fmt.Sprintf("\nRunning EXPLAIN ANALYZE (DEBUG) for setup=%s\n", runConfig.setupNames[setupIdx]))
rows, err := conn.Query(fmt.Sprintf(
rows, err := tempConn.Query(fmt.Sprintf(
"EXPLAIN ANALYZE (DEBUG) %s;", tpch.QueriesByNumber[queryNum],
))
if err != nil {
Expand Down
334 changes: 334 additions & 0 deletions pkg/server/connectivity_test.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,334 @@
// Copyright 2016 The Cockroach Authors.
//
// Use of this software is governed by the Business Source License
// included in the file licenses/BSL.txt.
//
// As of the Change Date specified in that file, in accordance with
// the Business Source License, use of this software will be governed
// by the Apache License, Version 2.0, included in the file
// licenses/APL.txt.

package server_test

import (
"context"
"fmt"
"net"
"sort"
"sync"
"testing"

"github.com/cockroachdb/cockroach/pkg/base"
"github.com/cockroachdb/cockroach/pkg/clusterversion"
"github.com/cockroachdb/cockroach/pkg/roachpb"
"github.com/cockroachdb/cockroach/pkg/server"
"github.com/cockroachdb/cockroach/pkg/server/serverpb"
"github.com/cockroachdb/cockroach/pkg/testutils"
"github.com/cockroachdb/cockroach/pkg/testutils/testcluster"
"github.com/cockroachdb/cockroach/pkg/util/leaktest"
"github.com/cockroachdb/cockroach/pkg/util/log"
"github.com/cockroachdb/cockroach/pkg/util/uuid"
"github.com/cockroachdb/errors"
"google.golang.org/grpc"
)

// TestClusterConnectivity sets up an uninitialized cluster with custom join
// flags (individual nodes point to specific others, instead of all pointing to
// n1), and tests that the cluster/node IDs are distributed correctly
// throughout.
func TestClusterConnectivity(t *testing.T) {
defer leaktest.AfterTest(t)()
defer log.Scope(t).Close(t)

// TODO(irfansharif): Teach TestServer to accept a list of join addresses
// instead of just one.

var testConfigurations = []struct {
// bootstrapNode controls which node is `cockroach init`-ialized.
// Everything is 0-indexed.
bootstrapNode int

// joinConfig[i] returns the node the i-th node is pointing to through
// its join flags. Everything is 0-indexed.
joinConfig []int
}{
// 0. Every node points to the first, including the first.
{0, []int{0, 0, 0, 0, 0}},

// 1. Every node points to the previous, except the first, which points to
// itself.
//
// 0 <-- 1 <-- 2 <-- 3 <-- 4
{0, []int{0, 0, 1, 2, 3}},

// 2. Same as previous, but a few links switched around.
//
// 0 <-- 2 <-- 1 <-- 3 <-- 4
{0, []int{0, 2, 0, 1, 3}},

// 3. Introduce a bidirectional link.
//
// 1 <-> 2 <-- 0 <-- 3
// 1 <-- 4
{1, []int{2, 2, 1, 0, 1}},

// 4. Same as above, but bootstrap the other node in the bidirectional
// link.
//
// 1 <-> 2 <-- 0 <-- 3
// 1 <-- 4
{2, []int{2, 2, 1, 0, 1}},

// 5. Another topology centered around node 1, which itself is pointed
// to node 0.
//
// 0 <-> 1 <-- 2
// 1 <-- 3
{0, []int{1, 0, 1, 1}},

// 6. Same as above, but bootstrapping the centered node directly.
//
// 0 <-> 1 <-- 2
// 1 <-- 3
{1, []int{1, 0, 1, 1}},

// TODO(irfansharif): We would really like to be able to set up test
// clusters that are only partially connected, and assert that only
// nodes that are supposed to find out about bootstrap, actually do.
// Something like:
//
// 0 <-> 1 <-- 2
// 5 <-- 4 <-- 3 <-- 5
//
// A version of this was originally prototyped in #52526 but the changes
// required in Test{Cluster,Server} were too invasive to justify at the
// time.
}

// getListener is a short hand to allocate a listener to an unbounded port.
getListener := func() net.Listener {
t.Helper()

listener, err := net.Listen("tcp", "127.0.0.1:0")
if err != nil {
t.Fatal(err)
}
return listener
}
baseServerArgs := base.TestServerArgs{
// We're going to manually control initialization in this test.
NoAutoInitializeCluster: true,
StoreSpecs: []base.StoreSpec{{InMemory: true}},
}

for i, test := range testConfigurations {
t.Run(fmt.Sprintf("topology=%d", i), func(t *testing.T) {
numNodes := len(test.joinConfig)
var serverArgsPerNode = make(map[int]base.TestServerArgs)

// We start off with installing a listener for each server. We
// pre-bind a listener so the kernel can go ahead and assign an
// address for us. We'll later use this address to populate join
// flags for neighboring nodes.
var listeners = make([]net.Listener, numNodes)
for i := 0; i < numNodes; i++ {
listener := getListener()

serverArg := baseServerArgs
serverArg.Listener = listener
serverArg.Addr = listener.Addr().String()
serverArgsPerNode[i] = serverArg
listeners[i] = listener
}

// We'll annotate the server args with the right join flags.
for i := 0; i < numNodes; i++ {
joinNode := test.joinConfig[i]
joinAddr := listeners[joinNode].Addr().String()

serverArg := serverArgsPerNode[i]
serverArg.JoinAddr = joinAddr
serverArgsPerNode[i] = serverArg
}

tcArgs := base.TestClusterArgs{
// Saves time in this test.
ReplicationMode: base.ReplicationManual,
ServerArgsPerNode: serverArgsPerNode,

// We have to start servers in parallel because we're looking to
// bootstrap the cluster manually in a separate thread. Each
// individual Server.Start is a blocking call (it waits for
// init). We want to start all of them in parallel to simulate a
// bunch of servers each waiting for init.
ParallelStart: true,
}

// The test structure here is a bit convoluted, but necessary given
// the current implementation of TestCluster. TestCluster.Start
// wants to wait for all the nodes in the test cluster to be fully
// initialized before returning. Given we're testing initialization
// behavior, we do all the real work in a separate thread and keep
// the main thread limited to simply starting and stopping the test
// cluster.
//
// NB: That aside, TestCluster very much wants to live on the main
// goroutine running the test. That's mostly to do with its internal
// error handling and the limitations imposed by
// https://golang.org/pkg/testing/#T.FailNow (which sits underneath
// t.Fatal).

tc := testcluster.NewTestCluster(t, numNodes, tcArgs)

var wg sync.WaitGroup
wg.Add(1)
go func() {
defer wg.Done()

// Attempt to bootstrap the cluster through the configured node.
bootstrapNode := test.bootstrapNode
testutils.SucceedsSoon(t, func() (e error) {
ctx := context.Background()
serv := tc.Server(bootstrapNode)

dialOpts, err := tc.Server(bootstrapNode).RPCContext().GRPCDialOptions()
if err != nil {
return err
}

conn, err := grpc.DialContext(ctx, serv.ServingRPCAddr(), dialOpts...)
if err != nil {
return err
}
defer func() {
_ = conn.Close()
}()

client := serverpb.NewInitClient(conn)
_, err = client.Bootstrap(context.Background(), &serverpb.BootstrapRequest{})
return err
})

// Wait to get a real cluster ID (doesn't always get populated
// right after bootstrap).
testutils.SucceedsSoon(t, func() error {
clusterID := tc.Server(bootstrapNode).ClusterID()
if clusterID.Equal(uuid.UUID{}) {
return errors.New("cluster ID still not recorded")
}
return nil
})

clusterID := tc.Server(bootstrapNode).ClusterID()
testutils.SucceedsSoon(t, func() error {
var nodeIDs []roachpb.NodeID
var storeIDs []roachpb.StoreID

// Sanity check that all the nodes we expect to join this
// network actually do (by checking they discover the right
// cluster ID). Also collect node/store IDs for below.
for i := 0; i < numNodes; i++ {
if got := tc.Server(i).ClusterID(); got != clusterID {
return errors.Newf("mismatched cluster IDs; %s (for node %d) != %s (for node %d)",
clusterID.String(), bootstrapNode, got.String(), i)
}

nodeIDs = append(nodeIDs, tc.Server(i).NodeID())
storeIDs = append(storeIDs, tc.Server(i).GetFirstStoreID())
}

sort.Slice(nodeIDs, func(i, j int) bool {
return nodeIDs[i] < nodeIDs[j]
})
sort.Slice(storeIDs, func(i, j int) bool {
return storeIDs[i] < storeIDs[j]
})

// Double check that we have the full set of node/store IDs
// we expect.
for i := 1; i <= len(nodeIDs); i++ {
expNodeID := roachpb.NodeID(i)
if got := nodeIDs[i-1]; got != expNodeID {
return errors.Newf("unexpected node ID; expected %s, got %s", expNodeID.String(), got.String())
}

expStoreID := roachpb.StoreID(i)
if got := storeIDs[i-1]; got != expStoreID {
return errors.Newf("unexpected store ID; expected %s, got %s", expStoreID.String(), got.String())
}
}

return nil
})
}()

// Start the test cluster. This is a blocking call, and expects the
// configured number of servers in the cluster to be fully
// initialized before it returns. Given that the initialization
// happens in the other thread, we'll only get past it after having
// bootstrapped the test cluster in the thread above.
tc.Start(t)
defer tc.Stopper().Stop(context.Background())

wg.Wait()
})
}
}

// TestJoinVersionGate checks to see that improperly versioned cockroach nodes
// are not able to join a cluster.
func TestJoinVersionGate(t *testing.T) {
defer leaktest.AfterTest(t)()
defer log.Scope(t).Close(t)

commonArg := base.TestServerArgs{
StoreSpecs: []base.StoreSpec{
{InMemory: true},
},
}

numNodes := 3
tcArgs := base.TestClusterArgs{
ReplicationMode: base.ReplicationManual, // Saves time in this test.
ServerArgs: commonArg,
ParallelStart: true,
}

tc := testcluster.StartTestCluster(t, numNodes, tcArgs)
defer tc.Stopper().Stop(context.Background())

testutils.SucceedsSoon(t, func() error {
for i := 0; i < numNodes; i++ {
clusterID := tc.Server(0).ClusterID()
got := tc.Server(i).ClusterID()

if got != clusterID {
return errors.Newf("mismatched cluster IDs; %s (for node %d) != %s (for node %d)", clusterID.String(), 0, got.String(), i)
}
}
return nil
})

var newVersion = clusterversion.TestingBinaryVersion
var oldVersion = prev(newVersion)

knobs := base.TestingKnobs{
Server: &server.TestingKnobs{
BinaryVersionOverride: oldVersion,
},
}

oldVersionServerArgs := commonArg
oldVersionServerArgs.Knobs = knobs
oldVersionServerArgs.JoinAddr = tc.Servers[0].ServingRPCAddr()

serv, err := tc.AddServer(oldVersionServerArgs)
if err != nil {
t.Fatal(err)
}
defer serv.Stop()

if err := serv.Start(); !errors.Is(errors.Cause(err), server.ErrIncompatibleBinaryVersion) {
t.Fatalf("expected error %s, got %v", server.ErrIncompatibleBinaryVersion.Error(), err.Error())
}
}
5 changes: 5 additions & 0 deletions pkg/server/testserver.go
Original file line number Diff line number Diff line change
Expand Up @@ -292,6 +292,11 @@ func (ts *TestServer) Node() interface{} {
return ts.node
}

// NodeID returns the ID of this node within its cluster.
func (ts *TestServer) NodeID() roachpb.NodeID {
return ts.rpcContext.NodeID.Get()
}

// Stopper returns the embedded server's Stopper.
func (ts *TestServer) Stopper() *stop.Stopper {
return ts.stopper
Expand Down
Loading

0 comments on commit 8547a97

Please sign in to comment.