Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
90097: kv: collect hot ranges per tenant on store r=koorosh a=koorosh

The main goal of this change is to provide ability to request hot ranges info per tenant (as part of changes for UA). To track down hot ranges per tenant, `ReplicaRankingMap` struct is added that works almost identically as existing `ReplicaRankings` struct with only difference that hottest replicas are stored in an underlying map structure to keep separately ranges for every tenant. Also, it was decided to not extend existing `ReplicaRankings` functionality because it is used for replica rebalancing that doesn't need to know about tenants.

This change might bring small overhead for self-hosted clusters as it will accumulate same hot ranges in both `replRankings` and `replRankingsByTenant` fields in Store.

Changes in next PRs are supposed to rely on this change.

Release note: None

Epic: https://cockroachlabs.atlassian.net/browse/CRDB-19124

92313: roachtest: fix transfer-leases r=erikgrinaker a=tbg

`checkNoLeases` verifies that once a node is drained, for each range
there is one replica of the other nodes that sees the lease on one of
the non-drained stores. The reason it asks justs for *one* replica to
have the lease as opposed to all of them is because some followers may
be behind.

However, even just the assumption that there is one can be violated.
The drain succeeds once the draining node has locally seen the lease
transfer succeed. It is likely the raft leader at this point, so it
will be the first one to see this event. Other nodes will only see
it after one additional round-trip (when they learn that the log
index has committed, and then go and apply it).

So going and looking for a replica that sees the new lease immediately
after the drain succeeds may fail.

Work around this by sleeping for one second before checking, which ought
to be enough, and is also a small enough delay to make sure that if
leases are actually not getting transferred, the check will continue to
fail (non-cooperative lease failover is on the order of multiple
seconds).

This commit also improves a debugging file which was previously
clobbered over the multiple iterations of the surrounding loop.

It also makes it clearer that we're pulling the range data from
each node in turn (we were previously hitting otherNodeID but asking
it to proxy to the node. Now we're hitting each node directly without
relying on the internal redirect, which is less confusing).

See: #91801 (comment)

Fixes #91801.

Epic: none
Release note: None


92834: tree: improve type-checking for placeholders with ambiguous type r=jordanlewis a=rafiss

fixes #90364

The key fix is to change the typeCheckSplitExprs function so that it marks _all_
placeholder indexes. This then causes the existing type-checking logic
in typeCheckOverloadedExprs to check all placeholder expressions, rather
than just ones that don't have type hints.

Release note (bug fix): Prepared statements that use type hints can now succeed type-checking in more cases when the placeholder type is ambiguous.

93047: base: remove old testing knobs r=andreimatei a=andreimatei

The testing knobs for startup migrations don't exist anymore.

Release note: None
Epic: None

93074: testutils: add missing comment r=andreimatei a=andreimatei

Release note: None
Epic: None

93079: insights: record more txn insights r=xinhaoz a=xinhaoz

Closes #93076

This commit adds the following fields at the txn level when recording
insights in the sql insights system:
- contention: total txn contention time
- start_time: txn start time
- end_time: txn end time
- auto_retry_reason: last reason for auto txn retry
- retry_count
- rows_written
- rows_read

In addition, the following fields have been moved from recording at
the stmt level to recording at the txn level for insights:
- user
- application_name

Release note: None

93160: bazci: insert `--config ci` if necessary r=healthy-pod a=rickystewart

All builds and tests in CI need this --config argument.

Epic: None
Release note: None

93177: dev: fix dev build error when cross building r=rickystewart a=healthy-pod

This is a temporary fix for the issue. In a future change, we should let beaver hub distinguish between normal and cross builds, and then have an actual fix for this.

Release note: None
Epic: none

93193: vendor: bump Pebble to 0d6d19018632 r=nicktrav a=itsbilal

```
0d6d1901 crossversion: don't stream TestMeta output on verbose
d05a6f0e vfs: use SequentialReadsOption in vfs.[Limited]Copy
b85fc64f merging_iter: don't relative-seek past prefix in isNextEntryDeleted
32ad55f8 *: use github.com/cockroachdb/datadriven
6b644274 sstable: don't fatal if file no longer exists during readahead
```

Fixes #93191.

Release note: None.

Co-authored-by: Andrii Vorobiov <[email protected]>
Co-authored-by: Tobias Grieger <[email protected]>
Co-authored-by: Rafi Shamim <[email protected]>
Co-authored-by: Andrei Matei <[email protected]>
Co-authored-by: Xin Hao Zhang <[email protected]>
Co-authored-by: Ricky Stewart <[email protected]>
Co-authored-by: healthy-pod <[email protected]>
Co-authored-by: Bilal Akhtar <[email protected]>
  • Loading branch information
9 people committed Dec 7, 2022
10 parents 593bcb2 + b4d8726 + 4dbbafa + 33ead8e + 00c7330 + b849f61 + 69f16c6 + e6fae10 + 9d19eca + cd000ee commit 8248dce
Show file tree
Hide file tree
Showing 28 changed files with 468 additions and 107 deletions.
6 changes: 3 additions & 3 deletions DEPS.bzl
Original file line number Diff line number Diff line change
Expand Up @@ -1485,10 +1485,10 @@ def go_deps():
patches = [
"@com_github_cockroachdb_cockroach//build/patches:com_github_cockroachdb_pebble.patch",
],
sha256 = "4452117f35d8c00d73e8384dc2ba3c9bb69bf4507b30d0fec2a4287c7f318efa",
strip_prefix = "github.com/cockroachdb/[email protected]20221205175550-4a63cdb3a71e",
sha256 = "b422de55eea4f2662a4e1b32807d699f4f7feb0fab40dc0e99455473561c689c",
strip_prefix = "github.com/cockroachdb/[email protected]20221206222826-0d6d19018632",
urls = [
"https://storage.googleapis.com/cockroach-godeps/gomod/github.com/cockroachdb/pebble/com_github_cockroachdb_pebble-v0.0.0-20221205175550-4a63cdb3a71e.zip",
"https://storage.googleapis.com/cockroach-godeps/gomod/github.com/cockroachdb/pebble/com_github_cockroachdb_pebble-v0.0.0-20221206222826-0d6d19018632.zip",
],
)
go_repository(
Expand Down
2 changes: 1 addition & 1 deletion build/bazelutil/distdir_files.bzl
Original file line number Diff line number Diff line change
Expand Up @@ -193,7 +193,7 @@ DISTDIR_FILES = {
"https://storage.googleapis.com/cockroach-godeps/gomod/github.com/cockroachdb/google-api-go-client/com_github_cockroachdb_google_api_go_client-v0.80.1-0.20221117193156-6a9f7150cb93.zip": "b3378c579f4f4340403038305907d672c86f615f8233118a8873ebe4229c4f39",
"https://storage.googleapis.com/cockroach-godeps/gomod/github.com/cockroachdb/gostdlib/com_github_cockroachdb_gostdlib-v1.19.0.zip": "c4d516bcfe8c07b6fc09b8a9a07a95065b36c2855627cb3514e40c98f872b69e",
"https://storage.googleapis.com/cockroach-godeps/gomod/github.com/cockroachdb/logtags/com_github_cockroachdb_logtags-v0.0.0-20211118104740-dabe8e521a4f.zip": "1972c3f171f118add3fd9e64bcea6cbb9959a3b7fa0ada308e8a7310813fea74",
"https://storage.googleapis.com/cockroach-godeps/gomod/github.com/cockroachdb/pebble/com_github_cockroachdb_pebble-v0.0.0-20221205175550-4a63cdb3a71e.zip": "4452117f35d8c00d73e8384dc2ba3c9bb69bf4507b30d0fec2a4287c7f318efa",
"https://storage.googleapis.com/cockroach-godeps/gomod/github.com/cockroachdb/pebble/com_github_cockroachdb_pebble-v0.0.0-20221206222826-0d6d19018632.zip": "b422de55eea4f2662a4e1b32807d699f4f7feb0fab40dc0e99455473561c689c",
"https://storage.googleapis.com/cockroach-godeps/gomod/github.com/cockroachdb/redact/com_github_cockroachdb_redact-v1.1.3.zip": "7778b1e4485e4f17f35e5e592d87eb99c29e173ac9507801d000ad76dd0c261e",
"https://storage.googleapis.com/cockroach-godeps/gomod/github.com/cockroachdb/returncheck/com_github_cockroachdb_returncheck-v0.0.0-20200612231554-92cdbca611dd.zip": "ce92ba4352deec995b1f2eecf16eba7f5d51f5aa245a1c362dfe24c83d31f82b",
"https://storage.googleapis.com/cockroach-godeps/gomod/github.com/cockroachdb/sentry-go/com_github_cockroachdb_sentry_go-v0.6.1-cockroachdb.2.zip": "fbb2207d02aecfdd411b1357efe1192dbb827959e36b7cab7491731ac55935c9",
Expand Down
2 changes: 1 addition & 1 deletion dev
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ fi
set -euo pipefail

# Bump this counter to force rebuilding `dev` on all machines.
DEV_VERSION=64
DEV_VERSION=65

THIS_DIR=$(cd "$(dirname "$0")" && pwd)
BINARY_DIR=$THIS_DIR/bin/dev-versions
Expand Down
2 changes: 1 addition & 1 deletion go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -112,7 +112,7 @@ require (
github.com/cockroachdb/go-test-teamcity v0.0.0-20191211140407-cff980ad0a55
github.com/cockroachdb/gostdlib v1.19.0
github.com/cockroachdb/logtags v0.0.0-20211118104740-dabe8e521a4f
github.com/cockroachdb/pebble v0.0.0-20221205175550-4a63cdb3a71e
github.com/cockroachdb/pebble v0.0.0-20221206222826-0d6d19018632
github.com/cockroachdb/redact v1.1.3
github.com/cockroachdb/returncheck v0.0.0-20200612231554-92cdbca611dd
github.com/cockroachdb/stress v0.0.0-20220803192808-1806698b1b7b
Expand Down
4 changes: 2 additions & 2 deletions go.sum
Original file line number Diff line number Diff line change
Expand Up @@ -475,8 +475,8 @@ github.com/cockroachdb/gostdlib v1.19.0/go.mod h1:+dqqpARXbE/gRDEhCak6dm0l14AaTy
github.com/cockroachdb/logtags v0.0.0-20190617123548-eb05cc24525f/go.mod h1:i/u985jwjWRlyHXQbwatDASoW0RMlZ/3i9yJHE2xLkI=
github.com/cockroachdb/logtags v0.0.0-20211118104740-dabe8e521a4f h1:6jduT9Hfc0njg5jJ1DdKCFPdMBrp/mdZfCpa5h+WM74=
github.com/cockroachdb/logtags v0.0.0-20211118104740-dabe8e521a4f/go.mod h1:Vz9DsVWQQhf3vs21MhPMZpMGSht7O/2vFW2xusFUVOs=
github.com/cockroachdb/pebble v0.0.0-20221205175550-4a63cdb3a71e h1:ASgNX7mpOQnyi6P3ZMXgfB0V9jUAzuLVDxL5AV7oSP0=
github.com/cockroachdb/pebble v0.0.0-20221205175550-4a63cdb3a71e/go.mod h1:qf9bLis2yy1XyNYD01wvIHPabuC1STzQsvGibYVsom4=
github.com/cockroachdb/pebble v0.0.0-20221206222826-0d6d19018632 h1:XD+3KAuihY5B+bvwmvfq/RjFTj8AA/wj5Oq+IB/WSxg=
github.com/cockroachdb/pebble v0.0.0-20221206222826-0d6d19018632/go.mod h1:8vvNzfaCFGp5Yvnqu0+a1LCL5i+NCID7YsNdhe0xhM8=
github.com/cockroachdb/redact v1.1.3 h1:AKZds10rFSIj7qADf0g46UixK8NNLwWTNdCIGS5wfSQ=
github.com/cockroachdb/redact v1.1.3/go.mod h1:BVNblN9mBWFyMyqK1k3AAiSxhvhfK2oOZZ2lK+dpvRg=
github.com/cockroachdb/returncheck v0.0.0-20200612231554-92cdbca611dd h1:KFOt5I9nEKZgCnOSmy8r4Oykh8BYQO8bFOTgHDS8YZA=
Expand Down
1 change: 0 additions & 1 deletion pkg/base/testing_knobs.go
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,6 @@ type TestingKnobs struct {
SQLTypeSchemaChanger ModuleTestingKnobs
GCJob ModuleTestingKnobs
PGWireTestingKnobs ModuleTestingKnobs
StartupMigrationManager ModuleTestingKnobs
DistSQL ModuleTestingKnobs
SQLEvalContext ModuleTestingKnobs
NodeLiveness ModuleTestingKnobs
Expand Down
12 changes: 12 additions & 0 deletions pkg/cmd/bazci/bazci.go
Original file line number Diff line number Diff line change
Expand Up @@ -333,6 +333,18 @@ func bazciImpl(cmd *cobra.Command, args []string) error {
}
args = append(args, fmt.Sprintf("--build_event_binary_file=%s", bepLoc))
args = append(args, fmt.Sprintf("--bes_backend=grpc://127.0.0.1:%d", port))
// Insert `--config ci` if it's not already in the args list.
hasCiConfig := false
for idx, arg := range args {
if arg == "--config=ci" || arg == "--config=cinolint" ||
(arg == "--config" && idx < len(args)-1 && (args[idx+1] == "ci" || args[idx+1] == "cinolint")) {
hasCiConfig = true
break
}
}
if !hasCiConfig {
args = append(args, "--config", "ci")
}
fmt.Println("running bazel w/ args: ", shellescape.QuoteCommand(args))
bazelCmd := exec.Command("bazel", args...)
bazelCmd.Stdout = os.Stdout
Expand Down
10 changes: 5 additions & 5 deletions pkg/cmd/dev/build.go
Original file line number Diff line number Diff line change
Expand Up @@ -167,13 +167,13 @@ func (d *dev) build(cmd *cobra.Command, commandLine []string) error {
return err
}
args = append(args, additionalBazelArgs...)
if buildutil.CrdbTestBuild {
args = append(args, "--build_event_binary_file=/tmp/path")
} else {
args = append(args, fmt.Sprintf("--build_event_binary_file=%s", filepath.Join(tmpDir, bepFileBasename)))
}

if cross == "" {
if buildutil.CrdbTestBuild {
args = append(args, "--build_event_binary_file=/tmp/path")
} else {
args = append(args, fmt.Sprintf("--build_event_binary_file=%s", filepath.Join(tmpDir, bepFileBasename)))
}
logCommand("bazel", args...)
if err := d.exec.CommandContextInheritingStdStreams(ctx, "bazel", args...); err != nil {
return err
Expand Down
28 changes: 17 additions & 11 deletions pkg/cmd/roachtest/tests/quit.go
Original file line number Diff line number Diff line change
Expand Up @@ -202,10 +202,6 @@ ALTER TABLE t SPLIT AT TABLE generate_series(%[1]d,%[1]d-99,-1)`, i)); err != ni
// checkNoLeases verifies that no range has a lease on the node
// that's just been shut down.
func (q *quitTest) checkNoLeases(ctx context.Context, nodeID int) {
// We need to use SQL against a node that's not the one we're
// shutting down.
otherNodeID := 1 + nodeID%q.c.Spec().NodeCount

// Now we're going to check two things:
//
// 1) *immediately*, that every range in the cluster has a lease
Expand All @@ -218,12 +214,21 @@ func (q *quitTest) checkNoLeases(ctx context.Context, nodeID int) {
// drain does not wait for followers to catch up.
// https://github.com/cockroachdb/cockroach/issues/47100
//
// Additionally, the way the test is architected right now has a tiny race:
// when n3 has transferred the lease, the result is visible to n3, but we
// are only checking the other nodes. Even if some of them must have acked
// the raft log entry, there is an additional delay until they apply it. So
// we may still, in this test, find that a node has drained and there is a
// lease transfer that is not yet visible (= has applied) on any other
// node. To work around this, we sleep for one second prior to checking.
//
// 2) *eventually* that every other node than nodeID has no range
// replica whose lease refers to nodeID, i.e. the followers
// have all caught up.
// Note: when issue #47100 is fixed, this 2nd condition
// must be true immediately -- drain is then able to wait
// for all followers to learn who the new leaseholder is.
time.Sleep(time.Second)

if err := testutils.SucceedsSoonError(func() error {
// To achieve that, we ask first each range in turn for its range
Expand All @@ -248,18 +253,18 @@ func (q *quitTest) checkNoLeases(ctx context.Context, nodeID int) {
// Get the report via HTTP.
// Flag -s is to remove progress on stderr, so that the buffer
// contains the JSON of the response and nothing else.
adminAddrs, err := q.c.InternalAdminUIAddr(ctx, q.t.L(), q.c.Node(otherNodeID))
adminAddrs, err := q.c.InternalAdminUIAddr(ctx, q.t.L(), q.c.Node(i))
if err != nil {
q.Fatal(err)
}
result, err := q.c.RunWithDetailsSingleNode(ctx, q.t.L(), q.c.Node(otherNodeID),
"curl", "-s", fmt.Sprintf("http://%s/_status/ranges/%d",
adminAddrs[0], i))
result, err := q.c.RunWithDetailsSingleNode(ctx, q.t.L(), q.c.Node(i),
"curl", "-s", fmt.Sprintf("http://%s/_status/ranges/local",
adminAddrs[0]))
if err != nil {
q.Fatal(err)
}
// Persist the response to artifacts to aid debugging. See #75438.
_ = os.WriteFile(filepath.Join(q.t.ArtifactsDir(), "status_ranges.json"),
_ = os.WriteFile(filepath.Join(q.t.ArtifactsDir(), fmt.Sprintf("status_ranges_n%d.json", i)),
[]byte(result.Stdout), 0644,
)
// We need just a subset of the response. Make an ad-hoc
Expand Down Expand Up @@ -342,10 +347,11 @@ func (q *quitTest) checkNoLeases(ctx context.Context, nodeID int) {
q.Fatal(err)
}

// For good measure, also write to the table. This ensures it remains
// available. We pick a node that's not the drained node.
otherNodeID := 1 + nodeID%q.c.Spec().NodeCount
db := q.c.Conn(ctx, q.t.L(), otherNodeID)
defer db.Close()
// For good measure, also write to the table. This ensures it
// remains available.
if _, err := db.ExecContext(ctx, `UPDATE t SET y = y + 1`); err != nil {
q.Fatal(err)
}
Expand Down
79 changes: 79 additions & 0 deletions pkg/kv/kvserver/replica_rankings.go
Original file line number Diff line number Diff line change
Expand Up @@ -199,3 +199,82 @@ func (pq *rrPriorityQueue) Pop() interface{} {
pq.entries = old[0 : n-1]
return item
}

// ReplicaRankingMap maintains top-k orderings of the replicas per tenant in a store by QPS.
type ReplicaRankingMap struct {
mu struct {
syncutil.Mutex
items RRAccumulatorByTenant
}
}

// NewReplicaRankingsMap returns a new ReplicaRankingMap struct.
func NewReplicaRankingsMap() *ReplicaRankingMap {
return &ReplicaRankingMap{}
}

// NewAccumulator returns a new rrAccumulator.
func (rr *ReplicaRankingMap) NewAccumulator() *RRAccumulatorByTenant {
return &RRAccumulatorByTenant{}
}

// Update sets the accumulator for replica tracking to be the passed in value.
func (rr *ReplicaRankingMap) Update(acc *RRAccumulatorByTenant) {
rr.mu.Lock()
rr.mu.items = *acc
rr.mu.Unlock()
}

// TopQPS returns the highest QPS CandidateReplicas that are tracked.
func (rr *ReplicaRankingMap) TopQPS(tenantID roachpb.TenantID) []CandidateReplica {
rr.mu.Lock()
defer rr.mu.Unlock()
r, ok := rr.mu.items[tenantID]
if !ok {
return []CandidateReplica{}
}
if r.Len() > 0 {
r.entries = consumeAccumulator(&r)
rr.mu.items[tenantID] = r
}
return r.entries
}

// RRAccumulatorByTenant accumulates replicas per tenant to update the replicas tracked by ReplicaRankingMap.
// It should be used in the same way as RRAccumulator (see doc string).
type RRAccumulatorByTenant map[roachpb.TenantID]rrPriorityQueue

// AddReplica adds a replica to the replica accumulator.
func (a RRAccumulatorByTenant) AddReplica(repl CandidateReplica) {
// Do not consider ranges as hot when they are accessed once or less times.
if repl.QPS() <= 1 {
return
}

tID, ok := repl.Repl().TenantID()
if !ok {
return
}

r, ok := a[tID]
if !ok {
q := rrPriorityQueue{
val: func(r CandidateReplica) float64 { return r.QPS() },
}
heap.Push(&q, repl)
a[tID] = q
return
}

if r.Len() < numTopReplicasToTrack {
heap.Push(&r, repl)
a[tID] = r
return
}

if repl.QPS() > r.entries[0].QPS() {
heap.Pop(&r)
heap.Push(&r, repl)
a[tID] = r
}
}
70 changes: 70 additions & 0 deletions pkg/kv/kvserver/replica_rankings_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -484,3 +484,73 @@ func TestReadLoadMetricAccounting(t *testing.T) {
})
}
}

func TestNewReplicaRankingsMap(t *testing.T) {
defer leaktest.AfterTest(t)()
defer log.Scope(t).Close(t)

rr := NewReplicaRankingsMap()

type testCase struct {
tenantID uint64
qps float64
}

testCases := [][]testCase{
{},
{{1, 1}, {1, 2}, {1, 3}, {1, 4}},
{{1, 1}, {1, 2}, {2, 0}, {3, 0}},
{{1, 1}, {1, 2}, {1, 3}, {1, 4},
{2, 1}, {2, 2}, {2, 3}, {2, 4},
{3, 1}, {3, 2}, {3, 3}, {3, 4}},
}

for _, tc := range testCases {
acc := rr.NewAccumulator()

// Randomize the order of the inputs each time the test is run.
rand.Shuffle(len(tc), func(i, j int) {
tc[i], tc[j] = tc[j], tc[i]
})

expectedReplicasPerTenant := make(map[uint64]int)

for i, c := range tc {
cr := candidateReplica{
Replica: &Replica{RangeID: roachpb.RangeID(i)},
qps: c.qps,
}
cr.mu.tenantID = roachpb.MustMakeTenantID(c.tenantID)
acc.AddReplica(cr)

if c.qps <= 1 {
continue
}

if l, ok := expectedReplicasPerTenant[c.tenantID]; ok {
expectedReplicasPerTenant[c.tenantID] = l + 1
} else {
expectedReplicasPerTenant[c.tenantID] = 1
}
}
rr.Update(acc)

for tID, count := range expectedReplicasPerTenant {
repls := rr.TopQPS(roachpb.MustMakeTenantID(tID))
if len(repls) != count {
t.Errorf("wrong number of replicas in output; got: %v; want: %v", repls, tc)
continue
}
for i := 0; i < len(repls)-1; i++ {
if repls[i].QPS() < repls[i+1].QPS() {
t.Errorf("got %f for %d'th element; it's smaller than QPS of the next element %f", repls[i].QPS(), i, repls[i+1].QPS())
break
}
}
replsCopy := rr.TopQPS(roachpb.MustMakeTenantID(tID))
if !reflect.DeepEqual(repls, replsCopy) {
t.Errorf("got different replicas on second call to topQPS; first call: %v, second call: %v", repls, replsCopy)
}
}
}
}
Loading

0 comments on commit 8248dce

Please sign in to comment.