Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
103945: jobs: add a Profiler tab to the job details page r=maryliag a=adityamaru

This change adds a Profiler tab to the job details page. This change also adds a row that allows collection of a cluster-wide CPU profile for 5 seconds, of all the samples corresponding to the job's execution.

Fixes: #102735

Release note (ui change): job details page now has a profiler tab for more advanced observability into a job's execution. Currently, we support collecting a cluster-wide CPU profile of the job.

104423: server: make SpanStats authoritative r=zachlite a=zachlite

This commit guarantees that SpanStats uses up-to-date range
descriptors on all nodes. The dependency on DistSender is removed
and is replaced with a dependency on rangedesc.Scanner.

rangedesc.Scanner is used to:
1) Locate nodes that house replicas of a span to avoid cluster-wide fan-outs.
2) Find Range Descriptors that touch a span, to build a SpanStatsResponse.

This commit also fixes #103809, where a SpanStatsResponse incorrectly returned
the replica_count for a span, instead of the range_count.

Epic: https://cockroachlabs.atlassian.net/browse/CRDB-24928
Issue: #103957
Release note (bug fix): SpanStats is no longer subject to stale
information, and should be considered authoritative.

104933: roachtest: fix regressions introduced by metamorphic arm64 and fips c… r=yuzefovich,renatolabs a=srosenberg

…hanges

There were two regressions introduced in [1]. This PR resolves them in a quick way. Subsequent PRs will address cpu architecture validation for attaching to pre-existing clusters, as well as enable benchmarks to run on arm64.

When using "local mode", don't fall back to amd64 if a test happens to be a benchmark. When attaching to a pre-existing cluster, pretend that its (cpu) architecture is the same as the one that was chosen for the given test.

[1] #103710

Epic: none
Fixes: #104678

Release note: None

104982: go.mod: bump Pebble to e9a280333ef7 r=jbowens a=jbowens

```
e9a28033 db: tighten missized tombstone metric count
17e0ab3c sharedcache: use a pool of workers for writes
3726f551 db: don't use internal types in ScanInternal
2fed2e71 sharedcache: refactor block math
```

Epic: none
Release note: None

104986: changefeedccl: add version gate to `SHOW CHANGEFEED JOBS` r=Xiang-Gu a=jayshrivastava

On 23.1 binaries, `SHOW CHANGEFEED JOBS` queries `crdb_internal.system_jobs`, which was added in 23.1. In 22.2-23.1 mixed version clusters, this query fails.

This change adds a version gate such that before an upgrade to 23.1 is finalized, `SHOW CHANGEFEED JOBS` will query `system.jobs` instead.

This change also adds regression testing in the `changefeed` logic test, ensuring that it runs in a mixed version environment.

Release note (bug fix): `SHOW CHANGEFEED JOBS` no
longer fails on 22.2, 23.1 mixed version clusters.

Epic: None

Co-authored-by: adityamaru <[email protected]>
Co-authored-by: Zach Lite <[email protected]>
Co-authored-by: Stan Rosenberg <[email protected]>
Co-authored-by: Jackson Owens <[email protected]>
Co-authored-by: Jayant Shrivastava <[email protected]>
  • Loading branch information
6 people committed Jun 15, 2023
6 parents 8dc422f + acd6d23 + 455148e + ab50817 + 7fc8834 + 1c227ea commit faaa732
Show file tree
Hide file tree
Showing 17 changed files with 326 additions and 98 deletions.
6 changes: 3 additions & 3 deletions DEPS.bzl
Original file line number Diff line number Diff line change
Expand Up @@ -1565,10 +1565,10 @@ def go_deps():
patches = [
"@com_github_cockroachdb_cockroach//build/patches:com_github_cockroachdb_pebble.patch",
],
sha256 = "e58192b93c07d773152894f78a9b47808a4d90dc1649f53949b0f4eafa58ce2a",
strip_prefix = "github.com/cockroachdb/[email protected]20230614133735-32834aa62738",
sha256 = "2379e7d32e1579131ada504c332c8e938d85c2048fa57684a84a8400f123a844",
strip_prefix = "github.com/cockroachdb/[email protected]20230615160155-e9a280333ef7",
urls = [
"https://storage.googleapis.com/cockroach-godeps/gomod/github.com/cockroachdb/pebble/com_github_cockroachdb_pebble-v0.0.0-20230614133735-32834aa62738.zip",
"https://storage.googleapis.com/cockroach-godeps/gomod/github.com/cockroachdb/pebble/com_github_cockroachdb_pebble-v0.0.0-20230615160155-e9a280333ef7.zip",
],
)
go_repository(
Expand Down
2 changes: 1 addition & 1 deletion build/bazelutil/distdir_files.bzl
Original file line number Diff line number Diff line change
Expand Up @@ -313,7 +313,7 @@ DISTDIR_FILES = {
"https://storage.googleapis.com/cockroach-godeps/gomod/github.com/cockroachdb/go-test-teamcity/com_github_cockroachdb_go_test_teamcity-v0.0.0-20191211140407-cff980ad0a55.zip": "bac30148e525b79d004da84d16453ddd2d5cd20528e9187f1d7dac708335674b",
"https://storage.googleapis.com/cockroach-godeps/gomod/github.com/cockroachdb/gostdlib/com_github_cockroachdb_gostdlib-v1.19.0.zip": "c4d516bcfe8c07b6fc09b8a9a07a95065b36c2855627cb3514e40c98f872b69e",
"https://storage.googleapis.com/cockroach-godeps/gomod/github.com/cockroachdb/logtags/com_github_cockroachdb_logtags-v0.0.0-20230118201751-21c54148d20b.zip": "ca7776f47e5fecb4c495490a679036bfc29d95bd7625290cfdb9abb0baf97476",
"https://storage.googleapis.com/cockroach-godeps/gomod/github.com/cockroachdb/pebble/com_github_cockroachdb_pebble-v0.0.0-20230614133735-32834aa62738.zip": "e58192b93c07d773152894f78a9b47808a4d90dc1649f53949b0f4eafa58ce2a",
"https://storage.googleapis.com/cockroach-godeps/gomod/github.com/cockroachdb/pebble/com_github_cockroachdb_pebble-v0.0.0-20230615160155-e9a280333ef7.zip": "2379e7d32e1579131ada504c332c8e938d85c2048fa57684a84a8400f123a844",
"https://storage.googleapis.com/cockroach-godeps/gomod/github.com/cockroachdb/redact/com_github_cockroachdb_redact-v1.1.5.zip": "11b30528eb0dafc8bc1a5ba39d81277c257cbe6946a7564402f588357c164560",
"https://storage.googleapis.com/cockroach-godeps/gomod/github.com/cockroachdb/returncheck/com_github_cockroachdb_returncheck-v0.0.0-20200612231554-92cdbca611dd.zip": "ce92ba4352deec995b1f2eecf16eba7f5d51f5aa245a1c362dfe24c83d31f82b",
"https://storage.googleapis.com/cockroach-godeps/gomod/github.com/cockroachdb/sentry-go/com_github_cockroachdb_sentry_go-v0.6.1-cockroachdb.2.zip": "fbb2207d02aecfdd411b1357efe1192dbb827959e36b7cab7491731ac55935c9",
Expand Down
2 changes: 1 addition & 1 deletion go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -116,7 +116,7 @@ require (
github.com/cockroachdb/go-test-teamcity v0.0.0-20191211140407-cff980ad0a55
github.com/cockroachdb/gostdlib v1.19.0
github.com/cockroachdb/logtags v0.0.0-20230118201751-21c54148d20b
github.com/cockroachdb/pebble v0.0.0-20230614133735-32834aa62738
github.com/cockroachdb/pebble v0.0.0-20230615160155-e9a280333ef7
github.com/cockroachdb/redact v1.1.5
github.com/cockroachdb/returncheck v0.0.0-20200612231554-92cdbca611dd
github.com/cockroachdb/stress v0.0.0-20220803192808-1806698b1b7b
Expand Down
4 changes: 2 additions & 2 deletions go.sum
Original file line number Diff line number Diff line change
Expand Up @@ -486,8 +486,8 @@ github.com/cockroachdb/gostdlib v1.19.0/go.mod h1:+dqqpARXbE/gRDEhCak6dm0l14AaTy
github.com/cockroachdb/logtags v0.0.0-20211118104740-dabe8e521a4f/go.mod h1:Vz9DsVWQQhf3vs21MhPMZpMGSht7O/2vFW2xusFUVOs=
github.com/cockroachdb/logtags v0.0.0-20230118201751-21c54148d20b h1:r6VH0faHjZeQy818SGhaone5OnYfxFR/+AzdY3sf5aE=
github.com/cockroachdb/logtags v0.0.0-20230118201751-21c54148d20b/go.mod h1:Vz9DsVWQQhf3vs21MhPMZpMGSht7O/2vFW2xusFUVOs=
github.com/cockroachdb/pebble v0.0.0-20230614133735-32834aa62738 h1:pKUZ8PCP23se600eVZs1HM7zwipLqWAKldxeWfgDGCk=
github.com/cockroachdb/pebble v0.0.0-20230614133735-32834aa62738/go.mod h1:ZoLpjYdSY0BC1it7VZT20L4tm0OhBawe5awAtC0S5Vk=
github.com/cockroachdb/pebble v0.0.0-20230615160155-e9a280333ef7 h1:/pKQG5yaM44AY8ODkUToGfn/6hpXq5Qb8KWIUzbjWcU=
github.com/cockroachdb/pebble v0.0.0-20230615160155-e9a280333ef7/go.mod h1:ZoLpjYdSY0BC1it7VZT20L4tm0OhBawe5awAtC0S5Vk=
github.com/cockroachdb/redact v1.1.3/go.mod h1:BVNblN9mBWFyMyqK1k3AAiSxhvhfK2oOZZ2lK+dpvRg=
github.com/cockroachdb/redact v1.1.5 h1:u1PMllDkdFfPWaNGMyLD1+so+aq3uUItthCFqzwPJ30=
github.com/cockroachdb/redact v1.1.5/go.mod h1:BVNblN9mBWFyMyqK1k3AAiSxhvhfK2oOZZ2lK+dpvRg=
Expand Down
18 changes: 17 additions & 1 deletion pkg/ccl/logictestccl/testdata/logic_test/changefeed
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# LogicTest: local
# LogicTest: local local-mixed-22.2-23.1

statement ok
CREATE TABLE t ()
Expand Down Expand Up @@ -49,3 +49,19 @@ user testuser

statement error user testuser requires the CHANGEFEED privilege on all target tables to be able to run an enterprise changefeed
CREATE CHANGEFEED FOR t INTO 'null://sink' with initial_scan='only'


user root

let $job_id
SELECT job_id FROM [SHOW CHANGEFEED JOBS] WHERE user_name = 'testuser'

query TT
SELECT user_name, description FROM [SHOW CHANGEFEED JOB $job_id]
----
testuser CREATE CHANGEFEED FOR TABLE t INTO 'null://sink' WITH initial_scan = 'only'

query TT
SELECT user_name, description FROM [SHOW CHANGEFEED JOBS]
----
testuser CREATE CHANGEFEED FOR TABLE t INTO 'null://sink' WITH initial_scan = 'only'
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ go_test(
"//c-deps:libgeos", # keep
"//pkg/ccl/logictestccl:testdata", # keep
],
shard_count = 5,
shard_count = 6,
tags = [
"ccl_test",
"cpu:1",
Expand Down

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

6 changes: 4 additions & 2 deletions pkg/cmd/roachtest/test_runner.go
Original file line number Diff line number Diff line change
Expand Up @@ -427,6 +427,8 @@ func defaultClusterAllocator(
lopt.l.PrintfCtx(ctx, "Attaching to existing cluster %s for test %s", existingClusterName, t.Name)
c, err := attachToExistingCluster(ctx, existingClusterName, clusterL, t.Cluster, opt, r.cr)
if err == nil {
// Pretend pre-existing's cluster architecture matches the desired one; see the above TODO wrt validation.
c.arch = arch
return c, nil, nil
}
if !errors.Is(err, errClusterNotFound) {
Expand Down Expand Up @@ -632,7 +634,7 @@ func (r *testRunner) runWorker(
// N.B. FIPS is only supported on 'amd64' at this time.
arch = vm.ArchFIPS
}
if testToRun.spec.Benchmark {
if testToRun.spec.Benchmark && testToRun.spec.Cluster.Cloud != spec.Local {
// TODO(srosenberg): enable after https://github.com/cockroachdb/cockroach/issues/104213
l.PrintfCtx(ctx, "Disabling randomly chosen arch=%q, %s", arch, testToRun.spec.Name)
arch = vm.ArchAMD64
Expand All @@ -645,7 +647,7 @@ func (r *testRunner) runWorker(
if testToRun.canReuseCluster && c != nil && c.arch != arch {
// Non-local cluster that's being reused must have the same architecture as was ensured above.
if c.spec.Cloud != spec.Local {
return errors.New("infeasible path: non-local cluster arch mismatch")
return errors.Newf("infeasible path: non-local cluster arch=%q differs from selected arch=%q", c.arch, arch)
}
// Local cluster is now reused to emulate a different CPU architecture.
c.arch = arch
Expand Down
21 changes: 20 additions & 1 deletion pkg/roachpb/span_stats.go
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,11 @@

package roachpb

import "github.com/cockroachdb/cockroach/pkg/settings"
import (
"fmt"

"github.com/cockroachdb/cockroach/pkg/settings"
)

// Put span statistics cluster settings here to avoid import cycle.

Expand All @@ -37,3 +41,18 @@ var RangeStatsBatchLimit = settings.RegisterIntSetting(
defaultRangeStatsBatchLimit,
settings.PositiveInt,
)

// RangeDescPageSize controls the page size when iterating through range
// descriptors.
var RangeDescPageSize = settings.RegisterIntSetting(
settings.TenantWritable,
"server.span_stats.range_desc_page_size",
"the page size when iterating through range descriptors",
100,
func(i int64) error {
if i < 5 || i > 25000 {
return fmt.Errorf("expected range_desc_page_size to be in range [5, 25000], got %d", i)
}
return nil
},
)
32 changes: 24 additions & 8 deletions pkg/server/admin.go
Original file line number Diff line number Diff line change
Expand Up @@ -1368,8 +1368,9 @@ func (s *adminServer) statsForSpan(
return nil, err
}

// Get a list of node ids and range count for the specified span.
nodeIDs, rangeCount, err := nodeIDsAndRangeCountForSpan(
// Get a list of nodeIDs, range counts, and replica counts per node
// for the specified span.
nodeIDs, rangeCount, replCounts, err := getNodeIDsRangeCountReplCountForSpan(
ctx, s.distSender, rSpan,
)
if err != nil {
Expand Down Expand Up @@ -1439,6 +1440,15 @@ func (s *adminServer) statsForSpan(
return nil, err
}
}

// The semantics of tableStatResponse.ReplicaCount counts replicas
// found for this span returned by a cluster-wide fan-out.
// We can use descriptors to know what the final count _should_ be,
// if we assume every request succeeds (nodes and replicas are reachable).
for _, replCount := range replCounts {
tableStatResponse.ReplicaCount += replCount
}

for remainingResponses := len(nodeIDs); remainingResponses > 0; remainingResponses-- {
select {
case resp := <-responses:
Expand All @@ -1449,6 +1459,10 @@ func (s *adminServer) statsForSpan(
return nil, serverError(ctx, resp.err)
}

// If this node is unreachable,
// it's replicas can not be counted.
tableStatResponse.ReplicaCount -= replCounts[resp.nodeID]

tableStatResponse.MissingNodes = append(
tableStatResponse.MissingNodes,
serverpb.TableStatsResponse_MissingNode{
Expand All @@ -1458,7 +1472,6 @@ func (s *adminServer) statsForSpan(
)
} else {
tableStatResponse.Stats.Add(resp.resp.SpanToStats[span.String()].TotalStats)
tableStatResponse.ReplicaCount += int64(resp.resp.SpanToStats[span.String()].RangeCount)
tableStatResponse.ApproximateDiskBytes += resp.resp.SpanToStats[span.String()].ApproximateDiskBytes
}
case <-ctx.Done():
Expand All @@ -1470,24 +1483,27 @@ func (s *adminServer) statsForSpan(
return &tableStatResponse, nil
}

// Returns the list of node ids for the specified span.
func nodeIDsAndRangeCountForSpan(
// Returns the list of node ids, range count,
// and replica count for the specified span.
func getNodeIDsRangeCountReplCountForSpan(
ctx context.Context, ds *kvcoord.DistSender, rSpan roachpb.RSpan,
) (nodeIDList []roachpb.NodeID, rangeCount int64, _ error) {
) (nodeIDList []roachpb.NodeID, rangeCount int64, replCounts map[roachpb.NodeID]int64, _ error) {
nodeIDs := make(map[roachpb.NodeID]struct{})
replCountForNodeID := make(map[roachpb.NodeID]int64)
ri := kvcoord.MakeRangeIterator(ds)
ri.Seek(ctx, rSpan.Key, kvcoord.Ascending)
for ; ri.Valid(); ri.Next(ctx) {
rangeCount++
for _, repl := range ri.Desc().Replicas().Descriptors() {
replCountForNodeID[repl.NodeID]++
nodeIDs[repl.NodeID] = struct{}{}
}
if !ri.NeedAnother(rSpan) {
break
}
}
if err := ri.Error(); err != nil {
return nil, 0, err
return nil, 0, nil, err
}

nodeIDList = make([]roachpb.NodeID, 0, len(nodeIDs))
Expand All @@ -1497,7 +1513,7 @@ func nodeIDsAndRangeCountForSpan(
sort.Slice(nodeIDList, func(i, j int) bool {
return nodeIDList[i] < nodeIDList[j]
})
return nodeIDList, rangeCount, nil
return nodeIDList, rangeCount, replCountForNodeID, nil
}

// Users returns a list of users, stripped of any passwords.
Expand Down
1 change: 0 additions & 1 deletion pkg/server/server.go
Original file line number Diff line number Diff line change
Expand Up @@ -998,7 +998,6 @@ func NewServer(cfg Config, stopper *stop.Stopper) (*Server, error) {
serverIterator,
spanConfig.reporter,
clock,
distSender,
rangestats.NewFetcher(db),
node,
)
Expand Down
Loading

0 comments on commit faaa732

Please sign in to comment.