Skip to content

Commit

Permalink
server,tracing: integrate on-demand profiling with CRDB tracing
Browse files Browse the repository at this point in the history
This change introduces a BackgroundProfiler service that is started
during server startup on each node in the cluster. The BackgroundProfiler
is responsible for collecting on-demand CPU profiles and runtime traces
for a particular operation. The profiler can be subscribed to by an in-process
listener. The first Subscriber initializes the collection of the CPU and execution
trace profiles. While the profiles are being collected, only Subscribers carrying
the same `profileID` are allowed to join the running profiles. The profiles
are stopped and persisted to local storage when the last Subscriber
unsubscribes. The `profileID` is a unique identifier of the operation that is
being traced. Since only one profile can be running in a process at a time,
any Subscribers with different `profileID`s than the current one will be rejected.

The in-process listeners described above will be CRDBs internal tracing
spans. This change introduces a `WithBackgroudProfiling` option that can be used
to instruct a tracing span to subscribe to the BackgroundProfiler. This option
is propogated to all local and remote child spans created as part of the trace.
Only local, root spans that have background profiling enabled will Subscribe to
the profiler on creation. As mentioned above only one operation can be profiled
at a time. We use the first root span's `TraceID` as the BackgroundProfiler's `profileID`.
All subsequent root span's that are part of the same trace will be able to join
the running profile. Tracing span's unsubscribe from the profile on Finish().

Every Susbcriber is returned a wrapped ctx with pprof labels that tie its execution
to the profile being collected by the BackgroundProfiler. These labels are used
to post-process the collected CPU profile and filter out samples that only correspond
to our subscribers. The end result is filtered CPU profile  prefixed `cpuprofiler.`
and a process wide execution trace `runtimetrace.` persisted to local storage.

This change only introduces the infrastructure to enable on-demand profiling.
The test in `profiler_test.go` results in a CPU profile with information about
each labelled root operation collected on-demand:

❯ go tool pprof cpuprofiler.2023-03-08T14_51_52.402
Type: cpu
Time: Mar 8, 2023 at 9:51am (EST)
Duration: 10.11s, Total samples = 8.57s (84.77%)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) tags
 9171346462634118014: Total 8.6s
                      906.0ms (10.57%): op2
                      902.0ms (10.53%): op1
                      890.0ms (10.39%): op0
                      886.0ms (10.34%): op7
                      866.0ms (10.11%): op4
                      866.0ms (10.11%): op5
                      854.0ms ( 9.96%): op3
                      806.0ms ( 9.40%): op8
                      804.0ms ( 9.38%): op6
                      790.0ms ( 9.22%): op9

Execution traces do not surface pprof labels in golang yet but a future
patch could consider cherry-picking https://go-review.googlesource.com/c/go/+/446975.
This allows the user to focus on goroutines run with the specified pprof labels:

With this framework in place one could envision the following use cases:

- stmt diagnostics requests get a new option to request profiling.
When requested, any local root trace span (i.e. while any part of the trace is
active on a given node) subscribes to profiles, and references to the profiles collected
are stored as payloads in the span. They're then included in the stmt bundle.

- even outside of diagnostics, could mark traces as wanting to capture debug info
for "slow spans". Such spans on creation could set a timer that, once it fires,
subscribes to (short) execution traces periodically as a way to snapshot the goroutine's
actions. These could be referenced in the span for later retrieval.

Informs: cockroachdb#97215
  • Loading branch information
adityamaru committed Mar 8, 2023
1 parent b84f10c commit b12cbd1
Show file tree
Hide file tree
Showing 24 changed files with 854 additions and 16 deletions.
7 changes: 7 additions & 0 deletions pkg/BUILD.bazel
Original file line number Diff line number Diff line change
Expand Up @@ -269,6 +269,7 @@ ALL_TESTS = [
"//pkg/security/username:username_disallowed_imports_test",
"//pkg/security/username:username_test",
"//pkg/security:security_test",
"//pkg/server/backgroundprofiler/profiler:profiler_test",
"//pkg/server/debug/goroutineui:goroutineui_test",
"//pkg/server/debug/pprofui:pprofui_test",
"//pkg/server/debug:debug_test",
Expand Down Expand Up @@ -1413,6 +1414,10 @@ GO_TARGETS = [
"//pkg/security/username:username_test",
"//pkg/security:security",
"//pkg/security:security_test",
"//pkg/server/backgroundprofiler/profiler:profiler",
"//pkg/server/backgroundprofiler/profiler:profiler_test",
"//pkg/server/backgroundprofiler:backgroundprofiler",
"//pkg/server/backgroundprofiler:executiontracer",
"//pkg/server/debug/goroutineui:goroutineui",
"//pkg/server/debug/goroutineui:goroutineui_test",
"//pkg/server/debug/pprofui:pprofui",
Expand Down Expand Up @@ -2783,6 +2788,8 @@ GET_X_DATA_TARGETS = [
"//pkg/security/sessionrevival:get_x_data",
"//pkg/security/username:get_x_data",
"//pkg/server:get_x_data",
"//pkg/server/backgroundprofiler:get_x_data",
"//pkg/server/backgroundprofiler/profiler:get_x_data",
"//pkg/server/debug:get_x_data",
"//pkg/server/debug/goroutineui:get_x_data",
"//pkg/server/debug/pprofui:get_x_data",
Expand Down
4 changes: 4 additions & 0 deletions pkg/base/constants.go
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,10 @@ const (
// stores profiles when the periodic CPU profile dump is enabled.
CPUProfileDir = "pprof_dump"

// RuntimeProfileDir is the directory name where the
// backgroundprofiler.Profiler stores profiles.
RuntimeProfileDir = "runtime_profiler"

// InflightTraceDir is the directory name where the job trace dumper stores traces
// when a job opts in to dumping its execution traces.
InflightTraceDir = "inflight_trace_dump"
Expand Down
12 changes: 12 additions & 0 deletions pkg/base/test_server_args.go
Original file line number Diff line number Diff line change
Expand Up @@ -142,6 +142,11 @@ type TestServerArgs struct {
// If set, a TraceDir is initialized at the provided path.
TraceDir string

// If set, a RuntimeProfileDir is initialized at the provided path. Runtime
// profiles that are collected by backgroundprofiler.Profiler during the
// execution of the test will be written to this directory.
RuntimeProfileDir string

// DisableSpanConfigs disables the use of the span configs infrastructure
// (in favor of the gossiped system config span). It's equivalent to setting
// COCKROACH_DISABLE_SPAN_CONFIGS, and is only intended for tests written
Expand Down Expand Up @@ -353,6 +358,13 @@ type TestTenantArgs struct {
// If set, this directory should be cleaned up after the test completes.
HeapProfileDirName string

// RuntimeProfileDirName is used to initialize the same named field on the
// SQLServer.BaseConfig field. It is the directory name for runtime profiles
// using backgroundprofiler.Profiler. If empty, no runtime profiles will be
// collected during the test. If set, this directory should be cleaned up
// after the test completes.
RuntimeProfileDirName string

// StartDiagnosticsReporting checks cluster.TelemetryOptOut(), and
// if not disabled starts the asynchronous goroutine that checks for
// CockroachDB upgrades and periodically reports diagnostics to
Expand Down
1 change: 1 addition & 0 deletions pkg/cli/log_flags.go
Original file line number Diff line number Diff line change
Expand Up @@ -227,6 +227,7 @@ func setupLogging(ctx context.Context, cmd *cobra.Command, isServerCmd, applyCon
serverCfg.HeapProfileDirName = filepath.Join(outputDirectory, base.HeapProfileDir)
serverCfg.CPUProfileDirName = filepath.Join(outputDirectory, base.CPUProfileDir)
serverCfg.InflightTraceDirName = filepath.Join(outputDirectory, base.InflightTraceDir)
serverCfg.RuntimeProfileDirName = filepath.Join(outputDirectory, base.RuntimeProfileDir)

return nil
}
Expand Down
1 change: 1 addition & 0 deletions pkg/gen/protobuf.bzl
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,7 @@ PROTOBUF_SRCS = [
"//pkg/repstream/streampb:streampb_go_proto",
"//pkg/roachpb:roachpb_go_proto",
"//pkg/rpc:rpc_go_proto",
"//pkg/server/backgroundprofiler/profiler:profiler_go_proto",
"//pkg/server/diagnostics/diagnosticspb:diagnosticspb_go_proto",
"//pkg/server/serverpb:serverpb_go_proto",
"//pkg/server/status/statuspb:statuspb_go_proto",
Expand Down
1 change: 1 addition & 0 deletions pkg/server/BUILD.bazel
Original file line number Diff line number Diff line change
Expand Up @@ -158,6 +158,7 @@ go_library(
"//pkg/security/password",
"//pkg/security/securityassets",
"//pkg/security/username",
"//pkg/server/backgroundprofiler/profiler",
"//pkg/server/debug",
"//pkg/server/diagnostics",
"//pkg/server/diagnostics/diagnosticspb",
Expand Down
20 changes: 20 additions & 0 deletions pkg/server/backgroundprofiler/BUILD.bazel
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
load("//build/bazelutil/unused_checker:unused.bzl", "get_x_data")
load("@io_bazel_rules_go//go:def.bzl", "go_library")

go_library(
name = "executiontracer",
srcs = ["executiontracer.go"],
importpath = "github.com/cockroachdb/cockroach/pkg/server/executiontracer",
visibility = ["//visibility:public"],
deps = ["//pkg/util/protoutil"],
)

go_library(
name = "backgroundprofiler",
srcs = ["background_profiler.go"],
importpath = "github.com/cockroachdb/cockroach/pkg/server/backgroundprofiler",
visibility = ["//visibility:public"],
deps = ["//pkg/util/protoutil"],
)

get_x_data(name = "get_x_data")
59 changes: 59 additions & 0 deletions pkg/server/backgroundprofiler/background_profiler.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
// Copyright 2023 The Cockroach Authors.
//
// Use of this software is governed by the Business Source License
// included in the file licenses/BSL.txt.
//
// As of the Change Date specified in that file, in accordance with
// the Business Source License, use of this software will be governed
// by the Apache License, Version 2.0, included in the file
// licenses/APL.txt.

package backgroundprofiler

import (
"context"

"github.com/cockroachdb/cockroach/pkg/util/protoutil"
)

// ProfileID is a unique identifier of the operation being profiled by the
// Profiler.
type ProfileID int

// SubscriberID is a unique identifier of the Subscriber subscribing to the
// background profile collection.
type SubscriberID int

// IsSet returns true if the BackgroundProfiler is currently associated with a
// profileID.
func (r ProfileID) IsSet() bool {
return r != 0
}

// Subscriber is the interface that describes an object that can subscribe to
// the background profiler.
type Subscriber interface {
// LabelValue returns the value that will be used when setting the pprof
// labels of the Subscriber. The key of the label will always be the ProfileID
// thereby allowing us to identify all samples that describe the operation
// being profiled.
LabelValue() string
// Identifier returns the unique identifier of the Subscriber.
Identifier() SubscriberID
// ProfileID returns the unique identifier of the operation that the
// Subscriber is executing on behalf of.
ProfileID() ProfileID
}

// Profiler is the interface that exposes methods to subscribe and unsubscribe
// from a background profiler.
type Profiler interface {
// Subscribe registers the subscriber with the background profiler. This
// method returns a context wrapped with pprof labels along with a closure to
// restore the original labels of the context.
Subscribe(ctx context.Context, subscriber Subscriber) (context.Context, func())
// Unsubscribe unregisters the subscriber from the background profiler. If the
// subscriber is responsible for finishing the profile the method will also
// return metadata describing the collected profile.
Unsubscribe(subscriber Subscriber) (finishedProfile bool, msg protoutil.Message)
}
64 changes: 64 additions & 0 deletions pkg/server/backgroundprofiler/profiler/BUILD.bazel
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
load("//build/bazelutil/unused_checker:unused.bzl", "get_x_data")
load("@rules_proto//proto:defs.bzl", "proto_library")
load("@io_bazel_rules_go//go:def.bzl", "go_library", "go_test")
load("@io_bazel_rules_go//proto:def.bzl", "go_proto_library")

proto_library(
name = "profiler_proto",
srcs = ["profiler.proto"],
strip_import_prefix = "/pkg",
visibility = ["//visibility:public"],
deps = ["@com_github_gogo_protobuf//gogoproto:gogo_proto"],
)

go_proto_library(
name = "profiler_go_proto",
compilers = ["//pkg/cmd/protoc-gen-gogoroach:protoc-gen-gogoroach_compiler"],
importpath = "github.com/cockroachdb/cockroach/pkg/server/backgroundprofiler/profiler",
proto = ":profiler_proto",
visibility = ["//visibility:public"],
deps = ["@com_github_gogo_protobuf//gogoproto"],
)

go_library(
name = "profiler",
srcs = ["profiler.go"],
embed = [":profiler_go_proto"],
importpath = "github.com/cockroachdb/cockroach/pkg/server/backgroundprofiler/profiler",
visibility = ["//visibility:public"],
deps = [
"//pkg/server/backgroundprofiler",
"//pkg/server/dumpstore",
"//pkg/settings",
"//pkg/settings/cluster",
"//pkg/util/log",
"//pkg/util/pprofutil",
"//pkg/util/protoutil",
"//pkg/util/stop",
"//pkg/util/syncutil",
"//pkg/util/timeutil",
"@com_github_cockroachdb_errors//:errors",
"@com_github_google_pprof//profile",
],
)

go_test(
name = "profiler_test",
srcs = ["profiler_test.go"],
args = ["-test.timeout=295s"],
deps = [
":profiler",
"//pkg/settings/cluster",
"//pkg/testutils",
"//pkg/util/ctxgroup",
"//pkg/util/log",
"//pkg/util/stop",
"//pkg/util/tracing",
"//pkg/util/tracing/tracingpb",
"@com_github_gogo_protobuf//types",
"@com_github_google_pprof//profile",
"@com_github_stretchr_testify//require",
],
)

get_x_data(name = "get_x_data")
Loading

0 comments on commit b12cbd1

Please sign in to comment.