Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvserver: v22.2.3: nil pointer in (quotapool.RateLimiter).WaitN #96231

Closed
cockroach-teamcity opened this issue Jan 30, 2023 · 2 comments · Fixed by #100554
Closed

kvserver: v22.2.3: nil pointer in (quotapool.RateLimiter).WaitN #96231

cockroach-teamcity opened this issue Jan 30, 2023 · 2 comments · Fixed by #100554
Assignees
Labels
C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. O-sentry Originated from an in-the-wild panic report. T-kv KV Team

Comments

@cockroach-teamcity
Copy link
Member

cockroach-teamcity commented Jan 30, 2023

This issue was autofiled by Sentry. It represents a crash or reported error on a live cluster with telemetry enabled.

Sentry link: https://sentry.io/organizations/cockroach-labs/issues/3906561739/?referrer=webhooks_plugin

Panic message:

panic.go:965: runtime error: invalid memory address or nil pointer dereference
--
runtime.errorString
panic.go:965: *withstack.withStack (top exception)

Stacktrace (expand for inline code snippets):

/usr/local/go/src/runtime/panic.go#L964-L966 in runtime.gopanic
/usr/local/go/src/runtime/panic.go#L211-L213 in runtime.panicmem
/usr/local/go/src/runtime/signal_unix.go#L733-L735 in runtime.sigpanic

defer rl.putRateRequest(r)
if err := rl.qp.Acquire(ctx, r); err != nil {
return err
in pkg/util/quotapool.(*RateLimiter).WaitN
// Rate Limit the scan through the range
if err := limiter.WaitN(ctx, int64(len(unsafeKey.Key)+len(unsafeValue))); err != nil {
return err
in pkg/kv/kvserver.(*Replica).sha512.func1

cockroach/pkg/storage/mvcc.go

Lines 3985 to 3987 in af7f257

for _, f := range callbacks {
if err := f(unsafeKey, unsafeValue); err != nil {
return enginepb.MVCCStats{}, err
in pkg/storage.ComputeStatsForRange
storage.IterOptions{UpperBound: span.End.Key})
spanMS, err := storage.ComputeStatsForRange(
iter, span.Start.Key, span.End.Key, 0 /* nowNanos */, visitor,
in pkg/kv/kvserver.(*Replica).sha512
result, err := r.sha512(ctx, desc, snap, snapshot, cc.Mode, r.store.consistencyLimiter)
if err != nil {
in pkg/kv/kvserver.(*Replica).computeChecksumPostApply.func1.1
r.computeChecksumDone(ctx, cc.ChecksumID, result, snapshot)
}()
in pkg/kv/kvserver.(*Replica).computeChecksumPostApply.func1
f(ctx)
}()
in pkg/util/stop.(*Stopper).RunAsyncTaskEx.func2
/usr/local/go/src/runtime/asm_amd64.s#L1370-L1372 in runtime.goexit

/usr/local/go/src/runtime/panic.go in runtime.gopanic at line 965
/usr/local/go/src/runtime/panic.go in runtime.panicmem at line 212
/usr/local/go/src/runtime/signal_unix.go in runtime.sigpanic at line 734
pkg/util/quotapool/int_rate.go in pkg/util/quotapool.(*RateLimiter).WaitN at line 59
pkg/kv/kvserver/replica_consistency.go in pkg/kv/kvserver.(*Replica).sha512.func1 at line 581
pkg/storage/mvcc.go in pkg/storage.ComputeStatsForRange at line 3986
pkg/kv/kvserver/replica_consistency.go in pkg/kv/kvserver.(*Replica).sha512 at line 636
pkg/kv/kvserver/replica_proposal.go in pkg/kv/kvserver.(*Replica).computeChecksumPostApply.func1.1 at line 244
pkg/kv/kvserver/replica_proposal.go in pkg/kv/kvserver.(*Replica).computeChecksumPostApply.func1 at line 250
pkg/util/stop/stopper.go in pkg/util/stop.(*Stopper).RunAsyncTaskEx.func2 at line 442
/usr/local/go/src/runtime/asm_amd64.s in runtime.goexit at line 1371
Tag Value
Cockroach Release v21.2.2
Cockroach SHA: af7f257
Platform linux amd64
Distribution CCL
Environment development
Command start-single-node
Go Version ``
# of CPUs
# of Goroutines

Jira issue: CRDB-24018

@cockroach-teamcity cockroach-teamcity added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. O-sentry Originated from an in-the-wild panic report. labels Jan 30, 2023
@philip-stoev
Copy link

We are also saw this issue once in our CI with v22.2.3 :

upgrade-materialized-1  | * ERROR: a panic has occurred!
upgrade-materialized-1  | * runtime error: invalid memory address or nil pointer dereference
upgrade-materialized-1  | * (1) attached stack trace
upgrade-materialized-1  | *   -- stack trace:
upgrade-materialized-1  | *   | runtime.gopanic
upgrade-materialized-1  | *   | 	GOROOT/src/runtime/panic.go:884
upgrade-materialized-1  | *   | runtime.panicmem
upgrade-materialized-1  | *   | 	GOROOT/src/runtime/panic.go:260
upgrade-materialized-1  | *   | runtime.sigpanic
upgrade-materialized-1  | *   | 	GOROOT/src/runtime/signal_unix.go:835
upgrade-materialized-1  | *   | github.com/cockroachdb/cockroach/pkg/util/quotapool.(*RateLimiter).WaitN
upgrade-materialized-1  | *   | 	github.com/cockroachdb/cockroach/pkg/util/quotapool/int_rate.go:62
upgrade-materialized-1  | *   | github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).sha512.func1
upgrade-materialized-1  | *   | 	github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/replica_consistency.go:586
upgrade-materialized-1  | *   | github.com/cockroachdb/cockroach/pkg/storage.computeStatsForIterWithVisitors
upgrade-materialized-1  | *   | 	github.com/cockroachdb/cockroach/pkg/storage/mvcc.go:5566
upgrade-materialized-1  | *   | github.com/cockroachdb/cockroach/pkg/storage.ComputeStatsWithVisitors
upgrade-materialized-1  | *   | 	github.com/cockroachdb/cockroach/pkg/storage/mvcc.go:5465
upgrade-materialized-1  | *   | github.com/cockroachdb/cockroach/pkg/kv/kvserver/rditer.ComputeStatsForRangeWithVisitors
upgrade-materialized-1  | *   | 	github.com/cockroachdb/cockroach/pkg/kv/kvserver/rditer/stats.go:38
upgrade-materialized-1  | *   | github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).sha512
upgrade-materialized-1  | *   | 	github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/replica_consistency.go:687
upgrade-materialized-1  | *   | github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).computeChecksumPostApply.func2.1
upgrade-materialized-1  | *   | 	github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/replica_consistency.go:811
upgrade-materialized-1  | *   | github.com/cockroachdb/cockroach/pkg/util/contextutil.RunWithTimeout
upgrade-materialized-1  | *   | 	github.com/cockroachdb/cockroach/pkg/util/contextutil/context.go:91
upgrade-materialized-1  | *   | github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).computeChecksumPostApply.func2
upgrade-materialized-1  | *   | 	github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/replica_consistency.go:803
upgrade-materialized-1  | *   | github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTaskEx.func2
upgrade-materialized-1  | *   | 	github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:489
upgrade-materialized-1  | *   | runtime.goexit
upgrade-materialized-1  | *   | 	GOROOT/src/runtime/asm_amd64.s:1594
upgrade-materialized-1  | * Wraps: (2) runtime error: invalid memory address or nil pointer dereference
upgrade-materialized-1  | * Error types: (1) *withstack.withStack (2) runtime.errorString
upgrade-materialized-1  | *

@yuzefovich yuzefovich changed the title sentry: panic.go:965: runtime error: invalid memory address or nil pointer dereference -- runtime.errorString panic.go:965: *withstack.withStack (top exception) storage: v22.2.3: nil pointer in computeStatsForIterWithVisitors Feb 23, 2023
@blathers-crl blathers-crl bot added the T-storage Storage Team label Feb 23, 2023
@blathers-crl blathers-crl bot added the T-kv KV Team label Mar 1, 2023
@jbowens jbowens removed the T-storage Storage Team label Mar 1, 2023
@jbowens jbowens changed the title storage: v22.2.3: nil pointer in computeStatsForIterWithVisitors kvserver: v22.2.3: nil pointer in (quotapool.RateLimiter).WaitN Mar 1, 2023
@jbowens
Copy link
Collaborator

jbowens commented Mar 1, 2023

Looks like the quota pool backing the rate limiter is nil? Appears to be above storage.

craig bot pushed a commit that referenced this issue Apr 4, 2023
100539: sql: add telemetry for UDFs with RETURNS TABLE r=mgartner a=mgartner

Informs #100226

Release note: None

100554: kv: initialize consistencyLimiter during Store construction, before Start r=aliher1911 a=nvanbenschoten

Fixes #96231.

This commit attempts to fix #96231. It moves the initialization of `Store.consistencyLimiter` up from the bottom of `Store.Start` into `NewStore`. It is unsafe to initialize this variable after the call to `Store.processRaft`, which starts Raft processing. Beyond that point, incoming Raft traffic is permitted and calls to `computeChecksumPostApply` are possible.

The two stacktraces we have in that issue conclusively point to the `Store.consistencyLimiter` being nil during a call to `(*Replica).computeChecksumPostApply`. This startup-time race is the only possible explanation I could come up with.

Release note (bug fix): Fixed a rare startup race that could cause an `invalid memory address or nil pointer dereference` error.

100592: storage: remove unnecessary version override in unit test r=nicktrav a=jbowens

Remove an unnecessary use of cluster.MakeTestingClusterSettingsWithVersions in favor of cluster.MakeTestingClusterSettings.

Epic: None
Informs: #100552
Release note: None

Co-authored-by: Marcus Gartner <[email protected]>
Co-authored-by: Nathan VanBenschoten <[email protected]>
Co-authored-by: Jackson Owens <[email protected]>
@craig craig bot closed this as completed in 35c9283 Apr 4, 2023
blathers-crl bot pushed a commit that referenced this issue Apr 4, 2023
…tart

Fixes #96231.

This commit attempts to fix #96231. It moves the initialization of
`Store.consistencyLimiter` up from the bottom of `Store.Start` into
`NewStore`. It is unsafe to initialize this variable after the call to
`Store.processRaft`, which starts Raft processing. Beyond that point,
incoming Raft traffic is permitted and calls to `computeChecksumPostApply`
are possible.

The two stacktraces we have in that issue conclusively point to the
`Store.consistencyLimiter` being nil during a call to
`(*Replica).computeChecksumPostApply`. This startup-time race is the
only possible explanation I could come up with.

Release note (bug fix): Fixed a rare startup race that could cause a
`invalid memory address or nil pointer dereference` error.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. O-sentry Originated from an in-the-wild panic report. T-kv KV Team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants