Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storepool can panic if relocation happens before gossip update. #96654

Closed
aliher1911 opened this issue Feb 6, 2023 · 0 comments · Fixed by #96668
Closed

storepool can panic if relocation happens before gossip update. #96654

aliher1911 opened this issue Feb 6, 2023 · 0 comments · Fixed by #96668
Assignees
Labels
A-kv-distribution Relating to rebalancing and leasing. C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. T-kv KV Team

Comments

@aliher1911
Copy link
Contributor

aliher1911 commented Feb 6, 2023

When restarting node in an active cluster sometime it fails with:

I230206 14:50:58.896856 200 kv/kvserver/store_remove_replica.go:150 ⋮ [T1,n5,s5,r110/1:‹/Table/108/1/"{seatt…-washi…}›,raft] 394  removing replica r110/1
E230206 14:50:58.897133 218 1@util/log/logcrash/crash_reporting.go:188 ⋮ [T1,n5,s5,store-rebalancer] 395  a panic has occurred!
E230206 14:50:58.897133 218 1@util/log/logcrash/crash_reporting.go:188 ⋮ [T1,n5,s5,store-rebalancer] 395 +runtime error: invalid memory address or nil pointer dereference
E230206 14:50:58.897133 218 1@util/log/logcrash/crash_reporting.go:188 ⋮ [T1,n5,s5,store-rebalancer] 395 +(1) attached stack trace
E230206 14:50:58.897133 218 1@util/log/logcrash/crash_reporting.go:188 ⋮ [T1,n5,s5,store-rebalancer] 395 +  -- stack trace:
E230206 14:50:58.897133 218 1@util/log/logcrash/crash_reporting.go:188 ⋮ [T1,n5,s5,store-rebalancer] 395 +  | runtime.gopanic
E230206 14:50:58.897133 218 1@util/log/logcrash/crash_reporting.go:188 ⋮ [T1,n5,s5,store-rebalancer] 395 +  |   GOROOT/src/runtime/panic.go:884
E230206 14:50:58.897133 218 1@util/log/logcrash/crash_reporting.go:188 ⋮ [T1,n5,s5,store-rebalancer] 395 +  | runtime.panicmem
E230206 14:50:58.897133 218 1@util/log/logcrash/crash_reporting.go:188 ⋮ [T1,n5,s5,store-rebalancer] 395 +  |   GOROOT/src/runtime/panic.go:260
E230206 14:50:58.897133 218 1@util/log/logcrash/crash_reporting.go:188 ⋮ [T1,n5,s5,store-rebalancer] 395 +  | runtime.sigpanic
E230206 14:50:58.897133 218 1@util/log/logcrash/crash_reporting.go:188 ⋮ [T1,n5,s5,store-rebalancer] 395 +  |   GOROOT/src/runtime/signal_unix.go:835
E230206 14:50:58.897133 218 1@util/log/logcrash/crash_reporting.go:188 ⋮ [T1,n5,s5,store-rebalancer] 395 +  | github.com/cockroachdb/cockroach/pkg/kv/kvserver/allocator/storepool.(*StorePool).UpdateLocalStoreAfterRelocate.func1
E230206 14:50:58.897133 218 1@util/log/logcrash/crash_reporting.go:188 ⋮ [T1,n5,s5,store-rebalancer] 395 +  |   github.com/cockroachdb/cockroach/pkg/kv/kvserver/allocator/storepool/store_pool.go:624
E230206 14:50:58.897133 218 1@util/log/logcrash/crash_reporting.go:188 ⋮ [T1,n5,s5,store-rebalancer] 395 +  | github.com/cockroachdb/cockroach/pkg/kv/kvserver/allocator/storepool.(*StorePool).UpdateLocalStoreAfterRelocate
E230206 14:50:58.897133 218 1@util/log/logcrash/crash_reporting.go:188 ⋮ [T1,n5,s5,store-rebalancer] 395 +  |   github.com/cockroachdb/cockroach/pkg/kv/kvserver/allocator/storepool/store_pool.go:636
E230206 14:50:58.897133 218 1@util/log/logcrash/crash_reporting.go:188 ⋮ [T1,n5,s5,store-rebalancer] 395 +  | github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*StoreRebalancer).PostRangeRebalance
E230206 14:50:58.897133 218 1@util/log/logcrash/crash_reporting.go:188 ⋮ [T1,n5,s5,store-rebalancer] 395 +  |   github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/store_rebalancer.go:683
E230206 14:50:58.897133 218 1@util/log/logcrash/crash_reporting.go:188 ⋮ [T1,n5,s5,store-rebalancer] 395 +  | github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*StoreRebalancer).rebalanceStore
E230206 14:50:58.897133 218 1@util/log/logcrash/crash_reporting.go:188 ⋮ [T1,n5,s5,store-rebalancer] 395 +  |   github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/store_rebalancer.go:410
E230206 14:50:58.897133 218 1@util/log/logcrash/crash_reporting.go:188 ⋮ [T1,n5,s5,store-rebalancer] 395 +  | github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*StoreRebalancer).Start.func1
E230206 14:50:58.897133 218 1@util/log/logcrash/crash_reporting.go:188 ⋮ [T1,n5,s5,store-rebalancer] 395 +  |   github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/store_rebalancer.go:281
E230206 14:50:58.897133 218 1@util/log/logcrash/crash_reporting.go:188 ⋮ [T1,n5,s5,store-rebalancer] 395 +  | github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTaskEx.func2
E230206 14:50:58.897133 218 1@util/log/logcrash/crash_reporting.go:188 ⋮ [T1,n5,s5,store-rebalancer] 395 +  |   github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:470
E230206 14:50:58.897133 218 1@util/log/logcrash/crash_reporting.go:188 ⋮ [T1,n5,s5,store-rebalancer] 395 +  | runtime.goexit
E230206 14:50:58.897133 218 1@util/log/logcrash/crash_reporting.go:188 ⋮ [T1,n5,s5,store-rebalancer] 395 +  |   GOROOT/src/runtime/asm_amd64.s:1594
E230206 14:50:58.897133 218 1@util/log/logcrash/crash_reporting.go:188 ⋮ [T1,n5,s5,store-rebalancer] 395 +Wraps: (2) runtime error: invalid memory address or nil pointer dereference
E230206 14:50:58.897133 218 1@util/log/logcrash/crash_reporting.go:188 ⋮ [T1,n5,s5,store-rebalancer] 395 +Error types: (1) *withstack.withStack (2) runtime.errorString

Panic is not persistent and starting node again helps.

It seem to be caused by:

StorePool update in allocator:

updateTargets := func(targets []roachpb.ReplicationTarget) {
for _, target := range targets {
if toDetail := sp.GetStoreDetailLocked(target.StoreID); toDetail != nil {
toDetail.Desc.Capacity.RangeCount++
}
}
}

For store which received no gossip info yet.
Existing null check is pointless as GetStoreDetailLocked always returns info, but it could be unpopulated.

Jira issue: CRDB-24266

@aliher1911 aliher1911 added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. A-kv-distribution Relating to rebalancing and leasing. T-kv KV Team labels Feb 6, 2023
@aliher1911 aliher1911 self-assigned this Feb 6, 2023
@craig craig bot closed this as completed in a0f8d9b Feb 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-kv-distribution Relating to rebalancing and leasing. C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. T-kv KV Team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant