Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: inverted range in intervalSkl.AddRange #32149

Closed
tbg opened this issue Nov 5, 2018 · 25 comments · Fixed by #32492
Closed

storage: inverted range in intervalSkl.AddRange #32149

tbg opened this issue Nov 5, 2018 · 25 comments · Fixed by #32492
Assignees
Labels
C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. O-sentry Originated from an in-the-wild panic report.

Comments

@tbg
Copy link
Member

tbg commented Nov 5, 2018

https://sentry.io/cockroach-labs/cockroachdb/issues/752525882/

stopper.go:182: *errors.errorString

github.com/cockroachdb/cockroach/pkg/storage.(*Store).Send.func1

stacktrace: {u'frames': [{u'function': u'1', u'abs_path': u'/go/src/github.com/cockroachdb/cockroach/vendor/google.golang.org/grpc/server.go', u'module': u'github.com/cockroachdb/cockroach/vendor/google.golang.org/grpc.(*Server).serveStreams.func1', u'filename': u'github.com/cockroachdb/cockroach/vendor/google.golang.org/grpc/server.go', u'lineno': 680, u'in_app': False}, {u'function': u'handleStream', u'abs_path': u'/go/src/github.com/cockroachdb/cockroach/vendor/google.golang.org/grpc/server.go', u'module': u'github.com/cockroachdb/cockroach/vendor/google.golang.org/grpc.(*Server)', u'filename': u'github.com/cockroachdb/cockroach/vendor/google.golang.org/grpc/server.go', u'lineno': 1249, u'in_app': False}, {u'function': u'processUnaryRPC', u'abs_path': u'/go/src/github.com/cockroachdb/cockroach/vendor/google.golang.org/grpc/server.go', u'module': u'github.com/cockroachdb/cockroach/vendor/google.golang.org/grpc.(*Server)', u'filename': u'github.com/cockroachdb/cockroach/vendor/google.golang.org/grpc/server.go', u'lineno': 1011, u'in_app': False}, {u'function': u'_Internal_Batch_Handler', u'abs_path': u'/go/src/github.com/cockroachdb/cockroach/pkg/roachpb/api.pb.go', u'module': u'github.com/cockroachdb/cockroach/pkg/roachpb', u'filename': u'github.com/cockroachdb/cockroach/pkg/roachpb/api.pb.go', u'lineno': 6635, u'in_app': True}, {u'function': u'func1', u'abs_path': u'/go/src/github.com/cockroachdb/cockroach/pkg/rpc/context.go', u'module': u'github.com/cockroachdb/cockroach/pkg/rpc.NewServerWithInterceptor', u'filename': u'github.com/cockroachdb/cockroach/pkg/rpc/context.go', u'lineno': 197, u'in_app': True}, {u'function': u'func1', u'abs_path': u'/go/src/github.com/cockroachdb/cockroach/vendor/github.com/grpc-ecosystem/grpc-opentracing/go/otgrpc/server.go', u'module': u'github.com/cockroachdb/cockroach/vendor/github.com/grpc-ecosystem/grpc-opentracing/go/otgrpc.OpenTracingServerInterceptor', u'filename': u'github.com/cockroachdb/cockroach/vendor/github.com/grpc-ecosystem/grpc-opentracing/go/otgrpc/server.go', u'lineno': 48, u'in_app': False}, {u'function': u'func1', u'abs_path': u'/go/src/github.com/cockroachdb/cockroach/pkg/roachpb/api.pb.go', u'module': u'github.com/cockroachdb/cockroach/pkg/roachpb._Internal_Batch_Handler', u'filename': u'github.com/cockroachdb/cockroach/pkg/roachpb/api.pb.go', u'lineno': 6633, u'in_app': True}, {u'function': u'Batch', u'abs_path': u'/go/src/github.com/cockroachdb/cockroach/pkg/server/node.go', u'module': u'github.com/cockroachdb/cockroach/pkg/server.(*Node)', u'filename': u'github.com/cockroachdb/cockroach/pkg/server/node.go', u'lineno': 1015, u'in_app': True}, {u'function': u'batchInternal', u'abs_path': u'/go/src/github.com/cockroachdb/cockroach/pkg/server/node.go', u'module': u'github.com/cockroachdb/cockroach/pkg/server.(*Node)', u'filename': u'github.com/cockroachdb/cockroach/pkg/server/node.go', u'lineno': 974, u'in_app': True}, {u'function': u'RunTaskWithErr', u'abs_path': u'/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go', u'module': u'github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper)', u'filename': u'github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go', u'lineno': 303, u'in_app': True}, {u'function': u'func1', u'abs_path': u'/go/src/github.com/cockroachdb/cockroach/pkg/server/node.go', u'module': u'github.com/cockroachdb/cockroach/pkg/server.(*Node).batchInternal', u'filename': u'github.com/cockroachdb/cockroach/pkg/server/node.go', u'lineno': 987, u'in_app': True}, {u'function': u'Send', u'abs_path': u'/go/src/github.com/cockroachdb/cockroach/pkg/storage/stores.go', u'module': u'github.com/cockroachdb/cockroach/pkg/storage.(*Stores)', u'filename': u'github.com/cockroachdb/cockroach/pkg/storage/stores.go', u'lineno': 185, u'in_app': True}, {u'function': u'Send', u'abs_path': u'/go/src/github.com/cockroachdb/cockroach/pkg/storage/store.go', u'module': u'github.com/cockroachdb/cockroach/pkg/storage.(*Store)', u'filename': u'github.com/cockroachdb/cockroach/pkg/storage/store.go', u'lineno': 3167, u'in_app': True}, {u'function': u'Send', u'abs_path': u'/go/src/github.com/cockroachdb/cockroach/pkg/storage/replica.go', u'module': u'github.com/cockroachdb/cockroach/pkg/storage.(*Replica)', u'filename': u'github.com/cockroachdb/cockroach/pkg/storage/replica.go', u'lineno': 1969, u'in_app': True}, {u'function': u'sendWithRangeID', u'abs_path': u'/go/src/github.com/cockroachdb/cockroach/pkg/storage/replica.go', u'module': u'github.com/cockroachdb/cockroach/pkg/storage.(*Replica)', u'filename': u'github.com/cockroachdb/cockroach/pkg/storage/replica.go', u'lineno': 2024, u'in_app': True}, {u'function': u'executeReadOnlyBatch', u'abs_path': u'/go/src/github.com/cockroachdb/cockroach/pkg/storage/replica.go', u'module': u'github.com/cockroachdb/cockroach/pkg/storage.(*Replica)', u'filename': u'github.com/cockroachdb/cockroach/pkg/storage/replica.go', u'lineno': 3074, u'in_app': True}, {u'function': u'func1', u'abs_path': u'/go/src/github.com/cockroachdb/cockroach/pkg/storage/replica.go', u'module': u'github.com/cockroachdb/cockroach/pkg/storage.(*Replica).executeReadOnlyBatch', u'filename': u'github.com/cockroachdb/cockroach/pkg/storage/replica.go', u'lineno': 3019, u'in_app': True}, {u'function': u'done', u'abs_path': u'/go/src/github.com/cockroachdb/cockroach/pkg/storage/replica.go', u'module': u'github.com/cockroachdb/cockroach/pkg/storage.(*endCmds)', u'filename': u'github.com/cockroachdb/cockroach/pkg/storage/replica.go', u'lineno': 2139, u'in_app': True}, {u'function': u'updateTimestampCache', u'abs_path': u'/go/src/github.com/cockroachdb/cockroach/pkg/storage/replica.go', u'module': u'github.com/cockroachdb/cockroach/pkg/storage.(*Replica)', u'filename': u'github.com/cockroachdb/cockroach/pkg/storage/replica.go', u'lineno': 2208, u'in_app': True}, {u'function': u'Add', u'abs_path': u'/go/src/github.com/cockroachdb/cockroach/pkg/storage/tscache/skl_impl.go', u'module': u'github.com/cockroachdb/cockroach/pkg/storage/tscache.(*sklImpl)', u'filename': u'github.com/cockroachdb/cockroach/pkg/storage/tscache/skl_impl.go', u'lineno': 86, u'in_app': True}, {u'function': u'AddRange', u'abs_path': u'/go/src/github.com/cockroachdb/cockroach/pkg/storage/tscache/interval_skl.go', u'module': u'github.com/cockroachdb/cockroach/pkg/storage/tscache.(*intervalSkl)', u'filename': u'github.com/cockroachdb/cockroach/pkg/storage/tscache/interval_skl.go', u'lineno': 243, u'in_app': True}, {u'function': u'gopanic', u'abs_path': u'/usr/local/go/src/runtime/panic.go', u'module': u'runtime', u'filename': u'runtime/panic.go', u'lineno': 502, u'in_app': False}, {u'function': u'call64', u'abs_path': u'/usr/local/go/src/runtime/asm_amd64.s', u'module': u'runtime', u'filename': u'runtime/asm_amd64.s', u'lineno': 574, u'in_app': False}, {u'function': u'func1', u'abs_path': u'/go/src/github.com/cockroachdb/cockroach/pkg/storage/store.go', u'module': u'github.com/cockroachdb/cockroach/pkg/storage.(*Store).Send', u'filename': u'github.com/cockroachdb/cockroach/pkg/storage/store.go', u'lineno': 3046, u'in_app': True}, {u'function': u'gopanic', u'abs_path': u'/usr/local/go/src/runtime/panic.go', u'module': u'runtime', u'filename': u'runtime/panic.go', u'lineno': 502, u'in_app': False}, {u'function': u'call32', u'abs_path': u'/usr/local/go/src/runtime/asm_amd64.s', u'module': u'runtime', u'filename': u'runtime/asm_amd64.s', u'lineno': 573, u'in_app': False}]}
type: *log.safeError
value: stopper.go:182: *errors.errorString

@tbg tbg added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. O-sentry Originated from an in-the-wild panic report. labels Nov 5, 2018
@jordanlewis
Copy link
Member

github.com/cockroachdb/cockroach/pkg/storage/store.go in func1 at line 3046
Called from: runtime/asm_amd64.s in call64
github.com/cockroachdb/cockroach/pkg/storage/tscache/interval_skl.go in AddRange at line 243
github.com/cockroachdb/cockroach/pkg/storage/tscache/skl_impl.go in Add at line 86
github.com/cockroachdb/cockroach/pkg/storage/replica.go in updateTimestampCache at line 2208
github.com/cockroachdb/cockroach/pkg/storage/replica.go in done at line 2139
github.com/cockroachdb/cockroach/pkg/storage/replica.go in func1 at line 3019
github.com/cockroachdb/cockroach/pkg/storage/replica.go in executeReadOnlyBatch at line 3074
github.com/cockroachdb/cockroach/pkg/storage/replica.go in sendWithRangeID at line 2024
github.com/cockroachdb/cockroach/pkg/storage/replica.go in Send at line 1969
github.com/cockroachdb/cockroach/pkg/storage/store.go in Send at line 3167
github.com/cockroachdb/cockroach/pkg/storage/stores.go in Send at line 185
github.com/cockroachdb/cockroach/pkg/server/node.go in func1 at line 987
github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go in RunTaskWithErr at line 303
github.com/cockroachdb/cockroach/pkg/server/node.go in batchInternal at line 974
github.com/cockroachdb/cockroach/pkg/server/node.go in Batch at line 1015
github.com/cockroachdb/cockroach/pkg/roachpb/api.pb.go in func1 at line 6633
Called from: … .com/cockroachdb/cockroach/vendor/github.com/grpc-ecosystem/grpc-opentracing/go/otgrpc/server.go in func1
github.com/cockroachdb/cockroach/pkg/rpc/context.go in func1 at line 197
github.com/cockroachdb/cockroach/pkg/roachpb/api.pb.go in _Internal_Batch_Handler at line 6635
Called from: github.com/cockroachdb/cockroach/vendor/google.golang.org/grpc/server.go in processUnaryRPC

@nvanbenschoten
Copy link
Member

This panic is coming from here:

case cmp > 0:
// Starting key is after ending key. This shouldn't happen.
panic(interval.ErrInvertedRange)

We're providing an end key to the timestamp cache that sorts before the start key. I looked through the sklImpl logic, specifically around truncating tscache keys when they are too long, but couldn't find anything suspicious looking.

That means that this must be coming from

tc.Add(start, end, ts, txnID, readOnlyUseReadCache)

Its very unlikely that we're messing up the span in the ScanResponse header itself because that would have been caught in a number of other places first. That leads me to suspect that something is going wrong with the resume span. Is it possible that a BatchRequest contains multiple ScanRequests and they're all being given the same resume span end key (which lives on the ResponseHeader)? If that was the case then a scan could observe the resume span for a difference scan and get an inverted end key. That seems too crazy to be possible.

@nvanbenschoten
Copy link
Member

Never mind, I was mixing up the ResponseHeader and the BatchResponse_Header. Maybe there's an error in the resume span generation.

@tbg
Copy link
Member Author

tbg commented Nov 13, 2018

Another unsubstantiated theory: could be a bit flip in the end key, which could easily reverse the ordering.

nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Nov 15, 2018
Closes cockroachdb#32149 as unactionable past this point.

Release note: None
@nvanbenschoten
Copy link
Member

Yeah, it's tough to say on these. I think the best we can do is add more info to this panic (#32397) and see if it ever happens again.

craig bot pushed a commit that referenced this issue Nov 15, 2018
32397: storage/tscache: improve debug info in panic msg for inverted range r=nvanbenschoten a=nvanbenschoten

Closes #32149 as unactionable past this point.

Release note: None

Co-authored-by: Nathan VanBenschoten <[email protected]>
@craig craig bot closed this as completed in #32397 Nov 15, 2018
@danhhz
Copy link
Contributor

danhhz commented Nov 16, 2018

This just reprod on my roachprod cluster, but sadly I was pulled to hours short of your better panic message patch. @nvanbenschoten anything I should save here before I wipe the cluster?

@danhhz
Copy link
Contributor

danhhz commented Nov 16, 2018

Nevermind, it's not like there's any state to recover here. The relevant logs for posterity.

E181116 15:46:41.239816 71734 util/log/crash_reporting.go:203  [n1] a panic has occurred!
E181116 15:46:41.268990 71734 util/log/crash_reporting.go:477  [n1] Reported as error b471182342cf4124be61fcc01ffe16af
E181116 15:46:41.271737 71734 sql/conn_executor.go:633  [n1,client=10.142.0.2:33080,user=root] a SQL panic has occurred while executing "SELECT o_id, o_entry_d, o_carrier_id FROM \"order\" WHERE ((o_w_id = $1) AND (o_d_id = $2)) AND (o_
c_id = $3) ORDER BY o_id DESC LIMIT 1": interval: inverted range
E181116 15:46:41.273614 71734 util/log/crash_reporting.go:203  [n1,client=10.142.0.2:33080,user=root] a panic has occurred!
E181116 15:46:41.275977 71734 util/log/crash_reporting.go:477  [n1,client=10.142.0.2:33080,user=root] Reported as error d0e5c0bef7324b90b57bc1ca2b7c4276
panic: interval: inverted range [recovered]
        panic: interval: inverted range [recovered]
        panic: interval: inverted range [recovered]
        panic: panic while executing 1 statements: SELECT _, _, _ FROM _ WHERE ((_ = $1) AND (_ = $2)) AND (_ = $3) ORDER BY _ DESC LIMIT _; caused by interval: inverted range

goroutine 71734 [running]:
github.com/cockroachdb/cockroach/pkg/sql.(*connExecutor).closeWrapper(0xc42d165900, 0x3293860, 0xc4322bae80, 0x2a57c80, 0x4b51080)
        /go/src/github.com/cockroachdb/cockroach/pkg/sql/conn_executor.go:647 +0x36f
github.com/cockroachdb/cockroach/pkg/sql.(*Server).ServeConn.func1(0xc42d165900, 0x3293860, 0xc4322bae80)
        /go/src/github.com/cockroachdb/cockroach/pkg/sql/conn_executor.go:388 +0x61
panic(0x2a57c80, 0x4b51080)
        /usr/local/go/src/runtime/panic.go:502 +0x229
github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).Recover(0xc4206166c0, 0x3293920, 0xc44a08a600)
        /go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:184 +0x11f
panic(0x2a57c80, 0x4b51080)
        /usr/local/go/src/runtime/panic.go:502 +0x229
github.com/cockroachdb/cockroach/pkg/storage.(*Store).Send.func1(0xc43a31c7f8, 0xc43a31c878, 0x1567a5598108f7d3, 0x0, 0xc43a31c870, 0xc420c06580)
        /go/src/github.com/cockroachdb/cockroach/pkg/storage/store.go:2882 +0x273
panic(0x2a57c80, 0x4b51080)
        /usr/local/go/src/runtime/panic.go:502 +0x229
github.com/cockroachdb/cockroach/pkg/storage/tscache.(*intervalSkl).AddRange(0xc4201e8f30, 0xc43337cf40, 0xc, 0xc, 0xc442945848, 0x7, 0x7, 0x2, 0x1567a55980715582, 0x0, ...)
        /go/src/github.com/cockroachdb/cockroach/pkg/storage/tscache/interval_skl.go:243 +0x2ab
github.com/cockroachdb/cockroach/pkg/storage/tscache.(*sklImpl).Add(0xc420cac680, 0xc43337cf40, 0xc, 0xc, 0xc442945848, 0x7, 0x7, 0x1567a55980715582, 0x0, 0x614d01283fa18622, ...)
        /go/src/github.com/cockroachdb/cockroach/pkg/storage/tscache/skl_impl.go:86 +0x1b5
github.com/cockroachdb/cockroach/pkg/storage.(*Replica).updateTimestampCache(0xc4296d6e00, 0xc433ae82a8, 0xc444716420, 0x0)
        /go/src/github.com/cockroachdb/cockroach/pkg/storage/replica.go:2265 +0x9e5
github.com/cockroachdb/cockroach/pkg/storage.(*endCmds).done(0xc433ae8280, 0xc444716420, 0x0, 0x0)
        /go/src/github.com/cockroachdb/cockroach/pkg/storage/replica.go:2186 +0xc6
github.com/cockroachdb/cockroach/pkg/storage.(*Replica).executeReadOnlyBatch.func1(0xc433ae8280, 0xc43a31c040, 0xc43a31c048)
        /go/src/github.com/cockroachdb/cockroach/pkg/storage/replica.go:3135 +0x4e
github.com/cockroachdb/cockroach/pkg/storage.(*Replica).executeReadOnlyBatch(0xc4296d6e00, 0x3293920, 0xc44a08a690, 0x1567a55980715582, 0x0, 0x100000001, 0x1, 0x6a9, 0x0, 0xc442c22e00, ...)
        /go/src/github.com/cockroachdb/cockroach/pkg/storage/replica.go:3190 +0x726
github.com/cockroachdb/cockroach/pkg/storage.(*Replica).sendWithRangeID(0xc4296d6e00, 0x3293920, 0xc44a08a690, 0x6a9, 0x1567a55980715582, 0x0, 0x100000001, 0x1, 0x6a9, 0x0, ...)
        /go/src/github.com/cockroachdb/cockroach/pkg/storage/replica.go:2066 +0x2c2
github.com/cockroachdb/cockroach/pkg/storage.(*Replica).Send(0xc4296d6e00, 0x3293920, 0xc44a08a660, 0x1567a55980715582, 0x0, 0x100000001, 0x1, 0x6a9, 0x0, 0xc442c22e00, ...)
        /go/src/github.com/cockroachdb/cockroach/pkg/storage/replica.go:2011 +0x90
github.com/cockroachdb/cockroach/pkg/storage.(*Store).Send(0xc420c06580, 0x3293920, 0xc44a08a660, 0x1567a55980715582, 0x0, 0x100000001, 0x1, 0x6a9, 0x0, 0xc442c22e00, ...)
        /go/src/github.com/cockroachdb/cockroach/pkg/storage/store.go:3003 +0x60c
github.com/cockroachdb/cockroach/pkg/storage.(*Stores).Send(0xc4208199a0, 0x3293920, 0xc44a08a600, 0x0, 0x0, 0x100000001, 0x1, 0x6a9, 0x0, 0xc442c22e00, ...)
        /go/src/github.com/cockroachdb/cockroach/pkg/storage/stores.go:185 +0xdb
github.com/cockroachdb/cockroach/pkg/server.(*Node).batchInternal.func1(0x3293920, 0xc44a08a600, 0x0, 0x0)
        /go/src/github.com/cockroachdb/cockroach/pkg/server/node.go:987 +0x1c1
github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunTaskWithErr(0xc4206166c0, 0x3293920, 0xc44a08a600, 0x2d081fd, 0x10, 0xc43a31ca60, 0x0, 0x0)
        /go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:303 +0xed
github.com/cockroachdb/cockroach/pkg/server.(*Node).batchInternal(0xc4204e1800, 0x3293920, 0xc44a08a600, 0xc449c86e00, 0xc44a08a600, 0xc43a31cae8, 0x790b35)
        /go/src/github.com/cockroachdb/cockroach/pkg/server/node.go:974 +0x165
github.com/cockroachdb/cockroach/pkg/server.(*Node).Batch(0xc4204e1800, 0x3293920, 0xc44a08a600, 0xc449c86e00, 0x0, 0xc43a31cb60, 0x7921cc)
        /go/src/github.com/cockroachdb/cockroach/pkg/server/node.go:1015 +0x9c
github.com/cockroachdb/cockroach/pkg/rpc.internalClientAdapter.Batch(0x3268a60, 0xc4204e1800, 0x3293920, 0xc44a08a5d0, 0xc449c86e00, 0x0, 0x0, 0x0, 0xc44a08a540, 0x0, ...)
        /go/src/github.com/cockroachdb/cockroach/pkg/rpc/context.go:431 +0x4f
github.com/cockroachdb/cockroach/pkg/kv.(*grpcTransport).sendBatch(0xc44a08a5a0, 0x3293920, 0xc44a08a5d0, 0x3272720, 0xc4203c80b0, 0x0, 0x0, 0x100000001, 0x1, 0x6a9, ...)
        /go/src/github.com/cockroachdb/cockroach/pkg/kv/transport.go:199 +0x138
github.com/cockroachdb/cockroach/pkg/kv.(*grpcTransport).SendNext(0xc44a08a5a0, 0x3293920, 0xc44a08a540, 0x0, 0x0, 0x100000001, 0x1, 0x6a9, 0x0, 0xc442c22e00, ...)
        /go/src/github.com/cockroachdb/cockroach/pkg/kv/transport.go:169 +0x138
github.com/cockroachdb/cockroach/pkg/kv.(*DistSender).sendToReplicas(0xc4204a2500, 0x3293920, 0xc44a08a540, 0xc4204a2550, 0x6a9, 0xc46880a0f0, 0x3, 0x3, 0x0, 0x0, ...)
        /go/src/github.com/cockroachdb/cockroach/pkg/kv/dist_sender.go:1325 +0x30a
github.com/cockroachdb/cockroach/pkg/kv.(*DistSender).sendRPC(0xc4204a2500, 0x3293920, 0xc44a08a540, 0x6a9, 0xc46880a0f0, 0x3, 0x3, 0x0, 0x0, 0x0, ...)
        /go/src/github.com/cockroachdb/cockroach/pkg/kv/dist_sender.go:392 +0x27c
github.com/cockroachdb/cockroach/pkg/kv.(*DistSender).sendSingleRange(0xc4204a2500, 0x3293920, 0xc44a08a540, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xc442c22e00, ...)
        /go/src/github.com/cockroachdb/cockroach/pkg/kv/dist_sender.go:470 +0x231
github.com/cockroachdb/cockroach/pkg/kv.(*DistSender).sendPartialBatch(0xc4204a2500, 0x3293920, 0xc44a08a540, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xc442c22e00, ...)
        /go/src/github.com/cockroachdb/cockroach/pkg/kv/dist_sender.go:1101 +0x322
github.com/cockroachdb/cockroach/pkg/kv.(*DistSender).divideAndSendBatchToRanges(0xc4204a2500, 0x3293920, 0xc44a08a540, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xc442c22e00, ...)
        /go/src/github.com/cockroachdb/cockroach/pkg/kv/dist_sender.go:772 +0x1364
github.com/cockroachdb/cockroach/pkg/kv.(*DistSender).Send(0xc4204a2500, 0x3293920, 0xc44a08a540, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xc442c22d00, ...)
        /go/src/github.com/cockroachdb/cockroach/pkg/kv/dist_sender.go:684 +0x4e3
github.com/cockroachdb/cockroach/pkg/kv.(*txnLockGatekeeper).SendLocked(0xc45d35abc8, 0x3293920, 0xc44a08a540, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xc442c22d00, ...)
        /go/src/github.com/cockroachdb/cockroach/pkg/kv/txn_coord_sender.go:234 +0xf5
github.com/cockroachdb/cockroach/pkg/kv.(*txnMetrics).SendLocked(0xc45d35ab90, 0x3293920, 0xc44a08a540, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xc442c22d00, ...)
        /go/src/github.com/cockroachdb/cockroach/pkg/kv/txn_interceptor_metrics.go:58 +0x12d
github.com/cockroachdb/cockroach/pkg/kv.(*txnSpanRefresher).sendLockedWithRefreshAttempts(0xc45d35aaf8, 0x3293920, 0xc44a08a540, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xc442c22d00, ...)
        /go/src/github.com/cockroachdb/cockroach/pkg/kv/txn_interceptor_span_refresher.go:167 +0x98
github.com/cockroachdb/cockroach/pkg/kv.(*txnSpanRefresher).SendLocked(0xc45d35aaf8, 0x3293920, 0xc44a08a540, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xc442c22d00, ...)
        /go/src/github.com/cockroachdb/cockroach/pkg/kv/txn_interceptor_span_refresher.go:105 +0x11e
github.com/cockroachdb/cockroach/pkg/kv.(*txnPipeliner).SendLocked(0xc45d35aa78, 0x3293920, 0xc44a08a540, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xc442c22d00, ...)
        /go/src/github.com/cockroachdb/cockroach/pkg/kv/txn_interceptor_pipeliner.go:161 +0x165
github.com/cockroachdb/cockroach/pkg/kv.(*txnIntentCollector).SendLocked(0xc45d35aa38, 0x3293920, 0xc44a08a540, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xc442c22d00, ...)
        /go/src/github.com/cockroachdb/cockroach/pkg/kv/txn_interceptor_intent_collector.go:106 +0x45d
github.com/cockroachdb/cockroach/pkg/kv.(*txnSeqNumAllocator).SendLocked(0xc45d35ab78, 0x3293920, 0xc44a08a540, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xc442c22d00, ...)
        /go/src/github.com/cockroachdb/cockroach/pkg/kv/txn_interceptor_sequence_nums.go:62 +0x1f3
github.com/cockroachdb/cockroach/pkg/kv.(*txnHeartbeat).SendLocked(0xc45d35a990, 0x3293920, 0xc44a08a540, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xc442c22d00, ...)
        /go/src/github.com/cockroachdb/cockroach/pkg/kv/txn_interceptor_heartbeat.go:230 +0x542
github.com/cockroachdb/cockroach/pkg/kv.(*TxnCoordSender).Send(0xc45d35a800, 0x3293920, 0xc44a08a540, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xc442c22d00, ...)
        /go/src/github.com/cockroachdb/cockroach/pkg/kv/txn_coord_sender.go:648 +0x531
github.com/cockroachdb/cockroach/pkg/internal/client.(*DB).sendUsingSender(0xc420522280, 0x3293920, 0xc44a08a4e0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
        /go/src/github.com/cockroachdb/cockroach/pkg/internal/client/db.go:623 +0x135
github.com/cockroachdb/cockroach/pkg/internal/client.(*Txn).Send(0xc44cf8e990, 0x3293920, 0xc44a08a4e0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
        /go/src/github.com/cockroachdb/cockroach/pkg/internal/client/txn.go:806 +0x14c
github.com/cockroachdb/cockroach/pkg/sql/row.(*txnKVFetcher).fetch(0xc44479c5b0, 0x3293920, 0xc44a08a4e0, 0xc420559980, 0xc4206166c0)
        /go/src/github.com/cockroachdb/cockroach/pkg/sql/row/kvfetcher.go:227 +0x597
github.com/cockroachdb/cockroach/pkg/sql/row.(*txnKVFetcher).nextBatch(0xc44479c5b0, 0x3293920, 0xc44a08a4e0, 0x0, 0x0, 0x0, 0x0, 0xc43a31fc68, 0x6d0bc6, 0xc44a08a510, ...)
        /go/src/github.com/cockroachdb/cockroach/pkg/sql/row/kvfetcher.go:294 +0x6d
github.com/cockroachdb/cockroach/pkg/sql/row.(*Fetcher).nextKV(0xc44627a498, 0x3293920, 0xc44a08a4e0, 0x1, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
        /go/src/github.com/cockroachdb/cockroach/pkg/sql/row/fetcher.go:522 +0x2cf
github.com/cockroachdb/cockroach/pkg/sql/row.(*Fetcher).NextKey(0xc44627a498, 0x3293920, 0xc44a08a4e0, 0x68, 0x0, 0x0)
        /go/src/github.com/cockroachdb/cockroach/pkg/sql/row/fetcher.go:541 +0x8a
github.com/cockroachdb/cockroach/pkg/sql/row.(*Fetcher).StartScanFrom(0xc44627a498, 0x3293920, 0xc44a08a4e0, 0x326dea0, 0xc44479c5b0, 0x2, 0xcccc01)
        /go/src/github.com/cockroachdb/cockroach/pkg/sql/row/fetcher.go:490 +0xd9
github.com/cockroachdb/cockroach/pkg/sql/row.(*Fetcher).StartScan(0xc44627a498, 0x3293920, 0xc44a08a4e0, 0xc44cf8e990, 0xc443e1cf00, 0x1, 0x1, 0xc43413d801, 0x1, 0xc43e490800, ...)
        /go/src/github.com/cockroachdb/cockroach/pkg/sql/row/fetcher.go:478 +0x208
github.com/cockroachdb/cockroach/pkg/sql/distsqlrun.(*tableReader).Start(0xc44627a000, 0x3293920, 0xc44a08a3c0, 0xc43a320118, 0x199c4b0)
        /go/src/github.com/cockroachdb/cockroach/pkg/sql/distsqlrun/tablereader.go:248 +0x173
github.com/cockroachdb/cockroach/pkg/sql/distsqlrun.(*indexJoiner).Start(0xc4695cf000, 0x3293920, 0xc44a08a3c0, 0xc442945948, 0x285c000)
        /go/src/github.com/cockroachdb/cockroach/pkg/sql/distsqlrun/indexjoiner.go:130 +0x52
github.com/cockroachdb/cockroach/pkg/sql/distsqlrun.(*ProcessorBase).Run(0xc4695cf000, 0x3293920, 0xc44a08a3c0, 0x0)
        /go/src/github.com/cockroachdb/cockroach/pkg/sql/distsqlrun/processors.go:767 +0x58
github.com/cockroachdb/cockroach/pkg/sql/distsqlrun.(*Flow).StartSync(0xc429e3c8c0, 0x3293920, 0xc44a08a3c0, 0x2e51158, 0xc445d4f420, 0x3282ca0)
        /go/src/github.com/cockroachdb/cockroach/pkg/sql/distsqlrun/flow.go:630 +0x191
github.com/cockroachdb/cockroach/pkg/sql.(*DistSQLPlanner).Run(0xc420c92000, 0xc4447162a0, 0xc44cf8e990, 0xc43a320a18, 0xc436cc78c0, 0xc42d165dc0, 0x0)
        /go/src/github.com/cockroachdb/cockroach/pkg/sql/distsql_running.go:261 +0x868
github.com/cockroachdb/cockroach/pkg/sql.(*DistSQLPlanner).PlanAndRun(0xc420c92000, 0x3293920, 0xc42d25ac90, 0xc42d165dc0, 0xc4447162a0, 0xc44cf8e990, 0x3283960, 0xc468134840, 0xc436cc78c0)
        /go/src/github.com/cockroachdb/cockroach/pkg/sql/distsql_running.go:792 +0x24c
github.com/cockroachdb/cockroach/pkg/sql.(*connExecutor).execWithDistSQLEngine(0xc42d165900, 0x3293920, 0xc42d25ac90, 0xc42d165d28, 0x3, 0x7f34b8bd3748, 0xc44e0327e0, 0xc433067800, 0x0, 0x0)
        /go/src/github.com/cockroachdb/cockroach/pkg/sql/conn_executor_exec.go:998 +0x27a
github.com/cockroachdb/cockroach/pkg/sql.(*connExecutor).dispatchToExecutionEngine(0xc42d165900, 0x3293920, 0xc42d25ac90, 0x3297860, 0xc44d79c400, 0xc433067800, 0x3, 0x3, 0xc44fba2b40, 0x86, ...)
        /go/src/github.com/cockroachdb/cockroach/pkg/sql/conn_executor_exec.go:840 +0xa96
github.com/cockroachdb/cockroach/pkg/sql.(*connExecutor).execStmtInOpenState(0xc42d165900, 0x3293920, 0xc42d25ac90, 0x3297860, 0xc44d79c400, 0xc433067800, 0x3, 0x3, 0xc44fba2b40, 0x86, ...)
        /go/src/github.com/cockroachdb/cockroach/pkg/sql/conn_executor_exec.go:416 +0xe7a
github.com/cockroachdb/cockroach/pkg/sql.(*connExecutor).execStmt(0xc42d165900, 0x3293920, 0xc42d25ac90, 0x3297860, 0xc44d79c400, 0xc433067800, 0x3, 0x3, 0xc44fba2b40, 0x86, ...)
        /go/src/github.com/cockroachdb/cockroach/pkg/sql/conn_executor_exec.go:99 +0x341
github.com/cockroachdb/cockroach/pkg/sql.(*connExecutor).run(0xc42d165900, 0x3293860, 0xc4322bae80, 0xc420e340d8, 0x5400, 0x15000, 0xc420e34170, 0xc429be5600, 0x0, 0x0)
        /go/src/github.com/cockroachdb/cockroach/pkg/sql/conn_executor.go:1170 +0x13f1
github.com/cockroachdb/cockroach/pkg/sql.(*Server).ServeConn(0xc420e40000, 0x3293860, 0xc4322bae80, 0xc42d165900, 0x5400, 0x15000, 0xc420e34170, 0xc429be5600, 0x0, 0x0)
        /go/src/github.com/cockroachdb/cockroach/pkg/sql/conn_executor.go:390 +0xce
github.com/cockroachdb/cockroach/pkg/sql/pgwire.(*conn).serveImpl.func4(0xc420e40000, 0x3293860, 0xc4322bae80, 0xc42d165900, 0x5400, 0x15000, 0xc420e34170, 0xc429be5600, 0xc429be5610, 0xc42ec68930)
        /go/src/github.com/cockroachdb/cockroach/pkg/sql/pgwire/conn.go:309 +0x81
created by github.com/cockroachdb/cockroach/pkg/sql/pgwire.(*conn).serveImpl
        /go/src/github.com/cockroachdb/cockroach/pkg/sql/pgwire/conn.go:308 +0x107c

@tbg
Copy link
Member Author

tbg commented Nov 16, 2018

The "better panic" actually doesn't log the keys. It should do so at least to the local logs.

@tbg tbg reopened this Nov 16, 2018
@nvanbenschoten
Copy link
Member

Logging the keys before hitting the panic is a good idea now that we know this reproduces occasionally. I'll do that.

In the meantime, I don't think there's much to do with your cluster @danhhz. It is interesting that the query had a LIMIT. That makes it a little more likely that this is an issue with ResumeSpans.

@nvanbenschoten nvanbenschoten changed the title sentry: stopper.go:182: *errors.errorString storage: inverted range in intervalSkl.AddRange Nov 16, 2018
@nvanbenschoten
Copy link
Member

We just saw this fail in #30886: panic: inverted range: key lens = [14,9), diff @ index 0.

The short lengths rule out the possibility that the key inversion is caused by sklImpl.boundKeyLengths.

nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Nov 16, 2018
craig bot pushed a commit that referenced this issue Nov 16, 2018
32424: storage/tscache: locally log full keys for inverted range Fatal r=nvanbenschoten a=nvanbenschoten

Informs #32149.

Release note: None

Co-authored-by: Nathan VanBenschoten <[email protected]>
@nvanbenschoten
Copy link
Member

Another repro: inverted range (issue #32149): key lens = [8,8), diff @ index 0, [( k key , key-26 )

@nvanbenschoten
Copy link
Member

So if I'm reading this right, the keys are "( k key " and " key-26 ". This seems to be failing reliably enough now that we'll be able to find a pattern in here. We'll probably also want some logging around resume spans.

@petermattis
Copy link
Collaborator

What workload is generating keys like that?

Note:

hex("( k key ") == {0x28, 0x20, 0x6b, 0x20, 0x6b, 0x65, 0x79, 0x20, 0x0}
hex(" key-26 ") == {0x20, 0x6b, 0x65, 0x79, 0x2d, 0x32, 0x36, 0x20, 0x0}

The first byte of these keys differs by a single bit:

0x28 == 101000
0x20 == 100000

I'm not sure if that is adequate evidence of a bit flip. I'd like to see more reproductions of this.

@tbg
Copy link
Member Author

tbg commented Nov 19, 2018

More repros, you say? #32446:

For your convenience:

package main

import (
	"encoding/hex"
	"fmt"
)

func main() {
	b1 := []byte("J�†��")
	b2 := []byte("�½‰É�")
	fmt.Println(b1)
	fmt.Println(b2)
}
E181117 11:37:17.044978 12003 storage/tscache/interval_skl.go:254  inverted range (issue #32149): key lens = [5,5), diff @ index 0, [J�†��,�½‰É�)
panic: inverted range (issue #32149): key lens = [5,5), diff @ index 0 [recovered]
	panic: inverted range (issue #32149): key lens = [5,5), diff @ index 0 [recovered]
	panic: inverted range (issue #32149): key lens = [5,5), diff @ index 0 [recovered]
	panic: inverted range (issue #32149): key lens = [5,5), diff @ index 0

goroutine 12003 [running]:
github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).Recover(0xc421b96360, 0x3d96ec0, 0xc421405920)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:184 +0x12e
panic(0x33bc7c0, 0xc421ccde00)
	/usr/local/go/src/runtime/panic.go:502 +0x24a
github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).Recover(0xc421b963f0, 0x3d96ec0, 0xc4228bc7e0)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:184 +0x12e
panic(0x33bc7c0, 0xc421ccde00)
	/usr/local/go/src/runtime/panic.go:502 +0x24a
github.com/cockroachdb/cockroach/pkg/storage.(*Store).Send.func1(0xc4220658d8, 0xc422065958, 0x1567e651f2ed3e35, 0x0, 0xc422065950, 0xc420776000)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/store.go:2883 +0x6f3
panic(0x33bc7c0, 0xc421ccde00)
	/usr/local/go/src/runtime/panic.go:502 +0x24a
github.com/cockroachdb/cockroach/pkg/storage/tscache.(*intervalSkl).AddRange(0xc421ba23f0, 0xc4228d0018, 0x5, 0x5, 0xc42287cfb8, 0x5, 0x5, 0x2, 0x1567e651f29915d8, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/tscache/interval_skl.go:255 +0x8d4
github.com/cockroachdb/cockroach/pkg/storage/tscache.(*sklImpl).Add(0xc421348140, 0xc4228d0018, 0x5, 0x5, 0xc42287cfb8, 0x5, 0x5, 0x1567e651f29915d8, 0x0, 0xc44f8478ba641c2e, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/tscache/skl_impl.go:86 +0x26d
github.com/cockroachdb/cockroach/pkg/storage.(*Replica).updateTimestampCache(0xc422da8000, 0xc421c44028, 0xc421a80060, 0x0)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/replica.go:2265 +0xd6f
github.com/cockroachdb/cockroach/pkg/storage.(*endCmds).done(0xc421c44000, 0xc421a80060, 0x0, 0x0)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/replica.go:2186 +0x173
github.com/cockroachdb/cockroach/pkg/storage.(*Replica).executeReadOnlyBatch.func1(0xc421c44000, 0xc422065038, 0xc422065040)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/replica.go:3135 +0x78
github.com/cockroachdb/cockroach/pkg/storage.(*Replica).executeReadOnlyBatch(0xc422da8000, 0x3d96ec0, 0xc421ccdc80, 0x1567e651f29915d8, 0x0, 0x100000001, 0x1, 0x1, 0x0, 0xc421ad5f00, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/replica.go:3190 +0x92a
github.com/cockroachdb/cockroach/pkg/storage.(*Replica).sendWithRangeID(0xc422da8000, 0x3d96ec0, 0xc421ccdc80, 0x1, 0x1567e651f29915d8, 0x0, 0x100000001, 0x1, 0x1, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/replica.go:2066 +0x337
github.com/cockroachdb/cockroach/pkg/storage.(*Replica).Send(0xc422da8000, 0x3d96ec0, 0xc421ccdc50, 0x1567e651f29915d8, 0x0, 0x100000001, 0x1, 0x1, 0x0, 0xc421ad5f00, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/replica.go:2011 +0xb6
github.com/cockroachdb/cockroach/pkg/storage.(*Store).Send(0xc420776000, 0x3d96ec0, 0xc421ccdc50, 0x1567e651f29915d8, 0x0, 0x100000001, 0x1, 0x1, 0x0, 0xc421ad5f00, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/store.go:3004 +0x745
github.com/cockroachdb/cockroach/pkg/storage.(*Stores).Send(0xc4226b6630, 0x3d96ec0, 0xc4228bc7e0, 0x0, 0x0, 0x100000001, 0x1, 0x1, 0x0, 0xc421ad5f00, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/stores.go:185 +0x134
github.com/cockroachdb/cockroach/pkg/server.(*Node).batchInternal.func1(0x3d96ec0, 0xc4228bc7e0, 0x0, 0x0)
	/go/src/github.com/cockroachdb/cockroach/pkg/server/node.go:987 +0x213
github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunTaskWithErr(0xc421b963f0, 0x3d96ec0, 0xc4228bc7e0, 0x34ff249, 0x10, 0xc422065ba8, 0x0, 0x0)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:303 +0xfb
github.com/cockroachdb/cockroach/pkg/server.(*Node).batchInternal(0xc421bc6c00, 0x3d96ec0, 0xc4228bc7e0, 0xc421aab000, 0xc4228bc7e0, 0x15000568eb70, 0x20)
	/go/src/github.com/cockroachdb/cockroach/pkg/server/node.go:974 +0x2b3
github.com/cockroachdb/cockroach/pkg/server.(*Node).Batch(0xc421bc6c00, 0x3d96ec0, 0xc4228bc7e0, 0xc421aab000, 0x0, 0xc422065cc8, 0x9e52a5)
	/go/src/github.com/cockroachdb/cockroach/pkg/server/node.go:1015 +0xbb
github.com/cockroachdb/cockroach/pkg/rpc.internalClientAdapter.Batch(0x3d6e7c0, 0xc421bc6c00, 0x3d96ec0, 0xc4228bc7b0, 0xc421aab000, 0x0, 0x0, 0x0, 0x9e52a5, 0x8a241c, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/rpc/context.go:431 +0x64
github.com/cockroachdb/cockroach/pkg/kv.(*grpcTransport).sendBatch(0xc4228bc780, 0x3d96ec0, 0xc4228bc7b0, 0x3d776c0, 0xc4203b8190, 0x0, 0x0, 0x100000001, 0x1, 0x1, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/transport.go:199 +0x18e
github.com/cockroachdb/cockroach/pkg/kv.(*grpcTransport).SendNext(0xc4228bc780, 0x3d96ec0, 0xc4228bc6f0, 0x0, 0x0, 0x100000001, 0x1, 0x1, 0x0, 0xc421ad5f00, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/transport.go:169 +0x1aa
github.com/cockroachdb/cockroach/pkg/kv.raceTransport.SendNext(0x3da1080, 0xc4228bc780, 0x3d96ec0, 0xc4228bc6f0, 0x0, 0x0, 0x0, 0x0, 0x1, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/transport_race.go:83 +0x3d1
github.com/cockroachdb/cockroach/pkg/kv.(*DistSender).sendToReplicas(0xc42266e300, 0x3d96ec0, 0xc4228bc6f0, 0xc42266e350, 0x1, 0xc42034b300, 0x1, 0x1, 0x0, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/dist_sender.go:1325 +0x3a6
github.com/cockroachdb/cockroach/pkg/kv.(*DistSender).sendRPC(0xc42266e300, 0x3d96ec0, 0xc4228bc6f0, 0x1, 0xc42034b300, 0x1, 0x1, 0x0, 0x0, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/dist_sender.go:392 +0x2f6
github.com/cockroachdb/cockroach/pkg/kv.(*DistSender).sendSingleRange(0xc42266e300, 0x3d96ec0, 0xc4228bc6f0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xc421ad5f00, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/dist_sender.go:470 +0x1bf
github.com/cockroachdb/cockroach/pkg/kv.(*DistSender).sendPartialBatch(0xc42266e300, 0x3d96ec0, 0xc4228bc6f0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xc421ad5f00, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/dist_sender.go:1101 +0x3db
github.com/cockroachdb/cockroach/pkg/kv.(*DistSender).divideAndSendBatchToRanges(0xc42266e300, 0x3d96ec0, 0xc4228bc6f0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xc421ad5f00, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/dist_sender.go:772 +0x17d5
github.com/cockroachdb/cockroach/pkg/kv.(*DistSender).Send(0xc42266e300, 0x3d96ec0, 0xc4228bc6f0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xc421ad5e00, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/dist_sender.go:684 +0x687
github.com/cockroachdb/cockroach/pkg/kv.(*txnLockGatekeeper).SendLocked(0xc4235f1bd0, 0x3d96ec0, 0xc4228bc6f0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xc421ad5e00, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/txn_coord_sender.go:234 +0x145
github.com/cockroachdb/cockroach/pkg/kv.(*txnMetrics).SendLocked(0xc4235f1b98, 0x3d96ec0, 0xc4228bc6f0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xc421ad5e00, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/txn_interceptor_metrics.go:58 +0x1c5
github.com/cockroachdb/cockroach/pkg/kv.(*txnSpanRefresher).sendLockedWithRefreshAttempts(0xc4235f1b00, 0x3d96ec0, 0xc4228bc6f0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xc421ad5e00, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/txn_interceptor_span_refresher.go:167 +0xbe
github.com/cockroachdb/cockroach/pkg/kv.(*txnSpanRefresher).SendLocked(0xc4235f1b00, 0x3d96ec0, 0xc4228bc6f0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xc421ad5e00, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/txn_interceptor_span_refresher.go:105 +0x152
github.com/cockroachdb/cockroach/pkg/kv.(*txnPipeliner).SendLocked(0xc4235f1a80, 0x3d96ec0, 0xc4228bc6f0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xc421ad5e00, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/txn_interceptor_pipeliner.go:161 +0x1be
github.com/cockroachdb/cockroach/pkg/kv.(*txnIntentCollector).SendLocked(0xc4235f1a40, 0x3d96ec0, 0xc4228bc6f0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xc421ad5e00, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/txn_interceptor_intent_collector.go:106 +0x5fe
github.com/cockroachdb/cockroach/pkg/kv.(*txnSeqNumAllocator).SendLocked(0xc4235f1b80, 0x3d96ec0, 0xc4228bc6f0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xc421ad5e00, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/txn_interceptor_sequence_nums.go:62 +0x341
github.com/cockroachdb/cockroach/pkg/kv.(*txnHeartbeat).SendLocked(0xc4235f1998, 0x3d96ec0, 0xc4228bc6f0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xc421ad5e00, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/txn_interceptor_heartbeat.go:230 +0x1cc
github.com/cockroachdb/cockroach/pkg/kv.(*TxnCoordSender).Send(0xc4235f1800, 0x3d96ec0, 0xc4228bc6f0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xc421ad5e00, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/txn_coord_sender.go:648 +0x6bb
github.com/cockroachdb/cockroach/pkg/internal/client.(*DB).sendUsingSender(0xc4229d2480, 0x3d96e40, 0xc4200dc050, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/internal/client/db.go:623 +0x124
github.com/cockroachdb/cockroach/pkg/internal/client.(*Txn).Send(0xc42363b680, 0x3d96e40, 0xc4200dc050, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/internal/client/txn.go:806 +0x1c1
github.com/cockroachdb/cockroach/pkg/internal/client.(*Txn).Send-fm(0x3d96e40, 0xc4200dc050, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/internal/client/txn.go:508 +0xa4
github.com/cockroachdb/cockroach/pkg/internal/client.sendAndFill(0x3d96e40, 0xc4200dc050, 0xc422069240, 0xc421c4ce00, 0xc42034b280, 0x32f9560)
	/go/src/github.com/cockroachdb/cockroach/pkg/internal/client/db.go:548 +0x152
github.com/cockroachdb/cockroach/pkg/internal/client.(*Txn).Run(0xc42363b680, 0x3d96e40, 0xc4200dc050, 0xc421c4ce00, 0x0, 0x0)
	/go/src/github.com/cockroachdb/cockroach/pkg/internal/client/txn.go:508 +0xfe
github.com/cockroachdb/cockroach/pkg/internal/client.(*Txn).scan(0xc42363b680, 0x3d96e40, 0xc4200dc050, 0x3385580, 0xc42034b280, 0x32f9560, 0xc42034b2a0, 0x1, 0x1, 0xc422069418, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/internal/client/txn.go:415 +0x15a
github.com/cockroachdb/cockroach/pkg/internal/client.(*Txn).ReverseScan(0xc42363b680, 0x3d96e40, 0xc4200dc050, 0x3385580, 0xc42034b280, 0x32f9560, 0xc42034b2a0, 0x1, 0xc42034b260, 0x1, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/internal/client/txn.go:442 +0xb1
github.com/cockroachdb/cockroach/pkg/server.(*TestServer).SplitRange.func1.1(0xc420377b28, 0x3, 0x4, 0x1, 0x1c, 0xc42287cf98, 0x3, 0x8, 0xc42287cfb0, 0x2, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/server/testserver.go:728 +0x217
github.com/cockroachdb/cockroach/pkg/server.(*TestServer).SplitRange.func1(0x3d96e40, 0xc4200dc050, 0xc42363b680, 0xc422069a58, 0x8a22d2)
	/go/src/github.com/cockroachdb/cockroach/pkg/server/testserver.go:753 +0x3ad
github.com/cockroachdb/cockroach/pkg/internal/client.(*DB).Txn.func1(0x3d96e40, 0xc4200dc050, 0xc42363b680, 0x866a69, 0x1500074479b0)
	/go/src/github.com/cockroachdb/cockroach/pkg/internal/client/db.go:586 +0x5f
github.com/cockroachdb/cockroach/pkg/internal/client.(*Txn).exec(0xc42363b680, 0x3d96e40, 0xc4200dc050, 0xc4202e8d90, 0x1, 0xc42363b680)
	/go/src/github.com/cockroachdb/cockroach/pkg/internal/client/txn.go:705 +0xf7
github.com/cockroachdb/cockroach/pkg/internal/client.(*DB).Txn(0xc4229d2480, 0x3d96e40, 0xc4200dc050, 0xc4224b8b00, 0x3dc0f00, 0xc421b3cb40)
	/go/src/github.com/cockroachdb/cockroach/pkg/internal/client/db.go:585 +0x15e
github.com/cockroachdb/cockroach/pkg/server.(*TestServer).SplitRange(0xc4207581c0, 0xc420377b28, 0x3, 0x4, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/server/testserver.go:723 +0x561
github.com/cockroachdb/cockroach/pkg/sql.TestPlanningDuringSplitsAndMerges.func2(0x3d96ec0, 0xc421405920)
	/go/src/github.com/cockroachdb/cockroach/pkg/sql/distsql_physical_planner_test.go:134 +0x394
github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunWorker.func1(0xc421736590, 0xc421b96360, 0xc42139d9c0)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:199 +0x14b
created by github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunWorker
	/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:192 +0xbb

@tbg
Copy link
Member Author

tbg commented Nov 19, 2018

34s to get another repro via a 5 node cluster in

make roachprod-stressrace CLUSTER=tobias-stress TESTS=TestPlanningDuringSplitsAndMerges PKG=github.com/cockroachdb/cockroach/pkg/sql TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

I bet you get a repro every minute on a gceworker.

@tbg
Copy link
Member Author

tbg commented Nov 19, 2018

PS the error is during a split again:

github.com/cockroachdb/cockroach/pkg/server.(*TestServer).SplitRange.func1.1(0xc4202c7070, 0x3, 0x4, 0x1, 0x1b, 0xc4214b1cb0, 0x3, 0x8, 0xc4214b1cb8, 0x2, ...)
        /go/src/github.com/cockroachdb/cockroach/pkg/server/testserver.go:728 +0x217
github.com/cockroachdb/cockroach/pkg/server.(*TestServer).SplitRange.func1(0x3d96e80, 0xc4200cc050, 0xc421acca00, 0x0, 0x8a22d2)

@tbg
Copy link
Member Author

tbg commented Nov 19, 2018

(I'm going to leave this to either @petermattis or @nvanbenschoten to figure out, hopefully straightforward now that we have an instant repro). These keys sure look like garbage to me.

@nvanbenschoten
Copy link
Member

Awesome, now that we have a path to reproduction, this should be easy to track down. I'll take care of that.

@nvanbenschoten
Copy link
Member

Ok, there's definitely something funky going on with ResumeSpans and ReverseScans. I see a reverse scan over Meta2 keys get a ResumeSpan in the normal keyspace. When its start key is replaced by the resume span key, we invert the interval and hit the panic. So where is this resume span coming from?

@nvanbenschoten
Copy link
Member

Interestingly, we don't see a crazy resume scan from

rows, resumeSpan, intents, err = engine.MVCCScan(
ctx, batch, args.Key, args.EndKey, cArgs.MaxKeys, h.Timestamp, engine.MVCCScanOptions{
Inconsistent: h.ReadConsistency != roachpb.CONSISTENT,
Txn: h.Txn,
Reverse: true,
})

but it ends up corrupted before we apply it to the timestamp cache. We must be corrupting memory somehow.

I tracked the failure back to bd57a2b, which makes a lot of sense. Now we just need to figure out what that's doing wrong.

I also think this means that the original sentry crash is unrelated to the rest of these failures. I was beginning to suspect this after seeing how common the panic was becoming.

@petermattis
Copy link
Collaborator

I tracked the failure back to bd57a2b, which makes a lot of sense. Now we just need to figure out what that's doing wrong.

Yeah, that does make a lot of sense. It is easy to foul up the memory handling when crossing the Cgo boundary. I'll take a close look at that commit.

@nvanbenschoten
Copy link
Member

We are using cSliceToGoBytes when copying the resume key from C++, but I wonder if that's having issues with the call to (roachpb.Key).Next.

@petermattis
Copy link
Collaborator

cSliceToGoBytes doesn't copy the data. That might be the problem right there. When the iterator is closed (or used again), the underlying memory might change.

@nvanbenschoten
Copy link
Member

Doesn't it? I thought cSliceToUnsafeGoBytes was the function that doesn't copy the data.

@petermattis
Copy link
Collaborator

Ah, I misremembered. cSliceToGoBytes does copy the data. The call to roachpb.Key.Next should be fine. At least nothing appears wrong about it on the surface.

nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Nov 19, 2018
Fixes cockroachdb#32149.

Before this change, it was possible for `DBScanResults.resume_key` to
point into memory owned by `mvccScanner`, which went out of scope after
`MVCCScan` returned. This allowed for memory corruption when returning
the key to Go.

This change fixes this corruption by copying the memory to the `DBIterator`
before returning, which should have a lifetime which exceeds that of the
`DBScanResults`.

Release note: None
craig bot pushed a commit that referenced this issue Nov 20, 2018
32492: libroach: ensure correct lifetime for resume_key on reverse iteration r=nvanbenschoten a=nvanbenschoten

Fixes #32149.

Before this change, it was possible for `DBScanResults.resume_key` to
point into memory owned by `mvccScanner`, which went out of scope after
`MVCCScan` returned. This allowed for memory corruption when returning
the key to Go.

This change fixes this corruption by copying the memory to the `DBIterator`
before returning, which should have a lifetime which exceeds that of the
`DBScanResults`.

Release note: None

Co-authored-by: Nathan VanBenschoten <[email protected]>
@craig craig bot closed this as completed in #32492 Nov 20, 2018
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Dec 26, 2018
Not related to cockroachdb#32149.

Before this change, we would treat a scan with limit 0 as a point
lookup when updating the timestamp cache. After the change, we no
longer update the timestamp cache if a scan had a limit of 0 and
never looked at any keys.

Release note: None
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Jul 11, 2019
Not related to cockroachdb#32149.

Before this change, we would treat a scan with limit 0 as a point
lookup when updating the timestamp cache. After the change, we no
longer update the timestamp cache if a scan had a limit of 0 and
never looked at any keys.

Release note: None
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. O-sentry Originated from an in-the-wild panic report.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants