Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

server: TestGetTenantWeights failed #97350

Closed
cockroach-teamcity opened this issue Feb 20, 2023 · 11 comments
Closed

server: TestGetTenantWeights failed #97350

cockroach-teamcity opened this issue Feb 20, 2023 · 11 comments
Assignees
Labels
A-storage Relating to our storage engine (Pebble) on-disk storage. branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). GA-blocker O-robot Originated from a bot. T-storage Storage Team
Milestone

Comments

@cockroach-teamcity
Copy link
Member

cockroach-teamcity commented Feb 20, 2023

server.TestGetTenantWeights failed with artifacts on master @ a9d4e7040c538aeaa0e0e049e5525e2569eb364b:

Fatal error:

panic: runtime error: invalid memory address or nil pointer dereference [recovered]
	panic: runtime error: invalid memory address or nil pointer dereference [recovered]
	panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x28 pc=0x11bb98f]

Stack:

goroutine 509548 [running]:
github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).recover(0xc016baf9b8?, {0x5bf0208, 0xc006eef3e0})
	github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:229 +0x6a
panic({0x4333120, 0x820fa40})
	GOROOT/src/runtime/panic.go:884 +0x212
github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).recover(0x3797a26?, {0x5bf0208, 0xc006eef890})
	github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:229 +0x6a
panic({0x4333120, 0x820fa40})
	GOROOT/src/runtime/panic.go:884 +0x212
github.com/cockroachdb/pebble/sstable.(*Writer).writeCompressedBlock(0xc00dfa1500, {0xc00dd6f000?, 0x8, 0xc016bafd30?}, {0xc00acd7680, 0x463bff?, 0x78})
	github.com/cockroachdb/pebble/sstable/external/com_github_cockroachdb_pebble/sstable/writer.go:1673 +0x8f
github.com/cockroachdb/pebble/sstable.(*Writer).writeBlock(0xc00acd7720?, {0xc00dd6f000?, 0x178?, 0x21?}, 0xc0141a2490?, 0xc00acd7680)
	github.com/cockroachdb/pebble/sstable/external/com_github_cockroachdb_pebble/sstable/writer.go:1708 +0x5a
github.com/cockroachdb/pebble/sstable.(*Writer).Close(0xc00dfa1500)
	github.com/cockroachdb/pebble/sstable/external/com_github_cockroachdb_pebble/sstable/writer.go:1787 +0x2b1
github.com/cockroachdb/cockroach/pkg/storage.(*SSTWriter).Close(...)
	github.com/cockroachdb/cockroach/pkg/storage/sst_writer.go:411
github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*multiSSTWriter).Close(0xc016bb07c0?)
	github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/store_snapshot.go:289 +0x2a
github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*kvBatchSnapshotStrategy).Receive(_, {_, _}, {_, _}, {{0x17, 0xa, 0xc00643a4d0, 0xc004300960, 0xc018b9db40, ...}, ...}, ...)
	github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/store_snapshot.go:433 +0xbd9
github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Store).receiveSnapshot(0xc011c06000, {0x5bf0208, 0xc01646a750}, 0xc007e301e0, {0x7f167993f1b0, 0xc008dc0a20})
	github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/store_snapshot.go:1081 +0x97d
github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Store).HandleSnapshot.func1({0x5bf0208, 0xc006eef890})
	github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/store_raft.go:211 +0x9c
github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunTaskWithErr(0xc0033f7280, {0x5bf0208, 0xc006eef890}, {0x6a87ef?, 0x42f7be0?}, 0xc007cc71a0)
	github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:322 +0xd1
github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Store).HandleSnapshot(0xc011c06000, {0x5bf0160?, 0xc00eee9540?}, 0xc007e301e0, {0x7f167993f188, 0xc008dc0a20})
	github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/store_raft.go:208 +0xc5
github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*RaftTransport).RaftSnapshot(0xc007110540, {0x5c2c848, 0xc008dc0a20})
	github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/raft_transport.go:382 +0x1a6
github.com/cockroachdb/cockroach/pkg/kv/kvserver._MultiRaft_RaftSnapshot_Handler({0x48e58a0?, 0xc007110540}, {0x5c20808?, 0xc00bb2a3c0})
	github.com/cockroachdb/cockroach/pkg/kv/kvserver/bazel-out/k8-fastbuild/bin/pkg/kv/kvserver/kvserver_go_proto_/github.com/cockroachdb/cockroach/pkg/kv/kvserver/storage_services.pb.go:262 +0x9f
github.com/cockroachdb/cockroach/pkg/util/tracing/grpcinterceptor.StreamServerInterceptor.func1({0x48e58a0, 0xc007110540}, {0x5c20d60?, 0xc00a522000?}, 0xc01849bc38, 0x4c86ba8)
	github.com/cockroachdb/cockroach/pkg/util/tracing/grpcinterceptor/grpc_interceptor.go:164 +0x6c4
google.golang.org/grpc.chainStreamInterceptors.func1.1({0x48e58a0?, 0xc007110540?}, {0x5c20d60?, 0xc00a522000?})
	google.golang.org/grpc/external/org_golang_google_grpc/server.go:1482 +0x5b
github.com/cockroachdb/cockroach/pkg/rpc.NewServerEx.func4({0x48e58a0, 0xc007110540}, {0x5c20d60, 0xc00a522000}, 0xc006eef3e0?, 0xc00eee94c0)
	github.com/cockroachdb/cockroach/pkg/rpc/pkg/rpc/context.go:275 +0x74
google.golang.org/grpc.chainStreamInterceptors.func1.1({0x48e58a0?, 0xc007110540?}, {0x5c20d60?, 0xc00a522000?})
	google.golang.org/grpc/external/org_golang_google_grpc/server.go:1485 +0x83
github.com/cockroachdb/cockroach/pkg/rpc.kvAuth.streamInterceptor({0xc00b9a6000?, {{0x0?}, {0x5bf07f0?, 0xc011906d80?}}}, {0x48e58a0, 0xc007110540}, {0x5c20d60, 0xc00a522000}, 0xc01849bc38, 0xc00eee94c0)
	github.com/cockroachdb/cockroach/pkg/rpc/pkg/rpc/auth.go:132 +0x373
google.golang.org/grpc.chainStreamInterceptors.func1.1({0x48e58a0?, 0xc007110540?}, {0x5c20d60?, 0xc00a522000?})
	google.golang.org/grpc/external/org_golang_google_grpc/server.go:1485 +0x83
github.com/cockroachdb/cockroach/pkg/rpc.NewServerEx.func2.1({0x0?, 0x0?})
	github.com/cockroachdb/cockroach/pkg/rpc/pkg/rpc/context.go:242 +0x2d
github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunTaskWithErr(0xc0033f7280, {0x5bf0208, 0xc006eef3e0}, {0xc00777ba98?, 0x438eff?}, 0xc00777ba50)
	github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:322 +0xd1
github.com/cockroachdb/cockroach/pkg/rpc.NewServerEx.func2({0x48e58a0, 0xc007110540}, {0x5c20d60?, 0xc00a522000?}, 0xc01849bc38, 0xc00eee94c0)
	github.com/cockroachdb/cockroach/pkg/rpc/pkg/rpc/context.go:241 +0xef
google.golang.org/grpc.chainStreamInterceptors.func1.1({0x48e58a0?, 0xc007110540?}, {0x5c20d60?, 0xc00a522000?})
	google.golang.org/grpc/external/org_golang_google_grpc/server.go:1485 +0x83
google.golang.org/grpc.chainStreamInterceptors.func1({0x48e58a0, 0xc007110540}, {0x5c20d60, 0xc00a522000}, 0xc01849bc38, 0x4c86ba8)
	google.golang.org/grpc/external/org_golang_google_grpc/server.go:1487 +0x12b
google.golang.org/grpc.(*Server).processStreamingRPC(0xc009657a40, {0x5c371c0, 0xc0130eeea0}, 0xc00a5378c0, 0xc00aa77320, 0x8238fc0, 0x0)
	google.golang.org/grpc/external/org_golang_google_grpc/server.go:1636 +0x11c6
google.golang.org/grpc.(*Server).handleStream(0xc009657a40, {0x5c371c0, 0xc0130eeea0}, 0xc00a5378c0, 0x0)
	google.golang.org/grpc/external/org_golang_google_grpc/server.go:1717 +0x9ea
google.golang.org/grpc.(*Server).serveStreams.func1.2()
	google.golang.org/grpc/external/org_golang_google_grpc/server.go:965 +0x98
created by google.golang.org/grpc.(*Server).serveStreams.func1
	google.golang.org/grpc/external/org_golang_google_grpc/server.go:963 +0x28a
Log preceding fatal error

=== RUN   TestGetTenantWeights
    test_log_scope.go:161: test logs captured to: /artifacts/tmp/_tmp/5b2c9b3a394428c7572d34050aad8975/logTestGetTenantWeights2842268568
    test_log_scope.go:79: use -show-logs to present logs inline

Parameters: TAGS=bazel,gss

Help

See also: How To Investigate a Go Test Failure (internal)

/cc @cockroachdb/server

This test on roachdash | Improve this report!

Jira issue: CRDB-24651

@cockroach-teamcity cockroach-teamcity added branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. labels Feb 20, 2023
@cockroach-teamcity cockroach-teamcity added this to the 23.1 milestone Feb 20, 2023
@blathers-crl blathers-crl bot added the T-server-and-security DB Server & Security label Feb 20, 2023
@knz
Copy link
Contributor

knz commented Feb 21, 2023

This looks like storage / DR. @stevendanna @nicktrav wanna have a look?

@knz knz removed the T-server-and-security DB Server & Security label Feb 21, 2023
@knz knz added the GA-blocker label Feb 21, 2023
@stevendanna
Copy link
Collaborator

From the stack my guess is storage or kv might have the most fruitful first swing at this.

@nicktrav
Copy link
Collaborator

nicktrav commented Mar 1, 2023

I must have missed this. I'll have someone take a look.

@nicktrav nicktrav added A-storage Relating to our storage engine (Pebble) on-disk storage. T-storage Storage Team labels Mar 1, 2023
@jbowens
Copy link
Collaborator

jbowens commented Mar 1, 2023

Possibly a double Writer close?

@renatolabs
Copy link
Contributor

FWIW, just saw the same failure on a run of Example_demo_locality on my branch: https://teamcity.cockroachdb.com/viewLog.html?buildId=8891242&buildTypeId=Cockroach_BazelEssentialCi

@blathers-crl blathers-crl bot added the T-kv KV Team label Mar 1, 2023
@jbowens jbowens removed A-storage Relating to our storage engine (Pebble) on-disk storage. T-storage Storage Team labels Mar 1, 2023
@jbowens
Copy link
Collaborator

jbowens commented Mar 1, 2023

This looks like the individual SSTWriter is being closed twice; once before the defer and once during the defer. The one during the defer panics, because we nil out the file handle before returning in Close.

@blathers-crl blathers-crl bot added the T-storage Storage Team label Mar 3, 2023
@jbowens jbowens added A-storage Relating to our storage engine (Pebble) on-disk storage. and removed T-kv KV Team labels Mar 3, 2023
@jbowens
Copy link
Collaborator

jbowens commented Mar 5, 2023

@RaduBerinde could this possibly be introduced by cockroachdb/pebble#2267?

@RaduBerinde
Copy link
Member

This looks like the individual SSTWriter is being closed twice; once before the defer and once during the defer. The one during the defer panics, because we nil out the file handle before returning in Close.

So the hypothesis is that Finish got an error during tw.tw.Close(), and then we close it again in Close? Yeah, the stack trace in Example_demo_locality shows that at least the higher level multiSSTWriter.Finish() erorrs out.

But in Pebble, the second Close should exit very early when it checks w.err no? I guess that's the idea, but I looked and found this exit path which doesn't set w.err:
https://github.com/cockroachdb/pebble/blob/fdf055ddb6e438a63b81ac2d751da8232b07c893/sstable/writer.go#L1900

Not sure if we can confirm that this is the issue. In any case we should fix it and make this more robust: have the deferred function set w.err in error cases rather than each exit path having to worry about it.

FWIW, just saw the same failure on a run of Example_demo_locality on my branch: https://teamcity.cockroachdb.com/viewLog.html?buildId=8891242&buildTypeId=Cockroach_BazelEssentialCi

Do we expect Example_demo_locality to hit errors? If the problem is the above, then we're just obscuring another error. Or is this the kind of error that is handled internally?

@RaduBerinde
Copy link
Member

Opened cockroachdb/pebble#2380

RaduBerinde added a commit to RaduBerinde/pebble that referenced this issue Mar 5, 2023
`Close()` records any error encountered and exits early if called
again. One of the return paths was not setting `w.err`, possibly
causing the panic in github.com/cockroachdb/cockroach/issues/97350.

This change fixes this and improves the structure of the function -
instead of each exit path having to set `w.err`, we set it in the
deferred function.
RaduBerinde added a commit to cockroachdb/pebble that referenced this issue Mar 6, 2023
`Close()` records any error encountered and exits early if called
again. One of the return paths was not setting `w.err`, possibly
causing the panic in github.com/cockroachdb/cockroach/issues/97350.

This change fixes this and improves the structure of the function -
instead of each exit path having to set `w.err`, we set it in the
deferred function.
RaduBerinde added a commit to RaduBerinde/cockroach that referenced this issue Mar 6, 2023
e9a8c4ad sstable: fix error path in writer.Close
fdf055dd db: improve documentation of tableNewIters
717cbce0 db: add Options.LoggerAndTracer for tracing
7751e381 objstorage: better Writable API
78c8001e db: add missing error check in TestIngestLoad
1334233b metamorphic: add unit test for Options propagation
ed824c76 objstorage: rename ReadaheadHandle to ReadHandle

Informs cockroachdb#97350.

Release note: none
Epic: none
RaduBerinde added a commit to RaduBerinde/cockroach that referenced this issue Mar 6, 2023
e9a8c4ad sstable: fix error path in writer.Close
fdf055dd db: improve documentation of tableNewIters
717cbce0 db: add Options.LoggerAndTracer for tracing
7751e381 objstorage: better Writable API
78c8001e db: add missing error check in TestIngestLoad
1334233b metamorphic: add unit test for Options propagation
ed824c76 objstorage: rename ReadaheadHandle to ReadHandle

The "better Writable API" changed the interface to write to SST files.
This commit contains the required changes in Cockroach: MemFile is now
MemObject and implements objstorage.Writable. There were many test
places that were using MemFile as just an io.Writer; these are updated
to just use bytes.Buffer.

Informs cockroachdb#97350.

Release note: none
Epic: none
RaduBerinde added a commit to RaduBerinde/cockroach that referenced this issue Mar 6, 2023
e9a8c4ad sstable: fix error path in writer.Close
fdf055dd db: improve documentation of tableNewIters
717cbce0 db: add Options.LoggerAndTracer for tracing
7751e381 objstorage: better Writable API
78c8001e db: add missing error check in TestIngestLoad
1334233b metamorphic: add unit test for Options propagation
ed824c76 objstorage: rename ReadaheadHandle to ReadHandle

The "better Writable API" changed the interface to write to SST files.
This commit contains the required changes in Cockroach: MemFile is now
MemObject and implements objstorage.Writable. There were many test
places that were using MemFile as just an io.Writer; these are updated
to just use bytes.Buffer.

Informs cockroachdb#97350.

Release note: none
Epic: none
RaduBerinde added a commit to RaduBerinde/cockroach that referenced this issue Mar 6, 2023
e9a8c4ad sstable: fix error path in writer.Close
fdf055dd db: improve documentation of tableNewIters
717cbce0 db: add Options.LoggerAndTracer for tracing
7751e381 objstorage: better Writable API
78c8001e db: add missing error check in TestIngestLoad
1334233b metamorphic: add unit test for Options propagation
ed824c76 objstorage: rename ReadaheadHandle to ReadHandle

The "better Writable API" changed the interface to write to SST files.
This commit contains the required changes in Cockroach: MemFile is now
MemObject and implements objstorage.Writable. There were many test
places that were using MemFile as just an io.Writer; these are updated
to just use bytes.Buffer.

Informs cockroachdb#97350.

Release note: none
Epic: none
craig bot pushed a commit that referenced this issue Mar 7, 2023
98078: go.mod: bump Pebble to e9a8c4ad65c5 r=RaduBerinde a=RaduBerinde

e9a8c4ad sstable: fix error path in writer.Close
fdf055dd db: improve documentation of tableNewIters
717cbce0 db: add Options.LoggerAndTracer for tracing
7751e381 objstorage: better Writable API
78c8001e db: add missing error check in TestIngestLoad
1334233b metamorphic: add unit test for Options propagation
ed824c76 objstorage: rename ReadaheadHandle to ReadHandle

Informs #97350.

Release note: none
Epic: none

Co-authored-by: Radu Berinde <[email protected]>
@jbowens
Copy link
Collaborator

jbowens commented Mar 8, 2023

Fixed by #98078. Verified by stress:

22296 runs so far, 0 failures, over 39m55s

@jbowens jbowens closed this as completed Mar 8, 2023
@RaduBerinde
Copy link
Member

@jbowens Thanks for verifying!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-storage Relating to our storage engine (Pebble) on-disk storage. branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). GA-blocker O-robot Originated from a bot. T-storage Storage Team
Projects
Archived in project
Development

No branches or pull requests

7 participants