Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

store: signal SIGSEGV #3665

Closed
Shan1024 opened this issue Dec 23, 2020 · 13 comments
Closed

store: signal SIGSEGV #3665

Shan1024 opened this issue Dec 23, 2020 · 13 comments

Comments

@Shan1024
Copy link

Thanos, Prometheus and Golang version used:
quay.io/thanos/thanos:v0.17.2

Object Storage Provider:
S3

What happened:
This setup is done in EKS. Thanos Store Gateway crashed with the following error. This happened when i run a query on Thanos Query. I have 2 pods (part of a statefulset) for the Thanos Store Gateway (not sure whether this can cause any issue like this). Also, this crash does not happen with every query (i tried running the same query again after the new pods created). So not exactly sure what causes this crash. Could this be somehow related to the resources of the pod?

What you expected to happen:
Thanos Gateway Store not to be crashed when I run a query 😅

How to reproduce it (as minimally and precisely as possible):

Full logs to relevant components:

Logs

level=info ts=2020-12-23T04:25:11.346751948Z caller=fetcher.go:458 component=block.BaseFetcher msg="successfully synchronized block metadata" duration=40.109928863s cached=3969 returned=3920 partial=0
level=info ts=2020-12-23T04:27:10.071564755Z caller=main.go:168 msg="caught signal. Exiting." signal=terminated
level=warn ts=2020-12-23T04:27:10.071690977Z caller=intrumentation.go:54 msg="changing probe status" status=not-ready reason=null
level=info ts=2020-12-23T04:27:10.0717037Z caller=http.go:65 service=http/server component=store msg="internal server is shutting down" err=null
level=info ts=2020-12-23T04:27:10.588694698Z caller=http.go:84 service=http/server component=store msg="internal server is shutdown gracefully" err=null
level=info ts=2020-12-23T04:27:10.588758622Z caller=intrumentation.go:66 msg="changing probe status" status=not-healthy reason=null
level=warn ts=2020-12-23T04:27:10.588792163Z caller=intrumentation.go:54 msg="changing probe status" status=not-ready reason=null
level=info ts=2020-12-23T04:27:10.588809898Z caller=grpc.go:123 service=gRPC/server component=store msg="internal server is shutting down" err=null
level=info ts=2020-12-23T04:27:10.58884675Z caller=grpc.go:136 service=gRPC/server component=store msg="gracefully stopping internal server"
unexpected fault address 0x7f05740044d1
fatal error: fault
[signal SIGSEGV: segmentation violation code=0x1 addr=0x7f05740044d1 pc=0x402b31]

goroutine 3592047 [running]:
runtime.throw(0x1ae9f96, 0x5)
        /usr/local/go/src/runtime/panic.go:1116 +0x72 fp=0xc0005d3270 sp=0xc0005d3240 pc=0x4377d2
runtime.sigpanic()
        /usr/local/go/src/runtime/signal_unix.go:727 +0x405 fp=0xc0005d32a0 sp=0xc0005d3270 pc=0x44df65
memeqbody()
        /usr/local/go/src/internal/bytealg/equal_amd64.s:102 +0xd1 fp=0xc0005d32a8 sp=0xc0005d32a0 pc=0x402b31
strings.Compare(...)
        /usr/local/go/src/strings/compare.go:21
github.com/thanos-io/thanos/pkg/strutil.mergeTwoStringSlices(0xc01d134000, 0x6d9, 0x795, 0xc01d23a000, 0x729, 0x7f1, 0xc01d016000, 0x7df, 0x891)
        /go/src/github.com/thanos-io/thanos/pkg/strutil/merge.go:43 +0x4ec fp=0xc0005d3358 sp=0xc0005d32a8 pc=0x10a710c
github.com/thanos-io/thanos/pkg/strutil.MergeSlices(0xc01bd29128, 0x3e, 0xf49, 0xc01d016000, 0x7df, 0x891)
        /go/src/github.com/thanos-io/thanos/pkg/strutil/merge.go:21 +0xfb fp=0xc0005d33c8 sp=0xc0005d3358 pc=0x10a6a1b
github.com/thanos-io/thanos/pkg/strutil.MergeSlices(0xc01bd28b70, 0x7b, 0xf86, 0xc01cd9c000, 0x7e6, 0x8a0)
        /go/src/github.com/thanos-io/thanos/pkg/strutil/merge.go:21 +0xd9 fp=0xc0005d3438 sp=0xc0005d33c8 pc=0x10a69f9
github.com/thanos-io/thanos/pkg/strutil.MergeSlices(0xc01bd28000, 0xf5, 0x1000, 0x203003, 0x203003, 0x2a45998)
        /go/src/github.com/thanos-io/thanos/pkg/strutil/merge.go:21 +0xd9 fp=0xc0005d34a8 sp=0xc0005d3438 pc=0x10a69f9
github.com/thanos-io/thanos/pkg/strutil.MergeSlices(0xc01bd28000, 0x1ea, 0x1000, 0x203003, 0x30, 0x30)
        /go/src/github.com/thanos-io/thanos/pkg/strutil/merge.go:21 +0x6a fp=0xc0005d3518 sp=0xc0005d34a8 pc=0x10a698a
github.com/thanos-io/thanos/pkg/strutil.MergeSlices(0xc01bd28000, 0x3d4, 0x1000, 0xc018a51912, 0xc018258050, 0x0)
        /go/src/github.com/thanos-io/thanos/pkg/strutil/merge.go:21 +0x6a fp=0xc0005d3588 sp=0xc0005d3518 pc=0x10a698a
github.com/thanos-io/thanos/pkg/strutil.MergeSlices(0xc01bd28000, 0x7a8, 0x1000, 0xc0005d3640, 0x4a0ac7, 0xc018258040)
        /go/src/github.com/thanos-io/thanos/pkg/strutil/merge.go:21 +0x6a fp=0xc0005d35f8 sp=0xc0005d3588 pc=0x10a698a
github.com/thanos-io/thanos/pkg/strutil.MergeSlices(0xc01bd28000, 0xf50, 0x1000, 0xc00a4dd480, 0xc0203d8260, 0xc0156ff2c0)
        /go/src/github.com/thanos-io/thanos/pkg/strutil/merge.go:21 +0x6a fp=0xc0005d3668 sp=0xc0005d35f8 pc=0x10a698a
github.com/thanos-io/thanos/pkg/store.(*BucketStore).LabelValues(0xc000026a20, 0x1d46a00, 0xc0156ff2c0, 0xc0156ff200, 0xc000026a20, 0xc0156ff290, 0x1871be0)
        /go/src/github.com/thanos-io/thanos/pkg/store/bucket.go:1144 +0x3c8 fp=0xc0005d3770 sp=0xc0005d3668 pc=0x11dca08
github.com/thanos-io/thanos/pkg/store/storepb._Store_LabelValues_Handler.func1(0x1d46a00, 0xc0156ff2c0, 0x1a1b960, 0xc0156ff200, 0xbdc6ce, 0x1d46a00, 0xc0156ff290, 0x1d67c00)
        /go/src/github.com/thanos-io/thanos/pkg/store/storepb/rpc.pb.go:873 +0x89 fp=0xc0005d37b8 sp=0xc0005d3770 pc=0xbcbac9
github.com/grpc-ecosystem/go-grpc-middleware/recovery.UnaryServerInterceptor.func1(0x1d46a00, 0xc0156ff2c0, 0x1a1b960, 0xc0156ff200, 0xc0041d2620, 0xc0041d2660, 0x0, 0x0, 0x0, 0x0)
        /go/pkg/mod/github.com/grpc-ecosystem/[email protected]/recovery/interceptors.go:30 +0xbc fp=0xc0005d3838 sp=0xc0005d37b8 pc=0x134affc
github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1(0x1d46a00, 0xc0156ff2c0, 0x1a1b960, 0xc0156ff200, 0x1b0d2de, 0x19, 0x1d46a00, 0xc0156ff2c0)
        /go/pkg/mod/github.com/grpc-ecosystem/[email protected]/chain.go:25 +0x63 fp=0xc0005d3898 sp=0xc0005d3838 pc=0xbd7c03
github.com/grpc-ecosystem/go-grpc-middleware/tracing/opentracing.UnaryServerInterceptor.func1(0x1d46a00, 0xc0156ff290, 0x1a1b960, 0xc0156ff200, 0xc0041d2620, 0xc0041d2680, 0x1d46a00, 0xc0156ff290, 0x1d56960, 0xc00
02014a0)
        /go/pkg/mod/github.com/grpc-ecosystem/[email protected]/tracing/opentracing/server_interceptors.go:31 +0xda fp=0xc0005d3930 sp=0xc0005d3898 pc=0xbdd51a
github.com/thanos-io/thanos/pkg/tracing.UnaryServerInterceptor.func1(0x1d46a00, 0xc0156ff1d0, 0x1a1b960, 0xc0156ff200, 0xc0041d2620, 0xc0041d2680, 0x3, 0x3, 0x1d56960, 0xc0002014a0)
        /go/src/github.com/thanos-io/thanos/pkg/tracing/grpc.go:30 +0xc3 fp=0xc0005d3998 sp=0xc0005d3930 pc=0xbdf823
github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1(0x1d46a00, 0xc0156ff1d0, 0x1a1b960, 0xc0156ff200, 0x19, 0xc0103d1b80, 0x203001, 0xc000335000)
        /go/pkg/mod/github.com/grpc-ecosystem/[email protected]/chain.go:25 +0x63 fp=0xc0005d39f8 sp=0xc0005d3998 pc=0xbd7c03
github.com/grpc-ecosystem/go-grpc-prometheus.(*ServerMetrics).UnaryServerInterceptor.func1(0x1d46a00, 0xc0156ff1d0, 0x1a1b960, 0xc0156ff200, 0xc0041d2620, 0xc0041d26c0, 0xbd7c9a, 0x1978760, 0xc0041d26e0, 0xc0041d2
620)
        /go/pkg/mod/github.com/grpc-ecosystem/[email protected]/server_metrics.go:107 +0xad fp=0xc0005d3a70 sp=0xc0005d39f8 pc=0x1347e8d
github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1(0x1d46a00, 0xc0156ff1d0, 0x1a1b960, 0xc0156ff200, 0xc018a99800, 0x0, 0xc000163b30, 0x40e338)
        /go/pkg/mod/github.com/grpc-ecosystem/[email protected]/chain.go:25 +0x63 fp=0xc0005d3ad0 sp=0xc0005d3a70 pc=0xbd7c03
github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1(0x1d46a00, 0xc0156ff1d0, 0x1a1b960, 0xc0156ff200, 0xc0041d2620, 0xc0041d2660, 0xc000163ba0, 0x4a0586, 0x19bc9a0, 0xc0156ff1d0)
        /go/pkg/mod/github.com/grpc-ecosystem/[email protected]/chain.go:34 +0xd7 fp=0xc0005d3b40 sp=0xc0005d3ad0 pc=0xbd7df7
github.com/thanos-io/thanos/pkg/store/storepb._Store_LabelValues_Handler(0x1a0bf20, 0xc000026a20, 0x1d46a00, 0xc0156ff1d0, 0xc018a69260, 0xc0005aa810, 0x1d46a00, 0xc0156ff1d0, 0xc006b63300, 0x1f)
        /go/src/github.com/thanos-io/thanos/pkg/store/storepb/rpc.pb.go:875 +0x150 fp=0xc0005d3bb0 sp=0xc0005d3b40 pc=0xbb77b0
google.golang.org/grpc.(*Server).processUnaryRPC(0xc00009ed00, 0x1d61880, 0xc01c181c80, 0xc018a99800, 0xc0005aa990, 0x29faf30, 0x0, 0x0, 0x0)
        /go/pkg/mod/google.golang.org/[email protected]/server.go:1082 +0x522 fp=0xc0005d3e40 sp=0xc0005d3bb0 pc=0xb98322
google.golang.org/grpc.(*Server).handleStream(0xc00009ed00, 0x1d61880, 0xc01c181c80, 0xc018a99800, 0x0)
        /go/pkg/mod/google.golang.org/[email protected]/server.go:1405 +0xcc5 fp=0xc0005d3f68 sp=0xc0005d3e40 pc=0xb9c425
google.golang.org/grpc.(*Server).serveStreams.func1.1(0xc001f3f6c0, 0xc00009ed00, 0x1d61880, 0xc01c181c80, 0xc018a99800)
        /go/pkg/mod/google.golang.org/[email protected]/server.go:746 +0xa5 fp=0xc0005d3fb8 sp=0xc0005d3f68 pc=0xbaa385
runtime.goexit()
        /usr/local/go/src/runtime/asm_amd64.s:1374 +0x1 fp=0xc0005d3fc0 sp=0xc0005d3fb8 pc=0x46fe21
created by google.golang.org/grpc.(*Server).serveStreams.func1
        /go/pkg/mod/google.golang.org/[email protected]/server.go:744 +0xa5

goroutine 1 [select]:
github.com/thanos-io/thanos/pkg/server/grpc.(*Server).Shutdown(0xc0001b6460, 0x0, 0x0)
        /go/src/github.com/thanos-io/thanos/pkg/server/grpc/grpc.go:141 +0x453
main.runStore.func7(0x0, 0x0)
        /go/src/github.com/thanos-io/thanos/cmd/thanos/store.go:368 +0x6e
github.com/oklog/run.(*Group).Run(0xc0001177a0, 0xc000156910, 0xc000655aa0)
        /go/pkg/mod/github.com/oklog/[email protected]/group.go:47 +0x153
main.main()
        /go/src/github.com/thanos-io/thanos/cmd/thanos/main.go:155 +0xf28


goroutine 187 [runnable]:
syscall.Syscall(0x3, 0x5cf, 0x0, 0x0, 0x0, 0x0, 0x0)
        /usr/local/go/src/syscall/asm_linux_amd64.s:18 +0x5
syscall.Close(0x5cf, 0xc001aa6a78, 0x46ae85)
        /usr/local/go/src/syscall/zsyscall_linux_amd64.go:285 +0x45
internal/poll.(*FD).destroy(0xc0076860c0, 0xc001aa6a01, 0x0)
        /usr/local/go/src/internal/poll/fd_unix.go:77 +0x43
internal/poll.(*FD).decref(0xc0076860c0, 0x1, 0x0)
        /usr/local/go/src/internal/poll/fd_mutex.go:213 +0x45
internal/poll.(*FD).Close(0xc0076860c0, 0xc000095560, 0x7f059392f25b)
        /usr/local/go/src/internal/poll/fd_unix.go:99 +0x4f
os.(*file).close(0xc0076860c0, 0x7f0593912000, 0x1d25c)
        /usr/local/go/src/os/file_unix.go:235 +0x38
os.(*File).Close(...)
        /usr/local/go/src/os/file_posix.go:25
github.com/prometheus/prometheus/tsdb/fileutil.(*MmapFile).Close(0xc007841520, 0x0, 0x0)
        /go/pkg/mod/github.com/prometheus/[email protected]/tsdb/fileutil/mmap.go:59 +0xba
github.com/thanos-io/thanos/pkg/block/indexheader.(*BinaryReader).Close(0xc009cebf80, 0x0, 0x0)
        /go/src/github.com/thanos-io/thanos/pkg/block/indexheader/binary_reader.go:888 +0x34
github.com/thanos-io/thanos/pkg/store.(*bucketBlock).Close(0xc004d36c00, 0x0, 0x1b03f9a)
        /go/src/github.com/thanos-io/thanos/pkg/store/bucket.go:1438 +0x4b
github.com/thanos-io/thanos/pkg/runutil.CloseWithErrCapture(0xc001aa6d58, 0x1d13d60, 0xc004d36c00, 0x1b03f9a, 0x14, 0x0, 0x0, 0x0)
        /go/src/github.com/thanos-io/thanos/pkg/runutil/runutil.go:143 +0x113
github.com/thanos-io/thanos/pkg/store.(*BucketStore).Close(0xc000026a20, 0x0, 0x0)
        /go/src/github.com/thanos-io/thanos/pkg/store/bucket.go:356 +0x126
github.com/thanos-io/thanos/pkg/runutil.CloseWithLogOnErr(0x1d12880, 0xc000120960, 0x1d13d20, 0xc000026a20, 0x1af55d9, 0xc, 0x0, 0x0, 0x0)
        /go/src/github.com/thanos-io/thanos/pkg/runutil/runutil.go:110 +0x49
main.runStore.func4(0x0, 0x0)
        /go/src/github.com/thanos-io/thanos/cmd/thanos/store.go:342 +0x5a5
github.com/oklog/run.(*Group).Run.func1(0xc00007f6e0, 0xc0001318b0, 0xc0006109d0)
        /go/pkg/mod/github.com/oklog/[email protected]/group.go:38 +0x27
created by github.com/oklog/run.(*Group).Run
        /go/pkg/mod/github.com/oklog/[email protected]/group.go:37 +0xbb

goroutine 18 [select]:
go.opencensus.io/stats/view.(*worker).start(0xc000124200)
        /go/pkg/mod/[email protected]/stats/view/worker.go:276 +0x105
created by go.opencensus.io/stats/view.init.0
        /go/pkg/mod/[email protected]/stats/view/worker.go:34 +0x68

goroutine 190 [select, 835 minutes]:
main.reload(0x1d12880, 0xc000120960, 0xc000655aa0, 0xc000654300, 0x0, 0x0)
        /go/src/github.com/thanos-io/thanos/cmd/thanos/main.go:179 +0x145
main.main.func6(0x441376, 0x1b7de18)
        /go/src/github.com/thanos-io/thanos/cmd/thanos/main.go:149 +0x45
github.com/oklog/run.(*Group).Run.func1(0xc00007f6e0, 0xc00025bd70, 0xc000156910)
        /go/pkg/mod/github.com/oklog/[email protected]/group.go:38 +0x27
created by github.com/oklog/run.(*Group).Run
        /go/pkg/mod/github.com/oklog/[email protected]/group.go:37 +0xbb

goroutine 3596032 [sync.Cond.Wait]:
runtime.goparkunlock(...)
        /usr/local/go/src/runtime/proc.go:312
sync.runtime_notifyListWait(0xc00004a310, 0x2)
        /usr/local/go/src/runtime/sema.go:513 +0xf8
sync.(*Cond).Wait(0xc00004a300)
        /usr/local/go/src/sync/cond.go:56 +0x9d
google.golang.org/grpc.(*Server).GracefulStop(0xc00009ed00)
        /go/pkg/mod/google.golang.org/[email protected]/server.go:1555 +0x1ee
github.com/thanos-io/thanos/pkg/server/grpc.(*Server).Shutdown.func1(0xc0001b6460, 0xc006f523c0)
        /go/src/github.com/thanos-io/thanos/pkg/server/grpc/grpc.go:137 +0x11e
created by github.com/thanos-io/thanos/pkg/server/grpc.(*Server).Shutdown
        /go/src/github.com/thanos-io/thanos/pkg/server/grpc/grpc.go:135 +0x3bc

goroutine 188 [chan receive]:
google.golang.org/grpc.(*Server).Serve.func1(0xc00009ed00)
        /go/pkg/mod/google.golang.org/[email protected]/server.go:597 +0x77
google.golang.org/grpc.(*Server).Serve(0xc00009ed00, 0x1d3f300, 0xc010083080, 0x0, 0x0)
        /go/pkg/mod/google.golang.org/[email protected]/server.go:651 +0x685
github.com/thanos-io/thanos/pkg/server/grpc.(*Server).ListenAndServe(0xc0001b6460, 0x0, 0xc000208620)
        /go/src/github.com/thanos-io/thanos/pkg/server/grpc/grpc.go:117 +0x21d
main.runStore.func6(0x441376, 0x1b7de18)
        /go/src/github.com/thanos-io/thanos/cmd/thanos/store.go:365 +0x70
github.com/oklog/run.(*Group).Run.func1(0xc00007f6e0, 0xc00018c510, 0xc000618b80)
        /go/pkg/mod/github.com/oklog/[email protected]/group.go:38 +0x27
created by github.com/oklog/run.(*Group).Run
        /go/pkg/mod/github.com/oklog/[email protected]/group.go:37 +0xbb

goroutine 192 [syscall]:
os/signal.signal_recv(0x1d31840)
        /usr/local/go/src/runtime/sigqueue.go:147 +0x9d
os/signal.loop()
        /usr/local/go/src/os/signal/signal_unix.go:23 +0x25
created by os/signal.Notify.func1.1
        /usr/local/go/src/os/signal/signal.go:150 +0x45

goroutine 3548229 [semacquire]:
sync.runtime_Semacquire(0xc001f3f6c8)
        /usr/local/go/src/runtime/sema.go:56 +0x45
sync.(*WaitGroup).Wait(0xc001f3f6c0)
        /usr/local/go/src/sync/waitgroup.go:130 +0x65
google.golang.org/grpc.(*Server).serveStreams(0xc00009ed00, 0x1d61880, 0xc01c181c80)
        /go/pkg/mod/google.golang.org/[email protected]/server.go:755 +0xe9
google.golang.org/grpc.(*Server).handleRawConn.func1(0xc00009ed00, 0x1d61880, 0xc01c181c80)
        /go/pkg/mod/google.golang.org/[email protected]/server.go:703 +0x3f
created by google.golang.org/grpc.(*Server).handleRawConn
        /go/pkg/mod/google.golang.org/[email protected]/server.go:702 +0x50b

Anything else we need to know:

@yeya24
Copy link
Contributor

yeya24 commented Dec 23, 2020

Thanks for the report.
This panic is really strange. I'll investigate it later.

@arvidsnet
Copy link

I also have occasional SIGSEGV with store-0.17.2. Logs look different, so not sure if it is the same issue. Attaching full log.
store-0.17.2.SIGSEGV.log

@bwplotka
Copy link
Member

bwplotka commented Dec 29, 2020

Valid bug.

First of all on this version store should never fault just panic: https://github.com/thanos-io/thanos/blob/v0.17.2/cmd/thanos/main.go#L34

Maybe we don't recover from panic? Or somehow this statement does not work? 🤔

Secondly, we need to look on LabelValues, looks like we don't properly lock it as reading from block.

@GiedriusS
Copy link
Member

GiedriusS commented Dec 30, 2020

Valid bug.

First of all on this version store should never fault just panic: https://github.com/thanos-io/thanos/blob/v0.17.2/cmd/thanos/main.go#L34

Maybe we don't recover from panic? Or somehow this statement does not work?

Well, it only applies to the current goroutine and we spawn a bunch of them when responding to calls so it doesn't work like it is supposed to 😛

@stale
Copy link

stale bot commented Mar 5, 2021

Hello 👋 Looks like there was no activity on this issue for the last two months.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗
If there will be no activity in the next two weeks, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.

@stale stale bot added the stale label Mar 5, 2021
@stale
Copy link

stale bot commented Mar 19, 2021

Closing for now as promised, let us know if you need this to be reopened! 🤗

@stale stale bot closed this as completed Mar 19, 2021
@roidelapluie
Copy link

roidelapluie commented Mar 19, 2021

SIGSEVG happens when accessing un-mmapped files.

@Wander1024
Copy link
Contributor

any update for this issue?
is it processing?

@GiedriusS
Copy link
Member

any update for this issue?
is it processing?

Is it still an issue on the newest version?

@Wander1024
Copy link
Contributor

any update for this issue?
is it processing?

Is it still an issue on the newest version?

sorry,
let me upgrade to the newest version.
thanks

@chalut01
Copy link

chalut01 commented Aug 6, 2022

From version 0.26.0.
I still get this issue. anybody else?
Screen Shot 2565-08-06 at 15 47 43

@jimethn
Copy link

jimethn commented Aug 15, 2022

Just got this on 0.25.1

@mrnonz
Copy link

mrnonz commented Sep 7, 2022

Just got this on 0.25.1

Can you try with 0.28.0 ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

10 participants