Sidecar: fix startup sequence #7403

MichaHoffmann · 2024-05-30T11:29:56Z

Previously we defered starting the gRPC server by blocking the whole startup until we could ping prometheus. This breaks usecases that rely on the config reloader to start prometheus.
We fix it by using a channel to defer starting the gRPC server and loading external labels in an actor concurrently.

should fix #7400

I added CHANGELOG entry for this change.
Change is not relevant to the end user.

Changes

Make loading external labels async again but use a channel to block starting the servers.

Verification

Checked it locally:

$ thanos sidecar --prometheus.get_config_interval=5s
ts=2024-05-30T13:55:09.246166909Z caller=sidecar.go:140 level=info msg="no supported bucket was configured, uploads will be disabled"
ts=2024-05-30T13:55:09.246227282Z caller=options.go:26 level=info protocol=gRPC msg="disabled TLS, key and cert must be set to enable"
ts=2024-05-30T13:55:09.246506827Z caller=sidecar.go:427 level=info msg="starting sidecar"
ts=2024-05-30T13:55:09.246607536Z caller=intrumentation.go:75 level=info msg="changing probe status" status=healthy
ts=2024-05-30T13:55:09.246642509Z caller=http.go:73 level=info service=http/server component=sidecar msg="listening for requests and metrics" address=0.0.0.0:10902
ts=2024-05-30T13:55:09.246633798Z caller=reloader.go:262 level=info component=reloader msg="nothing to be watched"
ts=2024-05-30T13:55:09.246841577Z caller=tls_config.go:313 level=info service=http/server component=sidecar msg="Listening on" address=[::]:10902
ts=2024-05-30T13:55:09.24685536Z caller=tls_config.go:316 level=info service=http/server component=sidecar msg="TLS is disabled." http2=false address=[::]:10902
ts=2024-05-30T13:55:09.246984484Z caller=sidecar.go:217 level=warn msg="failed to fetch prometheus version. Is Prometheus running? Retrying" err="perform GET request against http://localhost:9090/api/v1/status/buildinfo: Get \"http://localhost:9090/api/v1/status/buildinfo\": dial tcp [::1]:9090: connect: connection refused"
ts=2024-05-30T13:55:14.250528233Z caller=sidecar.go:217 level=warn msg="failed to fetch prometheus version. Is Prometheus running? Retrying" err="perform GET request against http://localhost:9090/api/v1/status/buildinfo: Get \"http://localhost:9090/api/v1/status/buildinfo\": dial tcp [::1]:9090: connect: connection refused"
ts=2024-05-30T13:55:19.249863255Z caller=sidecar.go:224 level=info msg="successfully loaded prometheus version"
ts=2024-05-30T13:55:19.251212186Z caller=sidecar.go:247 level=info msg="successfully loaded prometheus external labels" external_labels="{cluster=\"cluster_1\"}"
ts=2024-05-30T13:55:19.252019191Z caller=intrumentation.go:56 level=info msg="changing probe status" status=ready
ts=2024-05-30T13:55:19.252303243Z caller=grpc.go:167 level=info service=gRPC/server component=sidecar msg="listening for serving gRPC" address=0.0.0.0:10901
^Cts=2024-05-30T13:55:21.76238197Z caller=main.go:182 level=info msg="caught signal. Exiting." signal=interrupt
ts=2024-05-30T13:55:21.762502109Z caller=intrumentation.go:67 level=warn msg="changing probe status" status=not-ready reason=null
ts=2024-05-30T13:55:21.762531393Z caller=http.go:91 level=info service=http/server component=sidecar msg="internal server is shutting down" err=null
ts=2024-05-30T13:55:21.762729392Z caller=http.go:110 level=info service=http/server component=sidecar msg="internal server is shutdown gracefully" err=null
ts=2024-05-30T13:55:21.762817333Z caller=intrumentation.go:81 level=info msg="changing probe status" status=not-healthy reason=null
ts=2024-05-30T13:55:21.762931491Z caller=grpc.go:174 level=info service=gRPC/server component=sidecar msg="internal server is shutting down" err=null
ts=2024-05-30T13:55:21.762973115Z caller=grpc.go:187 level=info service=gRPC/server component=sidecar msg="gracefully stopping internal server"
ts=2024-05-30T13:55:21.763094802Z caller=grpc.go:200 level=info service=gRPC/server component=sidecar msg="internal server is shutdown gracefully" err=null
ts=2024-05-30T13:55:21.763139606Z caller=main.go:174 level=info msg=exiting

Previously we defered starting the gRPC server by blocking the whole startup until we could ping prometheus. This breaks usecases that rely on the config reloader to start prometheus. We fix it by using a channel to defer starting the grpc server and loading external labels in an actor concurrently. Signed-off-by: Michael Hoffmann <[email protected]>

saswatamcode

Thanks, but not super familiar with sidecar, maybe @GiedriusS could take a look too?

yeya24

Thanks. The fix makes sense to me!

Previously we defered starting the gRPC server by blocking the whole startup until we could ping prometheus. This breaks usecases that rely on the config reloader to start prometheus. We fix it by using a channel to defer starting the grpc server and loading external labels in an actor concurrently. Signed-off-by: Michael Hoffmann <[email protected]>

pull-request-size bot added the size/L label May 30, 2024

MichaHoffmann force-pushed the mhoffmann-fix-sidecar-reloader branch 2 times, most recently from 20a8f3e to d5c30b2 Compare May 30, 2024 13:53

MichaHoffmann requested review from fpetkovski and saswatamcode May 30, 2024 13:56

MichaHoffmann force-pushed the mhoffmann-fix-sidecar-reloader branch 2 times, most recently from 40e798c to 52ba763 Compare May 30, 2024 14:09

MichaHoffmann force-pushed the mhoffmann-fix-sidecar-reloader branch from 52ba763 to a1bee35 Compare May 30, 2024 14:10

saswatamcode approved these changes May 30, 2024

View reviewed changes

yeya24 approved these changes Jun 10, 2024

View reviewed changes

yeya24 merged commit 8aa42c8 into main Jun 10, 2024
20 checks passed

BrewTestBot mentioned this pull request Jul 31, 2024

thanos 0.36.0 Homebrew/homebrew-core#179115

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sidecar: fix startup sequence #7403

Sidecar: fix startup sequence #7403

MichaHoffmann commented May 30, 2024 •

edited

Loading

saswatamcode left a comment

yeya24 left a comment

Sidecar: fix startup sequence #7403

Sidecar: fix startup sequence #7403

Conversation

MichaHoffmann commented May 30, 2024 • edited Loading

Changes

Verification

saswatamcode left a comment

Choose a reason for hiding this comment

yeya24 left a comment

Choose a reason for hiding this comment

MichaHoffmann commented May 30, 2024 •

edited

Loading