Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sidecar: fix startup sequence #7403

Merged
merged 1 commit into from
Jun 10, 2024
Merged

Sidecar: fix startup sequence #7403

merged 1 commit into from
Jun 10, 2024

Conversation

MichaHoffmann
Copy link
Contributor

@MichaHoffmann MichaHoffmann commented May 30, 2024

Previously we defered starting the gRPC server by blocking the whole startup until we could ping prometheus. This breaks usecases that rely on the config reloader to start prometheus.
We fix it by using a channel to defer starting the gRPC server and loading external labels in an actor concurrently.

should fix #7400

  • I added CHANGELOG entry for this change.
  • Change is not relevant to the end user.

Changes

Make loading external labels async again but use a channel to block starting the servers.

Verification

Checked it locally:

$ thanos sidecar --prometheus.get_config_interval=5s
ts=2024-05-30T13:55:09.246166909Z caller=sidecar.go:140 level=info msg="no supported bucket was configured, uploads will be disabled"
ts=2024-05-30T13:55:09.246227282Z caller=options.go:26 level=info protocol=gRPC msg="disabled TLS, key and cert must be set to enable"
ts=2024-05-30T13:55:09.246506827Z caller=sidecar.go:427 level=info msg="starting sidecar"
ts=2024-05-30T13:55:09.246607536Z caller=intrumentation.go:75 level=info msg="changing probe status" status=healthy
ts=2024-05-30T13:55:09.246642509Z caller=http.go:73 level=info service=http/server component=sidecar msg="listening for requests and metrics" address=0.0.0.0:10902
ts=2024-05-30T13:55:09.246633798Z caller=reloader.go:262 level=info component=reloader msg="nothing to be watched"
ts=2024-05-30T13:55:09.246841577Z caller=tls_config.go:313 level=info service=http/server component=sidecar msg="Listening on" address=[::]:10902
ts=2024-05-30T13:55:09.24685536Z caller=tls_config.go:316 level=info service=http/server component=sidecar msg="TLS is disabled." http2=false address=[::]:10902
ts=2024-05-30T13:55:09.246984484Z caller=sidecar.go:217 level=warn msg="failed to fetch prometheus version. Is Prometheus running? Retrying" err="perform GET request against http://localhost:9090/api/v1/status/buildinfo: Get \"http://localhost:9090/api/v1/status/buildinfo\": dial tcp [::1]:9090: connect: connection refused"
ts=2024-05-30T13:55:14.250528233Z caller=sidecar.go:217 level=warn msg="failed to fetch prometheus version. Is Prometheus running? Retrying" err="perform GET request against http://localhost:9090/api/v1/status/buildinfo: Get \"http://localhost:9090/api/v1/status/buildinfo\": dial tcp [::1]:9090: connect: connection refused"
ts=2024-05-30T13:55:19.249863255Z caller=sidecar.go:224 level=info msg="successfully loaded prometheus version"
ts=2024-05-30T13:55:19.251212186Z caller=sidecar.go:247 level=info msg="successfully loaded prometheus external labels" external_labels="{cluster=\"cluster_1\"}"
ts=2024-05-30T13:55:19.252019191Z caller=intrumentation.go:56 level=info msg="changing probe status" status=ready
ts=2024-05-30T13:55:19.252303243Z caller=grpc.go:167 level=info service=gRPC/server component=sidecar msg="listening for serving gRPC" address=0.0.0.0:10901
^Cts=2024-05-30T13:55:21.76238197Z caller=main.go:182 level=info msg="caught signal. Exiting." signal=interrupt
ts=2024-05-30T13:55:21.762502109Z caller=intrumentation.go:67 level=warn msg="changing probe status" status=not-ready reason=null
ts=2024-05-30T13:55:21.762531393Z caller=http.go:91 level=info service=http/server component=sidecar msg="internal server is shutting down" err=null
ts=2024-05-30T13:55:21.762729392Z caller=http.go:110 level=info service=http/server component=sidecar msg="internal server is shutdown gracefully" err=null
ts=2024-05-30T13:55:21.762817333Z caller=intrumentation.go:81 level=info msg="changing probe status" status=not-healthy reason=null
ts=2024-05-30T13:55:21.762931491Z caller=grpc.go:174 level=info service=gRPC/server component=sidecar msg="internal server is shutting down" err=null
ts=2024-05-30T13:55:21.762973115Z caller=grpc.go:187 level=info service=gRPC/server component=sidecar msg="gracefully stopping internal server"
ts=2024-05-30T13:55:21.763094802Z caller=grpc.go:200 level=info service=gRPC/server component=sidecar msg="internal server is shutdown gracefully" err=null
ts=2024-05-30T13:55:21.763139606Z caller=main.go:174 level=info msg=exiting

@MichaHoffmann MichaHoffmann force-pushed the mhoffmann-fix-sidecar-reloader branch 2 times, most recently from 20a8f3e to d5c30b2 Compare May 30, 2024 13:53
@MichaHoffmann MichaHoffmann force-pushed the mhoffmann-fix-sidecar-reloader branch 2 times, most recently from 40e798c to 52ba763 Compare May 30, 2024 14:09
Previously we defered starting the gRPC server by blocking the whole
startup until we could ping prometheus. This breaks usecases that rely
on the config reloader to start prometheus.
We fix it by using a channel to defer starting the grpc server
and loading external labels in an actor concurrently.

Signed-off-by: Michael Hoffmann <[email protected]>
@MichaHoffmann MichaHoffmann force-pushed the mhoffmann-fix-sidecar-reloader branch from 52ba763 to a1bee35 Compare May 30, 2024 14:10
Copy link
Member

@saswatamcode saswatamcode left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, but not super familiar with sidecar, maybe @GiedriusS could take a look too?

Copy link
Contributor

@yeya24 yeya24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. The fix makes sense to me!

@yeya24 yeya24 merged commit 8aa42c8 into main Jun 10, 2024
20 checks passed
hczhu-db pushed a commit to databricks/thanos that referenced this pull request Aug 22, 2024
Previously we defered starting the gRPC server by blocking the whole
startup until we could ping prometheus. This breaks usecases that rely
on the config reloader to start prometheus.
We fix it by using a channel to defer starting the grpc server
and loading external labels in an actor concurrently.

Signed-off-by: Michael Hoffmann <[email protected]>
hczhu-db pushed a commit to databricks/thanos that referenced this pull request Aug 22, 2024
Previously we defered starting the gRPC server by blocking the whole
startup until we could ping prometheus. This breaks usecases that rely
on the config reloader to start prometheus.
We fix it by using a channel to defer starting the grpc server
and loading external labels in an actor concurrently.

Signed-off-by: Michael Hoffmann <[email protected]>
jnyi pushed a commit to jnyi/thanos that referenced this pull request Oct 17, 2024
Previously we defered starting the gRPC server by blocking the whole
startup until we could ping prometheus. This breaks usecases that rely
on the config reloader to start prometheus.
We fix it by using a channel to defer starting the grpc server
and loading external labels in an actor concurrently.

Signed-off-by: Michael Hoffmann <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Regression in thanos v0.35.1
3 participants