Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cherry-pick #3204 after it has been inadvertently removed #3979

Merged
merged 1 commit into from
Mar 26, 2021

Conversation

dgrisonnet
Copy link
Contributor

@dgrisonnet dgrisonnet commented Mar 26, 2021

Cherry-pick #3204 after it has been inadvertently removed by #3856.

  • I added CHANGELOG entry for this change.
  • Change is not relevant to the end user.

During prometheus updates the alert was firing because the metric was
initialized with a value of '0' before the first heartbeat was sent. As
such, the evaluation of the alert results into actually taking just the
value of time() into consideration which led to misleading information
about the health of the sidecar.

As the thanos_sidecar_last_heartbeat_success_time_seconds metric is
effectively just a timestamp that resets on new deployments, we can
simply wrap it around the timestamp() function which should return
almost the same value of the metric itself with the added benefit that
heartbeat resets will be ignored.

This also refactors the relevant tests and drops the timeout to 4
minutes in order to ensure that we do not get hit by stale data if
the sidecar takes longer to start.

Signed-off-by: Markos Chandras <[email protected]>
Signed-off-by: Damien Grisonnet <[email protected]>
Copy link
Member

@kakkoyun kakkoyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing 💯 Thanks @dgrisonnet for catching this one up!

@kakkoyun kakkoyun enabled auto-merge (squash) March 26, 2021 15:02
@kakkoyun kakkoyun merged commit 5139e33 into thanos-io:main Mar 26, 2021
onprem pushed a commit that referenced this pull request Jun 23, 2021
…er healthy (#4342)

* Revert "mixin: Use sidecar's metric timestamp for healthcheck (#3204) (#3979)"

This reverts commit 5139e33.

Signed-off-by: Arunprasad Rajkumar <[email protected]>

* fix(mixin): ThanosSidecarUnhealthy doesn't fire if the sidecar is never healthy

Signed-off-by: Arunprasad Rajkumar <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants