Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor application health monitoring #82

Merged
merged 7 commits into from
Aug 30, 2024

Conversation

istreeter
Copy link
Contributor

Application health is needed for two distinct reasons:

  1. Sending alerts to a monitoring webhook for setup errors. Relatedly, sending heartbeat messages to the webhook when healthy.
  2. The HTTP health probe. Needed so that the orchestration environment (kubernetes or whatever) can kill the pod when unhealthy

Several Snowplow apps each implement their own logic of when to toggle the health probe and when to send an alert. This PR consolidates that logic into one place.

After this PR, the application code just needs to call methods on the AppHealth class. This lib then manages webhook events and health probe, based on the current status of the AppHealth.

Application health is needed for two distinct reasons:

1. Sending alerts to a monitoring webhook for setup errors. Relatedly,
   sending heartbeat messages to the webhook when healthy.
2. The HTTP health probe.  Needed so that the orchestration environment
   (kubernetes or whatever) can kill the pod when unhealthy

Several Snowplow apps each implement their own logic of when to toggle
the health probe and when to send an alert.  This PR consolidates that
logic into one place.

After this PR, the application code just needs to call methods on the
AppHealth class.  This lib then manages webhook events and health probe,
based on the current status of the AppHealth.
@istreeter istreeter force-pushed the feature/application-health-monitoring branch from 4b3a25f to b29df23 Compare August 13, 2024 17:37

private[runtime] sealed trait SetupStatus[+Alert]
private[runtime] object SetupStatus {
case object AwaitingHealth extends SetupStatus[Nothing]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It only affects what is printed in logs, right? In health probe it still means 503 unhealthy?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • When we first transition to Healthy then we send a heartbeat event to the webhook
  • When we first transition to Unhealthy then we send a alert event to the webhook

Therefore we need a type which is neither Healthy nor Unhealthy so we detect the transition. Hence AwaitingHealth.

@istreeter istreeter force-pushed the feature/application-health-monitoring branch from 2011180 to f261713 Compare August 14, 2024 13:59
@istreeter istreeter force-pushed the feature/application-health-monitoring branch from 4b492d9 to 0e72774 Compare August 14, 2024 21:02
@istreeter istreeter force-pushed the feature/application-health-monitoring branch from 1135ccf to 8d43c85 Compare August 16, 2024 11:15
@istreeter istreeter merged commit 65b7725 into develop Aug 30, 2024
1 check passed
@istreeter istreeter deleted the feature/application-health-monitoring branch August 30, 2024 10:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants