Skip to content
This repository has been archived by the owner on Jun 1, 2021. It is now read-only.

Latest commit

 

History

History
114 lines (91 loc) · 4.97 KB

File metadata and controls

114 lines (91 loc) · 4.97 KB

Eventuate replication endpoint health information for dropwizard's health-checks lib

eventuate-tool's dropwizard-healthchecks provides health monitoring facilities for Eventuate components based on dropwizard's health-checks library. The following can be monitored:

  • health of the replication from remote source logs based on Available/Unavailable messages.
  • health of the connection to the storage backend for persisting new events based on information from an event-log's circuit breaker.
  • health of Eventuate's actors based on akka's death watch

Project dependency

The following artifact is published to jfrog's snapshot and release repository:

  • Artifact Id: dropwizard-healthchecks_<scala-version>
  • Group Name: com.rbmhtechnology.eventuate-tools

Settings for an sbt-build:

libraryDependencies += "com.rbmhtechnology.eventuate-tools" %% "dropwizard-healthchecks" % "<version>"

// for snapshots
resolvers += "OJO Snapshots" at "https://oss.jfrog.org/oss-snapshot-local"

// for releases
resolvers += "OJO Releases" at "https://oss.jfrog.org/oss-release-local"

Register with HealthCheckRegistry

Given a ReplicationEndpoint (endpoint) to be monitored and a HealthCheckRegistry (healthRegistry) health checks can be registered under a given optional prefix (namePrefix) for each of the components listed above as follows:

  • replication health:
    val monitor = new ReplicationHealthMonitor(endpoint, healthRegistry, namePrefix)
  • circuit breaker health:
    val monitor = new CircuitBreakerHealthMonitor(endpoint, healthRegistry, namePrefix)
  • actor health:
    val monitor = new ActorHealthMonitor(endpoint, healthRegistry, namePrefix)

There is also a convenience class to register all at once in a single HealthCheckRegistry under a single namePrefix:

val monitor = new ReplicationEndpointHealthMonitor(endpoint, healthRegistry, namePrefix)

Health monitoring can be stopped to remove the registered health checks in each case as follows:

monitor.stopMonitoring()

When the actor system stops without the monitoring being stopped first all registered health checks turn unhealthy and indicate the monitored component in an unknown state. This ensures that in case of an unexpected actor system stop (as for example triggered by Eventuate's cassandra extension, when the database cannot be accessed at startup) all components are reported as unhealthy.

By default the ReplicationHealthMonitor does not report the health status for the replication from remote logs where the connection could not yet be established as the health monitoring is based on Eventuate's Available and Unavailable notifications that are only sent if an initial connect was successful. The ReplicationHealthMonitor can be initialized with an optional parameter initiallyUnhealty that takes a set of log-ids of remote logs whose replication status shall be reported as unhealthy initially (because the connection can not be established) until the first Available is received. As Eventuate internally does not know the endpoint ids of remote endpoints until the connection is established the ids have to be provided explicitly.

Health-check names

For a given prefix the individual monitors register the following health checks:

  • ReplicationHealthMonitor registers for each local log that is replicated to a remote endpoint:
    <prefix>.replication-from.<remote-endpoint-id>.<log-name>
    
    This turns unhealthy as soon as an Unavailable message for this particular log arrives and back to healthy when a corresponding Available message arrives. See also the corresponding section in the Eventuate documentation.
  • CircuitBreakerHealthMonitor registers for each local log:
    <prefix>.circuit-breaker-of.<log-id>
    
    This turns unhealthy as soon as the circuit breaker opens and healthy when it closes. Currently only the CassandraEventLog uses the circuit breaker and it only applies when persisting of locally emitted events fails.
  • ActorHealthMonitor registers for each local log:
    <prefix>.actor.eventlog.<log-id>
    
    and for the acceptor:
    <prefix>.actor.acceptor
    
    These turn unhealthy as soon as the corresponding actors terminate.