-
Notifications
You must be signed in to change notification settings - Fork 454
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[dbnode] Emit metric with dbnode health status (#2588)
Emit metric with dbnode health status Problem: In a large m3db cluster, when a database node becomes non-functional (service fails to start or host is down), it may go unnoticed. If it goes unnoticed long enough, and one more node that owns the same shard(s) becomes non-functional, a quorum may be lost and block writes to the database. Solution: The connection pool in `src/dbnode/client/connection_pool.go` already does periodic health check from the client's node/process. Let that code emit a gauge metric with the result of the healthcheck. The metrics scope passed to `newConnectionPool` is already tagged with `hostID`. Since the healthcheck is done from the client, it implies that node is in M3DB placement and expected to be functional. Thus, alerting can be set up based on this metric alone. This behavior is optional, and disabled by default, to prevent accidental explosion of metric cardinality. When enabled, the callsites must ensure that the tags they set on the scope passed to m3db node client will not cause high cardinality of combinations with `hostID` tag. Considered Alternatives: 1. Emit a heartbeat metric from `src/dbnode/network/server/tchannelthrift/node/service.go`. Alerting on lost heartbeat requires knowledge about whether the node is in placement, i.e. expected to be functional. 2. Let independent monitoring/canary system actively probe healthcheck endpoint of every database node, determine whether the node is expected to be functional by comparing to M3DB placement data, and alert operator. Such solution would be ideal but has much higher cost.
- Loading branch information
Showing
4 changed files
with
96 additions
and
1 deletion.
There are no files selected for viewing
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters