-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DocDB] Expose master's raft heartbeat delay for each follower as a metric #21178
Closed
1 task done
Labels
area/docdb
YugabyteDB core features
kind/enhancement
This is an enhancement of an existing feature
priority/medium
Medium priority issue
Comments
iSignal
added
area/docdb
YugabyteDB core features
status/awaiting-triage
Issue awaiting triage
labels
Feb 23, 2024
yugabyte-ci
added
kind/enhancement
This is an enhancement of an existing feature
priority/medium
Medium priority issue
labels
Feb 23, 2024
@druzac - assigned this to you since you added the API. Ask is metric for the same issue. |
druzac
added a commit
that referenced
this issue
Sep 25, 2024
Summary: Adds a metric for the maximum master follower heartbeat delay. Ideally this metric would report the raw follower delays directly and clients could calculate the maximum themselves. However there is no metric entity we could use to export a metric from the leader master with the right fields. Adding a new metric entity for just this metric seems too involved given the current use-case (alert if the follower delay is too large). Another alternative would be adding a server entity metric to each master, and have followers report their own delay. But we are interested in the perspective from the master leader here and that approach could introduce surprising behaviour. Jira: DB-10113 Test Plan: ``` ./yb_build.sh --with-tests --cxx-test-filter-re tablet_health_manager-itest --cxx-test tablet_health_manager-itest --gtest_filter 'AreNodesSafeToTakeDownItest.GetFollowerUpdate*' ``` Reviewers: asrivastava, sanketh Reviewed By: asrivastava Subscribers: ybase, slingam Differential Revision: https://phorge.dev.yugabyte.com/D38319
timothy-e
pushed a commit
that referenced
this issue
Sep 26, 2024
Summary: 35b12d2 [PLAT-15404] Average YSQL operations latency alert is using incorrect units (ms vs microsecs) Excluded: 008f885 [#23788] YSQL, QueryDiagnostics: Fixing issues in pg_stat_statements when no query executed 6ca8cc4 [#23810] yugabyted-ui: UI is displaying incorrect disk size when multiple data directories dca5923 [PLAT-15034][K8s] Add changes to apply master_join_existing_cluster gflag fa9b370 [docs] Update content for getting started page for CDC logical replication (#23916) 8db0ffb [PLAT-15380] clock drift alert did not reference nodes 44ae377 [PLAT-15349] Mark universe update as success after update lb config Excluded: 9f90819 [#24121] xCluster: Fix xcluster_outbound_replication_group-itest TestGetStreamByTableId 250a4d5 [#24026] docdb: Fix SIGSEGV from MaxPersistentOpId after flush 0d1046a [DEVOPS-3238] Move macOS build to macos13 (Ventura) 87cffc6 [#24137] DocDB: Add gflag_allowlist to yb_release_manifest 678d277 [#21178] docdb: Add metric for the max master follower heartbeat delay. ff97f51 [doc][ybm] Certificate links (#24139) Excluded: d26b62d [#21733] YSQL: ParallelAppend and pg_hint_plan 3ffe5a7 [PLAT-10519]Lack of Client-Side Inactivity Timeout - Part 1 254e164 [PLAT-15432] remove status,sizeInBytes from manifest.json file Test Plan: Jenkins: rebase: pg15-cherrypicks Reviewers: tfoucher, fizaa, telgersma Differential Revision: https://phorge.dev.yugabyte.com/D38454
druzac
added a commit
that referenced
this issue
Sep 30, 2024
…er heartbeat delay. Summary: Adds a metric for the maximum master follower heartbeat delay. Ideally this metric would report the raw follower delays directly and clients could calculate the maximum themselves. However there is no metric entity we could use to export a metric from the leader master with the right fields. Adding a new metric entity for just this metric seems too involved given the current use-case (alert if the follower delay is too large). Another alternative would be adding a server entity metric to each master, and have followers report their own delay. But we are interested in the perspective from the master leader here and that approach could introduce surprising behaviour. Jira: DB-10113 Original commit: 678d277 / D38319 Test Plan: ``` ./yb_build.sh --with-tests --cxx-test-filter-re tablet_health_manager-itest --cxx-test tablet_health_manager-itest --gtest_filter 'AreNodesSafeToTakeDownItest.GetFollowerUpdate*' ``` Reviewers: asrivastava, sanketh Reviewed By: asrivastava Subscribers: slingam, ybase Differential Revision: https://phorge.dev.yugabyte.com/D38441
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
area/docdb
YugabyteDB core features
kind/enhancement
This is an enhancement of an existing feature
priority/medium
Medium priority issue
Jira Link: DB-10113
Description
#18788 added a new RPC to expose the RAFT heartbeat delay for masters. Could we also expose this as a metric so that it can be alerted on?
@lingamsandeep @PrarabdhGarg
Issue Type
kind/enhancement
Warning: Please confirm that this issue does not contain any sensitive information
The text was updated successfully, but these errors were encountered: