-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Voyager Adaptive Parallelism] Implement YSQL function to get metrics for all nodes in cluster #23542
Labels
area/ecosystem
Label for all ecosystem related projects
jira-originated
kind/new-feature
This is a request for a completely new feature
priority/low
Low priority
Comments
yugabyte-ci
added
area/ecosystem
Label for all ecosystem related projects
jira-originated
kind/new-feature
This is a request for a completely new feature
priority/low
Low priority
labels
Aug 19, 2024
makalaaneesh
added a commit
that referenced
this issue
Sep 20, 2024
…trics such as cpu/memory usage from all nodes in cluster Summary: To enable adaptive parallelism in voyager, https://docs.google.com/document/d/1beD7zNtpmfYflXV1hVJ9mq_uqyCTJ9Es4titPEksSNE/edit#heading=h.3c3bf00hwf, a YSQL function yb_servers_metrics() is added which will fetch certain metrics for all nodes in the cluster. This allows voyager to monitor the state of the cluster, and adapt the parallelism while importing data to target YB cluster. A YSQL API is needed in order to provide deployment-agnostic API (not having to fetch metrics for YBA/YBM/on-prem using different mechanisms). Additionally, made a few changes to `MetricsSnapshotter` - Introduced a function for GetCpuUsageInInterval(int ms). - made the GetCpuUsage function static. - Introduced a `GetMemoryUsage` function to get memory usage (from proc/meminfo for linux and sysctl for macos) Sample output: ``` yugabyte=# select uuid, jsonb_pretty(metrics), status, error from yb_servers_metrics(); uuid | jsonb_pretty | status | error ----------------------------------+-----------------------------------------------------+--------+------- bf98c74dd7044b34943c5bff7bd3d0d1 | { +| OK | | "memory_free": "0", +| | | "memory_total": "17179869184", +| | | "cpu_usage_user": "0.135827", +| | | "cpu_usage_system": "0.118110", +| | | "memory_available": "0", +| | | "tserver_root_memory_limit": "11166914969", +| | | "tserver_root_memory_soft_limit": "9491877723",+| | | "tserver_root_memory_consumption": "52346880" +| | | } | | d105c3a6128640f5a25cc74435e48ae3 | { +| OK | | "memory_free": "0", +| | | "memory_total": "17179869184", +| | | "cpu_usage_user": "0.135189", +| | | "cpu_usage_system": "0.119284", +| | | "memory_available": "0", +| | | "tserver_root_memory_limit": "11166914969", +| | | "tserver_root_memory_soft_limit": "9491877723",+| | | "tserver_root_memory_consumption": "55074816" +| | | } | | a321e13e5bf24060a764b35894cd4070 | { +| OK | | "memory_free": "0", +| | | "memory_total": "17179869184", +| | | "cpu_usage_user": "0.135827", +| | | "cpu_usage_system": "0.118110", +| | | "memory_available": "0", +| | | "tserver_root_memory_limit": "11166914969", +| | | "tserver_root_memory_soft_limit": "9491877723",+| | | "tserver_root_memory_consumption": "62062592" +| | | } | | ``` **Upgrade/Rollback safety:** This is a new YSQL function, so there won't be any prior users of this function. In case of an upgrade/rollback, the sql migration (that adds the function to pg_proc) will only run when the upgrade is being finalized (i.e. after all tservers are updated). Hence, it will not be possible to get errors due to a subset of tservers not being upgraded because the function itself will not be available to call. Test Plan: ./yb_build.sh --java-test 'org.yb.pgsql.TestYbServersMetrics#testYBServersMetricsFunction' Reviewers: asaha, djiang, telgersma Reviewed By: djiang, telgersma Subscribers: hbhanawat, yql, ybase, amakala Differential Revision: https://phorge.dev.yugabyte.com/D37267
foucher
pushed a commit
that referenced
this issue
Sep 23, 2024
Summary: ead90cc [#23645] docdb: Fix tests timing out on TSAN after 15786f3 0a6a31e [doc] Fix BNL flag defaults (#23945) 54793c8 [#22925] docdb: Persist tserver registry entries to sys catalog fbef568 [PLAT-15378][localProvider][dr] Deflake testDrConfigSetup local provider test 64ac031 [#23978] xCluster: set up sequences_data stream(s) on target universe 8d228a8 [#23923] YSQL: Fix DDL atomicity check failure 903d793 [PLAT-15328] Configure cgroup for non rhel9 machines as part of provision Excluded: 5dc71ea [#23882] YSQL: Improve cache re-invalidation for alter table commands 1e70024 [DOC-480] CDC metric description and voyager minor fixes (#24028) 7a4b409 [#23700] CDCSDK: Use leader epoch instead of leader term in table removal bg task 4d922ca [#23922] docdb: Handle colocated tablets correctly in tablet limit checks. 487bc77 [PLAT-15158 Update replication frequency tooltip 2059eee [#24001] docdb: Replace tablet in tablegroup manager on repartition of colocated table 90d4e93 [#24020] DocDB: Vector LSM 294b7bb [PLAT-14435]Fix args parsing in failure detection py script Excluded: 872b59e [#23542] YSQL: Add new YSQL function yb_servers_metrics() to fetch metrics such as cpu/memory usage from all nodes in cluster 252717b [PLAT-12263] G-Flag upgrade fails for tmp_dir if Rolling restart used Test Plan: Jenkins: rebase: pg15-cherrypicks Reviewers: jason, jenkins-bot Differential Revision: https://phorge.dev.yugabyte.com/D38266
makalaaneesh
pushed a commit
that referenced
this issue
Sep 25, 2024
…rvers_metrics() to fetch metrics such as cpu/memory usage from all nodes in cluster Summary: - catalog.h - YB_LAST_USED_OID - YB master 872b59e changes `YB_LAST_USED_OID` to 8072 - YB PG15 7989b01610e9d7ca5dbbcb4da1ebb25c7864f1c changed `YB_LAST_USED_OID` to 8071 - kept 8072 - pg_proc.dat - At the end of the file - YB master 872b59e adds proc entry for oid 8072 - YB PG15 7989b01610e9d7ca5dbbcb4da1ebb25c7864f1c adds proc entry for oid 8071 - added 8072 entry - pg_yb_migration.dat - # here: (line 15) - YB master 872b59e changes major's value to 58 and adds `V58__23542__yb_servers_metrics.sql` comment. - YB PG15 7989b01610e9d7ca5dbbcb4da1ebb25c7864f1c changes major's value to 57 and adds `V57__23312__binary_upgrade_set_next_tablegroup_default` comment. - kept yb master commit's changes. - yb_system_views.sql - CREATE VIEW yb_servers_metrics AS - YB master 872b59e added a new view yb_servers_metrics - YB PG 55782d5 removed a lot of views and functions just next to this line - Kept yb_servers_metrics definition and removed the other views and functions original summary ``` To enable adaptive parallelism in voyager, https://docs.google.com/document/d/1beD7zNtpmfYflXV1hVJ9mq_uqyCTJ9Es4titPEksSNE/edit#heading=h.3c3bf00hwf, a YSQL function yb_servers_metrics() is added which will fetch certain metrics for all nodes in the cluster. This allows voyager to monitor the state of the cluster, and adapt the parallelism while importing data to target YB cluster. A YSQL API is needed in order to provide deployment-agnostic API (not having to fetch metrics for YBA/YBM/on-prem using different mechanisms). Additionally, made a few changes to MetricsSnapshotter Introduced a function for GetCpuUsageInInterval(int ms). made the GetCpuUsage function static. Introduced a GetMemoryUsage function to get memory usage (from proc/meminfo for linux and sysctl for macos) Sample output: yugabyte=# select uuid, jsonb_pretty(metrics), status, error from yb_servers_metrics(); uuid | jsonb_pretty | status | error ----------------------------------+-----------------------------------------------------+--------+------- bf98c74dd7044b34943c5bff7bd3d0d1 | { +| OK | | "memory_free": "0", +| | | "memory_total": "17179869184", +| | | "cpu_usage_user": "0.135827", +| | | "cpu_usage_system": "0.118110", +| | | "memory_available": "0", +| | | "tserver_root_memory_limit": "11166914969", +| | | "tserver_root_memory_soft_limit": "9491877723",+| | | "tserver_root_memory_consumption": "52346880" +| | | } | | d105c3a6128640f5a25cc74435e48ae3 | { +| OK | | "memory_free": "0", +| | | "memory_total": "17179869184", +| | | "cpu_usage_user": "0.135189", +| | | "cpu_usage_system": "0.119284", +| | | "memory_available": "0", +| | | "tserver_root_memory_limit": "11166914969", +| | | "tserver_root_memory_soft_limit": "9491877723",+| | | "tserver_root_memory_consumption": "55074816" +| | | } | | a321e13e5bf24060a764b35894cd4070 | { +| OK | | "memory_free": "0", +| | | "memory_total": "17179869184", +| | | "cpu_usage_user": "0.135827", +| | | "cpu_usage_system": "0.118110", +| | | "memory_available": "0", +| | | "tserver_root_memory_limit": "11166914969", +| | | "tserver_root_memory_soft_limit": "9491877723",+| | | "tserver_root_memory_consumption": "62062592" +| | | } | | Upgrade/Rollback safety: This is a new YSQL function, so there won't be any prior users of this function. In case of an upgrade/rollback, the sql migration (that adds the function to pg_proc) will only run when the upgrade is being finalized (i.e. after all tservers are updated). Hence, it will not be possible to get errors due to a subset of tservers not being upgraded because the function itself will not be available to call. ``` Test Plan: ./yb_build.sh --java-test 'org.yb.pgsql.TestYbServersMetrics#testYBServersMetricsFunction' Reviewers: jason, tfoucher, fizaa, ishan.chhangani Reviewed By: fizaa Subscribers: fizaa Differential Revision: https://phorge.dev.yugabyte.com/D38307
makalaaneesh
added a commit
that referenced
this issue
Oct 15, 2024
…ics() to fetch metrics such as cpu/memory usage from all nodes in cluster Summary: To enable adaptive parallelism in voyager, https://docs.google.com/document/d/1beD7zNtpmfYflXV1hVJ9mq_uqyCTJ9Es4titPEksSNE/edit#heading=h.3c3bf00hwf, a YSQL function yb_servers_metrics() is added which will fetch certain metrics for all nodes in the cluster. This allows voyager to monitor the state of the cluster, and adapt the parallelism while importing data to target YB cluster. A YSQL API is needed in order to provide deployment-agnostic API (not having to fetch metrics for YBA/YBM/on-prem using different mechanisms). Additionally, made a few changes to `MetricsSnapshotter` - Introduced a function for GetCpuUsageInInterval(int ms). - made the GetCpuUsage function static. - Introduced a `GetMemoryUsage` function to get memory usage (from proc/meminfo for linux and sysctl for macos) Sample output: ``` yugabyte=# select uuid, jsonb_pretty(metrics), status, error from yb_servers_metrics(); uuid | jsonb_pretty | status | error ----------------------------------+-----------------------------------------------------+--------+------- bf98c74dd7044b34943c5bff7bd3d0d1 | { +| OK | | "memory_free": "0", +| | | "memory_total": "17179869184", +| | | "cpu_usage_user": "0.135827", +| | | "cpu_usage_system": "0.118110", +| | | "memory_available": "0", +| | | "tserver_root_memory_limit": "11166914969", +| | | "tserver_root_memory_soft_limit": "9491877723",+| | | "tserver_root_memory_consumption": "52346880" +| | | } | | d105c3a6128640f5a25cc74435e48ae3 | { +| OK | | "memory_free": "0", +| | | "memory_total": "17179869184", +| | | "cpu_usage_user": "0.135189", +| | | "cpu_usage_system": "0.119284", +| | | "memory_available": "0", +| | | "tserver_root_memory_limit": "11166914969", +| | | "tserver_root_memory_soft_limit": "9491877723",+| | | "tserver_root_memory_consumption": "55074816" +| | | } | | a321e13e5bf24060a764b35894cd4070 | { +| OK | | "memory_free": "0", +| | | "memory_total": "17179869184", +| | | "cpu_usage_user": "0.135827", +| | | "cpu_usage_system": "0.118110", +| | | "memory_available": "0", +| | | "tserver_root_memory_limit": "11166914969", +| | | "tserver_root_memory_soft_limit": "9491877723",+| | | "tserver_root_memory_consumption": "62062592" +| | | } | | ``` **Upgrade/Rollback safety:** This is a new YSQL function, so there won't be any prior users of this function. In case of an upgrade/rollback, the sql migration (that adds the function to pg_proc) will only run when the upgrade is being finalized (i.e. after all tservers are updated). Hence, it will not be possible to get errors due to a subset of tservers not being upgraded because the function itself will not be available to call. Original commit: 872b59e / D37267 Test Plan: ./yb_build.sh --java-test 'org.yb.pgsql.TestYbServersMetrics#testYBServersMetricsFunction' Reviewers: asaha, djiang, telgersma Reviewed By: asaha Subscribers: amakala, ybase, yql, hbhanawat Differential Revision: https://phorge.dev.yugabyte.com/D39000
makalaaneesh
added a commit
that referenced
this issue
Oct 15, 2024
…ics() to fetch metrics such as cpu/memory usage from all nodes in cluster Summary: To enable adaptive parallelism in voyager, https://docs.google.com/document/d/1beD7zNtpmfYflXV1hVJ9mq_uqyCTJ9Es4titPEksSNE/edit#heading=h.3c3bf00hwf, a YSQL function yb_servers_metrics() is added which will fetch certain metrics for all nodes in the cluster. This allows voyager to monitor the state of the cluster, and adapt the parallelism while importing data to target YB cluster. A YSQL API is needed in order to provide deployment-agnostic API (not having to fetch metrics for YBA/YBM/on-prem using different mechanisms). Additionally, made a few changes to `MetricsSnapshotter` - Introduced a function for GetCpuUsageInInterval(int ms). - made the GetCpuUsage function static. - Introduced a `GetMemoryUsage` function to get memory usage (from proc/meminfo for linux and sysctl for macos) Sample output: ``` yugabyte=# select uuid, jsonb_pretty(metrics), status, error from yb_servers_metrics(); uuid | jsonb_pretty | status | error ----------------------------------+-----------------------------------------------------+--------+------- bf98c74dd7044b34943c5bff7bd3d0d1 | { +| OK | | "memory_free": "0", +| | | "memory_total": "17179869184", +| | | "cpu_usage_user": "0.135827", +| | | "cpu_usage_system": "0.118110", +| | | "memory_available": "0", +| | | "tserver_root_memory_limit": "11166914969", +| | | "tserver_root_memory_soft_limit": "9491877723",+| | | "tserver_root_memory_consumption": "52346880" +| | | } | | d105c3a6128640f5a25cc74435e48ae3 | { +| OK | | "memory_free": "0", +| | | "memory_total": "17179869184", +| | | "cpu_usage_user": "0.135189", +| | | "cpu_usage_system": "0.119284", +| | | "memory_available": "0", +| | | "tserver_root_memory_limit": "11166914969", +| | | "tserver_root_memory_soft_limit": "9491877723",+| | | "tserver_root_memory_consumption": "55074816" +| | | } | | a321e13e5bf24060a764b35894cd4070 | { +| OK | | "memory_free": "0", +| | | "memory_total": "17179869184", +| | | "cpu_usage_user": "0.135827", +| | | "cpu_usage_system": "0.118110", +| | | "memory_available": "0", +| | | "tserver_root_memory_limit": "11166914969", +| | | "tserver_root_memory_soft_limit": "9491877723",+| | | "tserver_root_memory_consumption": "62062592" +| | | } | | ``` **Upgrade/Rollback safety:** This is a new YSQL function, so there won't be any prior users of this function. In case of an upgrade/rollback, the sql migration (that adds the function to pg_proc) will only run when the upgrade is being finalized (i.e. after all tservers are updated). Hence, it will not be possible to get errors due to a subset of tservers not being upgraded because the function itself will not be available to call. Original commit: 872b59e / D37267 While resolving merge conflicts: - Removed any rpcs or methods or entries in yb_system_views.sql, pg_proc.dat, yb_pg_rules.out that were not part of original diff. - last used OID on 2024.1 was 8067, the one I added is 8072; so I modified it to 8072. - last migration script on 2024.1 was V51, mine is V58. so changed my migration to V51.1. (Did not change the OID of the entry in pg_proc) Test Plan: ./yb_build.sh --java-test 'org.yb.pgsql.TestYbServersMetrics#testYBServersMetricsFunction' Reviewers: asaha, djiang, telgersma Reviewed By: asaha Subscribers: amakala, ybase, yql, hbhanawat Differential Revision: https://phorge.dev.yugabyte.com/D39052
makalaaneesh
added a commit
that referenced
this issue
Oct 18, 2024
…s() to fetch metrics such as cpu/memory usage from all nodes in cluster Summary: To enable adaptive parallelism in voyager, https://docs.google.com/document/d/1beD7zNtpmfYflXV1hVJ9mq_uqyCTJ9Es4titPEksSNE/edit#heading=h.3c3bf00hwf, a YSQL function yb_servers_metrics() is added which will fetch certain metrics for all nodes in the cluster. This allows voyager to monitor the state of the cluster, and adapt the parallelism while importing data to target YB cluster. A YSQL API is needed in order to provide deployment-agnostic API (not having to fetch metrics for YBA/YBM/on-prem using different mechanisms). Additionally, made a few changes to `MetricsSnapshotter` - Introduced a function for GetCpuUsageInInterval(int ms). - made the GetCpuUsage function static. - Introduced a `GetMemoryUsage` function to get memory usage (from proc/meminfo for linux and sysctl for macos) Sample output: ``` yugabyte=# select uuid, jsonb_pretty(metrics), status, error from yb_servers_metrics(); uuid | jsonb_pretty | status | error ----------------------------------+-----------------------------------------------------+--------+------- bf98c74dd7044b34943c5bff7bd3d0d1 | { +| OK | | "memory_free": "0", +| | | "memory_total": "17179869184", +| | | "cpu_usage_user": "0.135827", +| | | "cpu_usage_system": "0.118110", +| | | "memory_available": "0", +| | | "tserver_root_memory_limit": "11166914969", +| | | "tserver_root_memory_soft_limit": "9491877723",+| | | "tserver_root_memory_consumption": "52346880" +| | | } | | d105c3a6128640f5a25cc74435e48ae3 | { +| OK | | "memory_free": "0", +| | | "memory_total": "17179869184", +| | | "cpu_usage_user": "0.135189", +| | | "cpu_usage_system": "0.119284", +| | | "memory_available": "0", +| | | "tserver_root_memory_limit": "11166914969", +| | | "tserver_root_memory_soft_limit": "9491877723",+| | | "tserver_root_memory_consumption": "55074816" +| | | } | | a321e13e5bf24060a764b35894cd4070 | { +| OK | | "memory_free": "0", +| | | "memory_total": "17179869184", +| | | "cpu_usage_user": "0.135827", +| | | "cpu_usage_system": "0.118110", +| | | "memory_available": "0", +| | | "tserver_root_memory_limit": "11166914969", +| | | "tserver_root_memory_soft_limit": "9491877723",+| | | "tserver_root_memory_consumption": "62062592" +| | | } | | ``` **Upgrade/Rollback safety:** This is a new YSQL function, so there won't be any prior users of this function. In case of an upgrade/rollback, the sql migration (that adds the function to pg_proc) will only run when the upgrade is being finalized (i.e. after all tservers are updated). Hence, it will not be possible to get errors due to a subset of tservers not being upgraded because the function itself will not be available to call. While resolving backport merge conflicts: - Removed any rpcs or methods or entries in yb_system_views.sql, pg_proc.dat, yb_pg_rules.out that were not part of original diff. - last used OID on 2.20 was 8064, the one I added is 8072; so I modified it to 8072. - last migration script on 2.20 was V43.1, original migration of my diff was V58. so changed migration to V43.2. (Did not change the OID of the entry in pg_proc) Original commit: 872b59e / D37267 Test Plan: ./yb_build.sh --java-test 'org.yb.pgsql.TestYbServersMetrics#testYBServersMetricsFunction' Reviewers: asaha, djiang, telgersma Reviewed By: telgersma Subscribers: amakala, ybase, yql, hbhanawat Differential Revision: https://phorge.dev.yugabyte.com/D39125
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
area/ecosystem
Label for all ecosystem related projects
jira-originated
kind/new-feature
This is a request for a completely new feature
priority/low
Low priority
Jira Link: DB-12460
The text was updated successfully, but these errors were encountered: