Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Voyager Adaptive Parallelism] Implement YSQL function to get metrics for all nodes in cluster #23542

Closed
yugabyte-ci opened this issue Aug 19, 2024 · 0 comments
Assignees
Labels
area/ecosystem Label for all ecosystem related projects jira-originated kind/new-feature This is a request for a completely new feature priority/low Low priority

Comments

@yugabyte-ci
Copy link
Contributor

yugabyte-ci commented Aug 19, 2024

Jira Link: DB-12460

@yugabyte-ci yugabyte-ci added area/ecosystem Label for all ecosystem related projects jira-originated kind/new-feature This is a request for a completely new feature priority/low Low priority labels Aug 19, 2024
makalaaneesh added a commit that referenced this issue Sep 20, 2024
…trics such as cpu/memory usage from all nodes in cluster

Summary:
To enable adaptive parallelism in voyager, https://docs.google.com/document/d/1beD7zNtpmfYflXV1hVJ9mq_uqyCTJ9Es4titPEksSNE/edit#heading=h.3c3bf00hwf, a YSQL function yb_servers_metrics() is added
which will fetch certain metrics for all nodes in the cluster. This allows voyager to  monitor the state of the cluster, and adapt the parallelism while importing data to target YB cluster. A YSQL API is needed in
order to provide deployment-agnostic API (not having to fetch metrics for YBA/YBM/on-prem using different mechanisms).

Additionally, made a few changes to `MetricsSnapshotter`
- Introduced a function for GetCpuUsageInInterval(int ms).
- made the GetCpuUsage function static.
- Introduced a `GetMemoryUsage` function to get memory usage (from proc/meminfo for linux and sysctl for macos)

Sample output:

```
yugabyte=# select uuid, jsonb_pretty(metrics), status, error from yb_servers_metrics();
               uuid               |                    jsonb_pretty                     | status | error
----------------------------------+-----------------------------------------------------+--------+-------
 bf98c74dd7044b34943c5bff7bd3d0d1 | {                                                  +| OK     |
                                  |     "memory_free": "0",                            +|        |
                                  |     "memory_total": "17179869184",                 +|        |
                                  |     "cpu_usage_user": "0.135827",                  +|        |
                                  |     "cpu_usage_system": "0.118110",                +|        |
                                  |     "memory_available": "0",                       +|        |
                                  |     "tserver_root_memory_limit": "11166914969",    +|        |
                                  |     "tserver_root_memory_soft_limit": "9491877723",+|        |
                                  |     "tserver_root_memory_consumption": "52346880"  +|        |
                                  | }                                                   |        |
 d105c3a6128640f5a25cc74435e48ae3 | {                                                  +| OK     |
                                  |     "memory_free": "0",                            +|        |
                                  |     "memory_total": "17179869184",                 +|        |
                                  |     "cpu_usage_user": "0.135189",                  +|        |
                                  |     "cpu_usage_system": "0.119284",                +|        |
                                  |     "memory_available": "0",                       +|        |
                                  |     "tserver_root_memory_limit": "11166914969",    +|        |
                                  |     "tserver_root_memory_soft_limit": "9491877723",+|        |
                                  |     "tserver_root_memory_consumption": "55074816"  +|        |
                                  | }                                                   |        |
 a321e13e5bf24060a764b35894cd4070 | {                                                  +| OK     |
                                  |     "memory_free": "0",                            +|        |
                                  |     "memory_total": "17179869184",                 +|        |
                                  |     "cpu_usage_user": "0.135827",                  +|        |
                                  |     "cpu_usage_system": "0.118110",                +|        |
                                  |     "memory_available": "0",                       +|        |
                                  |     "tserver_root_memory_limit": "11166914969",    +|        |
                                  |     "tserver_root_memory_soft_limit": "9491877723",+|        |
                                  |     "tserver_root_memory_consumption": "62062592"  +|        |
                                  | }                                                   |        |
```

**Upgrade/Rollback safety:**
This is a new YSQL function, so there won't be any prior users of this function. In case of an upgrade/rollback, the sql migration (that adds the function to pg_proc) will only run when the upgrade is being finalized (i.e. after all tservers are updated). Hence, it will not be possible to get errors due to a subset of tservers not being upgraded because the function itself will not be available to call.

Test Plan: ./yb_build.sh --java-test 'org.yb.pgsql.TestYbServersMetrics#testYBServersMetricsFunction'

Reviewers: asaha, djiang, telgersma

Reviewed By: djiang, telgersma

Subscribers: hbhanawat, yql, ybase, amakala

Differential Revision: https://phorge.dev.yugabyte.com/D37267
foucher pushed a commit that referenced this issue Sep 23, 2024
Summary:
 ead90cc [#23645] docdb: Fix tests timing out on TSAN after 15786f3
 0a6a31e [doc] Fix BNL flag defaults (#23945)
 54793c8 [#22925] docdb: Persist tserver registry entries to sys catalog
 fbef568 [PLAT-15378][localProvider][dr] Deflake testDrConfigSetup local provider test
 64ac031 [#23978] xCluster: set up sequences_data stream(s) on target universe
 8d228a8 [#23923] YSQL: Fix DDL atomicity check failure
 903d793 [PLAT-15328] Configure cgroup for non rhel9 machines as part of provision
 Excluded: 5dc71ea [#23882] YSQL: Improve cache re-invalidation for alter table commands
 1e70024 [DOC-480] CDC metric description and voyager minor fixes (#24028)
 7a4b409 [#23700] CDCSDK: Use leader epoch instead of leader term in table removal bg task
 4d922ca [#23922] docdb: Handle colocated tablets correctly in tablet limit checks.
 487bc77 [PLAT-15158 Update replication frequency tooltip
 2059eee [#24001] docdb: Replace tablet in tablegroup manager on repartition of colocated table
 90d4e93 [#24020] DocDB: Vector LSM
 294b7bb [PLAT-14435]Fix args parsing in failure detection py script
 Excluded: 872b59e [#23542] YSQL: Add new YSQL function yb_servers_metrics() to fetch metrics such as cpu/memory usage from all nodes in cluster
 252717b [PLAT-12263] G-Flag upgrade fails for tmp_dir if Rolling restart used

Test Plan: Jenkins: rebase: pg15-cherrypicks

Reviewers: jason, jenkins-bot

Differential Revision: https://phorge.dev.yugabyte.com/D38266
makalaaneesh pushed a commit that referenced this issue Sep 25, 2024
…rvers_metrics() to fetch metrics such as cpu/memory usage from all nodes in cluster

Summary:
- catalog.h
 - YB_LAST_USED_OID
  - YB master 872b59e changes `YB_LAST_USED_OID` to 8072
  - YB PG15  7989b01610e9d7ca5dbbcb4da1ebb25c7864f1c changed `YB_LAST_USED_OID` to 8071
  - kept 8072

- pg_proc.dat
 - At the end of the file
  - YB master 872b59e adds proc entry for oid 8072
  - YB PG15 7989b01610e9d7ca5dbbcb4da1ebb25c7864f1c adds proc entry for oid 8071
  - added 8072 entry

- pg_yb_migration.dat
 - # here: (line 15)
  - YB master 872b59e changes major's value to 58 and adds `V58__23542__yb_servers_metrics.sql` comment.
  - YB PG15 7989b01610e9d7ca5dbbcb4da1ebb25c7864f1c  changes major's value to 57 and adds `V57__23312__binary_upgrade_set_next_tablegroup_default` comment.
  - kept yb master commit's changes.

- yb_system_views.sql
 - CREATE VIEW yb_servers_metrics AS
  - YB master 872b59e added a new view yb_servers_metrics
  - YB PG 55782d5 removed a lot of views and functions just next to this line
  - Kept yb_servers_metrics definition and removed the other views and functions

original summary
```
To enable adaptive parallelism in voyager, https://docs.google.com/document/d/1beD7zNtpmfYflXV1hVJ9mq_uqyCTJ9Es4titPEksSNE/edit#heading=h.3c3bf00hwf, a YSQL function yb_servers_metrics() is added
which will fetch certain metrics for all nodes in the cluster. This allows voyager to monitor the state of the cluster, and adapt the parallelism while importing data to target YB cluster. A YSQL API is needed in
order to provide deployment-agnostic API (not having to fetch metrics for YBA/YBM/on-prem using different mechanisms).

Additionally, made a few changes to MetricsSnapshotter

Introduced a function for GetCpuUsageInInterval(int ms).
made the GetCpuUsage function static.
Introduced a GetMemoryUsage function to get memory usage (from proc/meminfo for linux and sysctl for macos)
Sample output:

yugabyte=# select uuid, jsonb_pretty(metrics), status, error from yb_servers_metrics();
               uuid               |                    jsonb_pretty                     | status | error
----------------------------------+-----------------------------------------------------+--------+-------
 bf98c74dd7044b34943c5bff7bd3d0d1 | {                                                  +| OK     |
                                  |     "memory_free": "0",                            +|        |
                                  |     "memory_total": "17179869184",                 +|        |
                                  |     "cpu_usage_user": "0.135827",                  +|        |
                                  |     "cpu_usage_system": "0.118110",                +|        |
                                  |     "memory_available": "0",                       +|        |
                                  |     "tserver_root_memory_limit": "11166914969",    +|        |
                                  |     "tserver_root_memory_soft_limit": "9491877723",+|        |
                                  |     "tserver_root_memory_consumption": "52346880"  +|        |
                                  | }                                                   |        |
 d105c3a6128640f5a25cc74435e48ae3 | {                                                  +| OK     |
                                  |     "memory_free": "0",                            +|        |
                                  |     "memory_total": "17179869184",                 +|        |
                                  |     "cpu_usage_user": "0.135189",                  +|        |
                                  |     "cpu_usage_system": "0.119284",                +|        |
                                  |     "memory_available": "0",                       +|        |
                                  |     "tserver_root_memory_limit": "11166914969",    +|        |
                                  |     "tserver_root_memory_soft_limit": "9491877723",+|        |
                                  |     "tserver_root_memory_consumption": "55074816"  +|        |
                                  | }                                                   |        |
 a321e13e5bf24060a764b35894cd4070 | {                                                  +| OK     |
                                  |     "memory_free": "0",                            +|        |
                                  |     "memory_total": "17179869184",                 +|        |
                                  |     "cpu_usage_user": "0.135827",                  +|        |
                                  |     "cpu_usage_system": "0.118110",                +|        |
                                  |     "memory_available": "0",                       +|        |
                                  |     "tserver_root_memory_limit": "11166914969",    +|        |
                                  |     "tserver_root_memory_soft_limit": "9491877723",+|        |
                                  |     "tserver_root_memory_consumption": "62062592"  +|        |
                                  | }                                                   |        |
Upgrade/Rollback safety:
This is a new YSQL function, so there won't be any prior users of this function. In case of an upgrade/rollback, the sql migration (that adds the function to pg_proc) will only run when the upgrade is being finalized (i.e. after all tservers are updated). Hence, it will not be possible to get errors due to a subset of tservers not being upgraded because the function itself will not be available to call.
```

Test Plan: ./yb_build.sh --java-test 'org.yb.pgsql.TestYbServersMetrics#testYBServersMetricsFunction'

Reviewers: jason, tfoucher, fizaa, ishan.chhangani

Reviewed By: fizaa

Subscribers: fizaa

Differential Revision: https://phorge.dev.yugabyte.com/D38307
makalaaneesh added a commit that referenced this issue Oct 15, 2024
…ics() to fetch metrics such as cpu/memory usage from all nodes in cluster

Summary:
To enable adaptive parallelism in voyager, https://docs.google.com/document/d/1beD7zNtpmfYflXV1hVJ9mq_uqyCTJ9Es4titPEksSNE/edit#heading=h.3c3bf00hwf, a YSQL function yb_servers_metrics() is added
which will fetch certain metrics for all nodes in the cluster. This allows voyager to  monitor the state of the cluster, and adapt the parallelism while importing data to target YB cluster. A YSQL API is needed in
order to provide deployment-agnostic API (not having to fetch metrics for YBA/YBM/on-prem using different mechanisms).

Additionally, made a few changes to `MetricsSnapshotter`
- Introduced a function for GetCpuUsageInInterval(int ms).
- made the GetCpuUsage function static.
- Introduced a `GetMemoryUsage` function to get memory usage (from proc/meminfo for linux and sysctl for macos)

Sample output:

```
yugabyte=# select uuid, jsonb_pretty(metrics), status, error from yb_servers_metrics();
               uuid               |                    jsonb_pretty                     | status | error
----------------------------------+-----------------------------------------------------+--------+-------
 bf98c74dd7044b34943c5bff7bd3d0d1 | {                                                  +| OK     |
                                  |     "memory_free": "0",                            +|        |
                                  |     "memory_total": "17179869184",                 +|        |
                                  |     "cpu_usage_user": "0.135827",                  +|        |
                                  |     "cpu_usage_system": "0.118110",                +|        |
                                  |     "memory_available": "0",                       +|        |
                                  |     "tserver_root_memory_limit": "11166914969",    +|        |
                                  |     "tserver_root_memory_soft_limit": "9491877723",+|        |
                                  |     "tserver_root_memory_consumption": "52346880"  +|        |
                                  | }                                                   |        |
 d105c3a6128640f5a25cc74435e48ae3 | {                                                  +| OK     |
                                  |     "memory_free": "0",                            +|        |
                                  |     "memory_total": "17179869184",                 +|        |
                                  |     "cpu_usage_user": "0.135189",                  +|        |
                                  |     "cpu_usage_system": "0.119284",                +|        |
                                  |     "memory_available": "0",                       +|        |
                                  |     "tserver_root_memory_limit": "11166914969",    +|        |
                                  |     "tserver_root_memory_soft_limit": "9491877723",+|        |
                                  |     "tserver_root_memory_consumption": "55074816"  +|        |
                                  | }                                                   |        |
 a321e13e5bf24060a764b35894cd4070 | {                                                  +| OK     |
                                  |     "memory_free": "0",                            +|        |
                                  |     "memory_total": "17179869184",                 +|        |
                                  |     "cpu_usage_user": "0.135827",                  +|        |
                                  |     "cpu_usage_system": "0.118110",                +|        |
                                  |     "memory_available": "0",                       +|        |
                                  |     "tserver_root_memory_limit": "11166914969",    +|        |
                                  |     "tserver_root_memory_soft_limit": "9491877723",+|        |
                                  |     "tserver_root_memory_consumption": "62062592"  +|        |
                                  | }                                                   |        |
```

**Upgrade/Rollback safety:**
This is a new YSQL function, so there won't be any prior users of this function. In case of an upgrade/rollback, the sql migration (that adds the function to pg_proc) will only run when the upgrade is being finalized (i.e. after all tservers are updated). Hence, it will not be possible to get errors due to a subset of tservers not being upgraded because the function itself will not be available to call.

Original commit: 872b59e / D37267

Test Plan: ./yb_build.sh --java-test 'org.yb.pgsql.TestYbServersMetrics#testYBServersMetricsFunction'

Reviewers: asaha, djiang, telgersma

Reviewed By: asaha

Subscribers: amakala, ybase, yql, hbhanawat

Differential Revision: https://phorge.dev.yugabyte.com/D39000
makalaaneesh added a commit that referenced this issue Oct 15, 2024
…ics() to fetch metrics such as cpu/memory usage from all nodes in cluster

Summary:
To enable adaptive parallelism in voyager, https://docs.google.com/document/d/1beD7zNtpmfYflXV1hVJ9mq_uqyCTJ9Es4titPEksSNE/edit#heading=h.3c3bf00hwf, a YSQL function yb_servers_metrics() is added
which will fetch certain metrics for all nodes in the cluster. This allows voyager to  monitor the state of the cluster, and adapt the parallelism while importing data to target YB cluster. A YSQL API is needed in
order to provide deployment-agnostic API (not having to fetch metrics for YBA/YBM/on-prem using different mechanisms).

Additionally, made a few changes to `MetricsSnapshotter`
- Introduced a function for GetCpuUsageInInterval(int ms).
- made the GetCpuUsage function static.
- Introduced a `GetMemoryUsage` function to get memory usage (from proc/meminfo for linux and sysctl for macos)

Sample output:

```
yugabyte=# select uuid, jsonb_pretty(metrics), status, error from yb_servers_metrics();
               uuid               |                    jsonb_pretty                     | status | error
----------------------------------+-----------------------------------------------------+--------+-------
 bf98c74dd7044b34943c5bff7bd3d0d1 | {                                                  +| OK     |
                                  |     "memory_free": "0",                            +|        |
                                  |     "memory_total": "17179869184",                 +|        |
                                  |     "cpu_usage_user": "0.135827",                  +|        |
                                  |     "cpu_usage_system": "0.118110",                +|        |
                                  |     "memory_available": "0",                       +|        |
                                  |     "tserver_root_memory_limit": "11166914969",    +|        |
                                  |     "tserver_root_memory_soft_limit": "9491877723",+|        |
                                  |     "tserver_root_memory_consumption": "52346880"  +|        |
                                  | }                                                   |        |
 d105c3a6128640f5a25cc74435e48ae3 | {                                                  +| OK     |
                                  |     "memory_free": "0",                            +|        |
                                  |     "memory_total": "17179869184",                 +|        |
                                  |     "cpu_usage_user": "0.135189",                  +|        |
                                  |     "cpu_usage_system": "0.119284",                +|        |
                                  |     "memory_available": "0",                       +|        |
                                  |     "tserver_root_memory_limit": "11166914969",    +|        |
                                  |     "tserver_root_memory_soft_limit": "9491877723",+|        |
                                  |     "tserver_root_memory_consumption": "55074816"  +|        |
                                  | }                                                   |        |
 a321e13e5bf24060a764b35894cd4070 | {                                                  +| OK     |
                                  |     "memory_free": "0",                            +|        |
                                  |     "memory_total": "17179869184",                 +|        |
                                  |     "cpu_usage_user": "0.135827",                  +|        |
                                  |     "cpu_usage_system": "0.118110",                +|        |
                                  |     "memory_available": "0",                       +|        |
                                  |     "tserver_root_memory_limit": "11166914969",    +|        |
                                  |     "tserver_root_memory_soft_limit": "9491877723",+|        |
                                  |     "tserver_root_memory_consumption": "62062592"  +|        |
                                  | }                                                   |        |
```

**Upgrade/Rollback safety:**
This is a new YSQL function, so there won't be any prior users of this function. In case of an upgrade/rollback, the sql migration (that adds the function to pg_proc) will only run when the upgrade is being finalized (i.e. after all tservers are updated). Hence, it will not be possible to get errors due to a subset of tservers not being upgraded because the function itself will not be available to call.

Original commit: 872b59e / D37267

While resolving merge conflicts:
- Removed any rpcs or methods or entries in yb_system_views.sql, pg_proc.dat, yb_pg_rules.out that were not part of original diff.
- last used OID on 2024.1 was 8067, the one I added is 8072; so I modified it to 8072.
- last migration script on 2024.1 was V51, mine is V58. so changed my migration to V51.1. (Did not change the OID of the entry in pg_proc)

Test Plan: ./yb_build.sh --java-test 'org.yb.pgsql.TestYbServersMetrics#testYBServersMetricsFunction'

Reviewers: asaha, djiang, telgersma

Reviewed By: asaha

Subscribers: amakala, ybase, yql, hbhanawat

Differential Revision: https://phorge.dev.yugabyte.com/D39052
makalaaneesh added a commit that referenced this issue Oct 18, 2024
…s() to fetch metrics such as cpu/memory usage from all nodes in cluster

Summary:
To enable adaptive parallelism in voyager, https://docs.google.com/document/d/1beD7zNtpmfYflXV1hVJ9mq_uqyCTJ9Es4titPEksSNE/edit#heading=h.3c3bf00hwf, a YSQL function yb_servers_metrics() is added
which will fetch certain metrics for all nodes in the cluster. This allows voyager to  monitor the state of the cluster, and adapt the parallelism while importing data to target YB cluster. A YSQL API is needed in
order to provide deployment-agnostic API (not having to fetch metrics for YBA/YBM/on-prem using different mechanisms).

Additionally, made a few changes to `MetricsSnapshotter`
- Introduced a function for GetCpuUsageInInterval(int ms).
- made the GetCpuUsage function static.
- Introduced a `GetMemoryUsage` function to get memory usage (from proc/meminfo for linux and sysctl for macos)

Sample output:

```
yugabyte=# select uuid, jsonb_pretty(metrics), status, error from yb_servers_metrics();
               uuid               |                    jsonb_pretty                     | status | error
----------------------------------+-----------------------------------------------------+--------+-------
 bf98c74dd7044b34943c5bff7bd3d0d1 | {                                                  +| OK     |
                                  |     "memory_free": "0",                            +|        |
                                  |     "memory_total": "17179869184",                 +|        |
                                  |     "cpu_usage_user": "0.135827",                  +|        |
                                  |     "cpu_usage_system": "0.118110",                +|        |
                                  |     "memory_available": "0",                       +|        |
                                  |     "tserver_root_memory_limit": "11166914969",    +|        |
                                  |     "tserver_root_memory_soft_limit": "9491877723",+|        |
                                  |     "tserver_root_memory_consumption": "52346880"  +|        |
                                  | }                                                   |        |
 d105c3a6128640f5a25cc74435e48ae3 | {                                                  +| OK     |
                                  |     "memory_free": "0",                            +|        |
                                  |     "memory_total": "17179869184",                 +|        |
                                  |     "cpu_usage_user": "0.135189",                  +|        |
                                  |     "cpu_usage_system": "0.119284",                +|        |
                                  |     "memory_available": "0",                       +|        |
                                  |     "tserver_root_memory_limit": "11166914969",    +|        |
                                  |     "tserver_root_memory_soft_limit": "9491877723",+|        |
                                  |     "tserver_root_memory_consumption": "55074816"  +|        |
                                  | }                                                   |        |
 a321e13e5bf24060a764b35894cd4070 | {                                                  +| OK     |
                                  |     "memory_free": "0",                            +|        |
                                  |     "memory_total": "17179869184",                 +|        |
                                  |     "cpu_usage_user": "0.135827",                  +|        |
                                  |     "cpu_usage_system": "0.118110",                +|        |
                                  |     "memory_available": "0",                       +|        |
                                  |     "tserver_root_memory_limit": "11166914969",    +|        |
                                  |     "tserver_root_memory_soft_limit": "9491877723",+|        |
                                  |     "tserver_root_memory_consumption": "62062592"  +|        |
                                  | }                                                   |        |
```

**Upgrade/Rollback safety:**
This is a new YSQL function, so there won't be any prior users of this function. In case of an upgrade/rollback, the sql migration (that adds the function to pg_proc) will only run when the upgrade is being finalized (i.e. after all tservers are updated). Hence, it will not be possible to get errors due to a subset of tservers not being upgraded because the function itself will not be available to call.

While resolving backport merge conflicts:
- Removed any rpcs or methods or entries in yb_system_views.sql, pg_proc.dat, yb_pg_rules.out that were not part of original diff.
- last used OID on 2.20 was 8064, the one I added is 8072; so I modified it to 8072.
- last migration script on 2.20 was V43.1, original migration of my diff was V58. so changed migration to V43.2. (Did not change the OID of the entry in pg_proc)

Original commit: 872b59e / D37267

Test Plan: ./yb_build.sh --java-test 'org.yb.pgsql.TestYbServersMetrics#testYBServersMetricsFunction'

Reviewers: asaha, djiang, telgersma

Reviewed By: telgersma

Subscribers: amakala, ybase, yql, hbhanawat

Differential Revision: https://phorge.dev.yugabyte.com/D39125
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/ecosystem Label for all ecosystem related projects jira-originated kind/new-feature This is a request for a completely new feature priority/low Low priority
Projects
None yet
Development

No branches or pull requests

2 participants