-
Notifications
You must be signed in to change notification settings - Fork 312
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature(new_metrics): migrate meta metrics #1331
Labels
type/enhancement
Indicates new feature requests
Comments
8 tasks
empiredan
changed the title
Feature(new_metrics): migrate meta metrics
Feature(new_metrics): add table-level metric entity migrate meta metrics
Apr 8, 2023
empiredan
changed the title
Feature(new_metrics): add table-level metric entity migrate meta metrics
Feature(new_metrics): add table-level metric entity and migrate meta metrics
Apr 10, 2023
acelyc111
pushed a commit
that referenced
this issue
Apr 10, 2023
…vel metrics for server_state of meta (#1431) #1331 In perf counters, all metrics of server_state are server-level, for example, the number of healthy partitions among all tables of a pegasus cluster. However, sometimes this is not enough. For example, the metric shows that there are 4 unwritable partitions: the 4 unwritable partitions might belong to different tables; or, they might belong to one table. Therefore, these server-level metrics could be changed to table-level. This will provide us with the status of each table. On the other hand, once server-level metrics is needed, just aggregate on table-level ones. The metrics of server_state that are migrated and changed to table-level include: The number of dead, unreadable, unwritable, writable-ill, and healthy partitions among all partitions of a table, the number of times the configuration has been changed and the number of times the status of partition has been changed to unwritable or writable for a table. To implement table-level metrics, table-level metric entity is also added.
empiredan
added a commit
that referenced
this issue
Apr 11, 2023
…vel metrics for server_state of meta (#1431) #1331 In perf counters, all metrics of server_state are server-level, for example, the number of healthy partitions among all tables of a pegasus cluster. However, sometimes this is not enough. For example, the metric shows that there are 4 unwritable partitions: the 4 unwritable partitions might belong to different tables; or, they might belong to one table. Therefore, these server-level metrics could be changed to table-level. This will provide us with the status of each table. On the other hand, once server-level metrics is needed, just aggregate on table-level ones. The metrics of server_state that are migrated and changed to table-level include: The number of dead, unreadable, unwritable, writable-ill, and healthy partitions among all partitions of a table, the number of times the configuration has been changed and the number of times the status of partition has been changed to unwritable or writable for a table. To implement table-level metrics, table-level metric entity is also added.
empiredan
added a commit
that referenced
this issue
Apr 11, 2023
…vel metrics for server_state of meta (#1431) #1331 In perf counters, all metrics of server_state are server-level, for example, the number of healthy partitions among all tables of a pegasus cluster. However, sometimes this is not enough. For example, the metric shows that there are 4 unwritable partitions: the 4 unwritable partitions might belong to different tables; or, they might belong to one table. Therefore, these server-level metrics could be changed to table-level. This will provide us with the status of each table. On the other hand, once server-level metrics is needed, just aggregate on table-level ones. The metrics of server_state that are migrated and changed to table-level include: The number of dead, unreadable, unwritable, writable-ill, and healthy partitions among all partitions of a table, the number of times the configuration has been changed and the number of times the status of partition has been changed to unwritable or writable for a table. To implement table-level metrics, table-level metric entity is also added.
acelyc111
pushed a commit
that referenced
this issue
Apr 14, 2023
…ition-level metrics for greedy_load_balancer of meta (#1435) #1331 In perf counters, all metrics of greedy_load_balancer are server-level, for example, the number of each kind of operations by greedy balancer, including moving primaries, copying primaries and copying secondaries. For new metrics, it is hoped that they are fine-grained, since sometimes we want to know which primaries are moved. Also, it is convenient to calculate table-level or server-level metrics by just aggregate on partition-level ones. The metrics of greedy_load_balancer that are changed to partition-level and migrated to new framework include: the number of balance operations by greedy balancer that are recently needed to be executed, move primaries, copy primaries, and copy secondaries. In addition to the metrics of greedy_load_balancer, we also change some metrics of server_state again to partition-level which have been migrated to table-level in #1431, for the reason that partition-level is considered more appropriate for them than table-level. The metrics changed to partition-level include the number of times the configuration has been changed and the number of times the status of partition has been changed to unwritable or writable for a partition. To implement table-level metrics, partition-level metric entity is also added.
empiredan
added a commit
that referenced
this issue
Apr 14, 2023
…vel metrics for server_state of meta (#1431) #1331 In perf counters, all metrics of server_state are server-level, for example, the number of healthy partitions among all tables of a pegasus cluster. However, sometimes this is not enough. For example, the metric shows that there are 4 unwritable partitions: the 4 unwritable partitions might belong to different tables; or, they might belong to one table. Therefore, these server-level metrics could be changed to table-level. This will provide us with the status of each table. On the other hand, once server-level metrics is needed, just aggregate on table-level ones. The metrics of server_state that are migrated and changed to table-level include: The number of dead, unreadable, unwritable, writable-ill, and healthy partitions among all partitions of a table, the number of times the configuration has been changed and the number of times the status of partition has been changed to unwritable or writable for a table. To implement table-level metrics, table-level metric entity is also added.
empiredan
added a commit
that referenced
this issue
Apr 14, 2023
…ition-level metrics for greedy_load_balancer of meta (#1435) #1331 In perf counters, all metrics of greedy_load_balancer are server-level, for example, the number of each kind of operations by greedy balancer, including moving primaries, copying primaries and copying secondaries. For new metrics, it is hoped that they are fine-grained, since sometimes we want to know which primaries are moved. Also, it is convenient to calculate table-level or server-level metrics by just aggregate on partition-level ones. The metrics of greedy_load_balancer that are changed to partition-level and migrated to new framework include: the number of balance operations by greedy balancer that are recently needed to be executed, move primaries, copy primaries, and copy secondaries. In addition to the metrics of greedy_load_balancer, we also change some metrics of server_state again to partition-level which have been migrated to table-level in #1431, for the reason that partition-level is considered more appropriate for them than table-level. The metrics changed to partition-level include the number of times the configuration has been changed and the number of times the status of partition has been changed to unwritable or writable for a partition. To implement table-level metrics, partition-level metric entity is also added.
empiredan
added a commit
that referenced
this issue
Apr 14, 2023
#1331 Migrate metrics to new framework for meta_service, including the number of disconnections with replica servers, and the number of unalive and alive replica servers. All of these metrics are server-level, maintained in meta server. The old type in perf counters of the number of disconnections is volatile counter, which would be changed to non-volatile, while another 2 metrics would keep the type of gauge.
empiredan
changed the title
Feature(new_metrics): add table-level metric entity and migrate meta metrics
Feature(new_metrics): migrate meta metrics
Apr 15, 2023
acelyc111
pushed a commit
that referenced
this issue
Apr 16, 2023
…dian (#1440) #1331 In perf counters, there's only one metric for partition_guardian, namely the number of operations that fail to choose the primary replica, which is server-level. It would be changed to partition-level in new metrics since this could give which partitions fail to choose primaries and how frequency those happen. Still, to compute table-level or server-level metrics just aggregate on partition-level ones.
empiredan
added a commit
that referenced
this issue
Apr 18, 2023
…vel metrics for server_state of meta (#1431) #1331 In perf counters, all metrics of server_state are server-level, for example, the number of healthy partitions among all tables of a pegasus cluster. However, sometimes this is not enough. For example, the metric shows that there are 4 unwritable partitions: the 4 unwritable partitions might belong to different tables; or, they might belong to one table. Therefore, these server-level metrics could be changed to table-level. This will provide us with the status of each table. On the other hand, once server-level metrics is needed, just aggregate on table-level ones. The metrics of server_state that are migrated and changed to table-level include: The number of dead, unreadable, unwritable, writable-ill, and healthy partitions among all partitions of a table, the number of times the configuration has been changed and the number of times the status of partition has been changed to unwritable or writable for a table. To implement table-level metrics, table-level metric entity is also added.
empiredan
added a commit
that referenced
this issue
Apr 18, 2023
…ition-level metrics for greedy_load_balancer of meta (#1435) #1331 In perf counters, all metrics of greedy_load_balancer are server-level, for example, the number of each kind of operations by greedy balancer, including moving primaries, copying primaries and copying secondaries. For new metrics, it is hoped that they are fine-grained, since sometimes we want to know which primaries are moved. Also, it is convenient to calculate table-level or server-level metrics by just aggregate on partition-level ones. The metrics of greedy_load_balancer that are changed to partition-level and migrated to new framework include: the number of balance operations by greedy balancer that are recently needed to be executed, move primaries, copy primaries, and copy secondaries. In addition to the metrics of greedy_load_balancer, we also change some metrics of server_state again to partition-level which have been migrated to table-level in #1431, for the reason that partition-level is considered more appropriate for them than table-level. The metrics changed to partition-level include the number of times the configuration has been changed and the number of times the status of partition has been changed to unwritable or writable for a partition. To implement table-level metrics, partition-level metric entity is also added.
empiredan
added a commit
that referenced
this issue
Apr 18, 2023
#1331 Migrate metrics to new framework for meta_service, including the number of disconnections with replica servers, and the number of unalive and alive replica servers. All of these metrics are server-level, maintained in meta server. The old type in perf counters of the number of disconnections is volatile counter, which would be changed to non-volatile, while another 2 metrics would keep the type of gauge.
empiredan
added a commit
that referenced
this issue
Apr 18, 2023
…dian (#1440) #1331 In perf counters, there's only one metric for partition_guardian, namely the number of operations that fail to choose the primary replica, which is server-level. It would be changed to partition-level in new metrics since this could give which partitions fail to choose primaries and how frequency those happen. Still, to compute table-level or server-level metrics just aggregate on partition-level ones.
empiredan
added a commit
that referenced
this issue
Apr 27, 2023
…vel metrics for server_state of meta (#1431) #1331 In perf counters, all metrics of server_state are server-level, for example, the number of healthy partitions among all tables of a pegasus cluster. However, sometimes this is not enough. For example, the metric shows that there are 4 unwritable partitions: the 4 unwritable partitions might belong to different tables; or, they might belong to one table. Therefore, these server-level metrics could be changed to table-level. This will provide us with the status of each table. On the other hand, once server-level metrics is needed, just aggregate on table-level ones. The metrics of server_state that are migrated and changed to table-level include: The number of dead, unreadable, unwritable, writable-ill, and healthy partitions among all partitions of a table, the number of times the configuration has been changed and the number of times the status of partition has been changed to unwritable or writable for a table. To implement table-level metrics, table-level metric entity is also added.
empiredan
added a commit
that referenced
this issue
Apr 27, 2023
…ition-level metrics for greedy_load_balancer of meta (#1435) #1331 In perf counters, all metrics of greedy_load_balancer are server-level, for example, the number of each kind of operations by greedy balancer, including moving primaries, copying primaries and copying secondaries. For new metrics, it is hoped that they are fine-grained, since sometimes we want to know which primaries are moved. Also, it is convenient to calculate table-level or server-level metrics by just aggregate on partition-level ones. The metrics of greedy_load_balancer that are changed to partition-level and migrated to new framework include: the number of balance operations by greedy balancer that are recently needed to be executed, move primaries, copy primaries, and copy secondaries. In addition to the metrics of greedy_load_balancer, we also change some metrics of server_state again to partition-level which have been migrated to table-level in #1431, for the reason that partition-level is considered more appropriate for them than table-level. The metrics changed to partition-level include the number of times the configuration has been changed and the number of times the status of partition has been changed to unwritable or writable for a partition. To implement table-level metrics, partition-level metric entity is also added.
empiredan
added a commit
that referenced
this issue
Apr 27, 2023
#1331 Migrate metrics to new framework for meta_service, including the number of disconnections with replica servers, and the number of unalive and alive replica servers. All of these metrics are server-level, maintained in meta server. The old type in perf counters of the number of disconnections is volatile counter, which would be changed to non-volatile, while another 2 metrics would keep the type of gauge.
empiredan
added a commit
that referenced
this issue
Apr 27, 2023
…dian (#1440) #1331 In perf counters, there's only one metric for partition_guardian, namely the number of operations that fail to choose the primary replica, which is server-level. It would be changed to partition-level in new metrics since this could give which partitions fail to choose primaries and how frequency those happen. Still, to compute table-level or server-level metrics just aggregate on partition-level ones.
empiredan
added a commit
that referenced
this issue
May 5, 2023
…vel metrics for server_state of meta (#1431) #1331 In perf counters, all metrics of server_state are server-level, for example, the number of healthy partitions among all tables of a pegasus cluster. However, sometimes this is not enough. For example, the metric shows that there are 4 unwritable partitions: the 4 unwritable partitions might belong to different tables; or, they might belong to one table. Therefore, these server-level metrics could be changed to table-level. This will provide us with the status of each table. On the other hand, once server-level metrics is needed, just aggregate on table-level ones. The metrics of server_state that are migrated and changed to table-level include: The number of dead, unreadable, unwritable, writable-ill, and healthy partitions among all partitions of a table, the number of times the configuration has been changed and the number of times the status of partition has been changed to unwritable or writable for a table. To implement table-level metrics, table-level metric entity is also added.
empiredan
added a commit
that referenced
this issue
Jun 5, 2023
…vel metrics for server_state of meta (#1431) #1331 In perf counters, all metrics of server_state are server-level, for example, the number of healthy partitions among all tables of a pegasus cluster. However, sometimes this is not enough. For example, the metric shows that there are 4 unwritable partitions: the 4 unwritable partitions might belong to different tables; or, they might belong to one table. Therefore, these server-level metrics could be changed to table-level. This will provide us with the status of each table. On the other hand, once server-level metrics is needed, just aggregate on table-level ones. The metrics of server_state that are migrated and changed to table-level include: The number of dead, unreadable, unwritable, writable-ill, and healthy partitions among all partitions of a table, the number of times the configuration has been changed and the number of times the status of partition has been changed to unwritable or writable for a table. To implement table-level metrics, table-level metric entity is also added.
empiredan
added a commit
that referenced
this issue
Jun 5, 2023
…ition-level metrics for greedy_load_balancer of meta (#1435) #1331 In perf counters, all metrics of greedy_load_balancer are server-level, for example, the number of each kind of operations by greedy balancer, including moving primaries, copying primaries and copying secondaries. For new metrics, it is hoped that they are fine-grained, since sometimes we want to know which primaries are moved. Also, it is convenient to calculate table-level or server-level metrics by just aggregate on partition-level ones. The metrics of greedy_load_balancer that are changed to partition-level and migrated to new framework include: the number of balance operations by greedy balancer that are recently needed to be executed, move primaries, copy primaries, and copy secondaries. In addition to the metrics of greedy_load_balancer, we also change some metrics of server_state again to partition-level which have been migrated to table-level in #1431, for the reason that partition-level is considered more appropriate for them than table-level. The metrics changed to partition-level include the number of times the configuration has been changed and the number of times the status of partition has been changed to unwritable or writable for a partition. To implement table-level metrics, partition-level metric entity is also added.
empiredan
added a commit
that referenced
this issue
Jun 5, 2023
#1331 Migrate metrics to new framework for meta_service, including the number of disconnections with replica servers, and the number of unalive and alive replica servers. All of these metrics are server-level, maintained in meta server. The old type in perf counters of the number of disconnections is volatile counter, which would be changed to non-volatile, while another 2 metrics would keep the type of gauge.
empiredan
added a commit
that referenced
this issue
Jun 5, 2023
…dian (#1440) #1331 In perf counters, there's only one metric for partition_guardian, namely the number of operations that fail to choose the primary replica, which is server-level. It would be changed to partition-level in new metrics since this could give which partitions fail to choose primaries and how frequency those happen. Still, to compute table-level or server-level metrics just aggregate on partition-level ones.
empiredan
added a commit
that referenced
this issue
Jun 21, 2023
…vel metrics for server_state of meta (#1431) #1331 In perf counters, all metrics of server_state are server-level, for example, the number of healthy partitions among all tables of a pegasus cluster. However, sometimes this is not enough. For example, the metric shows that there are 4 unwritable partitions: the 4 unwritable partitions might belong to different tables; or, they might belong to one table. Therefore, these server-level metrics could be changed to table-level. This will provide us with the status of each table. On the other hand, once server-level metrics is needed, just aggregate on table-level ones. The metrics of server_state that are migrated and changed to table-level include: The number of dead, unreadable, unwritable, writable-ill, and healthy partitions among all partitions of a table, the number of times the configuration has been changed and the number of times the status of partition has been changed to unwritable or writable for a table. To implement table-level metrics, table-level metric entity is also added.
empiredan
added a commit
that referenced
this issue
Jun 21, 2023
…ition-level metrics for greedy_load_balancer of meta (#1435) #1331 In perf counters, all metrics of greedy_load_balancer are server-level, for example, the number of each kind of operations by greedy balancer, including moving primaries, copying primaries and copying secondaries. For new metrics, it is hoped that they are fine-grained, since sometimes we want to know which primaries are moved. Also, it is convenient to calculate table-level or server-level metrics by just aggregate on partition-level ones. The metrics of greedy_load_balancer that are changed to partition-level and migrated to new framework include: the number of balance operations by greedy balancer that are recently needed to be executed, move primaries, copy primaries, and copy secondaries. In addition to the metrics of greedy_load_balancer, we also change some metrics of server_state again to partition-level which have been migrated to table-level in #1431, for the reason that partition-level is considered more appropriate for them than table-level. The metrics changed to partition-level include the number of times the configuration has been changed and the number of times the status of partition has been changed to unwritable or writable for a partition. To implement table-level metrics, partition-level metric entity is also added.
empiredan
added a commit
that referenced
this issue
Jun 21, 2023
#1331 Migrate metrics to new framework for meta_service, including the number of disconnections with replica servers, and the number of unalive and alive replica servers. All of these metrics are server-level, maintained in meta server. The old type in perf counters of the number of disconnections is volatile counter, which would be changed to non-volatile, while another 2 metrics would keep the type of gauge.
empiredan
added a commit
that referenced
this issue
Jun 21, 2023
…dian (#1440) #1331 In perf counters, there's only one metric for partition_guardian, namely the number of operations that fail to choose the primary replica, which is server-level. It would be changed to partition-level in new metrics since this could give which partitions fail to choose primaries and how frequency those happen. Still, to compute table-level or server-level metrics just aggregate on partition-level ones.
empiredan
added a commit
that referenced
this issue
Aug 9, 2023
…vel metrics for server_state of meta (#1431) #1331 In perf counters, all metrics of server_state are server-level, for example, the number of healthy partitions among all tables of a pegasus cluster. However, sometimes this is not enough. For example, the metric shows that there are 4 unwritable partitions: the 4 unwritable partitions might belong to different tables; or, they might belong to one table. Therefore, these server-level metrics could be changed to table-level. This will provide us with the status of each table. On the other hand, once server-level metrics is needed, just aggregate on table-level ones. The metrics of server_state that are migrated and changed to table-level include: The number of dead, unreadable, unwritable, writable-ill, and healthy partitions among all partitions of a table, the number of times the configuration has been changed and the number of times the status of partition has been changed to unwritable or writable for a table. To implement table-level metrics, table-level metric entity is also added.
empiredan
added a commit
that referenced
this issue
Aug 9, 2023
…ition-level metrics for greedy_load_balancer of meta (#1435) #1331 In perf counters, all metrics of greedy_load_balancer are server-level, for example, the number of each kind of operations by greedy balancer, including moving primaries, copying primaries and copying secondaries. For new metrics, it is hoped that they are fine-grained, since sometimes we want to know which primaries are moved. Also, it is convenient to calculate table-level or server-level metrics by just aggregate on partition-level ones. The metrics of greedy_load_balancer that are changed to partition-level and migrated to new framework include: the number of balance operations by greedy balancer that are recently needed to be executed, move primaries, copy primaries, and copy secondaries. In addition to the metrics of greedy_load_balancer, we also change some metrics of server_state again to partition-level which have been migrated to table-level in #1431, for the reason that partition-level is considered more appropriate for them than table-level. The metrics changed to partition-level include the number of times the configuration has been changed and the number of times the status of partition has been changed to unwritable or writable for a partition. To implement table-level metrics, partition-level metric entity is also added.
empiredan
added a commit
that referenced
this issue
Aug 9, 2023
#1331 Migrate metrics to new framework for meta_service, including the number of disconnections with replica servers, and the number of unalive and alive replica servers. All of these metrics are server-level, maintained in meta server. The old type in perf counters of the number of disconnections is volatile counter, which would be changed to non-volatile, while another 2 metrics would keep the type of gauge.
empiredan
added a commit
that referenced
this issue
Aug 9, 2023
…dian (#1440) #1331 In perf counters, there's only one metric for partition_guardian, namely the number of operations that fail to choose the primary replica, which is server-level. It would be changed to partition-level in new metrics since this could give which partitions fail to choose primaries and how frequency those happen. Still, to compute table-level or server-level metrics just aggregate on partition-level ones.
empiredan
added a commit
that referenced
this issue
Aug 11, 2023
…vel metrics for server_state of meta (#1431) #1331 In perf counters, all metrics of server_state are server-level, for example, the number of healthy partitions among all tables of a pegasus cluster. However, sometimes this is not enough. For example, the metric shows that there are 4 unwritable partitions: the 4 unwritable partitions might belong to different tables; or, they might belong to one table. Therefore, these server-level metrics could be changed to table-level. This will provide us with the status of each table. On the other hand, once server-level metrics is needed, just aggregate on table-level ones. The metrics of server_state that are migrated and changed to table-level include: The number of dead, unreadable, unwritable, writable-ill, and healthy partitions among all partitions of a table, the number of times the configuration has been changed and the number of times the status of partition has been changed to unwritable or writable for a table. To implement table-level metrics, table-level metric entity is also added.
empiredan
added a commit
that referenced
this issue
Aug 11, 2023
…ition-level metrics for greedy_load_balancer of meta (#1435) #1331 In perf counters, all metrics of greedy_load_balancer are server-level, for example, the number of each kind of operations by greedy balancer, including moving primaries, copying primaries and copying secondaries. For new metrics, it is hoped that they are fine-grained, since sometimes we want to know which primaries are moved. Also, it is convenient to calculate table-level or server-level metrics by just aggregate on partition-level ones. The metrics of greedy_load_balancer that are changed to partition-level and migrated to new framework include: the number of balance operations by greedy balancer that are recently needed to be executed, move primaries, copy primaries, and copy secondaries. In addition to the metrics of greedy_load_balancer, we also change some metrics of server_state again to partition-level which have been migrated to table-level in #1431, for the reason that partition-level is considered more appropriate for them than table-level. The metrics changed to partition-level include the number of times the configuration has been changed and the number of times the status of partition has been changed to unwritable or writable for a partition. To implement table-level metrics, partition-level metric entity is also added.
empiredan
added a commit
that referenced
this issue
Aug 11, 2023
#1331 Migrate metrics to new framework for meta_service, including the number of disconnections with replica servers, and the number of unalive and alive replica servers. All of these metrics are server-level, maintained in meta server. The old type in perf counters of the number of disconnections is volatile counter, which would be changed to non-volatile, while another 2 metrics would keep the type of gauge.
empiredan
added a commit
that referenced
this issue
Aug 11, 2023
…dian (#1440) #1331 In perf counters, there's only one metric for partition_guardian, namely the number of operations that fail to choose the primary replica, which is server-level. It would be changed to partition-level in new metrics since this could give which partitions fail to choose primaries and how frequency those happen. Still, to compute table-level or server-level metrics just aggregate on partition-level ones.
empiredan
added a commit
to empiredan/pegasus
that referenced
this issue
Dec 6, 2023
…vel metrics for server_state of meta (apache#1431) apache#1331 In perf counters, all metrics of server_state are server-level, for example, the number of healthy partitions among all tables of a pegasus cluster. However, sometimes this is not enough. For example, the metric shows that there are 4 unwritable partitions: the 4 unwritable partitions might belong to different tables; or, they might belong to one table. Therefore, these server-level metrics could be changed to table-level. This will provide us with the status of each table. On the other hand, once server-level metrics is needed, just aggregate on table-level ones. The metrics of server_state that are migrated and changed to table-level include: The number of dead, unreadable, unwritable, writable-ill, and healthy partitions among all partitions of a table, the number of times the configuration has been changed and the number of times the status of partition has been changed to unwritable or writable for a table. To implement table-level metrics, table-level metric entity is also added.
empiredan
added a commit
to empiredan/pegasus
that referenced
this issue
Dec 6, 2023
…ition-level metrics for greedy_load_balancer of meta (apache#1435) apache#1331 In perf counters, all metrics of greedy_load_balancer are server-level, for example, the number of each kind of operations by greedy balancer, including moving primaries, copying primaries and copying secondaries. For new metrics, it is hoped that they are fine-grained, since sometimes we want to know which primaries are moved. Also, it is convenient to calculate table-level or server-level metrics by just aggregate on partition-level ones. The metrics of greedy_load_balancer that are changed to partition-level and migrated to new framework include: the number of balance operations by greedy balancer that are recently needed to be executed, move primaries, copy primaries, and copy secondaries. In addition to the metrics of greedy_load_balancer, we also change some metrics of server_state again to partition-level which have been migrated to table-level in apache#1431, for the reason that partition-level is considered more appropriate for them than table-level. The metrics changed to partition-level include the number of times the configuration has been changed and the number of times the status of partition has been changed to unwritable or writable for a partition. To implement table-level metrics, partition-level metric entity is also added.
empiredan
added a commit
to empiredan/pegasus
that referenced
this issue
Dec 6, 2023
…che#1437) apache#1331 Migrate metrics to new framework for meta_service, including the number of disconnections with replica servers, and the number of unalive and alive replica servers. All of these metrics are server-level, maintained in meta server. The old type in perf counters of the number of disconnections is volatile counter, which would be changed to non-volatile, while another 2 metrics would keep the type of gauge.
empiredan
added a commit
to empiredan/pegasus
that referenced
this issue
Dec 6, 2023
…backup-policy-level metrics for meta_backup_service (apache#1438) apache#1331 In perf counters, there's only one metric for meta_backup_service, namely the recent backup duration for each policy, which means this metric is policy-level. Therefore policy-level entity would also be implemented in new metrics.
empiredan
added a commit
to empiredan/pegasus
that referenced
this issue
Dec 6, 2023
…dian (apache#1440) apache#1331 In perf counters, there's only one metric for partition_guardian, namely the number of operations that fail to choose the primary replica, which is server-level. It would be changed to partition-level in new metrics since this could give which partitions fail to choose primaries and how frequency those happen. Still, to compute table-level or server-level metrics just aggregate on partition-level ones.
empiredan
added a commit
that referenced
this issue
Dec 11, 2023
…vel metrics for server_state of meta (#1431) #1331 In perf counters, all metrics of server_state are server-level, for example, the number of healthy partitions among all tables of a pegasus cluster. However, sometimes this is not enough. For example, the metric shows that there are 4 unwritable partitions: the 4 unwritable partitions might belong to different tables; or, they might belong to one table. Therefore, these server-level metrics could be changed to table-level. This will provide us with the status of each table. On the other hand, once server-level metrics is needed, just aggregate on table-level ones. The metrics of server_state that are migrated and changed to table-level include: The number of dead, unreadable, unwritable, writable-ill, and healthy partitions among all partitions of a table, the number of times the configuration has been changed and the number of times the status of partition has been changed to unwritable or writable for a table. To implement table-level metrics, table-level metric entity is also added.
empiredan
added a commit
that referenced
this issue
Dec 11, 2023
…ition-level metrics for greedy_load_balancer of meta (#1435) #1331 In perf counters, all metrics of greedy_load_balancer are server-level, for example, the number of each kind of operations by greedy balancer, including moving primaries, copying primaries and copying secondaries. For new metrics, it is hoped that they are fine-grained, since sometimes we want to know which primaries are moved. Also, it is convenient to calculate table-level or server-level metrics by just aggregate on partition-level ones. The metrics of greedy_load_balancer that are changed to partition-level and migrated to new framework include: the number of balance operations by greedy balancer that are recently needed to be executed, move primaries, copy primaries, and copy secondaries. In addition to the metrics of greedy_load_balancer, we also change some metrics of server_state again to partition-level which have been migrated to table-level in #1431, for the reason that partition-level is considered more appropriate for them than table-level. The metrics changed to partition-level include the number of times the configuration has been changed and the number of times the status of partition has been changed to unwritable or writable for a partition. To implement table-level metrics, partition-level metric entity is also added.
empiredan
added a commit
that referenced
this issue
Dec 11, 2023
#1331 Migrate metrics to new framework for meta_service, including the number of disconnections with replica servers, and the number of unalive and alive replica servers. All of these metrics are server-level, maintained in meta server. The old type in perf counters of the number of disconnections is volatile counter, which would be changed to non-volatile, while another 2 metrics would keep the type of gauge.
empiredan
added a commit
that referenced
this issue
Dec 11, 2023
…dian (#1440) #1331 In perf counters, there's only one metric for partition_guardian, namely the number of operations that fail to choose the primary replica, which is server-level. It would be changed to partition-level in new metrics since this could give which partitions fail to choose primaries and how frequency those happen. Still, to compute table-level or server-level metrics just aggregate on partition-level ones.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
The meta-related metrics migrated to new framework will be attached to server entity. All involved classes are put as below.
Following metrics are the members of
server_state
(server_state.cpp), which is created at the construction ofmeta_service
:Following metrics are the members of
greedy_load_balancer
(greedy_load_balancer.cpp), which is created atmeta_service::start()
:Following metrics are the member of
meta_service
(meta_service.cpp), which is created at the construction ofmeta_service_app
:Following metrics are the member of
policy_context
and created atpolicy_context::start()
(meta_service.cpp).policy_context
is created atmeta_service::start()
once cold backup is enabled:Following metrics are the member of
partition_guardian
(meta_service.cpp), which is created atmeta_service::start()
:The text was updated successfully, but these errors were encountered: