Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update 1.query-performance-metrics.md #1140

Merged
merged 4 commits into from
Mar 16, 2022
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
124 changes: 119 additions & 5 deletions docs-2.0/6.monitor-and-metrics/1.query-performance-metrics.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,24 +2,26 @@

Nebula Graph supports querying the monitoring metrics through HTTP ports.

## Metrics
## Metrics structure

Each metric of Nebula Graph consists of three fields: name, type, and time range. The fields are separated by periods, for example, `num_queries.sum.600`. Different Nebula Graph services (Graph, Storage, or Meta) support different metrics. The detailed description is as follows.

|Field|Example|Description|
|-|-|-|
|Metric name|`num_queries`|Indicates the function of the metric. For the detailed description of metrics, see [Metrics](../nebula-dashboard/6.monitor-parameter.md).|
|Metric type|`sum`|Indicates how the metrics are collected. Supported types are SUM, COUNT, AVG, RATE, and the P-th sample quantiles such as P75, P95, P99, and P99.9.|
|Metric name|`num_queries`|Indicates the function of the metric.|
|Metric type|`sum`|Indicates how the metrics are collected. Supported types are SUM, AVG, RATE, and the P-th sample quantiles such as P75, P95, P99, and P99.9.|
|Time range|`600`|The time range in seconds for the metric collection. Supported values are 5, 60, 600, and 3600, representing the last 5 seconds, 1 minute, 10 minutes, and 1 hour.|

### Space-level metrics

The Graph service supports a set of space-level metrics that record the information of different graph spaces separately.

The name of space-level metrics contains the corresponding graph space name in the form of `{space=space_name}`, for example, `query_latency_us{space=basketballplayer}.avg.3600`.

To enable space-level metrics, set the value of `enable_space_level_metrics` to `true` in the Graph service configuration file before starting Nebula Graph. For details about how to modify the configuration, see [Configuration Management](../5.configurations-and-logs/1.configurations/1.configurations.md).

!!! note

Space-level metrics can be queried only by querying all metrics. For example, run `curl -G "http://192.168.8.40:19559/stats"` to show all metrics. The returned result contains the graph space name in the form of '{space=space_name}', such as `num_active_queries{space=basketballplayer}.sum.5=0`.

## Query metrics over HTTP

### Syntax
Expand Down Expand Up @@ -103,4 +105,116 @@ curl -G "http://<ip>:<port>/stats?stats=<metric_name_list> [&format=json]"
num_heartbeats.sum.60=40
num_heartbeats.sum.600=394
num_heartbeats.sum.3600=2364
...
```

## Metric description

### Graph

| Parameter | Description |
| ---------------------------------------------- | ------------------------------------------------------------ |
| `num_active_queries` | The number of queries currently being executed. |
| `num_active_sessions` | The number of currently active sessions. |
| `num_aggregate_executors` | The number of executions for the Aggregation operator. |
| `num_auth_failed_sessions_bad_username_password` | The number of sessions where authentication failed due to incorrect username and password. |
| `num_auth_failed_sessions_out_of_max_allowed` | The number of sessions that failed to authenticate logins because the value of the parameter `FLAG_OUT_OF_MAX_ALLOWED_CONNECTIONS` was exceeded.|
| `num_auth_failed_sessions` | The number of sessions in which login authentication failed. |
| `num_indexscan_executors` | The number of executions for index scan operators. |
| `num_killed_queries` | The number of killed queries. |
| `num_opened_sessions` | The number of sessions connected to the server. |
| `num_queries` | The number of queries. |
| `num_query_errors_leader_changes` | The number of the raft leader changes due to query errors. |
| `num_query_errors` | The number of query errors. |
| `num_reclaimed_expired_sessions` | The number of expired sessions actively reclaimed by the server. |
| `num_rpc_sent_to_metad_failed` | The number of failed RPC requests that the Graphd service sends to the Metad service. |
| `num_rpc_sent_to_metad` | The number of RPC requests that the Graphd service sent to the Metad service. |
| `num_rpc_sent_to_storaged_failed` | The number of failed RPC requests that the Graphd service sent to the Storaged service. |
| `num_rpc_sent_to_storaged` | The number of RPC requests that the Graphd service sent to the Storaged service. |
| `num_sentences` | The number of statements received by the Graphd service. |
| `num_slow_queries` | The number of slow queries. |
| `num_sort_executors` | The number of executions for the Sort operator. |
| `optimizer_latency_us` | The latency of executing optimizer statements. |
| `query_latency_us` | The average latency of queries. |
| `slow_query_latency_us` | The average latency of slow queries. |
| `num_queries_hit_memory_watermark` | The number of queries that reached the memory watermark. |

### Meta

| Parameter | Description |
| -------------------------- | ----------------------------------- |
| `commit_log_latency_us` | The latency of committing logs in Raft. |
| `commit_snapshot_latency_us` | The latency of committing snapshots in Raft. |
| `heartbeat_latency_us` | The latency of heartbeats. |
| `num_heartbeats` | The number of heartbeats. |
| `num_raft_votes` | The number of votes in Raft. |
| `transfer_leader_latency_us` | The latency of transferring the raft leader. |
| `num_agent_heartbeats` | The number of heartbeats for the AgentHBProcessor.|
| `agent_heartbeat_latency_us` | The average latency of the AgentHBProcessor.|

### Storage

| Parameter | Description |
| ---------------------------- | --------------------------------------------------- |
| `add_edges_atomic_latency_us` | The average latency of adding edge single. |
| `add_edges_latency_us` | The average latency of adding edges. |
| `add_vertices_latency_us` | The average latency of adding vertices. |
| `commit_log_latency_us` | The latency of committing logs in Raft. |
| `commit_snapshot_latency_us` | The latency of committing snapshots in Raft. |
| `delete_edges_latency_us` | The average latency of deleting edges. |
| `delete_vertices_latency_us` | The average latency of deleting vertices. |
| `get_neighbors_latency_us` | The average latency of querying neighbor vertices. |
| `num_get_prop` | The number of executions for the GetPropProcessor. |
| `num_get_neighbors_errors` | The number of execution errors for the GetNeighborsProcessor. |
| `get_prop_latency_us` | The average latency of executions for the GetPropProcessor.|
| `num_edges_deleted` | The number of deleted edges. |
| `num_edges_inserted` | The number of inserted edges. |
| `num_raft_votes` | The number of votes in Raft. |
| `num_rpc_sent_to_metad_failed` | The number of failed RPC requests that the Storage service sent to the Meta service. |
| `num_rpc_sent_to_metad` | The number of RPC requests that the Storaged service sent to the Metad service. |
| `num_tags_deleted` | The number of deleted tags. |
| `num_vertices_deleted` | The number of deleted vertices. |
| `num_vertices_inserted` | The number of inserted vertices. |
| `transfer_leader_latency_us` | The latency of transferring the raft leader. |
| `lookup_latency_us` | The average latency of executions for the LookupProcessor. |
| `num_lookup_errors` | The number of execution errors for the LookupProcessor.|
| `num_scan_vertex` | The number of executions for the ScanVertexProcessor.|
| `num_scan_vertex_errors` | The number of execution errors for the ScanVertexProcessor.|
| `update_edge_latency_us` | The average latency of executions for the UpdateEdgeProcessor.|
| `num_update_vertex` | The number of executions for the UpdateVertexProcessor.|
| `num_update_vertex_errors` | The number of execution errors for the UpdateVertexProcessor.|
| `kv_get_latency_us` | The average latency of executions for the Getprocessor.|
| `kv_put_latency_us` | The average latency of executions for the PutProcessor.|
| `kv_remove_latency_us` | The average latency of executions for the RemoveProcessor.|
| `num_kv_get_errors` | The number of execution errors for the GetProcessor.|
| `num_kv_get` | The number of executions for the GetProcessor.|
| `num_kv_put_errors` | The number of execution errors for the PutProcessor.|
| `num_kv_put` | The number of executions for the PutProcessor.|
| `num_kv_remove_errors` | The number of execution errors for the RemoveProcessor.|
| `num_kv_remove` | The number of executions for the RemoveProcessor.|
| `forward_tranx_latency_us` | The average latency of transmission.|

### Space-level

| Parameter | Description |
| ---------------------------------------------- | ----------------------------------------- |
| `num_active_queries` | The number of queries currently being executed. |
| `num_queries` | The number of queries. |
| `num_sentences` | The number of statements received by the Graphd service. |
| `optimizer_latency_us` | The latency of executing optimizer statements. |
| `query_latency_us` | The average latency of queries. |
| `num_slow_queries` | The number of slow queries. |
| `num_query_errors` | The number of query errors. |
| `num_query_errors_leader_changes` | The number of raft leader changes due to query errors. |
| `num_killed_queries` | The number of killed queries. |
| `num_aggregate_executors` | The number of executions for the Aggregation operator. |
| `num_sort_executors` | The number of executions for the Sort operator. |
| `num_indexscan_executors` | The number of executions for index scan operators. |
| `num_oom_queries` | The number of queries that caused memory to run out. |