Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: update the Observability section with new scrape configuration #6011

Merged
merged 1 commit into from
Oct 21, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
94 changes: 63 additions & 31 deletions docs/content/latest/explore/observability/docker.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,9 +46,9 @@ showAsideToc: true
-->
</ul>

You can monitor your local YugabyteDB cluster with a local instance of [Prometheus](https://prometheus.io/), a popular standard for time-series monitoring of cloud native infrastructure. YugabyteDB services and APIs expose metrics in the Prometheus format at the `/prometheus-metrics` endpoint.
You can monitor your local YugabyteDB cluster with a local instance of [Prometheus](https://prometheus.io/), a popular standard for time-series monitoring of cloud native infrastructure. YugabyteDB services and APIs expose metrics in the Prometheus format at the `/prometheus-metrics` endpoint. For details on the metrics targets for YugabyteDB, see [Prometheus monitoring](../../../reference/configuration/default-ports/#prometheus-monitoring).

For details on the metrics targets for YugabyteDB, see [Monitoring with Prometheus](../../../reference/configuration/default-ports/#monitoring-with-prometheus).
This tutorial uses the [yb-docker-ctl](../../../admin/yb-docker-ctl) local cluster management utility.

## Prerequisite

Expand All @@ -74,58 +74,85 @@ Start a new local universe with replication factor of `3`.
$ ./yb-docker-ctl create --rf 3
```

## 2. Run sample key-value app
## 2. Run the YugabyteDB workload generator

Pull the [yb-sample-apps](https://github.com/yugabyte/yb-sample-apps) docker container. This container has built-in Java client programs for various workloads including SQL inserts and updates.
Pull the [yb-sample-apps](https://github.com/yugabyte/yb-sample-apps) Docker container image. This container image has built-in Java client programs for various workloads including SQL inserts and updates.

```sh
$ docker pull yugabytedb/yb-sample-apps
```

Run the simple `CassandraKeyValue` workload application in a separate shell.
Run the `CassandraKeyValue` workload application in a separate shell.

```sh
$ docker run --name yb-sample-apps --hostname yb-sample-apps --net yb-net yugabytedb/yb-sample-apps --workload CassandraKeyValue \
--nodes yb-tserver-n1:9042 \
--num_threads_write 1 \
--num_threads_read 4
$ docker run --name yb-sample-apps --hostname yb-sample-apps --net yb-net yugabytedb/yb-sample-apps \
--workload CassandraKeyValue \
--nodes yb-tserver-n1:9042 \
--num_threads_write 1 \
--num_threads_read 4
```

## 3. Prepare Prometheus config file
## 3. Prepare Prometheus configuration file

Copy the following into a file called `yugabytedb.yml`. Move this file to the `/tmp` directory so that you can bind the file to the Prometheus container later on.

```sh
```yaml
global:
scrape_interval: 5s # Set the scrape interval to every 5 seconds. Default is every 1 minute.
evaluation_interval: 5s # Evaluate rules every 5 seconds. The default is every 1 minute.
# scrape_timeout is set to the global default (10s).

# YugabyteDB configuration to scrape Prometheus time-series metrics
# YugabyteDB configuration to scrape Prometheus time-series metrics
scrape_configs:
- job_name: 'yugabytedb'
- job_name: "yugabytedb"
metrics_path: /prometheus-metrics
relabel_configs:
- target_label: "node_prefix"
replacement: "cluster-1"
metric_relabel_configs:
# Save the name of the metric so we can group_by since we cannot by __name__ directly...
- source_labels: ["__name__"]
regex: "(.*)"
target_label: "saved_name"
replacement: "$1"
# The following basically retrofit the handler_latency_* metrics to label format.
- source_labels: ["__name__"]
regex: "handler_latency_(yb_[^_]*)_([^_]*)_([^_]*)(.*)"
target_label: "server_type"
replacement: "$1"
- source_labels: ["__name__"]
regex: "handler_latency_(yb_[^_]*)_([^_]*)_([^_]*)(.*)"
target_label: "service_type"
replacement: "$2"
- source_labels: ["__name__"]
regex: "handler_latency_(yb_[^_]*)_([^_]*)_([^_]*)(_sum|_count)?"
target_label: "service_method"
replacement: "$3"
- source_labels: ["__name__"]
regex: "handler_latency_(yb_[^_]*)_([^_]*)_([^_]*)(_sum|_count)?"
target_label: "__name__"
replacement: "rpc_latency$4"

static_configs:
- targets: ['yb-master-n1:7000', 'yb-master-n2:7000', 'yb-master-n3:7000']
- targets: ["yb-master-n1:7000", "yb-master-n2:7000", "yb-master-n3:7000"]
labels:
group: 'yb-master'
export_type: "master_export"

- targets: ['yb-tserver-n1:9000', 'yb-tserver-n2:9000', 'yb-tserver-n3:9000']
- targets: ["yb-tserver-n1:9000", "yb-tserver-n2:9000", "yb-tserver-n3:9000"]
labels:
group: 'yb-tserver'
export_type: "tserver_export"

- targets: ['yb-tserver-n1:11000', 'yb-tserver-n2:11000', 'yb-tserver-n3:11000']
- targets: ["yb-tserver-n1:12000", "yb-tserver-n2:12000", "yb-tserver-n3:12000"]
labels:
group: 'yedis'
export_type: "cql_export"

- targets: ['yb-tserver-n1:12000', 'yb-tserver-n2:12000', 'yb-tserver-n3:12000']
- targets: ["yb-tserver-n1:13000", "yb-tserver-n2:13000", "yb-tserver-n3:13000"]
labels:
group: 'ycql'
export_type: "ysql_export"

- targets: ['yb-tserver-n1:13000', 'yb-tserver-n2:13000', 'yb-tserver-n3:13000']
- targets: ["yb-tserver-n1:11000", "yb-tserver-n2:11000", "yb-tserver-n3:11000"]
labels:
group: 'ysql'
export_type: "redis_export"
```

## 4. Start Prometheus server
Expand All @@ -134,9 +161,9 @@ Start the Prometheus server as below. The `prom/prometheus` container image will

```sh
$ docker run \
-p 9090:9090 \
-v /tmp/yugabytedb.yml:/etc/prometheus/prometheus.yml \
--net yb-net \
-p 9090:9090 \
-v /tmp/yugabytedb.yml:/etc/prometheus/prometheus.yml \
--net yb-net \
prom/prometheus
```

Expand All @@ -148,22 +175,22 @@ Open the Prometheus UI at http://localhost:9090 and then navigate to the Targets

On the Prometheus Graph UI, you can now plot the read/write throughput and latency for the `CassandraKeyValue` sample app. As you can see from the [source code](https://github.com/yugabyte/yugabyte-db/blob/master/java/yb-loadtester/src/main/java/com/yugabyte/sample/apps/CassandraKeyValue.java) of the app, it uses only SELECT statements for reads and INSERT statements for writes (aside from the initial CREATE TABLE). This means you can measure throughput and latency by simply using the metrics corresponding to the SELECT and INSERT statements.

Paste the following expressions into the Expression box and click Execute followed by Add Graph.
Paste the following expressions into the **Expression** box and click **Execute** followed by **Add Graph**.

### Throughput

> Read IOPS

```sh
sum(irate(handler_latency_yb_cqlserver_SQLProcessor_SelectStmt_count[1m]))
sum(irate(rpc_latency_count{server_type="yb_cqlserver", service_type="SQLProcessor", service_method="SelectStmt"}[1m]))
```

![Prometheus Read IOPS](/images/ce/prom-read-iops.png)

> Write IOPS

```sh
sum(irate(handler_latency_yb_cqlserver_SQLProcessor_InsertStmt_count[1m]))
sum(irate(rpc_latency_count{server_type="yb_cqlserver", service_type="SQLProcessor", service_method="InsertStmt"}[1m]))
```

![Prometheus Read IOPS](/images/ce/prom-write-iops.png)
Expand All @@ -173,15 +200,17 @@ sum(irate(handler_latency_yb_cqlserver_SQLProcessor_InsertStmt_count[1m]))
> Read Latency (in microseconds)

```sh
avg(irate(handler_latency_yb_cqlserver_SQLProcessor_SelectStmt_sum[1m])) / avg(irate(handler_latency_yb_cqlserver_SQLProcessor_SelectStmt_count[1m]))
avg(irate(rpc_latency_sum{server_type="yb_cqlserver", service_type="SQLProcessor", service_method="SelectStmt"}[1m])) /
avg(irate(rpc_latency_count{server_type="yb_cqlserver", service_type="SQLProcessor", service_method="SelectStmt"}[1m]))
```

![Prometheus Read IOPS](/images/ce/prom-read-latency.png)

> Write Latency (in microseconds)

```sh
avg(irate(handler_latency_yb_cqlserver_SQLProcessor_InsertStmt_sum[1m])) / avg(irate(handler_latency_yb_cqlserver_SQLProcessor_InsertStmt_count[1m]))
avg(irate(rpc_latency_sum{server_type="yb_cqlserver", service_type="SQLProcessor", service_method="InsertStmt"}[1m])) /
avg(irate(rpc_latency_count{server_type="yb_cqlserver", service_type="SQLProcessor", service_method="InsertStmt"}[1m]))
```

![Prometheus Read IOPS](/images/ce/prom-write-latency.png)
Expand All @@ -193,3 +222,6 @@ Optionally, you can shut down the local cluster created in Step 1.
```sh
$ ./yb-docker-ctl destroy
```

## What's next?
You can [setup Grafana](https://prometheus.io/docs/visualization/grafana/) and import the [YugabyteDB dashboard](https://grafana.com/grafana/dashboards/12620 "YugabyteDB dashboard on grafana.com") for better visualization of the metrics being collected by Prometheus.
78 changes: 54 additions & 24 deletions docs/content/latest/explore/observability/linux.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,11 +46,10 @@ showAsideToc: true
-->
</ul>

You can monitor your local YugabyteDB cluster with a local instance of [Prometheus](https://prometheus.io/), a popular standard for time-series monitoring of cloud native infrastructure. YugabyteDB services and APIs expose metrics in the Prometheus format at the `/prometheus-metrics` endpoint. For details on the metrics targets for YugabyteDB, see [Monitoring with Prometheus](../../../reference/configuration/default-ports/#monitoring-with-prometheus).
You can monitor your local YugabyteDB cluster with a local instance of [Prometheus](https://prometheus.io/), a popular standard for time-series monitoring of cloud native infrastructure. YugabyteDB services and APIs expose metrics in the Prometheus format at the `/prometheus-metrics` endpoint. For details on the metrics targets for YugabyteDB, see [Prometheus monitoring](../../../reference/configuration/default-ports/#prometheus-monitoring).

This tutorial uses the [yb-ctl](../../../admin/yb-ctl) local cluster management utility.


## Prerequisite

Prometheus is installed on your local machine. If you have not done so already, follow the links below.
Expand All @@ -66,10 +65,10 @@ If you have a previously running local universe, destroy it using the following.
$ ./bin/yb-ctl destroy
```

Start a new local YugabyteDB cluster - by default, this will create a three-node universe with a replication factor of `3`.
Start a new local YugabyteDB cluster - this will create a three-node universe with a replication factor of `3`.

```sh
$ ./bin/yb-ctl create
$ ./bin/yb-ctl create --rf 3
```

## 2. Run the YugabyteDB workload generator
Expand All @@ -80,7 +79,7 @@ Download the [YugabyteDB workload generator](https://github.com/yugabyte/yb-samp
$ wget https://github.com/yugabyte/yb-sample-apps/releases/download/v1.3.0/yb-sample-apps.jar?raw=true -O yb-sample-apps.jar
```

Run the `CassandraKeyValue` workload in a separate shell.
Run the `CassandraKeyValue` workload application in a separate shell.

```sh
$ java -jar ./yb-sample-apps.jar \
Expand All @@ -94,37 +93,63 @@ $ java -jar ./yb-sample-apps.jar \

Copy the following into a file called `yugabytedb.yml`.

```sh
```yaml
global:
scrape_interval: 5s # Set the scrape interval to every 5 seconds. Default is every 1 minute.
evaluation_interval: 5s # Evaluate rules every 5 seconds. The default is every 1 minute.
# scrape_timeout is set to the global default (10s).

# YugabyteDB configuration to scrape Prometheus time-series metrics
scrape_configs:
- job_name: 'yugabytedb'
- job_name: "yugabytedb"
metrics_path: /prometheus-metrics
relabel_configs:
- target_label: "node_prefix"
replacement: "cluster-1"
metric_relabel_configs:
# Save the name of the metric so we can group_by since we cannot by __name__ directly...
- source_labels: ["__name__"]
regex: "(.*)"
target_label: "saved_name"
replacement: "$1"
# The following basically retrofit the handler_latency_* metrics to label format.
- source_labels: ["__name__"]
regex: "handler_latency_(yb_[^_]*)_([^_]*)_([^_]*)(.*)"
target_label: "server_type"
replacement: "$1"
- source_labels: ["__name__"]
regex: "handler_latency_(yb_[^_]*)_([^_]*)_([^_]*)(.*)"
target_label: "service_type"
replacement: "$2"
- source_labels: ["__name__"]
regex: "handler_latency_(yb_[^_]*)_([^_]*)_([^_]*)(_sum|_count)?"
target_label: "service_method"
replacement: "$3"
- source_labels: ["__name__"]
regex: "handler_latency_(yb_[^_]*)_([^_]*)_([^_]*)(_sum|_count)?"
target_label: "__name__"
replacement: "rpc_latency$4"

static_configs:
- targets: ['127.0.0.1:7000', '127.0.0.2:7000', '127.0.0.3:7000']
- targets: ["127.0.0.1:7000", "127.0.0.2:7000", "127.0.0.3:7000"]
labels:
group: 'yb-master'
export_type: "master_export"

- targets: ['127.0.0.1:9000', '127.0.0.2:9000', '127.0.0.3:9000']
- targets: ["127.0.0.1:9000", "127.0.0.2:9000", "127.0.0.3:9000"]
labels:
group: 'yb-tserver'
export_type: "tserver_export"

- targets: ['127.0.0.1:11000', '127.0.0.2:11000', '127.0.0.3:11000']
- targets: ["127.0.0.1:12000", "127.0.0.2:12000", "127.0.0.3:12000"]
labels:
group: 'yedis'
export_type: "cql_export"

- targets: ['127.0.0.1:12000', '127.0.0.2:12000', '127.0.0.3:12000']
- targets: ["127.0.0.1:13000", "127.0.0.2:13000", "127.0.0.3:13000"]
labels:
group: 'ycql'
export_type: "ysql_export"

- targets: ['127.0.0.1:13000', '127.0.0.2:13000', '127.0.0.3:13000']
- targets: ["127.0.0.1:11000", "127.0.0.2:11000", "127.0.0.3:11000"]
labels:
group: 'ysql'
export_type: "redis_export"
```

## 4. Start Prometheus server
Expand All @@ -141,7 +166,7 @@ Open the Prometheus UI at http://localhost:9090 and then navigate to the Targets

## 5. Analyze key metrics

On the Prometheus Graph UI, you can now plot the read IOPS and write IOPS for the `CassandraKeyValue` sample app. As you can see from the [source code](https://github.com/yugabyte/yugabyte-db/blob/master/java/yb-loadtester/src/main/java/com/yugabyte/sample/apps/CassandraKeyValue.java) of the app, it uses only SELECT statements for reads and INSERT statements for writes (aside from the initial CREATE TABLE). This means you can measure throughput and latency by simply using the metrics corresponding to the SELECT and INSERT statements.
On the Prometheus Graph UI, you can now plot the read/write throughput and latency for the `CassandraKeyValue` sample app. As you can see from the [source code](https://github.com/yugabyte/yugabyte-db/blob/master/java/yb-loadtester/src/main/java/com/yugabyte/sample/apps/CassandraKeyValue.java) of the app, it uses only SELECT statements for reads and INSERT statements for writes (aside from the initial CREATE TABLE). This means you can measure throughput and latency by simply using the metrics corresponding to the SELECT and INSERT statements.

Paste the following expressions into the **Expression** box and click **Execute** followed by **Add Graph**.

Expand All @@ -150,41 +175,46 @@ Paste the following expressions into the **Expression** box and click **Execute*
> Read IOPS

```sh
sum(irate(handler_latency_yb_cqlserver_SQLProcessor_SelectStmt_count[1m]))
sum(irate(rpc_latency_count{server_type="yb_cqlserver", service_type="SQLProcessor", service_method="SelectStmt"}[1m]))
```

![Prometheus Read IOPS](/images/ce/prom-read-iops.png)

> Write IOPS

```sh
sum(irate(handler_latency_yb_cqlserver_SQLProcessor_InsertStmt_count[1m]))
sum(irate(rpc_latency_count{server_type="yb_cqlserver", service_type="SQLProcessor", service_method="InsertStmt"}[1m]))
```

![Prometheus Read IOPS](/images/ce/prom-write-iops.png)

### Latency

> Read Latency (in microseconds)
> Read Latency (in microseconds)

```sh
avg(irate(handler_latency_yb_cqlserver_SQLProcessor_SelectStmt_sum[1m])) / avg(irate(handler_latency_yb_cqlserver_SQLProcessor_SelectStmt_count[1m]))
avg(irate(rpc_latency_sum{server_type="yb_cqlserver", service_type="SQLProcessor", service_method="SelectStmt"}[1m])) /
avg(irate(rpc_latency_count{server_type="yb_cqlserver", service_type="SQLProcessor", service_method="SelectStmt"}[1m]))
```

![Prometheus Read IOPS](/images/ce/prom-read-latency.png)

> Write Latency (in microseconds)

```sh
avg(irate(handler_latency_yb_cqlserver_SQLProcessor_InsertStmt_sum[1m])) / avg(irate(handler_latency_yb_cqlserver_SQLProcessor_InsertStmt_count[1m]))
avg(irate(rpc_latency_sum{server_type="yb_cqlserver", service_type="SQLProcessor", service_method="InsertStmt"}[1m])) /
avg(irate(rpc_latency_count{server_type="yb_cqlserver", service_type="SQLProcessor", service_method="InsertStmt"}[1m]))
```

![Prometheus Read IOPS](/images/ce/prom-write-latency.png)

## 6. [Optional] Clean up
## 6. Clean up (optional)

Optionally, you can shut down the local cluster created in Step 1.

```sh
$ ./bin/yb-ctl destroy
```

## What's next?
You can [setup Grafana](https://prometheus.io/docs/visualization/grafana/) and import the [YugabyteDB dashboard](https://grafana.com/grafana/dashboards/12620 "YugabyteDB dashboard on grafana.com") for better visualization of the metrics being collected by Prometheus.
Loading