Skip to content

Commit

Permalink
Add detail to tmpnet metrics documentation (#2854)
Browse files Browse the repository at this point in the history
Signed-off-by: Stephen Buttolph <[email protected]>
Co-authored-by: Stephen Buttolph <[email protected]>
  • Loading branch information
marun and StephenButtolph authored Mar 26, 2024
1 parent f57f0f2 commit 0f6b123
Showing 1 changed file with 81 additions and 22 deletions.
103 changes: 81 additions & 22 deletions tests/fixture/tmpnet/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -234,38 +234,75 @@ The process details of a node are written by avalanchego to
process, the URI of the node's API, and the address other nodes can
use to bootstrap themselves (aka staking address).

## Metrics

### Prometheus configuration

When nodes are started, prometheus configuration for each node is
written to `~/.tmpnet/prometheus/file_sd_configs/` with a filename of
`[network uuid]-[node id].json`. Prometheus can be configured to
scrape the nodes as per the following example:

```yaml
scrape_configs:
- job_name: "avalanchego"
metrics_path: "/ext/metrics"
file_sd_configs:
- files:
- '/home/me/.tmpnet/prometheus/file_sd_configs/*.yaml'
## Monitoring

Monitoring is an essential part of understanding the workings of a
distributed system such as avalanchego. The tmpnet fixture enables
collection of logs and metrics from temporary networks to a monitoring
stack (prometheus+loki+grafana) to enable results to be analyzed and
shared.

### Example usage

```bash
# Start prometheus to collect metrics
PROMETHEUS_ID=<id> PROMETHEUS_PASSWORD=<password> ./scripts/run_prometheus.sh

# Start promtail to collect logs
LOKI_ID=<id> LOKI_PASSWORD=<password> ./scripts/run_promtail.sh

# Network start emits link to grafana displaying collected logs and metrics
./build/tmpnetctl start-network
```

### Viewing metrics
### Metrics collection

When a node is started, configuration enabling collection of metrics
from the node is written to
`~/.tmpnet/prometheus/file_sd_configs/[network uuid]-[node id].json`.

The `scripts/run_prometheus.sh` script starts prometheus in agent mode
configured to scrape metrics from configured nodes and forward the
metrics to a persistent prometheus instance. The script requires that
the `PROMETHEUS_ID` and `PROMETHEUS_PASSWORD` env vars be set. By
default the prometheus instance at
https://prometheus-experimental.avax-dev.network will be targeted and
this can be overridden via the `PROMETHEUS_URL` env var.

### Log collection

When a network is started with `tmpnet`, a grafana link for the
network's metrics will be emitted.
Nodes log are stored at `~/.tmpnet/networks/[network id]/[node
id]/logs` by default, and can optionally be forwarded to loki with
promtail.

The metrics emitted by temporary networks configured with tmpnet will
have the following labels applied:
When a node is started, promtail configuration enabling
collection of logs for the node is written to
`~/.tmpnet/promtail/file_sd_configs/[network
uuid]-[node id].json`.

The `scripts/run_promtail.sh` script starts promtail configured to
collect logs from configured nodes and forward the results to loki. The
script requires that the `LOKI_ID` and `LOKI_PASSWORD` env vars be
set. By default the loki instance at
https://loki-experimental.avax-dev.network will be targeted and this
can be overridden via the `LOKI_URL` env var.

### Labels

The logs and metrics collected for temporary networks will have the
following labels applied:

- `network_uuid`
- uniquely identifies a network across hosts
- `node_id`
- `is_ephemeral_node`
- 'ephemeral' nodes are expected to run for only a fraction of the
life of a network
- `network_owner`
- an arbitrary string that can be used to differentiate results
when a CI job runs more than one network

When a tmpnet network runs as part of github CI, the following
When a network runs as part of a github CI job, the following
additional labels will be applied:

- `gh_repo`
Expand All @@ -274,3 +311,25 @@ additional labels will be applied:
- `gh_run_number`
- `gh_run_attempt`
- `gh_job_id`

These labels are sourced from Github Actions' `github` context as per
https://docs.github.com/en/actions/learn-github-actions/contexts#github-context.

### Viewing

#### Local networks

When a network is started with tmpnet, a link to the [default grafana
instance](https://grafana-experimental.avax-dev.network) will be
emitted. The dashboards will only be populated if prometheus and
promtail are running locally (as per previous sections) to collect
metrics and logs.

#### CI

Collection of logs and metrics is enabled for CI jobs that use
tmpnet. Each job will execute a step titled `Notify of metrics
availability` that emits a link to grafana parametized to show results
for the job. Additional links to grafana parametized to show results
for individual network will appear in the logs displaying the start of
those networks.

0 comments on commit 0f6b123

Please sign in to comment.