Skip to content
This repository has been archived by the owner on Jan 13, 2025. It is now read-only.

Commit

Permalink
adding metrics deployment scripts (#30926)
Browse files Browse the repository at this point in the history
* adding metrics deployment scripts

* fix trailing whitespaces

* fix more trailing whitespaces

* fix typos

* fix trailing whitespace

* fix loops

* update env vars

* fix shellcheck source

* add source references
  • Loading branch information
joeaba authored Mar 28, 2023
1 parent 75abfc7 commit 77aac98
Show file tree
Hide file tree
Showing 53 changed files with 4,385 additions and 131 deletions.
90 changes: 45 additions & 45 deletions metrics/README.md
Original file line number Diff line number Diff line change
@@ -1,47 +1,47 @@
# Metrics

## Testnet Grafana Dashboard

There are three versions of the testnet dashboard, corresponding to the three
release channels:
* https://metrics.solana.com:3000/d/monitor-edge/cluster-telemetry-edge
* https://metrics.solana.com:3000/d/monitor-beta/cluster-telemetry-beta
* https://metrics.solana.com:3000/d/monitor/cluster-telemetry

The dashboard for each channel is defined from the
`metrics/scripts/grafana-provisioning/dashboards/cluster-monitor.json` source
file in the git branch associated with that channel, and deployed by automation
running `ci/publish-metrics-dashboard.sh`.

A deploy can be triggered at any time via the `New Build` button of
https://buildkite.com/solana-labs/publish-metrics-dashboard.

### Modifying a Dashboard

Dashboard updates are accomplished by modifying
`metrics/scripts/grafana-provisioning/dashboards/cluster-monitor.json`,
**manual edits made directly in Grafana will be overwritten**.

* Check out metrics to add at https://metrics.solana.com:8888/ in the data explorer.
* When editing a query for a dashboard graph, use the "Toggle Edit Mode" selection
behind the hamburger button to use raw SQL and copy the query into the text field.
You may have to fixup the query with the dashboard variables like $testnet or $timeFilter,
check other functioning fields in the dashboard for examples.

1. Open the desired dashboard in Grafana
2. Create a development copy of the dashboard by selecting `Save As..` in the
`Settings` menu for the dashboard
3. Edit dashboard as desired
4. Extract the JSON Model by selecting `JSON Model` in the `Settings` menu. Copy the JSON to the clipboard
and paste into `metrics/scripts/grafana-provisioning/dashboards/cluster-monitor.json`,
5. Delete your development dashboard: `Settings` => `Delete`

### Deploying a Dashboard Manually

If you need to immediately deploy a dashboard using the contents of
`cluster-monitor.json` in your local workspace,
```
$ export GRAFANA_API_TOKEN="an API key from https://metrics.solana.com:3000/org/apikeys"
$ metrics/publish-metrics-dashboard.sh (edge|beta|stable)
```
Note that automation will eventually overwrite your manual deploy.
## InfluxDB

In oder to explore validator specific metrics from mainnet-beta, testnet or devnet you can use Chronograf:

* https://metrics.solana.com:8888/ (production enviroment)
* https://metrics.solana.com:8889/ (testing enviroment)

For local cluster deployments you should use:

* https://internal-metrics.solana.com:8888/

## Public Grafana Dashboards

There are three main public dashboards for cluster related metrics:

* https://metrics.solana.com/d/monitor-edge/cluster-telemetry
* https://metrics.solana.com/d/0n54roOVz/fee-market
* https://metrics.solana.com/d/UpIWbId4k/ping-result

For local cluster deployments you should use:

* https://internal-metrics.solana.com:3000/

### Cluster Telemetry

The cluster telemetry dashboard shows the current state of the cluster:

1. Cluster Stability
2. Validator Streamer
3. Tomer Consensus
4. IP Network
5. Snapshots
6. RPC Send Transaction Service

### Fee Market

The fee market dashboard shows:

1. Total Priorization Fees
2. Block Min Priorization Fees
3. Cost Tracker Stats

### Ping Results

The ping reults dashboard displays relevant information about the Ping API
15 changes: 0 additions & 15 deletions metrics/grafcli.conf

This file was deleted.

10 changes: 10 additions & 0 deletions metrics/influx-enterprise/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
![image](https://user-images.githubusercontent.com/110216567/182764431-504557e4-92ac-41ff-82a5-b87c88c19c1d.png)
# Influxdb_Enterprise
[Influx_Enterprise](https://solana-labs.atlassian.net/wiki/spaces/DEVOPS/pages/25788425/Influx+Enterprise+Integration)


influxdb-meta.conf -- is the meta node configuration file in which we have to defined the servers and configuration.

influxdb.conf -- is the data node configuration file in which we have to defined the servers and configuration.

default -- is the nginx load balancer configuration file of the VM named influxdb-enterprise.
102 changes: 102 additions & 0 deletions metrics/influx-enterprise/default
Original file line number Diff line number Diff line change
@@ -0,0 +1,102 @@
##
# You should look at the following URL's in order to grasp a solid understanding
# of Nginx configuration files in order to fully unleash the power of Nginx.
# https://www.nginx.com/resources/wiki/start/
# https://www.nginx.com/resources/wiki/start/topics/tutorials/config_pitfalls/
# https://wiki.debian.org/Nginx/DirectoryStructure
#
# In most cases, administrators will remove this file from sites-enabled/ and
# leave it as reference inside of sites-available where it will continue to be
# updated by the nginx packaging team.
#
# This file will automatically load configuration files provided by other
# applications, such as Drupal or Wordpress. These applications will be made
# available underneath a path with that package name, such as /drupal8.
#
# Please see /usr/share/doc/nginx-doc/examples/ for more detailed examples.
##

# Default server configuration
#
upstream backend {
server 145.40.69.29:8086 max_fails=0;
server 147.28.151.45:8086 max_fails=0;
server 147.28.151.201:8086 max_fails=0;
server 86.109.7.147:8086 max_fails=0;
server 147.28.151.73:8086 max_fails=0;
server 147.28.129.143:8086 max_fails=0;
}
server {
listen 8086 default_server;
listen [::]:8086 default_server;

# SSL configuration
#
# listen 443 ssl default_server;
# listen [::]:443 ssl default_server;
#
# Note: You should disable gzip for SSL traffic.
# See: https://bugs.debian.org/773332
#
# Read up on ssl_ciphers to ensure a secure configuration.
# See: https://bugs.debian.org/765782
#
# Self signed certs generated by the ssl-cert package
# Don't use them in a production server!
#
# include snippets/snakeoil.conf;

root /var/www/html;

# Add index.php to the list if you are using PHP
index index.html index.htm index.nginx-debian.html;

server_name _;

location / {
proxy_connect_timeout 1200s;
proxy_send_timeout 1200s;
proxy_read_timeout 1200s;
proxy_pass http://backend;
# First attempt to serve request as file, then
# as directory, then fall back to displaying a 404.
# try_files $uri $uri/ =404;
}

# pass PHP scripts to FastCGI server
#
#location ~ \.php$ {
# include snippets/fastcgi-php.conf;
#
# # With php-fpm (or other unix sockets):
# fastcgi_pass unix:/var/run/php/php7.4-fpm.sock;
# # With php-cgi (or other tcp sockets):
# fastcgi_pass 127.0.0.1:9000;
#}

# deny access to .htaccess files, if Apache's document root
# concurs with nginx's one
#
#location ~ /\.ht {
# deny all;
#}
}

# Virtual Host configuration for example.com
#
# You can move that to a different file under sites-available/ and symlink that
# to sites-enabled/ to enable it.
#
#server {
# listen 80;
# listen [::]:80;
#
# server_name example.com;
#
# root /var/www/example.com;
# index index.html;
#
# location / {
# try_files $uri $uri/ =404;
# }
#}
140 changes: 140 additions & 0 deletions metrics/influx-enterprise/influxdb-meta.conf
Original file line number Diff line number Diff line change
@@ -0,0 +1,140 @@
### Welcome to the InfluxDB Enterprise configuration file.

# The values in this file override the default values used by the system if
# a config option is not specified. The commented out lines are the configuration
# field and the default value used. Uncommenting a line and changing the value
# will change the value used at runtime when the process is restarted.

# Once every 24 hours InfluxDB Enterprise will report usage data to usage.influxdata.com
# The data includes a random ID, os, arch, version, the number of series and other
# usage data. No data from user databases is ever transmitted.
# Change this option to true to disable reporting.
# reporting-disabled = false

# The TCP bind address to use for the cluster-internal meta services.
# bind-address = ":8091"

# Hostname advertised by this host for remote addresses. This must be resolvable by all
# other nodes in the cluster.
hostname = "dev-equinix-washington-24"

###
### [enterprise]
###
### Settings related to enterprise licensing.
###

[enterprise]
# Must be set to true to use the Enterprise Web UI
# registration-enabled = false

# Must include the protocol (http://)
# registration-server-url = ""

# license-key and license-path are mutually exclusive, use only one and leave the other blank
license-key = ""

# license-key and license-path are mutually exclusive, use only one and leave the other blank
license-path = ""

###
### [meta]
###
### Settings specific to meta node operation.
###
#
[meta]
# Directory where cluster meta data is stored.
dir = "/var/lib/influxdb/meta"

# The default address for raft, cluster, snapshot, etc.
# bind-address = ":8089"

# The default address to bind the API to.
# http-bind-address = ":8091"

# Determines whether meta nodes use HTTPS to communicate with each other.
# https-enabled = false

# The SSL certificate to use when HTTPS is enabled. The certificate should be a PEM encoded
# bundle of the certificate and key. If it is just the certificate, a key must be
# specified in https-private-key.
# https-certificate = ""

# Use a separate private key location.
# https-private-key = ""

# Whether meta nodes will skip certificate validation communicating with each other over HTTPS.
# This is useful when testing with self-signed certificates.
# https-insecure-tls = false

# Whether to use TLS to communicate with data nodes.
# data-use-tls = false

# Whether meta nodes will skip certificate validation communicating with data nodes over TLS.
# This is useful when testing with self-signed certificates.
# data-insecure-tls = false

# The default frequency with which the node will gossip its known announcements.
# gossip-frequency = "5s"

# The default length of time an announcement is kept before it is considered too old.
# announcement-expiration = "30s"

# Automatically create a default retention policy when creating a database.
# retention-autocreate = true

# The amount of time in candidate state without a leader before we attempt an election.
# election-timeout = "1s"

# The amount of time in follower state without a leader before we attempt an election.
# heartbeat-timeout = "1s"

# Control how long the "lease" lasts for being the leader without being able to contact a quorum
# of nodes. If we reach this interval without contact, we will step down as leader.
# leader-lease-timeout = "500ms"

# The amount of time without an Apply() operation before we heartbeat to ensure a timely
# commit. Due to random staggering, may be delayed as much as 2x this value.
# commit-timeout = "50ms"

# Timeout waiting for consensus before getting the latest Raft snapshot.
# consensus-timeout = "30s"

# Enables cluster level trace logging.
# cluster-tracing = false

# Enables cluster API level trace logging.
# logging-enabled = true

# Determines whether the pprof endpoint is enabled. This endpoint is used for
# troubleshooting and monitoring.
# pprof-enabled = true

# The default duration of leases.
# lease-duration = "1m0s"

# If true, HTTP endpoints require authentication.
# This setting must have the same value as the data nodes' meta.meta-auth-enabled
# configuration.
# auth-enabled = false

# Whether LDAP is allowed to be set.
# If true, you will need to use `influxd ldap set-config` and set enabled=true to use LDAP authentication.
# ldap-allowed = false

# The shared secret used by the API for JWT authentication.
# shared-secret = ""

# The shared secret used by the internal API for JWT authentication.
# This setting must have the same value as the data nodes'
# meta.meta-internal-shared-secret configuration.
internal-shared-secret = "this is meta node"

# Configures password hashing scheme. Use "pbkdf2-sha256" or "pbkdf2-sha512"
# for a FIPS-ready password hash. This setting must have the same value as
# the data nodes' meta.password-hash configuration.
# password-hash = "bcrypt"

# Configures strict FIPS-readiness check on startup.
# ensure-fips = false
Loading

0 comments on commit 77aac98

Please sign in to comment.