-
Notifications
You must be signed in to change notification settings - Fork 23
[Web service] prometheus metrics exporter
Hosted by laitos web server, the endpoint serves metrics information collected from the following sources in the prometheus-exporter format:
- All web service handlers: time to first byte, processing duration, size of response.
- Program resource usage: CPU time consumed, number of context switches, time spent on run queue and wait queue.
- All web proxy requests: time to first byte, connection duration, size of response.
Under the JSON key HTTPHandlers
, add a string property called PrometheusMetricsEndpoint
, value being the URL location of the service.
Keep the location a secret to yourself and make it difficult to guess.
Check out the additional process and system activity metrics available from the system maintenance daemon, enable them as you wish.
Here is an example:
{ ... "Maintenance": { ... "PrometheusScrapeIntervalSec": 60, "RegsiterSystemActivityMetrics": true, "RegsiterProcessActivityMetrics": true, "RegisterPrometheusMetrics": true, ... }, "HTTPHandlers": { ... "PrometheusMetricsEndpoint": "/my-precious-metrics", ... }, ... }
Modify the laitos program launch command by adding the parameter -prominteg
to it. The parameter works as the master switch to turn on
all points of integration with prometheus:
sudo ./laitos -prominteg -config <CONFIG FILE> -daemons ...,httpd,...
The exporter is hosted by web server, therefore remember to run web server.
The prometheus web handler serves all of these metrics:
- Web server (httpd and insecurehttpd) statistics are always included, such as individual handler's processing duration, response size, time-to-first-byte, etc.
- If web proxy daemon is enabled, the exporter will automatically include statistics such as data transfer per proxy destination, number of connections, connection duration, etc.
- If maintenance daemon
RegisterPrometheusMetrics
is enabled, the exporter will automatically include laitos program's process statistics such as CPU usage and scheduler performance. This relies on Linux (procfs
). - If maintenance daemon
RegsiterProcessActivityMetrics
is enabled, the exporter will automatically include the file and network activities of the laitos process. This relies onbpftrace
tool. - If maintenance daemon
RegsiterSystemActivityMetrics
is enabled, the exporter will automatically include the system-wide file and network activities. This relies onbpftrace
tool.
Next, follow the installation instructions of prometheus to install and start the prometheus daemon. Feel free to run the daemon on a home desktop, a dedicated server, or on the same computer that runs laitos.
Edit the prometheus configuration file (often located at /etc/prometheus/prometheus.yml
) and tell it to periodically fetch the exported data from
the web service's endpoint:
... scrape_configs: - job_name: 'laitos' scrape_interval: 60s # set to the same value as the system maintenance daemon's PrometheusScrapeIntervalSec scrape_timeout: 5s scheme: https # or https metrics_path: '/my-precious-metrics' static_configs: - targets: ['laitos-server.example.com:443', 'another-laitos-server.example.com:443]
Make the endpoint difficult to guess, this helps to prevent misuse of the service.
Visit prometheus web UI (or Grafana dashboard if they are integrated), and try out the following equations for plotting program resource usage:
- Percentage of involuntary context switches, 3-minutes running average:
(sum(rate(laitos_proc_num_involuntary_switches[3m])) by (instance) / (sum(rate(laitos_proc_num_involuntary_switches[3m])) by (instance) + sum(rate(laitos_proc_num_voluntary_switches[3m])) by (instance))) * 100
- Seconds of CPU time spent by laitos server (including children) in user and kernel mode, 3-minutes running average:
sum(rate(laitos_proc_num_kernel_mode_sec_incl_children[3m]) + rate(laitos_proc_num_user_mode_sec_incl_children[3m])) by (instance)
- Percentage of time spent as runnable according to OS scheduler (higher is better), 3-minutes running average:
(sum(rate(laitos_proc_num_run_sec[3m])) by (instance) / (sum(rate(laitos_proc_num_run_sec[3m])) by (instance) + sum(rate(laitos_proc_num_wait_sec[3m])) by (instance))) * 100
And try out these for plotting web server stats:
- Time-to-first-byte across all handlers at 95% quantile, 3-minutes running average:
histogram_quantile(0.95, sum(rate(laitos_httpd_response_time_to_first_byte_seconds_bucket[3m])) by (le, instance))
- Processing duration (including IO) across all handlers at 95% quantile, 3-minutes running average:
histogram_quantile(0.95, sum(rate(laitos_httpd_handler_duration_seconds_bucket[3m])) by (le, instance))
- Size of HTTP response across all handlers at 95% quantile, 3-minutes running average:
histogram_quantile(0.95, sum(rate(laitos_httpd_response_size_bytes_bucket[3m])) by (le, instance))
And try out these for plotting web proxy stats:
- Number of proxy requests per minute, 1-minute running average:
sum(rate(laitos_httpproxy_response_size_bytes_count[1m])) by (instance)
- Bytes transferred to proxy clients per minute, 1-minute running average:
sum(rate(laitos_httpproxy_response_size_bytes_sum[1m])) by (instance)
- Top 10 proxy destinations by data transfer (total MBs over 3hrs):
topk(10, sum by (host) (rate(laitos_httpproxy_response_size_bytes_sum[180m]))) * 180 * 60 / 1048576
- Top 10 proxy destinations by num of connections (total over 3 hours):
topk(10, sum by (host) (rate(laitos_httpproxy_response_size_bytes_count[180m]))) * 180 * 60
- Top 10 proxy destinations by connection duration (total seconds over 3 hours):
topk(10, sum by (host) (rate(laitos_httpproxy_handler_duration_seconds_sum[180m]))) * 180 * 60
- Size of proxy response across all destinations at 90% quantile, 3-minutes running average:
histogram_quantile(0.90, sum(rate(laitos_httpproxy_response_size_bytes_bucket[3m])) by (le, instance))
- Time-to-first-byte across all proxy destinations at 50% quantile, 3-minutes running average:
histogram_quantile(0.50, sum(rate(laitos_httpproxy_response_time_to_first_byte_seconds_bucket[3m])) by (le, instance))
- Processing duration (including IO) across all proxy destinations at 50% quantile, 3-minutes running average:
histogram_quantile(0.50, sum(rate(laitos_httpproxy_handler_duration_seconds_bucket[3m])) by (le, instance))
Table of Contents
- Home
- Get started
- Component list
- Tips for running on public cloud
- Tips for using apps over satellite
- laitos terminal
Daemon Components
- DNS server
- Mail server
- Web server
- Web proxy server
- Telnet server
- Telegram chat-bot
- Simple IP services server
- SNMP server
- System maintenance
- Phone home telemetry
Web Service Components
- Twilio telephone/SMS hook
- Microsoft chat bot hook
- The Things Network LORA tracker integration
- Recurring commands
- App command form
- Simple app command execution API
- GitLab browser
- Temporary file storage
- Simple web proxy
- Desktop on a page (virtual machine)
- Read telemetry records
- Program health report
- System process explorer
- Prometheus metrics exporter
- HTTP request inspector
- HTTP request logger
Apps