Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Monitoring for host and each docker container #56

Merged
merged 29 commits into from
Jul 2, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
ecadb44
monitoring: add cadvisor to collect per container metrics
tlvu Jun 16, 2020
c466f0c
monitoring: add node-exporter to collect system-wide metrics
tlvu Jun 16, 2020
79a3c85
monitoring: add prometheus to monitor and store collected cadvisor an…
tlvu Jun 16, 2020
4e9fdd4
monitoring: persist prometheus storage db
tlvu Jun 16, 2020
919f8b4
monitoring: increase prom storage retention to 90d from 15d default
tlvu Jun 16, 2020
be4dad9
monitoring: add Grafana to visualize metrics from Prometheus
tlvu Jun 16, 2020
c4179e8
monitoring: provision grafana datasources
tlvu Jun 17, 2020
5c744dd
monitoring: provision grafana dashboards
tlvu Jun 17, 2020
5a31d85
monitoring: replace ${DS_PROMETHEUS} with real DS name in grafana das…
tlvu Jun 17, 2020
e995ec9
env.local: add mandatory GRAFANA_ADMIN_PASSWORD for monitoring component
tlvu Jun 17, 2020
e1fef8a
monitoring: add persistance to grafana
tlvu Jun 17, 2020
feab3df
monitoring: make grafana dashboard more readeable when lots of contai…
tlvu Jun 17, 2020
3fa8728
monitoring: remove 3 unnecessary panels about container memory in gra…
tlvu Jun 17, 2020
1ff64bb
monitoring: fix system load grafana dashboard
tlvu Jun 18, 2020
6a8af62
monitoring: fix the other Load single number grafana dashboard
tlvu Jun 18, 2020
32afa00
monitoring: fix Uptime grafana dashboard
tlvu Jun 18, 2020
431f860
monitoring: fix Disk Space grafana dashboard
tlvu Jun 18, 2020
a5f8ecf
monitoring: fix Memory grafana dashboard
tlvu Jun 18, 2020
46cc1ec
monitoring: fix Swap and server variable in grafana dashboard
tlvu Jun 18, 2020
f5821ee
monitoring: fix Used Disk Space grafana dashboard
tlvu Jun 18, 2020
bead29c
monitoring: remove legend from Load and Used Disk Space since hover w…
tlvu Jun 18, 2020
04a956d
monitoring: generalize Disk Space gauge panel
tlvu Jun 18, 2020
faaa4e3
monitoring: fix Available Memory grafana graph
tlvu Jun 18, 2020
93ad45b
monitoring: fix Disk I/O graph
tlvu Jun 18, 2020
6db38aa
monitoring: show time on X-axis for Network Traffic and CPU Usage graph
tlvu Jun 18, 2020
2335d1a
monitoring: show total memory on Available Memory graph
tlvu Jun 18, 2020
b9215dd
monitoring: swap position of Memory Usage per Container and Memory Sw…
tlvu Jun 18, 2020
fb151af
monitoring: add Disk Usage per Container graph
tlvu Jun 18, 2020
39c577b
README: add instructions how to enable the monitoring stack
tlvu Jun 20, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions birdhouse/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,12 @@ enabled and configured in the `env.local` file (a copy from
desired, full documentation in [`env.local.example`](env.local.example).
* Run once [`fix-write-perm`](deployment/fix-write-perm), see doc in script.

Resource usage monitoring (CPU, memory, ..) for the host and each of the containers
can be enabled by enabling the `./components/monitoring` in `env.local` file.

* Add `./components/monitoring` to `EXTRA_CONF_DIRS`.
* Change `GRAFANA_ADMIN_PASSWORD` value.

To launch all the containers, use the following command:
```
./pavics-compose.sh up -d
Expand Down
3 changes: 3 additions & 0 deletions birdhouse/components/monitoring/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
prometheus.yml
grafana_datasources.yml
grafana_dashboards.yml
79 changes: 79 additions & 0 deletions birdhouse/components/monitoring/docker-compose-extra.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
version: '2.1'

services:
# https://github.com/google/cadvisor/blob/master/docs/running.md
# Collect per container metrics.
cadvisor:
image: gcr.io/google-containers/cadvisor:v0.36.0
container_name: cadvisor
volumes:
- /:/rootfs:ro
- /var/run:/var/run:ro
- /sys:/sys:ro
- /var/lib/docker:/var/lib/docker:ro
ports:
- 9999:8080
devices:
- /dev/kmsg
restart: unless-stopped

# https://github.com/prometheus/node_exporter
# Collect system-wide metrics.
node-exporter:
image: quay.io/prometheus/node-exporter:v1.0.0
container_name: node-exporter
volumes:
- /:/host:ro,rslave
ports:
- 9100:9100
network_mode: "host"
pid: "host"
command: --path.rootfs=/host
restart: unless-stopped

# https://prometheus.io/docs/prometheus/latest/installation
# Monitor and store collected metrics.
prometheus:
image: prom/prometheus:v2.19.0
container_name: prometheus
volumes:
- ./components/monitoring/prometheus.yml:/etc/prometheus/prometheus.yml:ro
- prometheus_persistence:/prometheus:rw
ports:
- 9090:9090
command:
# restore original CMD from image
- --config.file=/etc/prometheus/prometheus.yml
- --storage.tsdb.path=/prometheus
- --web.console.libraries=/usr/share/prometheus/console_libraries
- --web.console.templates=/usr/share/prometheus/consoles
# https://prometheus.io/docs/prometheus/latest/storage/
- --storage.tsdb.retention.time=90d
restart: unless-stopped

# https://grafana.com/docs/grafana/latest/installation/docker/
# https://grafana.com/docs/grafana/latest/installation/configure-docker/
# Visualize metrics from Prometheus
grafana:
image: grafana/grafana:7.0.3
container_name: grafana
volumes:
- ./components/monitoring/grafana_datasources.yml:/etc/grafana/provisioning/datasources/grafana_datasources.yml:ro
- ./components/monitoring/grafana_dashboards.yml:/etc/grafana/provisioning/dashboards/grafana_dashboards.yml:ro
- ./components/monitoring/grafana_dashboards:/etc/grafana/dashboards:ro
- grafana_persistence:/var/lib/grafana:rw
environment:
GF_SECURITY_ADMIN_PASSWORD: ${GRAFANA_ADMIN_PASSWORD}
ports:
- 3001:3000
restart: unless-stopped

volumes:
prometheus_persistence:
external:
name: prometheus_persistence
grafana_persistence:
external:
name: grafana_persistence

# vi: tabstop=8 expandtab shiftwidth=2 softtabstop=2
13 changes: 13 additions & 0 deletions birdhouse/components/monitoring/grafana_dashboards.yml.template
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
# https://grafana.com/docs/grafana/latest/administration/provisioning/#dashboards
apiVersion: 1

providers:
- name: 'default'
folder: 'Local-PAVICS'
folderUid: 'local-pavics'
disableDeletion: false
type: file
editable: false
allowUiUpdates: false
options:
path: "/etc/grafana/dashboards"
Loading