Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add support for rendering an optional PrometheusOperator ServiceMonitor resource definition #271

Merged
merged 21 commits into from
Aug 22, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
029df12
feat: add helper scripts to manage prometheus deployment
leninmehedy Aug 13, 2023
07fe722
fix: use network-node-svc label selector
leninmehedy Aug 13, 2023
cc4eac2
fix: install prometheus operator only if crds are not installed yet
leninmehedy Aug 14, 2023
36c9145
feat: deploy prometheus service monitor conditionally along with hede…
leninmehedy Aug 14, 2023
a59ab4b
style: add missing new lines
leninmehedy Aug 14, 2023
3fa482d
fix: allow prometheus service monitor endpoints to be set by user
leninmehedy Aug 14, 2023
29a7b8a
Merge branch 'main' into 227-prometheus-servicemonitor
leninmehedy Aug 14, 2023
77e427d
ci: setup prometheus operator in CI/CD pipeline
leninmehedy Aug 14, 2023
4a68fa2
fix: update dev script to deploy prometheus operator locally if not i…
leninmehedy Aug 14, 2023
3664f81
style: fix spotless lint issue
leninmehedy Aug 14, 2023
cfcd596
fix: function definition in bash script for consistency
leninmehedy Aug 14, 2023
ee18328
Merge branch 'main' into 227-prometheus-servicemonitor
leninmehedy Aug 16, 2023
54e5cb3
Update charts/hedera-network/templates/services/network-node-svc.yaml
leninmehedy Aug 21, 2023
f5d5fc9
fix: remove otel ports from value file
leninmehedy Aug 21, 2023
e68165a
Merge branch 'main' into 227-prometheus-servicemonitor
leninmehedy Aug 21, 2023
9cb82c2
fix: port name and updated README for manual tests
leninmehedy Aug 21, 2023
59248e6
fix: port name
leninmehedy Aug 21, 2023
bff50ba
fix: install minio operator if not installed already during network d…
leninmehedy Aug 21, 2023
00d9b9c
fix: increase timeout during example app deployment
leninmehedy Aug 21, 2023
2072ad3
fix: only expose OTel metrics port from node svc
leninmehedy Aug 22, 2023
5e57f43
fix: update health-check port name for otel collector
leninmehedy Aug 22, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions .github/workflows/zxc-compile-code.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -106,11 +106,18 @@ jobs:
with:
version: "v3.12.1" # helm version

- name: Setup Kubernetes Operators
working-directory: dev
if: ${{ inputs.enable-unit-tests && !cancelled() && !failure() }}
run: |
make deploy-prometheus-operator

- name: Kubernetes Cluster Info
if: ${{ inputs.enable-unit-tests && !cancelled() }}
run: |
kubectl config set-context --current --namespace=default
kubectl config get-contexts
kubectl get crd

- name: Authenticate to Google Cloud
id: google-auth
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,9 @@ apiVersion: v1
kind: Service
metadata:
name: network-{{ $nodeConfig.name }}-svc
labels:
fullstack.hedera.com/type: network-node-svc
leninmehedy marked this conversation as resolved.
Show resolved Hide resolved
fullstack.hedera.com/node-name: {{ $nodeConfig.name }}
spec:
selector:
app: network-{{ $nodeConfig.name }}
Expand All @@ -20,4 +23,8 @@ spec:
protocol: TCP
port: 50212 # tls grpc client port
targetPort: 50212
- name: otel-metrics
protocol: TCP
port: 8888
targetPort: 8888
{{- end }}
12 changes: 9 additions & 3 deletions charts/hedera-network/templates/sidecars/_otel-collector.tpl
Original file line number Diff line number Diff line change
Expand Up @@ -7,10 +7,16 @@
imagePullPolicy: {{ include "fullstack.images.pullPolicy" (dict "image" $otel.image "defaults" $defaults) }}
securityContext:
{{- include "fullstack.root.security.context" . | nindent 4 }}
{{- with default $defaults.ports $otel.ports }}
ports:
{{- toYaml . | nindent 4 }}
{{- end }}
- name: otel-health
containerPort: 13133
protocol: TCP
- name: otel-metrics
containerPort: 8888
protocol: TCP
- name: otel-otlp
containerPort: 4317
protocol: TCP
{{- with default $defaults.livenessProbe $otel.livenessProbe }}
livenessProbe:
{{- toYaml . | nindent 4 }}
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
{{- if $.Values.telemetry.prometheus.svcMonitor.enable | eq "true" }}
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: network-node-monitor
labels:
fullstack.hedera.com/type: network-node-svc-monitor
spec:
selector:
matchLabels:
fullstack.hedera.com/type: network-node-svc
endpoints:
- port: otel-metrics
interval: 5s
{{- end }}
22 changes: 9 additions & 13 deletions charts/hedera-network/values.yaml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# WARNING: Except numbers use double quote for all values. This is because we need to be careful about booleans.
# WARNING: Use double quotes for all values. This is because we need to be careful about booleans.

# cloud configuration
cloud:
Expand All @@ -8,6 +8,12 @@ cloud:
streamBucket: "fst-streams"
backupBucket: "fst-backups"

# telemetry configurations
telemetry:
prometheus:
svcMonitor:
enable: "true"

# reduce default termination grace period
terminationGracePeriodSeconds: 10

Expand Down Expand Up @@ -140,24 +146,14 @@ defaults:
repository: "otel/opentelemetry-collector-contrib"
tag: "0.72.0"
pullPolicy: "IfNotPresent"
ports:
- name: healthcheck
containerPort: 13133
protocol: TCP
- name: metrics
containerPort: 8888
protocol: TCP
- name: otlp
containerPort: 4317
protocol: TCP
livenessProbe:
httpGet:
path: /
port: healthcheck
port: otel-health
readinessProbe:
httpGet:
path: /
port: healthcheck
port: otel-health
resources: {}

# This configures the minio tenant subchart
Expand Down
36 changes: 34 additions & 2 deletions dev/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -8,13 +8,22 @@ SHELLOPTS:=$(if $(SHELLOPTS),$(SHELLOPTS):)pipefail:errexit
.ONESHELL:

# Here we tell make not to output the actual command before execution in order to reduce noise in the logs.
.SILENT: setup setup-cluster deploy-chart helm-test deploy-network destroy-test-container destroy-network test
.SILENT: \
setup \
setup-cluster \
deploy-chart \
helm-test \
deploy-network \
destroy-test-container \
destroy-network test


# Setup variables
SCRIPTS_DIR=$(PWD)/scripts
CHART_DIR=$(PWD)/../charts/hedera-network
SCRIPT_NAME=direct-install.sh
TMP_DIR=${SCRIPTS_DIR}/../temp
TELEMETRY_SCRIPT="telemetry.sh"

.PHONY: all
all: setup setup-cluster reset
Expand All @@ -33,7 +42,7 @@ update-helm-dependencies:
helm dependency update ../charts/hedera-network

.PHONY: deploy-chart
deploy-chart:
deploy-chart: deploy-minio-operator-if-required deploy-prometheus-operator
echo ">> Deploying helm chart..." && \
echo "" && \
if [ "${SCRIPT_NAME}" = "nmt-install.sh" ]; then \
Expand Down Expand Up @@ -128,6 +137,29 @@ restart: stop-nodes start-nodes
.PHONY: reset
reset: destroy-network start

######################################### Prometheus #################################
.PHONY: fetch-prometheus-operator-bundle
fetch-prometheus-operator-bundle:
source "${SCRIPTS_DIR}/${TELEMETRY_SCRIPT}" && fetch-prometheus-operator-bundle

.PHONY: deploy-prometheus-operator
deploy-prometheus-operator: fetch-prometheus-operator-bundle
source "${SCRIPTS_DIR}/${TELEMETRY_SCRIPT}" && deploy-prometheus-operator

.PHONY: destroy-prometheus-operator
destroy-prometheus-operator:
source "${SCRIPTS_DIR}/${TELEMETRY_SCRIPT}" && destroy-prometheus-operator

.PHONY: deploy-prometheus
deploy-prometheus: deploy-prometheus-operator
source "${SCRIPTS_DIR}/${TELEMETRY_SCRIPT}" && deploy-prometheus

.PHONY: destroy-prometheus
destroy-prometheus:
-source "${SCRIPTS_DIR}/${TELEMETRY_SCRIPT}" && destroy-prometheus
make destroy-prometheus-operator

######################################### MinIO #################################
.PHONY: deploy-minio-operator
deploy-minio-operator:
@echo ">> Deploying minio operator..."; \
Expand Down
16 changes: 14 additions & 2 deletions dev/scripts/helper.sh
Original file line number Diff line number Diff line change
Expand Up @@ -305,9 +305,21 @@ function prep_address_book() {
local addresses=()
for node_name in "${NODE_NAMES[@]}"; do
local pod="network-${node_name}-0" # pod name
while [[ $(kubectl get pod ${pod} -o 'jsonpath={..status.conditions[?(@.type=="Ready")].status}') != "True" ]]; do
echo "waiting for pod" && sleep 1
local max_attempts=$MAX_ATTEMPTS
local attempts=0
local status=$(kubectl get pod "${pod}" -o 'jsonpath={..status.conditions[?(@.type=="Ready")].status}')

while [[ "${attempts}" -lt "${max_attempts}" && "${status}" != "True" ]]; do
kubectl get pod network-node0-0 -o 'jsonpath={..status.conditions[?(@.type=="Ready")]}'

echo ""
echo "Waiting for the pod to be ready - ${pod}: Attempt# ${attempts}/${max_attempts} ..."
sleep 5

status=$(kubectl get pod "${pod}" -o 'jsonpath={..status.conditions[?(@.type=="Ready")].status}')
attempts=$((attempts + 1))
done

echo "${KCTL} get pod ${pod} -o jsonpath='{.status.podIP}' | xargs"
local POD_IP=$("${KCTL}" get pod "${pod}" -o jsonpath='{.status.podIP}' | xargs)
if [ -z "${POD_IP}" ]; then
Expand Down
94 changes: 94 additions & 0 deletions dev/scripts/telemetry.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,94 @@
#!/usr/bin/env bash

SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" &>/dev/null && pwd)"
readonly SCRIPT_DIR
readonly TELEMETRY_DIR="${SCRIPT_DIR}/../telemetry"
readonly PROMETHEUS_DIR="${TELEMETRY_DIR}/prometheus"

# Run the below command to retrieve the latest version
# curl -s "https://api.github.com/repos/prometheus-operator/prometheus-operator/releases/latest" | jq -cr .tag_name
readonly PROMETHEUS_VERSION=v0.67.1
readonly PROMETHEUS_OPERATOR_YAML="${PROMETHEUS_DIR}/prometheus-operator.yaml"
readonly PROMETHEUS_YAML="${PROMETHEUS_DIR}/prometheus.yaml"
readonly PROMETHEUS_RBAC_YAML="${PROMETHEUS_DIR}/prometheus-rbac.yaml"
readonly PROMETHEUS_EXAMPLE_APP_YAML="${PROMETHEUS_DIR}/example-app.yaml"

function fetch-prometheus-operator-bundle() {
if [[ ! -f "${PROMETHEUS_OPERATOR_YAML}" ]]; then \
echo ""
echo "Fetching prometheus bundle: https://github.com/prometheus-operator/prometheus-operator/releases/download/${PROMETHEUS_VERSION}/bundle.yaml"
echo "PROMETHEUS_OPERATOR_YAML: ${PROMETHEUS_OPERATOR_YAML}"
echo "-----------------------------------------------------------------------------------------------------"
echo "Fetching prometheus bundle: https://github.com/prometheus-operator/prometheus-operator/releases/download/${PROMETHEUS_VERSION}/bundle.yaml > ${PROMETHEUS_OPERATOR_YAML}"
curl -sL --fail-with-body "https://github.com/prometheus-operator/prometheus-operator/releases/download/${PROMETHEUS_VERSION}/bundle.yaml" -o "${PROMETHEUS_OPERATOR_YAML}"
local status="$?"
[[ "${status}" != 0 ]] && rm "${PROMETHEUS_OPERATOR_YAML}" && echo "ERROR: Failed to fetch prometheus bundle"
return "${status}"
fi
}

function deploy-prometheus-operator() {
echo ""
echo "Deploying prometheus operator"
echo "PROMETHEUS_OPERATOR_YAML: ${PROMETHEUS_OPERATOR_YAML}"
echo "-----------------------------------------------------------------------------------------------------"
local crd_count=$(kubectl get crd | grep "monitoring.coreos.com" | wc -l)
if [[ $crd_count -ne 10 ]]; then
kubectl create -f "${PROMETHEUS_OPERATOR_YAML}"
kubectl wait --for=condition=Ready pods -l app.kubernetes.io/name=prometheus-operator -n default
else
echo "Kubernetes operator CRD is already installed"
fi
}

function destroy-prometheus-operator() {
echo ""
echo "Destroying prometheus operator"
echo "PROMETHEUS_OPERATOR_YAML: ${PROMETHEUS_OPERATOR_YAML}"
echo "-----------------------------------------------------------------------------------------------------"
kubectl delete -f "${PROMETHEUS_OPERATOR_YAML}"
sleep 10
}

function deploy-prometheus() {
echo ""
echo "Deploying prometheus"
echo "PROMETHEUS_RBAC_YAML: ${PROMETHEUS_RBAC_YAML}"
echo "PROMETHEUS_YAML: ${PROMETHEUS_YAML}"
echo "-----------------------------------------------------------------------------------------------------"
kubectl create -f "${PROMETHEUS_RBAC_YAML}"
sleep 10
kubectl create -f "${PROMETHEUS_YAML}"
echo "Waiting for prometheus to be running..."
kubectl wait --for=condition=Ready pods -l app.kubernetes.io/name=prometheus -n default --timeout 300s
}

function destroy-prometheus() {
echo ""
echo "Destroying prometheus"
echo "PROMETHEUS_RBAC_YAML: ${PROMETHEUS_RBAC_YAML}"
echo "PROMETHEUS_YAML: ${PROMETHEUS_YAML}"
echo "-----------------------------------------------------------------------------------------------------"
kubectl delete -f "${PROMETHEUS_YAML}"
kubectl delete -f "${PROMETHEUS_RBAC_YAML}"
sleep 5
}

function deploy-prometheus-example-app() {
echo ""
echo "Deploying prometheus-example-app"
echo "PROMETHEUS_EXAMPLE_APP_YAML: ${PROMETHEUS_EXAMPLE_APP_YAML}"
echo "-----------------------------------------------------------------------------------------------------"
kubectl create -f "${PROMETHEUS_EXAMPLE_APP_YAML}"
kubectl wait --for=condition=Ready pods -l app=prometheus-example-app -n default --timeout 60
}

function destroy-prometheus-example-app() {
echo ""
echo "Destroying prometheus-example-app"
echo "PROMETHEUS_EXAMPLE_APP_YAML: ${PROMETHEUS_EXAMPLE_APP_YAML}"
echo "-----------------------------------------------------------------------------------------------------"
kubectl delete -f "${PROMETHEUS_EXAMPLE_APP_YAML}"
local status="$?"
[[ "${status}" = 0 ]] && sleep 10
}
59 changes: 59 additions & 0 deletions dev/telemetry/prometheus/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
# Force the use of bash as the shell for more features
SHELL=/bin/bash

# Ensure we can catch error to run cleanup when multiple make commands are run in sequence.
# Here we tell make to run all scripts as one-shell and also set 'pipefail' and 'errexit' flags.
# https://stackoverflow.com/questions/28597794/how-can-i-clean-up-after-an-error-in-a-makefile
SHELLOPTS:=$(if $(SHELLOPTS),$(SHELLOPTS):)pipefail:errexit
.ONESHELL:

# Run the below command to retrieve the latest version
# curl -s "https://api.github.com/repos/prometheus-operator/prometheus-operator/releases/latest" | jq -cr .tag_name
PROMETHEUS_VERSION=v0.67.1

SCRIPTS_DIR=$(PWD)/../../scripts
TELEMETRY_SCRIPT="telemetry.sh"

.SILENT: fetch-operator-bundle

.PHONY: fetch-operator-bundle
fetch-operator-bundle:
source "${SCRIPTS_DIR}/${TELEMETRY_SCRIPT}" && fetch-prometheus-operator-bundle


.PHONY: deploy-prometheus-operator
deploy-prometheus-operator: fetch-operator-bundle
source "${SCRIPTS_DIR}/${TELEMETRY_SCRIPT}" && deploy-prometheus-operator

.PHONY: destroy-prometheus-operator
destroy-prometheus-operator:
source "${SCRIPTS_DIR}/${TELEMETRY_SCRIPT}" && destroy-prometheus-operator

.PHONY: deploy-prometheus
deploy-prometheus: deploy-prometheus-operator
source "${SCRIPTS_DIR}/${TELEMETRY_SCRIPT}" && deploy-prometheus

.PHONY: destroy-prometheus
destroy-prometheus:
source "${SCRIPTS_DIR}/${TELEMETRY_SCRIPT}" && destroy-prometheus

.PHONY: destroy-all
destroy-all:
# Note: - prefix ensures errors are ignored and continues
-${MAKE} destroy-prometheus-example-app
-${MAKE} destroy-prometheus
-${MAKE} destroy-prometheus-operator

.PHONY: deploy-all
deploy-all: deploy-prometheus-example-app deploy-prometheus

# Prometheus example app
.PHONY: deploy-prometheus-example-app
deploy-prometheus-example-app:
source "${SCRIPTS_DIR}/${TELEMETRY_SCRIPT}" && deploy-prometheus-example-app

.PHONY: destroy-prometheus-example-app
destroy-prometheus-example-app:
source "${SCRIPTS_DIR}/${TELEMETRY_SCRIPT}" && destroy-prometheus-example-app


20 changes: 20 additions & 0 deletions dev/telemetry/prometheus/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# Setup Prometheus
This folder contains helper files to setup a prometheus instance locally.

## Commands
- Deploy prometheus operator
- `make deploy-prometheus-operator`
- Deploy prometheus
- `make deploy-prometheus`
- Deploy prometheus example app
- `make deploy-prometheus-example-app`
- Deploy all
`make deploy-all`
- Destroy all
`make destroy-all`

## Manual Test
- From `dev` folder deploy the network `make deploy-network`
- From this folder run `make deploy-all`
- export prometheus svc port `kubectl port-forward svc/prometheus 9090:9090`
- browse `http://localhost:9090/tsdb-status` and ensure status are non-zero
Loading