Skip to content

Commit

Permalink
Update telemetry adoption guide
Browse files Browse the repository at this point in the history
  • Loading branch information
yadneshk committed Jul 18, 2024
1 parent 8a4093d commit 782225d
Show file tree
Hide file tree
Showing 19 changed files with 349 additions and 122 deletions.
100 changes: 38 additions & 62 deletions docs_user/modules/proc_adopting-autoscaling.adoc
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[id="adopting-autoscaling_{context}"]

= Adopting autoscaling
= Adopting Autoscaling services

Adopting autoscaling means that an existing `OpenStackControlPlane` custom resource (CR), where Aodh services are supposed to be disabled, should be patched to start the service with the configuration parameters provided by the source environment.

Expand All @@ -20,87 +20,63 @@ should be already adopted.
. Patch the `OpenStackControlPlane` CR to deploy autoscaling services:
+
----
cat << EOF > aodh_patch.yaml
oc patch openstackcontrolplane openstack --type=merge --patch '
spec:
autoscaling:
telemetry:
enabled: true
prometheus:
deployPrometheus: false
aodh:
customServiceConfig: |
[DEFAULT]
debug=true
secret: osp-secret
ifeval::["{build}" != "downstream"]
apiImage: "quay.io/podified-antelope-centos9/openstack-aodh-api:current-podified"
evaluatorImage: "quay.io/podified-antelope-centos9/openstack-aodh-evaluator:current-podified"
notifierImage: "quay.io/podified-antelope-centos9/openstack-aodh-notifier:current-podified"
listenerImage: "quay.io/podified-antelope-centos9/openstack-aodh-listener:current-podified"
endif::[]
ifeval::["{build}" == "downstream"]
apiImage: "registry.redhat.io/rhosp-dev-preview/openstack-aodh-api-rhel9:18.0"
evaluatorImage: "registry.redhat.io/rhosp-dev-preview/openstack-aodh-evaluator-rhel9:18.0"
notifierImage: "registry.redhat.io/rhosp-dev-preview/openstack-aodh-notifier-rhel9:18.0"
listenerImage: "registry.redhat.io/rhosp-dev-preview/openstack-aodh-listener-rhel9:18.0"
endif::[]
passwordSelectors:
databaseUser: aodh
databaseInstance: openstack
memcachedInstance: memcached
EOF
----

. Optional: If you have previously backed up your {OpenStackShort} services configuration file from the old environment, you can use os-diff to compare and make sure the configuration is correct. This will producre the difference between both ini configuration files:
+
----
os-diff diff /tmp/collect_tripleo_configs/aodh/etc/aodh/aodh.conf aodh_patch.yaml --crd
----
+
For more information, see xref:reviewing-the-openstack-control-plane-configuration_{context}[Reviewing the {rhos_prev_long} control plane configuration].

. Patch the `OpenStackControlPlane` CR to deploy Aodh services:
+
----
oc patch openstackcontrolplane openstack --type=merge --patch-file aodh_patch.yaml
template:
autoscaling:
enabled: true
aodh:
passwordSelectors:
databaseAccount: aodh
databaseInstance: openstack
secret: osp-secret
heatInstance: heat
'
----

.Verification

. If autoscaling services are enabled, inspect Aodh pods:
+
----
AODH_POD=`oc get pods -l service=aodh | tail -n 1 | cut -f 1 -d' '`
AODH_POD=`oc get pods -l service=aodh -n openstack | tail -n 1 | cut -f 1 -d' '`
oc exec -t $AODH_POD -c aodh-api -- cat /etc/aodh/aodh.conf
----

. Check whether Aodh API service is registered in {identity_service}:
+
----
openstack endpoint list | grep aodh
| 6a805bd6c9f54658ad2f24e5a0ae0ab6 | regionOne | aodh | network | True | public | http://aodh-public-openstack.apps-crc.testing |
| b943243e596847a9a317c8ce1800fa98 | regionOne | aodh | network | True | internal | http://aodh-internal.openstack.svc:9696 |
| f97f2b8f7559476bb7a5eafe3d33cee7 | regionOne | aodh | network | True | admin | http://192.168.122.99:9696 |
| d05d120153cd4f9b8310ac396b572926 | regionOne | aodh | alarming | True | internal | http://aodh-internal.openstack.svc:8042 |
| d6daee0183494d7a9a5faee681c79046 | regionOne | aodh | alarming | True | public | http://aodh-public.openstack.svc:8042 |
----

. Create sample resources. You can test whether you can create alarms:
.Autoscaling template adoption

* `PrometheusAlarm` alarm type must be used instead of `GnocchiAggregationByResourcesAlarm`

* Create Aodh alarms of type prometheus
+
----
openstack alarm create \
--name low_alarm \
--type gnocchi_resources_threshold \
--metric cpu \
--resource-id b7ac84e4-b5ca-4f9e-a15c-ece7aaf68987 \
--threshold 35000000000 \
--comparison-operator lt \
--aggregation-method rate:mean \
--granularity 300 \
openstack alarm create --name high_cpu_alarm \
--type prometheus \
--query "(rate(ceilometer_cpu{resource_name=~'cirros'})) * 100" \
--alarm-action 'log://' \
--granularity 15 \
--evaluation-periods 3 \
--alarm-action 'log:\\' \
--ok-action 'log:\\' \
--resource-type instance
--comparison-operator gt \
--threshold 7000000000
----

//=== (TODO)

//* Include adopted autoscaling heat templates
//* Include adopted Aodh alarm create commands of type prometheus
* Verify the state of alarm
+
----
openstack alarm list
+--------------------------------------+------------+------------------+-------------------+----------+
| alarm_id | type | name | state | severity | enabled |
+--------------------------------------+------------+------------------+-------------------+----------+
| 209dc2e9-f9d6-40e5-aecc-e767ce50e9c0 | prometheus | prometheus_alarm | ok | low | True |
+--------------------------------------+------------+------------------+-------------------+----------+
----
Original file line number Diff line number Diff line change
Expand Up @@ -261,6 +261,7 @@ spec:
- nova
- ovn
- neutron-metadata
- telemetry
env:
- name: ANSIBLE_CALLBACKS_ENABLED
value: "profile_tasks"
Expand Down Expand Up @@ -426,6 +427,7 @@ spec:
- nova
- ovn
- neutron-metadata
- telemetry
nodeTemplate:
extraMounts:
- extraVolType: Ceph
Expand Down
147 changes: 91 additions & 56 deletions docs_user/modules/proc_adopting-telemetry-services.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -14,93 +14,128 @@ This guide also assumes that:
* Previous Adoption steps completed. MariaDB, the {identity_service_first_ref} and the data plane should be already adopted.
//kgilliga:Should this procedure be moved after the "Adopting the data plane" chapter?

.Procedure

. Patch the `OpenStackControlPlane` CR to deploy Ceilometer services:
// TODO(jistr): There are still some quay.io images in the downstream build.
* Patch the `OpenStackControlPlane` CR to deploy `cluster-observability-operator`:
+
----
cat << EOF > ceilometer_patch.yaml
oc create -f - <<EOF
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
name: cluster-observability-operator
namespace: openshift-operators
spec:
ceilometer:
enabled: true
template:
ifeval::["{build}" != "downstream"]
centralImage: quay.io/podified-antelope-centos9/openstack-ceilometer-central:current-podified
computeImage: quay.io/podified-antelope-centos9/openstack-ceilometer-compute:current-podified
customServiceConfig: |
[DEFAULT]
debug=true
ipmiImage: quay.io/podified-antelope-centos9/openstack-ceilometer-ipmi:current-podified
nodeExporterImage: quay.io/prometheus/node-exporter:v1.5.0
notificationImage: quay.io/podified-antelope-centos9/openstack-ceilometer-notification:current-podified
secret: osp-secret
sgCoreImage: quay.io/infrawatch/sg-core:v5.1.1
endif::[]
ifeval::["{build}" == "downstream"]
centralImage: registry.redhat.io/rhosp-dev-preview/openstack-ceilometer-central-rhel9:18.0
computeImage: registry.redhat.io/rhosp-dev-preview/openstack-ceilometer-compute-rhel9:18.0
customServiceConfig: |
[DEFAULT]
debug=true
ipmiImage: registry.redhat.io/rhosp-dev-preview/openstack-ceilometer-ipmi-rhel9:18.0
nodeExporterImage: quay.io/prometheus/node-exporter:v1.5.0
notificationImage: registry.redhat.io/rhosp-dev-preview/openstack-ceilometer-notification-rhel9:18.0
secret: osp-secret
sgCoreImage: quay.io/infrawatch/sg-core:v5.1.1
endif::[]
channel: development
installPlanApproval: Automatic
name: cluster-observability-operator
source: redhat-operators
sourceNamespace: openshift-marketplace
EOF
----

. Optional: If you previously backed up your {OpenStackShort} services configuration file from the old environment, you can use os-diff to compare and make sure the configuration is correct. This will produce the difference between both ini configuration files:
* Wait for the installation to succeed
+
----
os-diff diff /tmp/collect_tripleo_configs/ceilometer/etc/ceilometer/ceilometer.conf ceilometer_patch.yaml --crd
oc wait --for jsonpath="{.status.phase}"=Succeeded csv --namespace=openshift-operators -l operators.coreos.com/cluster-observability-operator.openshift-operators
----

* Enable metrics storage backend
+
For more information, see xref:reviewing-the-openstack-control-plane-configuration_{context}[Reviewing the {rhos_prev_long} control plane configuration].
----
oc patch openstackcontrolplane openstack --type=merge --patch '
spec:
telemetry:
template:
metricStorage:
enabled: true
monitoringStack:
alertingEnabled: true
scrapeInterval: 30s
storage:
strategy: persistent
retention: 24h
persistent:
pvcStorageRequest: 20G
'
----

* Verify that `alertmanager` and `prometheus` pods are available
+
----
oc get pods -l alertmanager=metric-storage -n openstack
NAME READY STATUS RESTARTS AGE
alertmanager-metric-storage-0 2/2 Running 0 46s
alertmanager-metric-storage-1 2/2 Running 0 46s
oc get pods -l prometheus=metric-storage -n openstack
NAME READY STATUS RESTARTS AGE
prometheus-metric-storage-0 3/3 Running 0 46s
----

. Patch the `OpenStackControlPlane` CR to deploy Ceilometer services:
.Procedure

* Patch the `OpenStackControlPlane` CR to deploy Ceilometer services:
+
----
oc patch openstackcontrolplane openstack --type=merge --patch-file ceilometer_patch.yaml
oc patch openstackcontrolplane openstack --type=merge --patch '
spec:
telemetry:
enabled: true
template:
ceilometer:
enabled: true
secret: osp-secret
'
----

.Verification

. Inspect the resulting Ceilometer pods:
+
----
CEILOMETETR_POD=`oc get pods -l service=ceilometer | tail -n 1 | cut -f 1 -d' '`
CEILOMETETR_POD=`oc get pods -l service=ceilometer -n openstack | tail -n 1 | cut -f 1 -d' '`
oc exec -t $CEILOMETETR_POD -c ceilometer-central-agent -- cat /etc/ceilometer/ceilometer.conf
----

. Inspect the resulting Ceilometer IPMI agent pod on data plane nodes:
. Inspect enabled pollsters:
+
----
podman ps | grep ceilometer-ipmi
oc get secret ceilometer-config-data -o jsonpath="{.data['polling\.yaml\.j2']}" | base64 -d
----

. Inspect enabled pollsters:
. Optional: Override default pollsters according to requirements:
+
----
oc get secret ceilometer-config-data -o jsonpath="{.data['polling\.yaml']}" | base64 -d
oc patch openstackcontrolplane controlplane --type=merge --patch '
spec:
telemetry:
template:
ceilometer:
defaultConfigOverwrite:
polling.yaml.j2: |
---
sources:
- name: pollsters
interval: 100
meters:
- volume.*
- image.size
enabled: true
secret: osp-secret
'
----

. Enable pollsters according to requirements:
* Patch the `OpenStackControlPlane` CR to include `logging`
+
----
cat << EOF > polling.yaml
---
sources:
- name: pollsters
interval: 300
meters:
- volume.size
- image.size
- cpu
- memory
EOF
oc patch secret ceilometer-config-data --patch="{\"data\": { \"polling.yaml\": \"$(base64 -w0 polling.yaml)\"}}"
oc patch openstackcontrolplane openstack --type=merge --patch '
spec:
telemetry:
template:
logging:
enabled: false
network: internalapi
ipaddr: 172.17.0.80
port: 10514
cloNamespace: openshift-logging
'
----
2 changes: 0 additions & 2 deletions docs_user/modules/proc_deploying-backend-services.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,6 @@ passwords can be extracted like this:
----
AODH_PASSWORD=$(cat ~/tripleo-standalone-passwords.yaml | grep ' AodhPassword:' | awk -F ': ' '{ print $2; }')
BARBICAN_PASSWORD=$(cat ~/tripleo-standalone-passwords.yaml | grep ' BarbicanPassword:' | awk -F ': ' '{ print $2; }')
CEILOMETER_METERING_SECRET=$(cat ~/tripleo-standalone-passwords.yaml | grep ' CeilometerMeteringSecret:' | awk -F ': ' '{ print $2; }')
CEILOMETER_PASSWORD=$(cat ~/tripleo-standalone-passwords.yaml | grep ' CeilometerPassword:' | awk -F ': ' '{ print $2; }')
CINDER_PASSWORD=$(cat ~/tripleo-standalone-passwords.yaml | grep ' CinderPassword:' | awk -F ': ' '{ print $2; }')
GLANCE_PASSWORD=$(cat ~/tripleo-standalone-passwords.yaml | grep ' GlancePassword:' | awk -F ': ' '{ print $2; }')
Expand Down Expand Up @@ -106,7 +105,6 @@ account passwords from the original deployment:
----
oc set data secret/osp-secret "AodhPassword=$AODH_PASSWORD"
oc set data secret/osp-secret "BarbicanPassword=$BARBICAN_PASSWORD"
oc set data secret/osp-secret "CeilometerMeteringSecret=$CEILOMETER_METERING_SECRET"
oc set data secret/osp-secret "CeilometerPassword=$CEILOMETER_PASSWORD"
oc set data secret/osp-secret "CinderPassword=$CINDER_PASSWORD"
oc set data secret/osp-secret "GlancePassword=$GLANCE_PASSWORD"
Expand Down
13 changes: 12 additions & 1 deletion docs_user/modules/proc_stopping-openstack-services.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -80,7 +80,14 @@ environmental variables and function:

----
# Update the services list to be stopped
ServicesToStop=("tripleo_horizon.service"
ServicesToStop=("tripleo_aodh_api.service"
"tripleo_aodh_api_cron.service"
"tripleo_aodh_evaluator.service"
"tripleo_aodh_listener.service"
"tripleo_aodh_notifier.service"
"tripleo_ceilometer_agent_central.service"
"tripleo_ceilometer_agent_notification.service"
"tripleo_horizon.service"
"tripleo_keystone.service"
"tripleo_barbican_api.service"
"tripleo_barbican_worker.service"
Expand All @@ -90,7 +97,11 @@ ServicesToStop=("tripleo_horizon.service"
"tripleo_cinder_scheduler.service"
"tripleo_cinder_volume.service"
"tripleo_cinder_backup.service"
"tripleo_collectd.service"
"tripleo_glance_api.service"
"tripleo_gnocchi_api.service"
"tripleo_gnocchi_metricd.service"
"tripleo_gnocchi_statsd.service"
"tripleo_manila_api.service"
"tripleo_manila_api_cron.service"
"tripleo_manila_scheduler.service"
Expand Down
6 changes: 6 additions & 0 deletions tests/playbooks/test_minimal.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,12 @@
- role: heat_adoption
tags:
- heat_adoption
- role: telemetry_adoption
tags:
- telemetry_adoption
- role: autoscaling_adoption
tags:
- autoscaling_adoption
- role: stop_remaining_services
tags:
- stop_remaining_services
Expand Down
Loading

0 comments on commit 782225d

Please sign in to comment.