Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update telemetry adoption guide #290

Merged
merged 1 commit into from
Jul 29, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
102 changes: 40 additions & 62 deletions docs_user/modules/proc_adopting-autoscaling.adoc
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[id="adopting-autoscaling_{context}"]

= Adopting autoscaling
= Adopting Autoscaling services

Adopting autoscaling means that an existing `OpenStackControlPlane` custom resource (CR), where Aodh services are supposed to be disabled, should be patched to start the service with the configuration parameters provided by the source environment.

Expand All @@ -20,87 +20,65 @@ should be already adopted.
. Patch the `OpenStackControlPlane` CR to deploy autoscaling services:
+
----
cat << EOF > aodh_patch.yaml
oc patch openstackcontrolplane openstack --type=merge --patch '
spec:
autoscaling:
telemetry:
enabled: true
prometheus:
deployPrometheus: false
aodh:
customServiceConfig: |
[DEFAULT]
debug=true
secret: osp-secret
ifeval::["{build}" != "downstream"]
apiImage: "quay.io/podified-antelope-centos9/openstack-aodh-api:current-podified"
evaluatorImage: "quay.io/podified-antelope-centos9/openstack-aodh-evaluator:current-podified"
notifierImage: "quay.io/podified-antelope-centos9/openstack-aodh-notifier:current-podified"
listenerImage: "quay.io/podified-antelope-centos9/openstack-aodh-listener:current-podified"
endif::[]
ifeval::["{build}" == "downstream"]
apiImage: "registry.redhat.io/rhosp-dev-preview/openstack-aodh-api-rhel9:18.0"
evaluatorImage: "registry.redhat.io/rhosp-dev-preview/openstack-aodh-evaluator-rhel9:18.0"
notifierImage: "registry.redhat.io/rhosp-dev-preview/openstack-aodh-notifier-rhel9:18.0"
listenerImage: "registry.redhat.io/rhosp-dev-preview/openstack-aodh-listener-rhel9:18.0"
endif::[]
passwordSelectors:
databaseAccount: aodh
databaseInstance: openstack
memcachedInstance: memcached
EOF
----

. Optional: If you have previously backed up your {OpenStackShort} services configuration file from the old environment, you can use os-diff to compare and make sure the configuration is correct. This will producre the difference between both ini configuration files:
+
----
os-diff diff /tmp/collect_tripleo_configs/aodh/etc/aodh/aodh.conf aodh_patch.yaml --crd
----
+
For more information, see xref:reviewing-the-openstack-control-plane-configuration_{context}[Reviewing the {rhos_prev_long} control plane configuration].

. Patch the `OpenStackControlPlane` CR to deploy Aodh services:
+
----
oc patch openstackcontrolplane openstack --type=merge --patch-file aodh_patch.yaml
template:
autoscaling:
enabled: true
aodh:
passwordSelector:
aodhService: AodhPassword
databaseAccount: aodh
databaseInstance: openstack
secret: osp-secret
serviceUser: aodh
heatInstance: heat
'
----

.Verification

. If autoscaling services are enabled, inspect Aodh pods:
+
----
AODH_POD=`oc get pods -l service=aodh | tail -n 1 | cut -f 1 -d' '`
AODH_POD=`oc get pods -l service=aodh -n openstack | tail -n 1 | cut -f 1 -d' '`
oc exec -t $AODH_POD -c aodh-api -- cat /etc/aodh/aodh.conf
----

. Check whether Aodh API service is registered in {identity_service}:
+
----
openstack endpoint list | grep aodh
| 6a805bd6c9f54658ad2f24e5a0ae0ab6 | regionOne | aodh | network | True | public | http://aodh-public-openstack.apps-crc.testing |
| b943243e596847a9a317c8ce1800fa98 | regionOne | aodh | network | True | internal | http://aodh-internal.openstack.svc:9696 |
| f97f2b8f7559476bb7a5eafe3d33cee7 | regionOne | aodh | network | True | admin | http://192.168.122.99:9696 |
| d05d120153cd4f9b8310ac396b572926 | regionOne | aodh | alarming | True | internal | http://aodh-internal.openstack.svc:8042 |
| d6daee0183494d7a9a5faee681c79046 | regionOne | aodh | alarming | True | public | http://aodh-public.openstack.svc:8042 |
----

. Create sample resources. You can test whether you can create alarms:
.Autoscaling template adoption

* `PrometheusAlarm` alarm type must be used instead of `GnocchiAggregationByResourcesAlarm`
yadneshk marked this conversation as resolved.
Show resolved Hide resolved

* Create Aodh alarms of type prometheus
+
----
openstack alarm create \
--name low_alarm \
--type gnocchi_resources_threshold \
--metric cpu \
--resource-id b7ac84e4-b5ca-4f9e-a15c-ece7aaf68987 \
--threshold 35000000000 \
--comparison-operator lt \
--aggregation-method rate:mean \
--granularity 300 \
openstack alarm create --name high_cpu_alarm \
--type prometheus \
--query "(rate(ceilometer_cpu{resource_name=~'cirros'})) * 100" \
--alarm-action 'log://' \
--granularity 15 \
--evaluation-periods 3 \
--alarm-action 'log:\\' \
--ok-action 'log:\\' \
--resource-type instance
--comparison-operator gt \
--threshold 7000000000
----

//=== (TODO)

//* Include adopted autoscaling heat templates
//* Include adopted Aodh alarm create commands of type prometheus
* Verify the state of alarm
+
----
openstack alarm list
+--------------------------------------+------------+------------------+-------------------+----------+
| alarm_id | type | name | state | severity | enabled |
+--------------------------------------+------------+------------------+-------------------+----------+
| 209dc2e9-f9d6-40e5-aecc-e767ce50e9c0 | prometheus | prometheus_alarm | ok | low | True |
+--------------------------------------+------------+------------------+-------------------+----------+
----
Original file line number Diff line number Diff line change
Expand Up @@ -261,6 +261,7 @@ spec:
- nova
- ovn
- neutron-metadata
- telemetry
env:
- name: ANSIBLE_CALLBACKS_ENABLED
value: "profile_tasks"
Expand Down Expand Up @@ -426,6 +427,7 @@ spec:
- nova
- ovn
- neutron-metadata
- telemetry
nodeTemplate:
extraMounts:
- extraVolType: Ceph
Expand Down
149 changes: 93 additions & 56 deletions docs_user/modules/proc_adopting-telemetry-services.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -14,93 +14,130 @@ This guide also assumes that:
* Previous Adoption steps completed. MariaDB, the {identity_service_first_ref} and the data plane should be already adopted.
//kgilliga:Should this procedure be moved after the "Adopting the data plane" chapter?

.Procedure

. Patch the `OpenStackControlPlane` CR to deploy Ceilometer services:
// TODO(jistr): There are still some quay.io images in the downstream build.
* Patch the `OpenStackControlPlane` CR to deploy `cluster-observability-operator`:
+
----
cat << EOF > ceilometer_patch.yaml
oc create -f - <<EOF
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
name: cluster-observability-operator
namespace: openshift-operators
spec:
ceilometer:
enabled: true
template:
ifeval::["{build}" != "downstream"]
centralImage: quay.io/podified-antelope-centos9/openstack-ceilometer-central:current-podified
computeImage: quay.io/podified-antelope-centos9/openstack-ceilometer-compute:current-podified
customServiceConfig: |
[DEFAULT]
debug=true
ipmiImage: quay.io/podified-antelope-centos9/openstack-ceilometer-ipmi:current-podified
nodeExporterImage: quay.io/prometheus/node-exporter:v1.5.0
notificationImage: quay.io/podified-antelope-centos9/openstack-ceilometer-notification:current-podified
secret: osp-secret
sgCoreImage: quay.io/infrawatch/sg-core:v5.1.1
endif::[]
ifeval::["{build}" == "downstream"]
centralImage: registry.redhat.io/rhosp-dev-preview/openstack-ceilometer-central-rhel9:18.0
computeImage: registry.redhat.io/rhosp-dev-preview/openstack-ceilometer-compute-rhel9:18.0
customServiceConfig: |
[DEFAULT]
debug=true
ipmiImage: registry.redhat.io/rhosp-dev-preview/openstack-ceilometer-ipmi-rhel9:18.0
nodeExporterImage: quay.io/prometheus/node-exporter:v1.5.0
notificationImage: registry.redhat.io/rhosp-dev-preview/openstack-ceilometer-notification-rhel9:18.0
secret: osp-secret
sgCoreImage: quay.io/infrawatch/sg-core:v5.1.1
endif::[]
channel: development
installPlanApproval: Automatic
name: cluster-observability-operator
source: redhat-operators
sourceNamespace: openshift-marketplace
EOF
----

. Optional: If you previously backed up your {OpenStackShort} services configuration file from the old environment, you can use os-diff to compare and make sure the configuration is correct. This will produce the difference between both ini configuration files:
* Wait for the installation to succeed
+
----
os-diff diff /tmp/collect_tripleo_configs/ceilometer/etc/ceilometer/ceilometer.conf ceilometer_patch.yaml --crd
oc wait --for jsonpath="{.status.phase}"=Succeeded csv --namespace=openshift-operators -l operators.coreos.com/cluster-observability-operator.openshift-operators
----

* Enable metrics storage backend
+
For more information, see xref:reviewing-the-openstack-control-plane-configuration_{context}[Reviewing the {rhos_prev_long} control plane configuration].
----
oc patch openstackcontrolplane openstack --type=merge --patch '
spec:
telemetry:
template:
metricStorage:
enabled: true
monitoringStack:
alertingEnabled: true
scrapeInterval: 30s
storage:
strategy: persistent
retention: 24h
persistent:
pvcStorageRequest: 20G
'
----

* Verify that `alertmanager` and `prometheus` pods are available
+
----
oc get pods -l alertmanager=metric-storage -n openstack
NAME READY STATUS RESTARTS AGE
alertmanager-metric-storage-0 2/2 Running 0 46s
alertmanager-metric-storage-1 2/2 Running 0 46s

oc get pods -l prometheus=metric-storage -n openstack
NAME READY STATUS RESTARTS AGE
prometheus-metric-storage-0 3/3 Running 0 46s
----

. Patch the `OpenStackControlPlane` CR to deploy Ceilometer services:
.Procedure

* Patch the `OpenStackControlPlane` CR to deploy Ceilometer services:
+
----
oc patch openstackcontrolplane openstack --type=merge --patch-file ceilometer_patch.yaml
oc patch openstackcontrolplane openstack --type=merge --patch '
spec:
telemetry:
enabled: true
template:
ceilometer:
passwordSelector:
ceilometerService: CeilometerPassword
enabled: true
secret: osp-secret
serviceUser: ceilometer
'
----

.Verification

. Inspect the resulting Ceilometer pods:
+
----
CEILOMETETR_POD=`oc get pods -l service=ceilometer | tail -n 1 | cut -f 1 -d' '`
CEILOMETETR_POD=`oc get pods -l service=ceilometer -n openstack | tail -n 1 | cut -f 1 -d' '`
oc exec -t $CEILOMETETR_POD -c ceilometer-central-agent -- cat /etc/ceilometer/ceilometer.conf
----

. Inspect the resulting Ceilometer IPMI agent pod on data plane nodes:
. Inspect enabled pollsters:
+
----
podman ps | grep ceilometer-ipmi
oc get secret ceilometer-config-data -o jsonpath="{.data['polling\.yaml\.j2']}" | base64 -d
----

. Inspect enabled pollsters:
. Optional: Override default pollsters according to requirements:
+
----
oc get secret ceilometer-config-data -o jsonpath="{.data['polling\.yaml']}" | base64 -d
oc patch openstackcontrolplane controlplane --type=merge --patch '
spec:
telemetry:
template:
ceilometer:
defaultConfigOverwrite:
polling.yaml.j2: |
---
sources:
- name: pollsters
interval: 100
meters:
- volume.*
- image.size
enabled: true
secret: osp-secret
'
----

. Enable pollsters according to requirements:
* Patch the `OpenStackControlPlane` CR to include `logging`
+
----
cat << EOF > polling.yaml
---
sources:
- name: pollsters
interval: 300
meters:
- volume.size
- image.size
- cpu
- memory
EOF

oc patch secret ceilometer-config-data --patch="{\"data\": { \"polling.yaml\": \"$(base64 -w0 polling.yaml)\"}}"
oc patch openstackcontrolplane openstack --type=merge --patch '
spec:
telemetry:
template:
logging:
enabled: false
ipaddr: 172.17.0.80
port: 10514
cloNamespace: openshift-logging
'
----
2 changes: 0 additions & 2 deletions docs_user/modules/proc_deploying-backend-services.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,6 @@ For example, in developer environments with {OpenStackPreviousInstaller} Standal
----
AODH_PASSWORD=$(cat ~/tripleo-standalone-passwords.yaml | grep ' AodhPassword:' | awk -F ': ' '{ print $2; }')
BARBICAN_PASSWORD=$(cat ~/tripleo-standalone-passwords.yaml | grep ' BarbicanPassword:' | awk -F ': ' '{ print $2; }')
CEILOMETER_METERING_SECRET=$(cat ~/tripleo-standalone-passwords.yaml | grep ' CeilometerMeteringSecret:' | awk -F ': ' '{ print $2; }')
CEILOMETER_PASSWORD=$(cat ~/tripleo-standalone-passwords.yaml | grep ' CeilometerPassword:' | awk -F ': ' '{ print $2; }')
CINDER_PASSWORD=$(cat ~/tripleo-standalone-passwords.yaml | grep ' CinderPassword:' | awk -F ': ' '{ print $2; }')
GLANCE_PASSWORD=$(cat ~/tripleo-standalone-passwords.yaml | grep ' GlancePassword:' | awk -F ': ' '{ print $2; }')
Expand Down Expand Up @@ -105,7 +104,6 @@ account passwords from the original deployment:
----
$ oc set data secret/osp-secret "AodhPassword=$AODH_PASSWORD"
$ oc set data secret/osp-secret "BarbicanPassword=$BARBICAN_PASSWORD"
$ oc set data secret/osp-secret "CeilometerMeteringSecret=$CEILOMETER_METERING_SECRET"
$ oc set data secret/osp-secret "CeilometerPassword=$CEILOMETER_PASSWORD"
$ oc set data secret/osp-secret "CinderPassword=$CINDER_PASSWORD"
$ oc set data secret/osp-secret "GlancePassword=$GLANCE_PASSWORD"
Expand Down
13 changes: 12 additions & 1 deletion docs_user/modules/proc_stopping-openstack-services.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,14 @@ sudo pcs constraint remove order-ceph-nfs-openstack-manila-share-Optional
+
----
# Update the services list to be stopped
ServicesToStop=("tripleo_horizon.service"
ServicesToStop=("tripleo_aodh_api.service"
"tripleo_aodh_api_cron.service"
"tripleo_aodh_evaluator.service"
"tripleo_aodh_listener.service"
"tripleo_aodh_notifier.service"
"tripleo_ceilometer_agent_central.service"
"tripleo_ceilometer_agent_notification.service"
"tripleo_horizon.service"
"tripleo_keystone.service"
"tripleo_barbican_api.service"
"tripleo_barbican_worker.service"
Expand All @@ -67,7 +74,11 @@ ServicesToStop=("tripleo_horizon.service"
"tripleo_cinder_scheduler.service"
"tripleo_cinder_volume.service"
"tripleo_cinder_backup.service"
"tripleo_collectd.service"
"tripleo_glance_api.service"
"tripleo_gnocchi_api.service"
"tripleo_gnocchi_metricd.service"
"tripleo_gnocchi_statsd.service"
"tripleo_manila_api.service"
"tripleo_manila_api_cron.service"
"tripleo_manila_scheduler.service"
Expand Down
6 changes: 6 additions & 0 deletions tests/playbooks/test_minimal.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,12 @@
- role: heat_adoption
tags:
- heat_adoption
- role: telemetry_adoption
tags:
- telemetry_adoption
- role: autoscaling_adoption
tags:
- autoscaling_adoption
- role: stop_remaining_services
tags:
- stop_remaining_services
Expand Down
Loading
Loading