-
Notifications
You must be signed in to change notification settings - Fork 207
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Monitoring for ironic-prometheus-exporter #340
Monitoring for ironic-prometheus-exporter #340
Conversation
@openshift/sig-monitoring |
I don’t actually see the service or daemonset for this exporter so it’s hard to say if this is correct. Generally speaking it looks ok, but I’d like to avoid having servicemonitors that don’t actually discover/select anything. |
This will need to be reworked so that it's only applied when the platform is of type "baremetal". I think it would make sense to include this as part of #302, since that's the PR that would deploy the ironic promethus exporter as well. |
Yes it would be easier to verify this if it included the deployment of the exporter. |
Agreed, this should either be rolled into #302 or this PR can be rebased onto that one when ready. |
b8d1ced
to
c2aa4fc
Compare
c2aa4fc
to
e25d72e
Compare
I would like some reviews to see if this is ok even if it will depends on PR #302 =) |
install/0000_30_machine-api-operator_15_ironic-exporter-prometheusrules.yaml
Outdated
Show resolved
Hide resolved
install/0000_30_machine-api-operator_15_ironic-exporter-prometheusrules.yaml
Outdated
Show resolved
Hide resolved
130867c
to
b10f7c0
Compare
install/0000_90_machine-api-operator_03_ironic-exporter-service.yaml
Outdated
Show resolved
Hide resolved
install/0000_90_machine-api-operator_04_ironic-exporter-servicemonitor.yaml
Outdated
Show resolved
Hide resolved
b10f7c0
to
bbd25b6
Compare
Since we had to switch to an older version of ironic that doesn't contain the exporter, ref #380 this should probably be abandoned until we can consume a newer ironic version? |
@hardys Is the version of Ironic we are using now in deploying "metal3" appropriate to restart this work? |
@sadasu I'm re-working this PR, the version on metal3 is fine but for openshift we will first need openshift/ironic-image#27 and also an update on BMO to be aware of the |
bbd25b6
to
d56877f
Compare
@sadasu can you take another look at this please - in particular I'd like to confirm how we can ensure the Service/ServiceMonitor etc only get created for the platform: baremetal case where we expect the metal3 pod part to be enabled? |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
apiVersion: monitoring.coreos.com/v1 | ||
kind: Service | ||
metadata: | ||
name: metal3-baremetalhost-controller |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure what the "name" here should be. But, "metal3-baremetalhost-controller" is the name of the baremetal_operator controller. See https://github.com/metal3-io/baremetal-operator/blob/master/pkg/controller/baremetalhost/baremetalhost_controller.go#L105
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name we are using for the deployment is "metal3". See https://github.com/openshift/machine-api-operator/blob/master/pkg/operator/baremetal_pod.go#L119.
metadata: | ||
labels: | ||
app: ironic-exporter | ||
name: metal3-baremetalhost-controller |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
apiVersion: monitoring.coreos.com/v1 | ||
kind: PrometheusRule | ||
metadata: | ||
name: metal3-baremetalhost-controller |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The .yaml files added in the /install dir will be installed by the CVO for all platforms. What would the behavior be when the ironic-prometheus-exporter container is not running for non "BareMetal" platform types? |
You also need to add the ironic-prometheus-exporter image to https://github.com/openshift/machine-api-operator/blob/master/install/image-references and https://github.com/openshift/machine-api-operator/blob/master/pkg/operator/fixtures/images.json. |
The exporter doesn't need a new image AFAIK, we will be using the ironic image (https://github.com/openshift/ironic-image) , we only need to add the container for the exporter. |
AFAICT creating the service, service monitor and prometheus rule with CVO shouldn't be an issue for non-baremetal platforms as the service monitor would match 0 endpoints and the Prometheus rule would generate 0 series. That being said, a better option would be to have the machine API operator manage those resources as it knows whether or not the Ironic exporter is deployed. |
0155292
to
5214e4b
Compare
- alert: HighCPUTemperature | ||
annotations: | ||
summary: "The baremetal node {{ $labels.node_name }} CPU {{ $labels.entity_id }} is too high" | ||
description: "The baremetal node {{ $labels.node_name }} CPU {{ $labels.entity_id }} was too high in the past 5 minutes. Last measurement {{ $value }}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wouldn't this trigger for non baremetal envs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should porbably rename syncBaremetalControllers
to something more generic e.g syncBaremetal
Then syncBaremetal
calls syncBaremetalControllers
and syncBaremetalMonitoring
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@enxebre I don't think it would trigger in non baremetal envs, check @simonpasquier comment.
The sync part you are talking would be if we make the yamls go code right?
This commit adds the changes that will allow Prometheus to collect data from the ironic-prometheus-exporter[1] that runs in the ironic-image [2]. - Added Service for the ironic-prometheus-exporter with same run level of the openshift-machine-api. - Added the ServiceMonitor for the ironic-prometheus-exporter - Added PrometheusRule with alerts for baremetal_temp_celsius metric. - Added the ironic-exporter container Note: Using the run level 90 to ServiceMonitor and PrometheusRule to ensure that the Service and the Prometheus Role and RoleBinding have been applied. [1] https://github.com/metal3-io/ironic-prometheus-exporter [2] https://github.com/metal3-io/ironic-image
5214e4b
to
cdfe1f8
Compare
@iurygregory: The following tests failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
Does anyone know people from the CVO that can take a look at the PR to see if this approach (using the if in the yamls is valid?) |
What's that coming from Can we please move any baremetal specific under https://github.com/openshift/machine-api-operator/blob/master/pkg/operator/sync.go#L97? Any monitoring related resource could be managed there, you could rename |
Hi @enxebre , I was talking with some people from kuryr and they pointed me to https://github.com/openshift/cluster-network-operator/blob/master/bindata/network/kuryr/008-admission.secret.yaml#L1 (not sure if MAO has support for that, but the conditions would ensure that the yamls would only apply when it's true). Is there any other directory where I can place the yamls and make a call in syncBaremetalControllers to apply? |
That's using bindata to compile and embed the yamls into static golang code that you can template at runtime as you wish. You could follow the same approach here. e.g Or alternatively you can define your resources as API golang types e.g https://github.com/openshift/cluster-autoscaler-operator/blob/8e6f95038c9eee84ef7e305e2e1f4960c918b30d/pkg/controller/clusterautoscaler/monitoring.go |
Issues go stale after 90d of inactivity. Mark the issue as fresh by commenting If this issue is safe to close now please do so with /lifecycle stale |
@iurygregory: The following test failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
Stale issues rot after 30d of inactivity. Mark the issue as fresh by commenting If this issue is safe to close now please do so with /lifecycle rotten |
Rotten issues close after 30d of inactivity. Reopen the issue by commenting /close |
@openshift-bot: Closed this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
This PR depends on #302 and adds the changes to enable monitoring
in the machine-api-operator that will allow Prometheus
to collect data from the ironic-prometheus-exporter[1] that
runs in the ironic-image [2].
metric.
[1] https://github.com/metal3-io/ironic-prometheus-exporter
[2] https://github.com/metal3-io/ironic-image