Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Group SyncOperator: PrometheusOperatorRejectedResources #312

Open
rbaumgar opened this issue Apr 12, 2024 · 5 comments
Open

Group SyncOperator: PrometheusOperatorRejectedResources #312

rbaumgar opened this issue Apr 12, 2024 · 5 comments

Comments

@rbaumgar
Copy link
Contributor

Looks like the same problem as https://access.redhat.com/solutions/6992399 or GitOps, Tempo Operator...

Fix:
oc label ns group-sync-operator openshift.io/cluster-monitoring=true

@sabre1041
Copy link
Collaborator

This is currently documented in the README.md

@rbaumgar
Copy link
Contributor Author

correct, I missed it.
But when the operator automatically creates the servicemonitor, the label should be applied to the namespace.

@sabre1041
Copy link
Collaborator

correct, I missed it. But when the operator automatically creates the servicemonitor, the label should be applied to the namespace.

The operator does not create the namespace, the user does prior/at the same time as installing the operator

@rbaumgar
Copy link
Contributor Author

When the operator automatically installs the service monitor, the label should be applied. Or at least it should be documented as post install task.
Or don't apply service monitor automatically during installation and document that when you require monitoring, apply service monitor and set label.

@ocpvkb
Copy link

ocpvkb commented Jul 26, 2024

Unfortunately, with the adjustments to the ServiceMonitor object from version 0.0.29, the metrics functionality is unusable.

For the 1# UseCase "Openshift Platform Monitoring" using the label "openshift.io/cluster-monitoring="true" the following configuration is required on the ServiceMonitor:

apiVersion: monitoring.coreos.com/v1 kind: ServiceMonitor .. spec: endpoints: - tlsConfig: caFile: /etc/prometheus/configmaps/serving-certs-ca-bundle/service-ca.crt ...

Otherwise, the Openshift platform Prometheus operator will not be able to establish a trust relationship and will discard the target.
The Openshift platform Prometheus correctly has no access to the explicit secret in the groupSync operator namespace of the current configuration (0.0.30).
Log:
caller=resource_selector.go:174 component=prometheusoperator msg="skipping servicemonitor" error="failed to get CA: unable to get secret \"group-sync-operator-certs\": secrets \"group-sync-operator-certs\" not found" servicemonitor=group-sync-operator/group-sync-operator-controller-manager-metrics-monitor namespace=openshift-monitoring prometheus=k8s


For the 2# use case “Openshift User-Workload Monitoring” without the label “openshift.io/cluster-monitoring="true" the following configuration is not permitted on the ServiceMonitor:
apiVersion: monitoring.coreos.com/v1 kind: ServiceMonitor .. spec: endpoints: - bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token ...

The Openshift User-Workload Prometheus correctly prohibits accesseing file system via bearer token file of the current configuration (0.0.30).
Log:
resource_selector.go:174 component=prometheusoperator msg="skipping servicemonitor" error="it accesses file system via bearer token file which Prometheus specification prohibits" servicemonitor=group-sync-operator/group-sync-operator-controller-manager-metrics-monitor namespace=openshift-user-workload-monitoring prometheus=user-workload

To Use Openshift User-Workload Prometheus with bearerToken you have to use a Secret like this:
apiVersion: monitoring.coreos.com/v1 kind: ServiceMonitor .. spec: endpoints: - path: /metrics port: https scheme: https # The secret exists in the same namespace as this service monitor and accessible by the *Prometheus Operator*. bearerTokenSecret: name: host-operator-prometheus-user-workload key: token


So both ways are currently not possible.

Unfortunately, I also have to agree with the following issues:
#311
#312

I see the simplest option as not having the operator create the ServiceMonitor. This means that each use case can be mapped individually. To do this, simply adapt the documentation with recommendations. BTW, this is also the current procedure at RedHat. :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants