Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add securedMetricsPort #495

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

AleksZimin
Copy link

@AleksZimin AleksZimin commented Jun 29, 2023

If a Linstor controller has more than one replica, only one target will be up in Prometheus. This is because of the leader election mechanism for Linstor controllers. Once a leader is elected, other pods of Linstor controllers won't provide any metrics.

This PR addresses this issue when using a service monitor. When the securedMetricsPort parameter is set, the operator will add a metrics port to the Linstor controller service and Linstor controller endpoints.

Signed-off-by: Aleksandr Zimin <[email protected]>
@WanzenBug
Copy link
Member

Sorry for the late response.

If a Linstor controller has more than one replica, only one target will be up in Prometheus. This is because of the leader election mechanism for Linstor controllers. Once a leader is elected, other pods of Linstor controllers won't provide any metrics.

That part made sense to me. The way the ServiceMonitor resource works you would have the metrics for the non-leader controllers appear to be down (I guess they are down, technically).

This PR addresses this issue when using a service monitor. When the securedMetricsPort parameter is set, the operator will add a metrics port to the Linstor controller service and Linstor controller endpoints.

I don't quite get how this fixes the issue. We only configure a second port, but how does that help? Also, there is nothing "secured" about the port?

@AleksZimin
Copy link
Author

I don't quite get how this fixes the issue. We only configure a second port, but how does that help? Also, there is nothing "secured" about the port?

Apologies for the confusion. Let me provide more details on how this fix addresses the issue.

By replacing the PodMonitor with ServiceMonitor, we change the way Prometheus scrapes the metrics. With PodMonitor, Prometheus attempts to scrape metrics from every pod, which can lead to issues when multiple replicas of the LINSTOR controller are present.

When we switch to ServiceMonitor, Prometheus only scrapes the endpoints of the service itself, rather than each individual pod. This resolves the problem of having only one target up in Prometheus when multiple replicas are deployed.

Additionally, we mentioned "secured" in the context of using RBAC proxy to secure communications for collecting metrics. When RBAC proxy is in use, the metrics are provided through another port, which needs to be added to the endpoints of the LINSTOR controller service. By configuring the securedMetricsPort in the LinstorController custom resource, we ensure that Prometheus can scrape metrics from this secured port.

I hope this clarifies how the changes address the issue and the relevance of the "secured" aspect. Please let me know if you have any further questions or concerns.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants