-
Notifications
You must be signed in to change notification settings - Fork 450
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create ServiceMonitors to gather metrics from the OTEL Operands #1874
Conversation
…Telemetry Collector instances Signed-off-by: Israel Blancas <[email protected]>
…or into feature/1768
Signed-off-by: Israel Blancas <[email protected]>
Signed-off-by: Israel Blancas <[email protected]>
…or into feature/1768
Signed-off-by: Israel Blancas <[email protected]>
|
||
func desiredServiceMonitors(_ context.Context, params Params) []monitoringv1.ServiceMonitor { | ||
col := params.Instance | ||
return []monitoringv1.ServiceMonitor{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't love the idea of the OpenTelemetry Operator being able to create objects that depend on other CRDs being preset, it feels like an opportunity for our users to get into weird situations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, I think this opens up some issues if we make the assumption that every operator user wants to use service monitors. This is a good topic for our next SIG meeting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this also creates a direct dependency on the prometheus operator from ours which concerns me. Right now we have a dependency in the TA which makes sense as it is the component pulling down that functionality, but I worry what happens with this added dependency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't love the idea of the OpenTelemetry Operator being able to create objects that depend on other CRDs being preset
From the conversations we had during the 25 May SIG meeting, I understood it was OK to implement the proposal described in #1768.
it feels like an opportunity for our users to get into weird situations.
What situations? I would be happy to check them and apply a fix if needed.
I agree, I think this opens up some issues if we make the assumption that every operator user wants to use service monitors.
I don't think this will be a problem since everything is protected with a feature flag. If the feature flag is not enabled, the Service Monitors will not be created. Also, you need to set to true the feature in the OpenTelemetry Collector instance.
This is a good topic for our next SIG meeting.
I'll not be able to attend since I'll be on vacation next week. But I'll be happy to talk about it in another moment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have discussed this on the SIG meeting that the operator will create Service/Pod monitors for operands with the assumption that the functionality will be put behind a feature flag. The implementation has to be flexible to allow OTLP in the future (once the collector will report OTLP metrics).
This work is needed to move the OTEL operator to operator level 4 https://operatorhub.io/operator/opentelemetry-operator. There is a lot of other operators that do the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah okay, yeah sorry checked my notes and saw this now. Proceed :) will review now.
…rator into feature/1768
…or into feature/1768
@@ -345,6 +352,26 @@ type AutoscalerSpec struct { | |||
TargetMemoryUtilization *int32 `json:"targetMemoryUtilization,omitempty"` | |||
} | |||
|
|||
// MetricsConfigSpec defines a metrics config. | |||
type MetricsConfigSpec struct { | |||
// CreateServiceMonitors specifies if ServiceMonitors should be created for the OpenTelemetry Collector. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Document here, how this feature is enabled
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The approach here makes sense, could you also add an e2e test please?
@@ -71,6 +73,7 @@ func init() { | |||
|
|||
utilruntime.Must(otelv1alpha1.AddToScheme(scheme)) | |||
utilruntime.Must(routev1.AddToScheme(scheme)) | |||
utilruntime.Must(monitoringv1.AddToScheme(scheme)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we also put this behind the flag?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When this code is executed, the flag value is unknown.
…rator into feature/1768
…or into feature/1768
Signed-off-by: Israel Blancas <[email protected]>
Signed-off-by: Israel Blancas <[email protected]>
I'm trying to fix the problems associated to the E2E tests |
tests/e2e/prometheus-config-validation/03-promreceiver-nopromconfig.yaml
Outdated
Show resolved
Hide resolved
Signed-off-by: Israel Blancas <[email protected]>
Signed-off-by: Israel Blancas <[email protected]>
…or into feature/1768
…or into feature/1768
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: iblancasa, jaronoff97 The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@iblancasa: The following test failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
@TylerHelmuth mind giving one more review? |
@jaronoff97 I am still concerned about #1874 (comment). It is bad that users can't modify their metrics endpoint today, I am hesitant to add on more functionality dependent on the metrics endpoint without the flexibility to configure it. While I recognize we could release this first and add support for configuration later, it feels like an implementation that drives the Operator to Level 4 should also allow the users to configure the managed object (in this instance the Collector) however they need. |
I see, in that case let's block merging this until we have #1931 closed |
pkg/featuregate/featuregate.go
Outdated
PrometheusOperatorIsAvailable = featuregate.GlobalRegistry().MustRegister( | ||
"operator.observability.prometheus", | ||
featuregate.StageAlpha, | ||
featuregate.WithRegisterDescription("enables features associated to the Prometheus Operator")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a featuregate.FromVersion
that marks the Operator version this feature gate was added. (This was a rather late concept in the featuregate package and we should go back and fix our other gates to include it)
Even without merging this PR, the problem is still there because the service created by the operator will be pointing to a port that does nothing. I'll work on #1931 but, TBH, I don't think there is a real need to block this PR until #1931 is merged. |
…or into feature/1768
Signed-off-by: Israel Blancas <[email protected]>
Correct. I my opinion it feels bad to add more complex logic to an inflexible configuration. |
…-telemetry#1874) * Allow the creation of ServiceMonitors to gather metrics from the OpenTelemetry Collector instances Signed-off-by: Israel Blancas <[email protected]> * Add missing changelog Signed-off-by: Israel Blancas <[email protected]> * Fix unprotected statement Signed-off-by: Israel Blancas <[email protected]> * Fix lint issues Signed-off-by: Israel Blancas <[email protected]> * Apply changes requested in code review Signed-off-by: Israel Blancas <[email protected]> * Add missing generated files Signed-off-by: Israel Blancas <[email protected]> * Change the way to enable the feature flag Signed-off-by: Israel Blancas <[email protected]> * Change the way to enable the feature flag Signed-off-by: Israel Blancas <[email protected]> * Fix merge * Fix enable feature flag * Change the name of the option and move the E2E tests to their own folder * Fix unit test * Fix docs * Fix CRD field * Fix CRD field * Add from version to feature gate Signed-off-by: Israel Blancas <[email protected]> * Move the E2E tests to their own section for the CI Signed-off-by: Israel Blancas <[email protected]> --------- Signed-off-by: Israel Blancas <[email protected]>
Resolves #1768