Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create ServiceMonitors to gather metrics from the OTEL Operands #1874

Merged
merged 38 commits into from
Jul 25, 2023

Conversation

iblancasa
Copy link
Contributor

Resolves #1768

@iblancasa iblancasa marked this pull request as ready for review June 28, 2023 14:37
@iblancasa iblancasa requested a review from a team June 28, 2023 14:37

func desiredServiceMonitors(_ context.Context, params Params) []monitoringv1.ServiceMonitor {
col := params.Instance
return []monitoringv1.ServiceMonitor{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't love the idea of the OpenTelemetry Operator being able to create objects that depend on other CRDs being preset, it feels like an opportunity for our users to get into weird situations.

Copy link
Contributor

@jaronoff97 jaronoff97 Jun 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, I think this opens up some issues if we make the assumption that every operator user wants to use service monitors. This is a good topic for our next SIG meeting.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this also creates a direct dependency on the prometheus operator from ours which concerns me. Right now we have a dependency in the TA which makes sense as it is the component pulling down that functionality, but I worry what happens with this added dependency.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't love the idea of the OpenTelemetry Operator being able to create objects that depend on other CRDs being preset

From the conversations we had during the 25 May SIG meeting, I understood it was OK to implement the proposal described in #1768.

it feels like an opportunity for our users to get into weird situations.

What situations? I would be happy to check them and apply a fix if needed.

I agree, I think this opens up some issues if we make the assumption that every operator user wants to use service monitors.

I don't think this will be a problem since everything is protected with a feature flag. If the feature flag is not enabled, the Service Monitors will not be created. Also, you need to set to true the feature in the OpenTelemetry Collector instance.

This is a good topic for our next SIG meeting.

I'll not be able to attend since I'll be on vacation next week. But I'll be happy to talk about it in another moment.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have discussed this on the SIG meeting that the operator will create Service/Pod monitors for operands with the assumption that the functionality will be put behind a feature flag. The implementation has to be flexible to allow OTLP in the future (once the collector will report OTLP metrics).

This work is needed to move the OTEL operator to operator level 4 https://operatorhub.io/operator/opentelemetry-operator. There is a lot of other operators that do the same.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah okay, yeah sorry checked my notes and saw this now. Proceed :) will review now.

@@ -345,6 +352,26 @@ type AutoscalerSpec struct {
TargetMemoryUtilization *int32 `json:"targetMemoryUtilization,omitempty"`
}

// MetricsConfigSpec defines a metrics config.
type MetricsConfigSpec struct {
// CreateServiceMonitors specifies if ServiceMonitors should be created for the OpenTelemetry Collector.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Document here, how this feature is enabled

Copy link
Contributor

@jaronoff97 jaronoff97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The approach here makes sense, could you also add an e2e test please?

main.go Outdated Show resolved Hide resolved
@@ -71,6 +73,7 @@ func init() {

utilruntime.Must(otelv1alpha1.AddToScheme(scheme))
utilruntime.Must(routev1.AddToScheme(scheme))
utilruntime.Must(monitoringv1.AddToScheme(scheme))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also put this behind the flag?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When this code is executed, the flag value is unknown.

@iblancasa
Copy link
Contributor Author

I'm trying to fix the problems associated to the E2E tests

@openshift-ci
Copy link

openshift-ci bot commented Jul 12, 2023

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: iblancasa, jaronoff97

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci
Copy link

openshift-ci bot commented Jul 12, 2023

@iblancasa: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/opentelemetry-e2e-tests 17e4bb5 link true /test opentelemetry-e2e-tests

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@jaronoff97
Copy link
Contributor

@TylerHelmuth mind giving one more review?

@TylerHelmuth
Copy link
Member

@jaronoff97 I am still concerned about #1874 (comment). It is bad that users can't modify their metrics endpoint today, I am hesitant to add on more functionality dependent on the metrics endpoint without the flexibility to configure it. While I recognize we could release this first and add support for configuration later, it feels like an implementation that drives the Operator to Level 4 should also allow the users to configure the managed object (in this instance the Collector) however they need.

@jaronoff97
Copy link
Contributor

I see, in that case let's block merging this until we have #1931 closed

PrometheusOperatorIsAvailable = featuregate.GlobalRegistry().MustRegister(
"operator.observability.prometheus",
featuregate.StageAlpha,
featuregate.WithRegisterDescription("enables features associated to the Prometheus Operator"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a featuregate.FromVersion that marks the Operator version this feature gate was added. (This was a rather late concept in the featuregate package and we should go back and fix our other gates to include it)

@iblancasa
Copy link
Contributor Author

I see, in that case let's block merging this until we have #1931 closed

Even without merging this PR, the problem is still there because the service created by the operator will be pointing to a port that does nothing. I'll work on #1931 but, TBH, I don't think there is a real need to block this PR until #1931 is merged.

@TylerHelmuth
Copy link
Member

Even without merging this PR, the problem is still there because the service created by the operator will be pointing to a port that does nothing

Correct. I my opinion it feels bad to add more complex logic to an inflexible configuration.

@jaronoff97 jaronoff97 merged commit 42ff92f into open-telemetry:main Jul 25, 2023
@iblancasa iblancasa deleted the feature/1768 branch July 25, 2023 17:13
ItielOlenick pushed a commit to ItielOlenick/opentelemetry-operator that referenced this pull request May 1, 2024
…-telemetry#1874)

* Allow the creation of ServiceMonitors to gather metrics from the OpenTelemetry Collector instances

Signed-off-by: Israel Blancas <[email protected]>

* Add missing changelog

Signed-off-by: Israel Blancas <[email protected]>

* Fix unprotected statement

Signed-off-by: Israel Blancas <[email protected]>

* Fix lint issues

Signed-off-by: Israel Blancas <[email protected]>

* Apply changes requested in code review

Signed-off-by: Israel Blancas <[email protected]>

* Add missing generated files

Signed-off-by: Israel Blancas <[email protected]>

* Change the way to enable the feature flag

Signed-off-by: Israel Blancas <[email protected]>

* Change the way to enable the feature flag

Signed-off-by: Israel Blancas <[email protected]>

* Fix merge

* Fix enable feature flag

* Change the name of the option and move the E2E tests to their own folder

* Fix unit test

* Fix docs

* Fix CRD field

* Fix CRD field

* Add from version to feature gate

Signed-off-by: Israel Blancas <[email protected]>

* Move the E2E tests to their own section for the CI

Signed-off-by: Israel Blancas <[email protected]>

---------

Signed-off-by: Israel Blancas <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support collecting metrics from instances
4 participants