Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add feature flag for target allocator config addition #1688

Merged
merged 6 commits into from
May 5, 2023

Conversation

swiatekm
Copy link
Contributor

@swiatekm swiatekm commented Apr 28, 2023

These are changes from #1557, rebased on main and using the feature flag machinery from #1619.

One addition I've made is clearing the scrape configs in prometheus receiver itself. This doesn't actually change the semantics, but it makes it clearer what is actually going on. The change is in a separate commit here: 15ac0ce.

I've done some manual E2E testing on these changes and everything worked as expected.

Fixes #1581

@swiatekm swiatekm force-pushed the otelallocator-flag branch from 15ac0ce to b3a56ab Compare April 28, 2023 09:42
@swiatekm swiatekm marked this pull request as ready for review April 28, 2023 10:09
@swiatekm swiatekm requested a review from a team April 28, 2023 10:09
// EnableTargetAllocatorRewrite is the feature gate that controls whether the collector's configuration should
// automatically be rewritten when the target allocator is enabled.
EnableTargetAllocatorRewrite = featuregate.GlobalRegistry().MustRegister(
"operator.enableTargetAllocatorRewrite",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sticking with the hierarchical pattern, how do you feel about operator.targetallocator.rewrite?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes it sound like this changes target allocator behaviour, but in reality we're changing how the operator rewrites prometheusreceiver config. So maybe something like operator.prometheusreceiver.rewritetargetallocator? Kind of a mouthful, but more descriptive imo.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the flag ultimately affects the collector how about operator.collector.rewritetargetallocator. I think that would scale well with any other flags that mess with collector configuration/capabilities.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that sounds better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed.

scrape_interval: 1m
scrape_timeout: 10s
evaluation_interval: 1m
target_allocator:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we have an explicit case for when the flag is disabled?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's disabled in previous tests, but only implicitly. Think it's worth disabling it explicitly there?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmmm... I think it's alright then, when the flag is moved to beta we'll just do the reverse.


// EnableTargetAllocatorRewrite is the feature gate that controls whether the collector's configuration should
// automatically be rewritten when the target allocator is enabled.
EnableTargetAllocatorRewrite = featuregate.GlobalRegistry().MustRegister(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also update the README with this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -28,7 +28,7 @@ func errorNotAMap(component string) error {
return fmt.Errorf("%s property in the configuration doesn't contain valid %s", component, component)
}

// ConfigToPromConfig converts the incoming configuration object into a the Prometheus receiver config.
// ConfigToPromConfig converts the incoming configuration object into the Prometheus receiver config.
func ConfigToPromConfig(cfg string) (map[interface{}]interface{}, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is technically a breaking change, could we mark that in the chloggen?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'd rather properly fix this check, as it's currently incorrect. Right now, it requires that the config property exists, but in reality we should check if one of the following is true:

  • config exists
  • target_allocator exists and target allocator is enabled on the resource
  • target allocator is enabled on the resource and the feature flag is enabled

Then this is really a fix, not a breaking change, and the actual breaking change will happen when the flag is enabled by default. How's that sound?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, technically the comment on the function doesn't match what the code is actually doing, but regardless anyone potentially using this function and expecting it to return the prometheus config blob would be broken by this change. I think i'm okay to call this a 'fix', as long as we call that out in the changelog's subtext – in case someone is using this for some reason they should be able to see the difference in the release notes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now I just added a note to the changelog about the function's semantics changing. For the validation, I took a stab at adding it, and it's a large enough change that I think it should go into a separate PR. You reckon we can merge this as is, and do validation afterwards? Can also go the other way around, the validation changes are mostly independent of this PR.

Copy link
Contributor

@jaronoff97 jaronoff97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more Q: were you able to test this with a real cluster and CRD to be sure it works as expected E2E?

scrape_interval: 1m
scrape_timeout: 10s
evaluation_interval: 1m
target_allocator:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmmm... I think it's alright then, when the flag is moved to beta we'll just do the reverse.

CollectorID: "${POD_NAME}",
}
// we don't need the scrape configs here anymore with target allocator enabled
cfg.PromConfig.ScrapeConfigs = []*promconfig.ScrapeConfig{}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually think the collector may fail to startup if it doesn't have any scrape configs set, at least that was the case a few months ago...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh ha, nvm i fixed this a few months ago (link)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I actually tested this E2E, no issues.

@@ -28,7 +28,7 @@ func errorNotAMap(component string) error {
return fmt.Errorf("%s property in the configuration doesn't contain valid %s", component, component)
}

// ConfigToPromConfig converts the incoming configuration object into a the Prometheus receiver config.
// ConfigToPromConfig converts the incoming configuration object into the Prometheus receiver config.
func ConfigToPromConfig(cfg string) (map[interface{}]interface{}, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, technically the comment on the function doesn't match what the code is actually doing, but regardless anyone potentially using this function and expecting it to return the prometheus config blob would be broken by this change. I think i'm okay to call this a 'fix', as long as we call that out in the changelog's subtext – in case someone is using this for some reason they should be able to see the difference in the release notes.

@swiatekm swiatekm requested a review from jaronoff97 May 5, 2023 16:56
@swiatekm swiatekm force-pushed the otelallocator-flag branch from 5727d1d to 80173ff Compare May 5, 2023 16:57
@swiatekm
Copy link
Contributor Author

swiatekm commented May 5, 2023

One more Q: were you able to test this with a real cluster and CRD to be sure it works as expected E2E?

Yeah. For reference, I'm adding experimental support for replacing Prometheus with otel to our Helm Chart here: SumoLogic/sumologic-kubernetes-collection#2988, and I ran our integration tests against the changes in this PR.

#### Target Allocator config rewriting

Prometheus receiver now has explicit support for acquiring scrape targets from the target allocator. As such, it is now possible to have the
Operator add the necessary target allocator configuration automatically. This feature currently requires the `operator.collector.rewritetargetallocator` feature flag to be enabled. With the flag enabled, the configuration from the previous section would be rendered as:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note for @TylerHelmuth in a follow up to this we should move the feature flag section to its own thing to include this.

@jaronoff97 jaronoff97 merged commit a74dc8e into open-telemetry:main May 5, 2023
@swiatekm swiatekm deleted the otelallocator-flag branch May 5, 2023 20:04
ItielOlenick pushed a commit to ItielOlenick/opentelemetry-operator that referenced this pull request May 1, 2024
…#1688)

* add feature flag for allocator config rewrite

* clear Prometheus scrape configs if allocator rewrite enabled

* add changelog entry

* rename flag to operator.collector.rewritetargetallocator

* fix changelog entry

* document the ewritetargetallocator flag

---------

Co-authored-by: Jacob Aronoff <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Automatically add target_allocator section to prometheus receiver config if allocator is enabled
3 participants