-
Notifications
You must be signed in to change notification settings - Fork 451
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add feature flag for target allocator config addition #1688
Conversation
15ac0ce
to
b3a56ab
Compare
pkg/featuregate/featuregate.go
Outdated
// EnableTargetAllocatorRewrite is the feature gate that controls whether the collector's configuration should | ||
// automatically be rewritten when the target allocator is enabled. | ||
EnableTargetAllocatorRewrite = featuregate.GlobalRegistry().MustRegister( | ||
"operator.enableTargetAllocatorRewrite", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sticking with the hierarchical pattern, how do you feel about operator.targetallocator.rewrite
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes it sound like this changes target allocator behaviour, but in reality we're changing how the operator rewrites prometheusreceiver config. So maybe something like operator.prometheusreceiver.rewritetargetallocator
? Kind of a mouthful, but more descriptive imo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the flag ultimately affects the collector how about operator.collector.rewritetargetallocator
. I think that would scale well with any other flags that mess with collector configuration/capabilities.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that sounds better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed.
scrape_interval: 1m | ||
scrape_timeout: 10s | ||
evaluation_interval: 1m | ||
target_allocator: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we have an explicit case for when the flag is disabled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's disabled in previous tests, but only implicitly. Think it's worth disabling it explicitly there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmmm... I think it's alright then, when the flag is moved to beta we'll just do the reverse.
|
||
// EnableTargetAllocatorRewrite is the feature gate that controls whether the collector's configuration should | ||
// automatically be rewritten when the target allocator is enabled. | ||
EnableTargetAllocatorRewrite = featuregate.GlobalRegistry().MustRegister( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you also update the README with this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@@ -28,7 +28,7 @@ func errorNotAMap(component string) error { | |||
return fmt.Errorf("%s property in the configuration doesn't contain valid %s", component, component) | |||
} | |||
|
|||
// ConfigToPromConfig converts the incoming configuration object into a the Prometheus receiver config. | |||
// ConfigToPromConfig converts the incoming configuration object into the Prometheus receiver config. | |||
func ConfigToPromConfig(cfg string) (map[interface{}]interface{}, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is technically a breaking change, could we mark that in the chloggen
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'd rather properly fix this check, as it's currently incorrect. Right now, it requires that the config
property exists, but in reality we should check if one of the following is true:
config
existstarget_allocator
exists and target allocator is enabled on the resource- target allocator is enabled on the resource and the feature flag is enabled
Then this is really a fix, not a breaking change, and the actual breaking change will happen when the flag is enabled by default. How's that sound?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, technically the comment on the function doesn't match what the code is actually doing, but regardless anyone potentially using this function and expecting it to return the prometheus config blob would be broken by this change. I think i'm okay to call this a 'fix', as long as we call that out in the changelog's subtext – in case someone is using this for some reason they should be able to see the difference in the release notes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now I just added a note to the changelog about the function's semantics changing. For the validation, I took a stab at adding it, and it's a large enough change that I think it should go into a separate PR. You reckon we can merge this as is, and do validation afterwards? Can also go the other way around, the validation changes are mostly independent of this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One more Q: were you able to test this with a real cluster and CRD to be sure it works as expected E2E?
scrape_interval: 1m | ||
scrape_timeout: 10s | ||
evaluation_interval: 1m | ||
target_allocator: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmmm... I think it's alright then, when the flag is moved to beta we'll just do the reverse.
CollectorID: "${POD_NAME}", | ||
} | ||
// we don't need the scrape configs here anymore with target allocator enabled | ||
cfg.PromConfig.ScrapeConfigs = []*promconfig.ScrapeConfig{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually think the collector may fail to startup if it doesn't have any scrape configs set, at least that was the case a few months ago...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh ha, nvm i fixed this a few months ago (link)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I actually tested this E2E, no issues.
@@ -28,7 +28,7 @@ func errorNotAMap(component string) error { | |||
return fmt.Errorf("%s property in the configuration doesn't contain valid %s", component, component) | |||
} | |||
|
|||
// ConfigToPromConfig converts the incoming configuration object into a the Prometheus receiver config. | |||
// ConfigToPromConfig converts the incoming configuration object into the Prometheus receiver config. | |||
func ConfigToPromConfig(cfg string) (map[interface{}]interface{}, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, technically the comment on the function doesn't match what the code is actually doing, but regardless anyone potentially using this function and expecting it to return the prometheus config blob would be broken by this change. I think i'm okay to call this a 'fix', as long as we call that out in the changelog's subtext – in case someone is using this for some reason they should be able to see the difference in the release notes.
5727d1d
to
80173ff
Compare
Yeah. For reference, I'm adding experimental support for replacing Prometheus with otel to our Helm Chart here: SumoLogic/sumologic-kubernetes-collection#2988, and I ran our integration tests against the changes in this PR. |
#### Target Allocator config rewriting | ||
|
||
Prometheus receiver now has explicit support for acquiring scrape targets from the target allocator. As such, it is now possible to have the | ||
Operator add the necessary target allocator configuration automatically. This feature currently requires the `operator.collector.rewritetargetallocator` feature flag to be enabled. With the flag enabled, the configuration from the previous section would be rendered as: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note for @TylerHelmuth in a follow up to this we should move the feature flag section to its own thing to include this.
…#1688) * add feature flag for allocator config rewrite * clear Prometheus scrape configs if allocator rewrite enabled * add changelog entry * rename flag to operator.collector.rewritetargetallocator * fix changelog entry * document the ewritetargetallocator flag --------- Co-authored-by: Jacob Aronoff <[email protected]>
These are changes from #1557, rebased on main and using the feature flag machinery from #1619.
One addition I've made is clearing the scrape configs in prometheus receiver itself. This doesn't actually change the semantics, but it makes it clearer what is actually going on. The change is in a separate commit here: 15ac0ce.
I've done some manual E2E testing on these changes and everything worked as expected.
Fixes #1581