Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve config validation for prometheus receiver and target allocator #1729

Merged
merged 6 commits into from
May 24, 2023

Conversation

swiatekm
Copy link
Contributor

@swiatekm swiatekm commented May 10, 2023

In a follow-up to #1688, improve validation for prometheus receiver configuration. This turned out to be more complicated than expected, so I'd like some more eyes to verify I haven't missed anything.

The validation logic for the prometheus receiver config itself is:

  • if target allocator is not enabled on the Collector CR, prometheus config under the config key is required
  • if target allocator is enabled
    • either config has to exist to inject http_sd_configs, or target_allocator needs to exist
    • if additionally target allocator rewrite is enabled, nothing is required

However, I also noticed that target allocator itself doesn't correctly validate its configuration. For example, it's possible to pass an empty Prometheus config, but this will cause a crash in the targetDiscoverer. So I added an additional check, requiring that either:

  • there is a non-empty Prometheus config
  • or PrometheusCR is enabled

I submitted both of these changes in a single PR, as they're related, but I can resubmit separately if that'll make review easier.

@swiatekm swiatekm force-pushed the fix/validate-ta-config branch 5 times, most recently from e2507b9 to e2c14ac Compare May 11, 2023 11:35
@swiatekm swiatekm marked this pull request as ready for review May 11, 2023 11:41
@swiatekm swiatekm requested review from a team May 11, 2023 11:41
Copy link
Contributor

@matej-g matej-g left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a nice addition! 💪 I added only a few nits.

apis/v1alpha1/opentelemetrycollector_webhook.go Outdated Show resolved Hide resolved
setupLog.Error(err, "Unable to apply initial configuration")
return err
if cfg.Config != nil {
err = targetDiscoverer.ApplyConfig(allocatorWatcher.EventSourceConfigMap, cfg.Config)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder whether it would make more sense to check for cfg.Config == nil inside of ApplyConfig and if so, just do early return

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about it, but realistically, this is the only place in the whole package where this can happen, because we're getting the config from user input. So I think it's better to keep this bit of complexity here instead of in the component. We could also just have the Config itself come with a non-nil default if the user doesn't supply it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you actually reverse this condition such that we do

if cfg.Config == nil {
  setupLog.Error("cannot apply an empty configuration")
  return fmt.Error("...")
}

That way we prevent startup correctly on bad config?

Copy link
Contributor Author

@swiatekm swiatekm May 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't reverse it if it stays where it is in main, as cfg.Config == nil by itself does not imply an error. Or do you mean doing this inside ApplyConfig?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wouldn't cfg.Config == nil imply an error is i guess my conjecture here. i.e. if we don't have a config to apply at all we should fail fast.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@swiatekm-sumo IMHO still it's an exported method, could be used anywhere else. That way whoever re-uses this method in whichever place they do, do not need to worry about the nil condition.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@matej-g makes sense, you've convinced me.

@jaronoff97 Having the raw Prometheus config be nil and Prometheus CR handling enabled looks like a valid configuration to me. That's why I added the validate method and check for this specific state.

Copy link
Contributor Author

@swiatekm swiatekm May 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved the check to ApplyConfig and added a Discoverer test for this.

if !*cliConfig.PromCRWatcherConf.Enabled &&
(config.Config == nil ||
len(config.Config.ScrapeConfigs) == 0) {
return fmt.Errorf("either at least one scrape config must be defined, or Prometheus CR watching must be enabled")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small nit

Suggested change
return fmt.Errorf("either at least one scrape config must be defined, or Prometheus CR watching must be enabled")
return fmt.Errorf("at least one scrape config must be defined, or Prometheus CR watching must be enabled")

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'd prefer us to actually split up this if in to three statements. Although it's more LOC we'll be able to give a more descriptive error back which I think would be better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wrote it this way anticipating that we might want to add more checks in the future, in which case they all need to be negative checks (as in "this error condition doesn't occur"). And I can't really rewrite it into three separate statements, as it's logically an OR.

I could make it nested instead, but I'm not sure if you can really make the error message any more useful that way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I rewrote this condition to be clearer, let me know if this helps.

cmd/otel-allocator/config/config.go Outdated Show resolved Hide resolved
Copy link
Contributor

@jaronoff97 jaronoff97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a few thoughts, overall this is going to be super useful though. thank you!

if !*cliConfig.PromCRWatcherConf.Enabled &&
(config.Config == nil ||
len(config.Config.ScrapeConfigs) == 0) {
return fmt.Errorf("either at least one scrape config must be defined, or Prometheus CR watching must be enabled")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'd prefer us to actually split up this if in to three statements. Although it's more LOC we'll be able to give a more descriptive error back which I think would be better.

setupLog.Error(err, "Unable to apply initial configuration")
return err
if cfg.Config != nil {
err = targetDiscoverer.ApplyConfig(allocatorWatcher.EventSourceConfigMap, cfg.Config)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you actually reverse this condition such that we do

if cfg.Config == nil {
  setupLog.Error("cannot apply an empty configuration")
  return fmt.Error("...")
}

That way we prevent startup correctly on bad config?

Copy link
Contributor

@jaronoff97 jaronoff97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a few thoughts, overall this is going to be super useful though. thank you!

@swiatekm swiatekm requested a review from jaronoff97 May 16, 2023 13:36
@jaronoff97
Copy link
Contributor

@swiatekm-sumo my colleague @kristinapathak is also going to take a look at this as well before i merge. We've been deep in to configuration related PRs for the TA as of late so this will be a good continuation.

Copy link
Contributor

@kristinapathak kristinapathak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! Left two questions/suggestions.

cmd/otel-allocator/config/config.go Outdated Show resolved Hide resolved
@@ -57,6 +57,11 @@ func ReplaceConfig(instance v1alpha1.OpenTelemetryCollector) (string, error) {
return "", getCfgPromErr
}

validateCfgPromErr := ta.ValidatePromConfig(promCfgMap, instance.Spec.TargetAllocator.Enabled, featuregate.EnableTargetAllocatorRewrite.IsEnabled())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is ValidatePromConfig() meant to be called every time ta.ConfigToPromConfig() is called? Put another way, is there ever a situation when we want to call one function without the other? I'm wondering the benefit of two separate functions. Is it separation of concerns? Not making a breaking change to the adapters pkg API?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not making a breaking change and keeping current semantics of ta.ConfigToPromConfig() is part of it, although the separation of concerns is a nice bonus from that. Though overall there isn't any strong reason behind it, if we're ok breaking ConfigToPromConfig, we could simply make validation a private function and have ConfigToPromConfig call it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that sounds like a good followup for the future :)

@swiatekm swiatekm requested a review from kristinapathak May 22, 2023 16:05
@swiatekm swiatekm force-pushed the fix/validate-ta-config branch from 7479876 to 1515f43 Compare May 24, 2023 19:04
@swiatekm swiatekm requested a review from jaronoff97 May 24, 2023 19:05
@swiatekm swiatekm force-pushed the fix/validate-ta-config branch 2 times, most recently from b774b95 to c09cce3 Compare May 24, 2023 20:24
@@ -74,6 +74,10 @@ func main() {
setupLog.Error(configLoadErr, "Unable to load configuration")
}

if validationErr := config.ValidateConfig(&cfg, &cliConf); validationErr != nil {
setupLog.Error(validationErr, "Invalid configuration")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in a future version should we fast fail here?

@@ -57,6 +57,11 @@ func ReplaceConfig(instance v1alpha1.OpenTelemetryCollector) (string, error) {
return "", getCfgPromErr
}

validateCfgPromErr := ta.ValidatePromConfig(promCfgMap, instance.Spec.TargetAllocator.Enabled, featuregate.EnableTargetAllocatorRewrite.IsEnabled())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that sounds like a good followup for the future :)

@jaronoff97 jaronoff97 merged commit 0a92a11 into open-telemetry:main May 24, 2023
@swiatekm swiatekm deleted the fix/validate-ta-config branch May 25, 2023 07:42
ItielOlenick pushed a commit to ItielOlenick/opentelemetry-operator that referenced this pull request May 1, 2024
open-telemetry#1729)

* Improve validation for prometheus receiver config

* Improve config validation for target allocator

* add changelog entry

* review fixes

* log empty Prometheus configs passed to discoverer

* add tests for ValidateConfig
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants