Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add feature flag for target allocator config addition #1688

Merged
merged 6 commits into from
May 5, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 18 additions & 0 deletions .chloggen/otelallocator-flag.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
# One of 'breaking', 'deprecation', 'new_component', 'enhancement', 'bug_fix'
change_type: enhancement

# The name of the component, or a single word describing the area of concern, (e.g. operator, target allocator, github action)
component: operator

# A brief description of the change. Surround your text with quotes ("") if it needs to start with a backtick (`).
note: Introduces a new feature flag "`operator.collector.rewritetargetallocator`" that allows an operator to add the target_allocator configuration to the collector configuration

# One or more tracking issues related to the change
issues: [1581]

# (Optional) One or more lines of additional information to render under the primary note.
# These lines will be padded with 2 spaces and then inserted directly into the document.
# Use pipe (|) for multiline entries.
subtext: |
Note that the ConfigToPromConfig function in pkg/targetallocator/adapters now correctly returns the prometheus receiver config
in accordance with its docstring. It used to erroneously return the actual Prometheus config from a level lower.
58 changes: 58 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -422,6 +422,64 @@ Note how the Operator added a `global` section and a new `http_sd_configs` to th

More info on the TargetAllocator can be found [here](cmd/otel-allocator/README.md).

#### Target Allocator config rewriting

Prometheus receiver now has explicit support for acquiring scrape targets from the target allocator. As such, it is now possible to have the
Operator add the necessary target allocator configuration automatically. This feature currently requires the `operator.collector.rewritetargetallocator` feature flag to be enabled. With the flag enabled, the configuration from the previous section would be rendered as:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note for @TylerHelmuth in a follow up to this we should move the feature flag section to its own thing to include this.


```yaml
receivers:
prometheus:
config:
global:
scrape_interval: 1m
scrape_timeout: 10s
evaluation_interval: 1m
target_allocator:
endpoint: http://collector-with-ta-targetallocator:80
interval: 30s
collector_id: $POD_NAME

exporters:
logging:

service:
pipelines:
metrics:
receivers: [prometheus]
processors: []
exporters: [logging]
```

This also allows for a more straightforward collector configuration for target discovery using prometheus-operator CRDs. See below for a minimal example:

```yaml
apiVersion: opentelemetry.io/v1alpha1
kind: OpenTelemetryCollector
metadata:
name: collector-with-ta-prometheus-cr
spec:
mode: statefulset
targetAllocator:
enabled: true
serviceAccount: everything-prometheus-operator-needs
prometheusCR:
enabled: true
config: |
receivers:
prometheus:

exporters:
logging:

service:
pipelines:
metrics:
receivers: [prometheus]
processors: []
exporters: [logging]
```

## Compatibility matrix

### OpenTelemetry Operator vs. OpenTelemetry Collector
Expand Down
29 changes: 20 additions & 9 deletions pkg/collector/reconcile/config_replace.go
Original file line number Diff line number Diff line change
Expand Up @@ -17,8 +17,8 @@ package reconcile
import (
"fmt"
"net/url"
"time"

"github.com/mitchellh/mapstructure"
promconfig "github.com/prometheus/prometheus/config"
"github.com/prometheus/prometheus/discovery"
"github.com/prometheus/prometheus/discovery/http"
Expand All @@ -27,12 +27,20 @@ import (

"github.com/open-telemetry/opentelemetry-operator/apis/v1alpha1"
"github.com/open-telemetry/opentelemetry-operator/pkg/collector/adapters"
"github.com/open-telemetry/opentelemetry-operator/pkg/featuregate"
"github.com/open-telemetry/opentelemetry-operator/pkg/naming"
ta "github.com/open-telemetry/opentelemetry-operator/pkg/targetallocator/adapters"
)

type targetAllocator struct {
Endpoint string `yaml:"endpoint"`
Interval time.Duration `yaml:"interval"`
CollectorID string `yaml:"collector_id"`
}

type Config struct {
PromConfig *promconfig.Config `yaml:"config"`
PromConfig *promconfig.Config `yaml:"config"`
TargetAllocConfig *targetAllocator `yaml:"target_allocator,omitempty"`
}

func ReplaceConfig(instance v1alpha1.OpenTelemetryCollector) (string, error) {
Expand All @@ -50,9 +58,7 @@ func ReplaceConfig(instance v1alpha1.OpenTelemetryCollector) (string, error) {
}

// yaml marshaling/unsmarshaling is preferred because of the problems associated with the conversion of map to a struct using mapstructure
promCfg, marshalErr := yaml.Marshal(map[string]interface{}{
"config": promCfgMap,
})
promCfg, marshalErr := yaml.Marshal(promCfgMap)
if marshalErr != nil {
return "", marshalErr
}
Expand All @@ -71,13 +77,18 @@ func ReplaceConfig(instance v1alpha1.OpenTelemetryCollector) (string, error) {
}
}

updPromCfgMap := make(map[string]interface{})
if err := mapstructure.Decode(cfg, &updPromCfgMap); err != nil {
return "", err
if featuregate.EnableTargetAllocatorRewrite.IsEnabled() {
cfg.TargetAllocConfig = &targetAllocator{
Endpoint: fmt.Sprintf("http://%s:80", naming.TAService(instance)),
Interval: 30 * time.Second,
CollectorID: "${POD_NAME}",
}
// we don't need the scrape configs here anymore with target allocator enabled
cfg.PromConfig.ScrapeConfigs = []*promconfig.ScrapeConfig{}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually think the collector may fail to startup if it doesn't have any scrape configs set, at least that was the case a few months ago...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh ha, nvm i fixed this a few months ago (link)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I actually tested this E2E, no issues.

}

// type coercion checks are handled in the ConfigToPromConfig method above
config["receivers"].(map[interface{}]interface{})["prometheus"].(map[interface{}]interface{})["config"] = updPromCfgMap["PromConfig"]
config["receivers"].(map[interface{}]interface{})["prometheus"] = cfg

out, err := yaml.Marshal(config)
if err != nil {
Expand Down
44 changes: 38 additions & 6 deletions pkg/collector/reconcile/config_replace_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -16,11 +16,15 @@ package reconcile

import (
"testing"
"time"

colfeaturegate "go.opentelemetry.io/collector/featuregate"

"github.com/prometheus/prometheus/discovery/http"
"github.com/stretchr/testify/assert"
"gopkg.in/yaml.v2"

"github.com/open-telemetry/opentelemetry-operator/pkg/featuregate"
ta "github.com/open-telemetry/opentelemetry-operator/pkg/targetallocator/adapters"
)

Expand All @@ -37,9 +41,7 @@ func TestPrometheusParser(t *testing.T) {
promCfgMap, err := ta.ConfigToPromConfig(actualConfig)
assert.NoError(t, err)

promCfg, err := yaml.Marshal(map[string]interface{}{
"config": promCfgMap,
})
promCfg, err := yaml.Marshal(promCfgMap)
assert.NoError(t, err)

err = yaml.UnmarshalStrict(promCfg, &cfg)
Expand All @@ -59,6 +61,37 @@ func TestPrometheusParser(t *testing.T) {
for k := range expectedMap {
assert.True(t, expectedMap[k], k)
}
assert.True(t, cfg.TargetAllocConfig == nil)
})

t.Run("should update config with targetAllocator block", func(t *testing.T) {
err := colfeaturegate.GlobalRegistry().Set(featuregate.EnableTargetAllocatorRewrite.ID(), true)
param.Instance.Spec.TargetAllocator.Enabled = true
assert.NoError(t, err)
actualConfig, err := ReplaceConfig(param.Instance)
assert.NoError(t, err)

// prepare
var cfg Config
promCfgMap, err := ta.ConfigToPromConfig(actualConfig)
assert.NoError(t, err)

promCfg, err := yaml.Marshal(promCfgMap)
assert.NoError(t, err)

err = yaml.UnmarshalStrict(promCfg, &cfg)
assert.NoError(t, err)

// test
assert.Len(t, cfg.PromConfig.ScrapeConfigs, 0)
expectedTAConfig := &targetAllocator{
Endpoint: "http://test-targetallocator:80",
Interval: 30 * time.Second,
CollectorID: "${POD_NAME}",
}
assert.Equal(t, expectedTAConfig, cfg.TargetAllocConfig)
err = colfeaturegate.GlobalRegistry().Set(featuregate.EnableTargetAllocatorRewrite.ID(), false)
assert.NoError(t, err)
})

t.Run("should not update config with http_sd_config", func(t *testing.T) {
Expand All @@ -71,9 +104,7 @@ func TestPrometheusParser(t *testing.T) {
promCfgMap, err := ta.ConfigToPromConfig(actualConfig)
assert.NoError(t, err)

promCfg, err := yaml.Marshal(map[string]interface{}{
"config": promCfgMap,
})
promCfg, err := yaml.Marshal(promCfgMap)
assert.NoError(t, err)

err = yaml.UnmarshalStrict(promCfg, &cfg)
Expand All @@ -93,6 +124,7 @@ func TestPrometheusParser(t *testing.T) {
for k := range expectedMap {
assert.True(t, expectedMap[k], k)
}
assert.True(t, cfg.TargetAllocConfig == nil)
})

}
3 changes: 2 additions & 1 deletion pkg/collector/reconcile/configmap.go
Original file line number Diff line number Diff line change
Expand Up @@ -114,7 +114,8 @@ func desiredTAConfigMap(params Params) (corev1.ConfigMap, error) {
"app.kubernetes.io/managed-by": "opentelemetry-operator",
"app.kubernetes.io/component": "opentelemetry-collector",
}
taConfig["config"] = promConfig
// We only take the "config" from the returned object, we don't need the "target_allocator" configuration here.
taConfig["config"] = promConfig["config"]
if len(params.Instance.Spec.TargetAllocator.AllocationStrategy) > 0 {
taConfig["allocation_strategy"] = params.Instance.Spec.TargetAllocator.AllocationStrategy
} else {
Expand Down
56 changes: 54 additions & 2 deletions pkg/collector/reconcile/configmap_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,8 @@ import (
"context"
"testing"

colfeaturegate "go.opentelemetry.io/collector/featuregate"

"github.com/stretchr/testify/assert"
"gopkg.in/yaml.v2"
v1 "k8s.io/api/core/v1"
Expand All @@ -28,6 +30,7 @@ import (

"github.com/open-telemetry/opentelemetry-operator/apis/v1alpha1"
"github.com/open-telemetry/opentelemetry-operator/internal/config"
"github.com/open-telemetry/opentelemetry-operator/pkg/featuregate"
ta "github.com/open-telemetry/opentelemetry-operator/pkg/targetallocator/adapters"
)

Expand Down Expand Up @@ -184,6 +187,55 @@ service:

})

t.Run("should return expected escaped collector config map with target_allocator config block", func(t *testing.T) {
expectedLables["app.kubernetes.io/component"] = "opentelemetry-collector"
expectedLables["app.kubernetes.io/name"] = "test-collector"
expectedLables["app.kubernetes.io/version"] = "latest"
err := colfeaturegate.GlobalRegistry().Set(featuregate.EnableTargetAllocatorRewrite.ID(), true)
assert.NoError(t, err)

expectedData := map[string]string{
"collector.yaml": `exporters:
logging: null
processors: null
receivers:
prometheus:
config:
global:
scrape_interval: 1m
scrape_timeout: 10s
evaluation_interval: 1m
target_allocator:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we have an explicit case for when the flag is disabled?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's disabled in previous tests, but only implicitly. Think it's worth disabling it explicitly there?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmmm... I think it's alright then, when the flag is moved to beta we'll just do the reverse.

endpoint: http://test-targetallocator:80
interval: 30s
collector_id: ${POD_NAME}
service:
pipelines:
metrics:
exporters:
- logging
processors: []
receivers:
- prometheus
`,
}

param, err := newParams("test/test-img", "../testdata/http_sd_config_servicemonitor_test.yaml")
assert.NoError(t, err)
param.Instance.Spec.TargetAllocator.Enabled = true
actual := desiredConfigMap(context.Background(), param)

assert.Equal(t, "test-collector", actual.Name)
assert.Equal(t, expectedLables, actual.Labels)
assert.Equal(t, expectedData, actual.Data)

// Reset the value
expectedLables["app.kubernetes.io/version"] = "0.47.0"
err = colfeaturegate.GlobalRegistry().Set(featuregate.EnableTargetAllocatorRewrite.ID(), false)
assert.NoError(t, err)

})

t.Run("should return expected target allocator config map", func(t *testing.T) {
expectedLables["app.kubernetes.io/component"] = "opentelemetry-targetallocator"
expectedLables["app.kubernetes.io/name"] = "test-targetallocator"
Expand Down Expand Up @@ -358,7 +410,7 @@ func TestExpectedConfigMap(t *testing.T) {
assert.True(t, exists)
assert.Equal(t, instanceUID, actual.OwnerReferences[0].UID)

parmConfig, err := ta.ConfigToPromConfig(params().Instance.Spec.Config)
promConfig, err := ta.ConfigToPromConfig(params().Instance.Spec.Config)
assert.NoError(t, err)

taConfig := make(map[interface{}]interface{})
Expand All @@ -367,7 +419,7 @@ func TestExpectedConfigMap(t *testing.T) {
"app.kubernetes.io/managed-by": "opentelemetry-operator",
"app.kubernetes.io/component": "opentelemetry-collector",
}
taConfig["config"] = parmConfig
taConfig["config"] = promConfig["config"]
taConfig["allocation_strategy"] = "least-weighted"
taConfigYAML, _ := yaml.Marshal(taConfig)

Expand Down
7 changes: 7 additions & 0 deletions pkg/featuregate/featuregate.go
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,13 @@ var (
"operator.autoinstrumentation.java",
featuregate.StageBeta,
featuregate.WithRegisterDescription("controls whether the operator supports Java auto-instrumentation"))

// EnableTargetAllocatorRewrite is the feature gate that controls whether the collector's configuration should
// automatically be rewritten when the target allocator is enabled.
EnableTargetAllocatorRewrite = featuregate.GlobalRegistry().MustRegister(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also update the README with this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

"operator.collector.rewritetargetallocator",
featuregate.StageAlpha,
featuregate.WithRegisterDescription("controls whether the operator should configure the collector's targetAllocator configuration"))
)

// Flags creates a new FlagSet that represents the available featuregate flags using the supplied featuregate registry.
Expand Down
14 changes: 2 additions & 12 deletions pkg/targetallocator/adapters/config_to_prom_config.go
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ func errorNotAMap(component string) error {
return fmt.Errorf("%s property in the configuration doesn't contain valid %s", component, component)
}

// ConfigToPromConfig converts the incoming configuration object into a the Prometheus receiver config.
// ConfigToPromConfig converts the incoming configuration object into the Prometheus receiver config.
func ConfigToPromConfig(cfg string) (map[interface{}]interface{}, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is technically a breaking change, could we mark that in the chloggen?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'd rather properly fix this check, as it's currently incorrect. Right now, it requires that the config property exists, but in reality we should check if one of the following is true:

  • config exists
  • target_allocator exists and target allocator is enabled on the resource
  • target allocator is enabled on the resource and the feature flag is enabled

Then this is really a fix, not a breaking change, and the actual breaking change will happen when the flag is enabled by default. How's that sound?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, technically the comment on the function doesn't match what the code is actually doing, but regardless anyone potentially using this function and expecting it to return the prometheus config blob would be broken by this change. I think i'm okay to call this a 'fix', as long as we call that out in the changelog's subtext – in case someone is using this for some reason they should be able to see the difference in the release notes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now I just added a note to the changelog about the function's semantics changing. For the validation, I took a stab at adding it, and it's a large enough change that I think it should go into a separate PR. You reckon we can merge this as is, and do validation afterwards? Can also go the other way around, the validation changes are mostly independent of this PR.

config, err := adapters.ConfigFromString(cfg)
if err != nil {
Expand All @@ -55,15 +55,5 @@ func ConfigToPromConfig(cfg string) (map[interface{}]interface{}, error) {
return nil, errorNotAMap("prometheus")
}

prometheusConfigProperty, ok := prometheus["config"]
if !ok {
return nil, errorNoComponent("prometheusConfig")
}

prometheusConfig, ok := prometheusConfigProperty.(map[interface{}]interface{})
if !ok {
return nil, errorNotAMap("prometheusConfig")
}

return prometheusConfig, nil
return prometheus, nil
}
Loading