Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix TA in clusters with global proxy #3187

Merged
merged 3 commits into from
Aug 5, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 20 additions & 0 deletions .chloggen/fix-ta-with-proxy.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# One of 'breaking', 'deprecation', 'new_component', 'enhancement', 'bug_fix'
change_type: bug_fix

# The name of the component, or a single word describing the area of concern, (e.g. collector, target allocator, auto-instrumentation, opamp, github action)
component: target allocator

# A brief description of the change. Surround your text with quotes ("") if it needs to start with a backtick (`).
note: Fix collector to target allocator connection in clusters with proxy.

# One or more tracking issues related to the change
issues: [3187]

# (Optional) One or more lines of additional information to render under the primary note.
# These lines will be padded with 2 spaces and then inserted directly into the document.
# Use pipe (|) for multiline entries.
subtext: |
On clusters with global proxy the collector might fail to talk to target allocator
because the endpoint is set to `<ta-service-name>:port` and therefore it will go to proxy
and request might be forwarded to internet. Clusters with proxy configure `NO_PROXY` to `.svc.cluster.local` so
the calls to this endpoint will not go through the proxy.
4 changes: 2 additions & 2 deletions controllers/builder_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -1291,7 +1291,7 @@ service:
Annotations: map[string]string{},
},
Data: map[string]string{
"collector.yaml": "exporters:\n logging: null\nreceivers:\n prometheus:\n config: {}\n target_allocator:\n collector_id: ${POD_NAME}\n endpoint: http://test-targetallocator:80\n interval: 30s\nservice:\n pipelines:\n metrics:\n exporters:\n - logging\n receivers:\n - prometheus\n",
"collector.yaml": "exporters:\n logging: null\nreceivers:\n prometheus:\n config: {}\n target_allocator:\n collector_id: ${POD_NAME}\n endpoint: http://test-targetallocator.test.svc.cluster.local:80\n interval: 30s\nservice:\n pipelines:\n metrics:\n exporters:\n - logging\n receivers:\n - prometheus\n",
},
},
&corev1.ServiceAccount{
Expand Down Expand Up @@ -1682,7 +1682,7 @@ prometheus_cr:
Annotations: map[string]string{},
},
Data: map[string]string{
"collector.yaml": "exporters:\n logging: null\nreceivers:\n prometheus:\n config: {}\n target_allocator:\n collector_id: ${POD_NAME}\n endpoint: http://test-targetallocator:80\n interval: 30s\nservice:\n pipelines:\n metrics:\n exporters:\n - logging\n receivers:\n - prometheus\n",
"collector.yaml": "exporters:\n logging: null\nreceivers:\n prometheus:\n config: {}\n target_allocator:\n collector_id: ${POD_NAME}\n endpoint: http://test-targetallocator.test.svc.cluster.local:80\n interval: 30s\nservice:\n pipelines:\n metrics:\n exporters:\n - logging\n receivers:\n - prometheus\n",
},
},
&corev1.ServiceAccount{
Expand Down
2 changes: 1 addition & 1 deletion internal/manifests/collector/config_replace.go
Original file line number Diff line number Diff line change
Expand Up @@ -71,7 +71,7 @@ func ReplaceConfig(otelcol v1beta1.OpenTelemetryCollector, targetAllocator *v1al

// To avoid issues caused by Prometheus validation logic, which fails regex validation when it encounters
// $$ in the prom config, we update the YAML file directly without marshaling and unmarshalling.
updPromCfgMap, getCfgPromErr := ta.AddTAConfigToPromConfig(promCfgMap, naming.TAService(targetAllocator.Name))
updPromCfgMap, getCfgPromErr := ta.AddTAConfigToPromConfig(promCfgMap, naming.TAService(targetAllocator.Name), targetAllocator.Namespace)
if getCfgPromErr != nil {
return "", getCfgPromErr
}
Expand Down
4 changes: 2 additions & 2 deletions internal/manifests/collector/config_replace_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ func TestPrometheusParser(t *testing.T) {
assert.NotContains(t, prometheusConfig, "scrape_configs")

expectedTAConfig := map[interface{}]interface{}{
"endpoint": "http://test-targetallocator:80",
"endpoint": "http://test-targetallocator.default.svc.cluster.local:80",
"interval": "30s",
"collector_id": "${POD_NAME}",
}
Expand All @@ -68,7 +68,7 @@ func TestPrometheusParser(t *testing.T) {
assert.NotContains(t, prometheusConfig, "scrape_configs")

expectedTAConfig := map[interface{}]interface{}{
"endpoint": "http://test-targetallocator:80",
"endpoint": "http://test-targetallocator.default.svc.cluster.local:80",
"interval": "30s",
"collector_id": "${POD_NAME}",
}
Expand Down
2 changes: 1 addition & 1 deletion internal/manifests/collector/configmap_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -84,7 +84,7 @@ receivers:
config: {}
target_allocator:
collector_id: ${POD_NAME}
endpoint: http://test-targetallocator:80
endpoint: http://test-targetallocator.default.svc.cluster.local:80
interval: 30s
service:
pipelines:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ receivers:
scrape_timeout: 10s
target_allocator:
collector_id: ${POD_NAME}
endpoint: http://test-targetallocator:80
endpoint: http://test-targetallocator.default.svc.cluster.local:80
interval: 30s
service:
pipelines:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ receivers:
scrape_configs:
- honor_labels: true
http_sd_configs:
- url: http://test-targetallocator:80/jobs/service-x/targets?collector_id=$POD_NAME
- url: http://test-targetallocator.default.svc.cluster.local:80/jobs/service-x/targets?collector_id=$POD_NAME
job_name: service-x
metric_relabel_configs:
- action: keep
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -196,7 +196,7 @@ func UnescapeDollarSignsInPromConfig(cfg string) (map[interface{}]interface{}, e
// This function removes any existing service discovery configurations (e.g., `sd_configs`, `dns_sd_configs`, `file_sd_configs`, etc.)
// from the `scrape_configs` section and adds a single `http_sd_configs` configuration.
// The `http_sd_configs` points to the TA (Target Allocator) endpoint that provides the list of targets for the given job.
func AddHTTPSDConfigToPromConfig(prometheus map[interface{}]interface{}, taServiceName string) (map[interface{}]interface{}, error) {
func AddHTTPSDConfigToPromConfig(prometheus map[interface{}]interface{}, taServiceName string, taNamespace string) (map[interface{}]interface{}, error) {
prometheusConfigProperty, ok := prometheus["config"]
if !ok {
return nil, errorNoComponent("prometheusConfig")
Expand Down Expand Up @@ -249,7 +249,7 @@ func AddHTTPSDConfigToPromConfig(prometheus map[interface{}]interface{}, taServi
escapedJob := url.QueryEscape(jobName)
scrapeConfig["http_sd_configs"] = []interface{}{
map[string]interface{}{
"url": fmt.Sprintf("http://%s:80/jobs/%s/targets?collector_id=$POD_NAME", taServiceName, escapedJob),
"url": fmt.Sprintf("http://%s.%s.svc.cluster.local:80/jobs/%s/targets?collector_id=$POD_NAME", taServiceName, taNamespace, escapedJob),
},
}
}
Expand All @@ -260,7 +260,7 @@ func AddHTTPSDConfigToPromConfig(prometheus map[interface{}]interface{}, taServi
// AddTAConfigToPromConfig adds or updates the target_allocator configuration in the Prometheus configuration.
// If the `EnableTargetAllocatorRewrite` feature flag for the target allocator is enabled, this function
// removes the existing scrape_configs from the collector's Prometheus configuration as it's not required.
func AddTAConfigToPromConfig(prometheus map[interface{}]interface{}, taServiceName string) (map[interface{}]interface{}, error) {
func AddTAConfigToPromConfig(prometheus map[interface{}]interface{}, taServiceName string, taNamespace string) (map[interface{}]interface{}, error) {
prometheusConfigProperty, ok := prometheus["config"]
if !ok {
return nil, errorNoComponent("prometheusConfig")
Expand All @@ -281,7 +281,7 @@ func AddTAConfigToPromConfig(prometheus map[interface{}]interface{}, taServiceNa
return nil, errorNotAMap("target_allocator")
}

targetAllocatorCfg["endpoint"] = fmt.Sprintf("http://%s:80", taServiceName)
targetAllocatorCfg["endpoint"] = fmt.Sprintf("http://%s.%s.svc.cluster.local:80", taServiceName, taNamespace)
targetAllocatorCfg["interval"] = "30s"
targetAllocatorCfg["collector_id"] = "${POD_NAME}"

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -279,15 +279,15 @@ func TestAddHTTPSDConfigToPromConfig(t *testing.T) {
"job_name": "test_job",
"http_sd_configs": []interface{}{
map[string]interface{}{
"url": fmt.Sprintf("http://%s:80/jobs/%s/targets?collector_id=$POD_NAME", taServiceName, url.QueryEscape("test_job")),
"url": fmt.Sprintf("http://%s.default.svc.cluster.local:80/jobs/%s/targets?collector_id=$POD_NAME", taServiceName, url.QueryEscape("test_job")),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think this should always work... do we have any concerns around custom DNS resolvers? i guess this would probably fail today for those use cases, but want to be sure this couldn't possibly break a user.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure I follow you. Could you please explain more how this fix could break uses with custom DNS resolver?

Using default.svc.cluster.local is pretty standard on k8s.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, i guess my concern is for users with custom DNS policies set https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/#pod-dns-config. I think i'm being overly cautious.

},
},
},
},
},
}

actualCfg, err := ta.AddHTTPSDConfigToPromConfig(cfg, taServiceName)
actualCfg, err := ta.AddHTTPSDConfigToPromConfig(cfg, taServiceName, "default")
assert.NoError(t, err)
assert.Equal(t, expectedCfg, actualCfg)
})
Expand All @@ -308,7 +308,7 @@ func TestAddHTTPSDConfigToPromConfig(t *testing.T) {

taServiceName := "test-service"

_, err := ta.AddHTTPSDConfigToPromConfig(cfg, taServiceName)
_, err := ta.AddHTTPSDConfigToPromConfig(cfg, taServiceName, "default")
assert.Error(t, err)
assert.EqualError(t, err, "no scrape_configs available as part of the configuration")
})
Expand Down Expand Up @@ -338,13 +338,13 @@ func TestAddTAConfigToPromConfig(t *testing.T) {
expectedResult := map[interface{}]interface{}{
"config": map[interface{}]interface{}{},
"target_allocator": map[interface{}]interface{}{
"endpoint": "http://test-targetallocator:80",
"endpoint": "http://test-targetallocator.default.svc.cluster.local:80",
"interval": "30s",
"collector_id": "${POD_NAME}",
},
}

result, err := ta.AddTAConfigToPromConfig(cfg, taServiceName)
result, err := ta.AddTAConfigToPromConfig(cfg, taServiceName, "default")

assert.NoError(t, err)
assert.Equal(t, expectedResult, result)
Expand Down Expand Up @@ -374,7 +374,7 @@ func TestAddTAConfigToPromConfig(t *testing.T) {

for _, tc := range testCases {
t.Run(tc.name, func(t *testing.T) {
_, err := ta.AddTAConfigToPromConfig(tc.cfg, taServiceName)
_, err := ta.AddTAConfigToPromConfig(tc.cfg, taServiceName, "default")

assert.Error(t, err)
assert.EqualError(t, err, tc.errText)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ data:
config: {}
target_allocator:
collector_id: ${POD_NAME}
endpoint: http://prometheus-kubernetessd-targetallocator:80
endpoint: http://prometheus-kubernetessd-targetallocator.chainsaw-targetallocator-kubernetessd.svc.cluster.local:80
interval: 30s
service:
pipelines:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ metadata:
creationTimestamp: null
name: targetallocator-kubernetessd
spec:
namespace: chainsaw-targetallocator-kubernetessd
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switching to using the static nemaspaces for the tests that assert collector config map with TA endpoint (now it contains namespace).

We will be able to get rid of this once we migrate to v1beta1 and use the namespace substitution provided by chainsaw.

steps:
- name: step-00
try:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ data:
config: {}
target_allocator:
collector_id: ${POD_NAME}
endpoint: http://prometheus-cr-targetallocator:80
endpoint: http://prometheus-cr-targetallocator.chainsaw-targetallocator-prometheuscr.svc.cluster.local:80
interval: 30s
service:
pipelines:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ metadata:
creationTimestamp: null
name: targetallocator-prometheuscr
spec:
namespace: chainsaw-targetallocator-prometheuscr
steps:
- name: step-00
try:
Expand Down
2 changes: 1 addition & 1 deletion tests/e2e/smoke-targetallocator/00-assert.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ data:
config: {}
target_allocator:
collector_id: ${POD_NAME}
endpoint: http://stateful-targetallocator:80
endpoint: http://stateful-targetallocator.chainsaw-smoke-targetallocator.svc.cluster.local:80
interval: 30s
service:
pipelines:
Expand Down
1 change: 1 addition & 0 deletions tests/e2e/smoke-targetallocator/chainsaw-test.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ metadata:
creationTimestamp: null
name: smoke-targetallocator
spec:
namespace: chainsaw-smoke-targetallocator
steps:
- name: step-00
try:
Expand Down
Loading