Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reload_interval TLS option not working for otlphttp exporter #11265

Open
Morgan-Li opened this issue Sep 24, 2024 · 4 comments
Open

reload_interval TLS option not working for otlphttp exporter #11265

Morgan-Li opened this issue Sep 24, 2024 · 4 comments
Labels
bug Something isn't working

Comments

@Morgan-Li
Copy link

Describe the bug
Thereload_interval TLS option is not working for the otlphttp exporter. I am seeing this on an OpenShift Kubernetes environment where I have an Open Liberty pod exporting logs and metrics to an OpenTelemetry collector which in turn exports to a fluentbit pod. And I am using cert-manager operator created certificates to set up TLS between the two.

Steps to reproduce
To test hot reloading certificates I am manually deleting the certificate secrets mounted to the open liberty server, the otel collector, and fluent bit pods to regenerate them. I then restart the open liberty pod, but not the otelcol or fluent bit pod. This works as expected, and logs can still flow from openliberty->otelcol->fluentbit. But, when I restart the fluent bit pod I get these errors on otelcol:

2024-09-23T15:45:26.244Z info	exporterhelper/retry_sender.go:118	Exporting failed. Will retry the request after interval.	{"kind": "exporter", "data_type": "metrics", "name": "otlphttp", "error": "failed to make an HTTP request: Post \"https://fluent-bit.morgan-certs.svc.cluster.local:4318/v1/metrics\": dial tcp 172.30.127.25:4318: connect: connection refused", "interval": "2.569047697s"}
2024-09-23T15:45:28.834Z info	exporterhelper/retry_sender.go:118	Exporting failed. Will retry the request after interval.	{"kind": "exporter", "data_type": "metrics", "name": "otlphttp", "error": "failed to make an HTTP request: Post \"https://fluent-bit.morgan-certs.svc.cluster.local:4318/v1/metrics\": tls: failed to verify certificate: x509: certificate signed by unknown authority (possibly because of \"crypto/rsa: verification error\" while trying to verify candidate authority certificate \"www.test.com\")", "interval": "10.061425455s"}

Indicating that the exporter is still using the old certificates. Even though I set the reload_interval option in the exporter config. The receiver seems to be reloading the certificate though as restarting the liberty pod and not otelcol (to get new certs on liberty pod) works. And after restarting the otelcol pod to get the exporter to use the new certificates, the exporter errors go away.

What did you expect to see?
I expect to see otel collector hot reloading the certificate without a restart when exporting logs and metrics using otlphttp exporter.

What did you see instead?
The otel collector was still exporting logs using the old certificates and the exporting failed with errors until the pod was restarted.

What version did you use?
Open Telemetry Collector v0.105.0
Fluent Bit v3.1.8

What config did you use?

extensions:

    receivers:
      otlp:
        protocols:
          grpc:
            endpoint: 0.0.0.0:4317
            tls:
              cert_file: /opt/certs/tls.crt
              key_file: /opt/certs/tls.key
              ca_file: /opt/certs/ca.crt
              reload_interval: 15s

    processors:
      batch:

    exporters:
      otlphttp:
        endpoint: https://fluent-bit.morgan-certs.svc.cluster.local:4318/
        tls:
          insecure: false
          insecure_skip_verify: false
          ca_file: /opt/outbound/certs/ca.crt
          reload_interval: 15s
      debug:
        verbosity: detailed
        sampling_initial: 5
        sampling_thereafter: 200

    service:
      extensions: []
      pipelines:
        logs:
          receivers: [otlp]
          processors: [batch]
          exporters: [debug, otlphttp]
        metrics:
          receivers: [otlp]
          processors: [batch]
          exporters: [debug, otlphttp]

Environment
OpenShift kubernetes based environment v4.16.7

Additional context

@Morgan-Li Morgan-Li added the bug Something isn't working label Sep 24, 2024
@shivanthzen
Copy link

shivanthzen commented Oct 4, 2024

@Morgan-Li I see seperate cert files receiver and exporter in the config, receivers.otlp.protocols.grpc.tls.ca_file and exporters.oltphtpp.tls.ca_file, values being /opt/certs/ and other in /opt/outbound/certs/. Could you verify if both of them were changed at once ?

@Morgan-Li
Copy link
Author

Morgan-Li commented Oct 4, 2024

@shivanthzen Yes, both /opt/certs/ca.crt and /opt/outbound/certs/ca.crt certificates are being changed at the same time (maybe a few seconds after each other). I have tried using the same certificate for both receiver and exporter as well, but the result is the same. The exporter certificate is not refreshed

@shivanthzen
Copy link

shivanthzen commented Oct 7, 2024

Apparently we can only hotreload server and client key/certificates, not ca certificates.

certPool, err := c.loadCACertPool()
if err != nil {
return nil, err
}
var getCertificate func(*tls.ClientHelloInfo) (*tls.Certificate, error)
var getClientCertificate func(*tls.CertificateRequestInfo) (*tls.Certificate, error)
if c.hasCert() || c.hasKey() {
var certReloader *certReloader

@shivanthzen
Copy link

This is a known issue in golang. There is no way to reload ca certificates for tls connections in golang as of now.
golang/go#35887

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants