-
Notifications
You must be signed in to change notification settings - Fork 345
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Flaky e2e-tests-examples2/simple-streaming test #945
Comments
@kevinearls is this something you could take a look? |
@jpkrohling Yes, please assign it to me. |
We have the same issue when trying to connect to our existing Kafka cluster. Is there any update on the investigation? |
ping @kevinearls |
@sandangel What are you seeing? I'm guessing it's something different from this and you need to open a new issue. If my memory is correct there was a time we were having intermittent test failures because either the Jaeger operator, Kafka (Strimzi) operator, or Kafka instance was taking too long to deploy. I don't think we've seen this in quite some time, but maybe @jpkrohling can confirm this. |
Indeed, it's been a while since I last saw an intermittent issue here. @sandangel, are you able to consistently reproduce this? |
I'm trying to configure the collector to connect to our existing kafka cluster. However, we consistently got the same error described above. cc @jpkrohling @kevinearls this is the operator config: apiVersion: jaegertracing.io/v1
kind: Jaeger
metadata:
name: simple-streaming
spec:
strategy: streaming
query:
replicas: 0
collector:
replicas: 1
volumeMounts:
- name: kafka-keytab
mountPath: /etc/security/kafka.keytab
readonly: true
- name: krb5-conf
mountPath: /etc/krb5.conf
readonly: true
volumes:
- name: kafka-keytab
secret:
secretName: kafka-keytab
- name: krb5-conf
configMap:
name: krb5-conf
items:
- key: krb5.conf
path: krb5.conf
resources:
limits:
cpu: 1000m
memory: 512Mi
options:
log-level: debug
kafka:
producer:
topic: stg-jaeger_tracing
authentication: kerberos
brokers: stg-broker:9092
kerberos:
realm: stg-realm
use-keytab: true
username: ***
annotations:
prometheus.io/scrape: 'false'
ingester:
replicas: 0
storage:
type: elasticsearch
options:
es:
server-urls: https://stg-elasticsearch:9200
sampling:
options:
default_strategy:
type: probabilistic
param: 1 This is the logs: 2021/03/22 09:10:57 maxprocs: Updating GOMAXPROCS=1: determined from CPU quota
{"level":"info","ts":1616404257.8497376,"caller":"flags/service.go:117","msg":"Mounting metrics handler on admin server","route":"/metrics"}
{"level":"info","ts":1616404257.8497589,"caller":"flags/service.go:123","msg":"Mounting expvar handler on admin server","route":"/debug/vars"}
{"level":"info","ts":1616404257.8499005,"caller":"flags/admin.go:121","msg":"Mounting health check on admin server","route":"/"}
{"level":"info","ts":1616404257.8499346,"caller":"flags/admin.go:127","msg":"Starting admin HTTP server","http-addr":":14269"}
{"level":"info","ts":1616404257.8499455,"caller":"flags/admin.go:113","msg":"Admin server started","http.host-port":"[::]:14269","health-status":"unavailable"}
{"level":"info","ts":1616404257.8513517,"caller":"kafka/factory.go:69","msg":"Kafka factory","producer builder":{"Brokers":["stg-broker:9092"],"RequiredAcks":1,"Compression":0,"CompressionLevel":0,"ProtocolVersion":"","BatchLinger":0,"BatchSize":0,"BatchMinMessages":0,"BatchMaxMessages":0,"Authentication":"kerberos","Kerberos":{"ServiceName":"kafka","Realm":"stg-realm","UseKeyTab":true,"Username":"***","ConfigPath":"/etc/krb5.conf","KeyTabPath":"/etc/security/kafka.keytab"},"TLS":{"Enabled":false,"CAPath":"","CertPath":"","KeyPath":"","ServerName":"","ClientCAPath":"","SkipHostVerify":false},"PlainText":{"UserName":""}},"topic":"stg-jaeger_tracing"}
{"level":"fatal","ts":1616404258.7554333,"caller":"command-line-arguments/main.go:74","msg":"Failed to init storage factory","error":"kafka: client has run out of available brokers to talk to (Is your cluster reachable?)","stacktrace":"main.main.func1\n\tcommand-line-arguments/main.go:74\ngithub.com/spf13/cobra.(*Command).execute\n\tgithub.com/spf13/[email protected]/command.go:826\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\tgithub.com/spf13/[email protected]/command.go:914\ngithub.com/spf13/cobra.(*Command).Execute\n\tgithub.com/spf13/[email protected]/command.go:864\nmain.main\n\tcommand-line-arguments/main.go:133\nruntime.main\n\truntime/proc.go:204"} this is our krb5.conf
Realm, username, kdc, topic, broker has been modified to share on public |
@sandangel Ok, but that doesn't really have anything to do with the original issue here, which was tests failing because of timing issues. Can you either open a new issue here, or ask on the Jaeger email list or Slack? (See https://www.jaegertracing.io/get-in-touch/#via-chat-or-email) @jpkrohling I think we can close this. |
Sure. I thought it is related because it has the same error msg. Anyway I can create another issue. thanks for your response ❤️❤️❤️ |
The e2e test
simple-streaming
from thee2e-tests-examples2
suite is intermittently failing.Running the tests for #914 locally, I was able to reproduce it and get the following information.
State of the cluster before the test timed out:
Logs from the collector:
Logs from Kafka:
Deleting the failing Kafka pod and running the test again made it work. Excerpts:
The text was updated successfully, but these errors were encountered: