Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

aws-for-fluent-bit : Upgrading chart version leading to Liveness Probe failed #995

Open
jatinmehrotra opened this issue Aug 31, 2023 · 2 comments · May be fixed by #1168
Open

aws-for-fluent-bit : Upgrading chart version leading to Liveness Probe failed #995

jatinmehrotra opened this issue Aug 31, 2023 · 2 comments · May be fixed by #1168
Labels
bug Something isn't working

Comments

@jatinmehrotra
Copy link

jatinmehrotra commented Aug 31, 2023

Describe the bug

Until now the fluent bit pods were working fine, but the moment I updated my chart from 0.1.19 to 0.1.29 our Fluent Bit pods enter a CrashLoopBackoff state, due to failures in the newly introduced #975."

Pod event show the following message

Liveness probe failed: HTTP Probe failed with statuscode: 500

Steps to reproduce

Spin up an IPv4 EKS Cluster, install the aws-for-fluent-bit Chart in version 0.1.29. The pods will enter CrashLoopBackoff.

Expected outcome
Liveness probe should be passed with updated chart configuration

Environment

Chart name: aws-for-fluent-bit
Chart version: 0.1.29
Kubernetes version: 1.25
Using EKS (yes/no), if so version? Yes, v1.25.12-eks-2d98532

Additional Context:

Note: the pods are runing on ec2 node.

@jatinmehrotra
Copy link
Author

I am wondering liveness probe was introduced in this commit #975. I am wondering If this is a port issue that probe requests are not Able to reach in to port 2020

( I may be wrong Since I dont know the exact internals of how fluent bit server is establishing the network connectivity for probes )

@jbeemster
Copy link

We are seeing the same issue though its somewhat sporadic - when scaling up an EKS cluster about 30-40% of fluentbit pods enter this crashloop and never seem to pass the health checks. The other pods do manage and are fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
2 participants