-
Notifications
You must be signed in to change notification settings - Fork 350
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(#4922): Fix flaky TestHealthTrait #5346
Conversation
@@ -362,6 +362,8 @@ func TestHealthTrait(t *testing.T) { | |||
|
|||
g.Eventually(IntegrationPodPhase(t, ctx, ns, name), TestTimeoutLong).Should(Equal(corev1.PodRunning)) | |||
g.Eventually(IntegrationPhase(t, ctx, ns, name), TestTimeoutShort).Should(Equal(v1.IntegrationPhaseRunning)) | |||
// Wait for the integration condition to become ready=false and then check that it remains not ready for some time - fixes some test flakiness |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that normally, the Ready
condition should start with a false
value. So, this verification may hide the problem. I think we need to understand the root cause, and, if it's not already the case, initialize a Ready
condition as false.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can find my analysis of the root cause in #5345. It is the Deployment ready state that causes the intermediate Integration condition status ready=true
. This is before the health trait is able to set the condition status to false
.
My intention in this PR is to fix the flaky E2E test. In case we are not happy with the intermediate condition status ready=true
I'd suggest to open a new issue and discuss
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The point is that, according to your analysis, the test is not wrong. What's wrong is the logic where we are blindly setting a value to something which should not be. If we change this test we'd be promoting a wrong behavior turning the bug into a feature. I am planning to retake the work of #5096 for next release which may intersect with this problem. I'd suggest to either work on the root cause or keep this on hold for the moment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What you say implies that the logic has changed recently and the E2E test has become flaky due to that regression recently. I think the flakiness in this test exists for quite some time since #4922 is reporting it already on Camel K 2.1 and it may be flaky even before that.
I am fine with working on the root cause and fix the condition status setting but let's open a new issue now to track that in particular and let's not keep the flaky test until this is tackled and completely resolved. In a few weeks nobody will remember that the flakiness in this E2E test points to a misbehavior that should be fixed. We need a new issue for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, this behavior exists since long time, also in 1.x branches probably. Reason why I advocate for working on the root cause. In any case, it's just my opinion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Created #5351 to track the ready condition status flaw
- Avoid failing assertion on condition status ready=false due to temporary deployment ready condition status
6033d54
to
6c6c76f
Compare
Probably it has to be backported, thanks. |
Release Note