-
Notifications
You must be signed in to change notification settings - Fork 382
Re-work TestCreateServiceInstanceWithProvisionFailure to mitigate its flake #2153
Re-work TestCreateServiceInstanceWithProvisionFailure to mitigate its flake #2153
Conversation
Thanks for tackling this! Hit me up when you think it's ready to review after all your kicking. 😀 |
/retest |
6 similar comments
/retest |
/retest |
/retest |
/retest |
/retest |
/retest |
ffae2a4
to
ae300e0
Compare
/retest |
@carolynvs I feel good about this PR fixing the flake. I have not had any test failures in 10 runs of the test. It is ready for review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM but honestly I'm not familiar enough with this area of service catalog for that to be meaningful. So we'll need someone else to do a more thorough review on it.
test/util/util.go
Outdated
@@ -345,7 +345,9 @@ func AssertServiceInstanceCondition(t *testing.T, instance *v1beta1.ServiceInsta | |||
} | |||
|
|||
if !foundCondition { | |||
t.Fatalf("%v condition not found", conditionType) | |||
if status != v1beta1.ConditionFalse || len(reason) != 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how weird, this function wasn't used before this. and the other one was used only once.
change makes sense, but maybe could use a comment that we have indeed found a condition because we have a set status
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll create a separate function for the false-or-absent assert.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant to remove this in lieu of the new function. I've pushed up a new commit with this removed.
conditionReason string | ||
expectFailCondition bool | ||
provisionErrorReason string | ||
failReason string | ||
triggersOrphanMitigation bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we annotate the struct fields with some comments for use?
// the core of the test so that the resource cleanup can proceed. | ||
defer atomic.StoreInt32(&respondWithProvisionSuccess, 1) | ||
defer atomic.StoreInt32(&respondWithDeprovisionSuccess, 1) | ||
defer atomic.StoreInt32(&blockDeprovision, 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not a super fan of atomic ints for what are essentially bools.
I'm fine with the "throw a big lock around it" solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is more code (which is subjectively more difficult to read). I personally don't find it worthwhile to use a mutex to avoid using as integer as a boolean, but I'll make the change nevertheless.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right about select and chan being more idiomatic, though. I'll use that instead.
}, | ||
})) | ||
func(r *osb.ProvisionRequest) (*osb.ProvisionResponse, error) { | ||
if atomic.LoadInt32(&respondWithProvisionSuccess) != 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would prefer if this was == 1
. we set it explicitly up above and I find equality easier to read than inequality.
if atomic.LoadInt32(&respondWithDeprovisionSuccess) != 0 { | ||
return &osb.DeprovisionResponse{}, nil | ||
} else { | ||
for atomic.LoadInt32(&blockDeprovision) != 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same with these. assert truth rather than other way around.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems okay. need to take another pass.
the atomics is pretty straightforward, but wondering if channels and selects would be more idiomatic.
… test to mitigate its flake
ae300e0
to
7da4277
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this flows a lot better from top to bottom.
LGTM
/lgtm
/approve
test/util/util.go
Outdated
@@ -345,7 +345,9 @@ func AssertServiceInstanceCondition(t *testing.T, instance *v1beta1.ServiceInsta | |||
} | |||
|
|||
if !foundCondition { | |||
t.Fatalf("%v condition not found", conditionType) | |||
if status != v1beta1.ConditionFalse || len(reason) != 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: MHBauer The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
New changes are detected. LGTM label has been removed. |
Closes #2036. |
I have thought about this flake a lot many different times and have not been able to rationalize either (1) why the test would produce failures if the code being tested is correct or (2) why the code being tested is incorrect. I have never been able to reproduce this flake locally, so I am not sure whether this re-work will do anything to fix the flake. However, if it doesn't, then I will be very concerned about what is going on in the code being tested.
I will kick this PR repeatedly to try to provoke a flake in the unit tests.