Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug 1835851: Update DeleteNodesTwice test #151

Conversation

JoelSpeed
Copy link

The test was relying on the TargetSize to determine if the nodes had been deleted twice. Until #147 was merged, this value would have been the original value from when the nodegroup was constructed and did not reflect the true size of the MachineSet/Deployment underlying the nodegroup

I've updated the test to:

  • Check that the client sees the deletion timestamp added to Machines
  • Wait for the Nodegroup to have the correct number of replicas after the first iteration
  • Check against the API for the final check of the replica count so that we don't rely on our own implementation to pass the test

return err
}

// Ensure the update worked
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

curious if there's any particular reason you saw to add this extra check? seems fair to trust the error returned in 879.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was actually to make sure the update had propagated and the client that is used by the nodegroup could read the deletiontimestamp, I'll update the comment, I'll also try seeing if this is as stable if I remove this.

I had used stress before to try and get it to fail but it didn't, I can retry that

// To make sure we don't run into any flakes in CI
// I've chosen to make this sleep duration 3s.
time.Sleep(3 * time.Second)
expectedSize := len(testConfig.machines) - len(testConfig.machines[7:])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we take this chance to put the magic number 7 in a constant with a comment?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah will do

@JoelSpeed
Copy link
Author

@enxebre I reworked this a bit again to make it a bit more obvious and to use the same flow for fetching machines that the NodeGroups does. Just ran stress and seems happy, plus it fails if I remove the deletion timestamp check

stress ./clusterapi.test -test.run TestNodeGroupDeleteNodesTwice -test.count 5
48 runs so far, 0 failures
96 runs so far, 0 failures
156 runs so far, 0 failures
216 runs so far, 0 failures
276 runs so far, 0 failures
326 runs so far, 0 failures

Copy link

@elmiko elmiko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tested this out locally and it passes for me as well. thanks Joel!

/approve

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 7, 2020
@enxebre
Copy link
Member

enxebre commented May 9, 2020

/hold
I think we should only merge this if we merge it upstream first. Otherwise we introduce go.mod skew and it means we need to revendor after rebasing every release. Re-vendor in this project is something we should currently avoid as it's painful.

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 9, 2020
@JoelSpeed
Copy link
Author

I think we should only merge this if we merge it upstream first. Otherwise we introduce go.mod skew and it means we need to revendor after rebasing every release. Re-vendor in this project is something we should currently avoid as it's painful.

Agreed. I'll open a PR upstream and raise that discussion

@JoelSpeed JoelSpeed force-pushed the update-delete-nodes-twice-test branch from 0979036 to 135b53f Compare May 11, 2020 13:31
@JoelSpeed
Copy link
Author

Upstream PR: kubernetes#3125

@openshift-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: elmiko

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@JoelSpeed
Copy link
Author

JoelSpeed commented May 14, 2020

@enxebre This is actually not too bad using wait.PollImmediate and drops the dependency on Gomega, WDYT?

@JoelSpeed JoelSpeed changed the title Update DeleteNodesTwice test Bug 1835851: Update DeleteNodesTwice test May 14, 2020
@openshift-ci-robot openshift-ci-robot added the bugzilla/severity-high Referenced Bugzilla bug's severity is high for the branch this PR is targeting. label May 14, 2020
@openshift-ci-robot
Copy link

@JoelSpeed: This pull request references Bugzilla bug 1835851, which is invalid:

  • expected the bug to target the "4.5.0" release, but it targets "---" instead

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

Bug 1835851: Update DeleteNodesTwice test

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot added the bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. label May 14, 2020
@JoelSpeed
Copy link
Author

/bugzilla refresh

@openshift-ci-robot openshift-ci-robot added the bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. label May 14, 2020
@openshift-ci-robot
Copy link

@JoelSpeed: This pull request references Bugzilla bug 1835851, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target release (4.5.0) matches configured target release for branch (4.5.0)
  • bug is in the state NEW, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)

In response to this:

/bugzilla refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot removed the bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. label May 14, 2020
@JoelSpeed JoelSpeed force-pushed the update-delete-nodes-twice-test branch from e75294a to 7189c10 Compare May 14, 2020 15:45
@JoelSpeed
Copy link
Author

/hold cancel

This is potentially blocking other bugs from being merged. Let's prioritise this and fixup the upstream later

@openshift-ci-robot openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 14, 2020
}
return targetSize == expectedSize, nil
}); err != nil {
t.Fatalf("unexpected error: %v", err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think this error might go meaningless printed this way, the likes of "you got a time out but I'm not telling you where or why".

@enxebre
Copy link
Member

enxebre commented May 18, 2020

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label May 18, 2020
@JoelSpeed JoelSpeed force-pushed the update-delete-nodes-twice-test branch from 7189c10 to 3ec2062 Compare May 18, 2020 10:25
@openshift-ci-robot openshift-ci-robot removed the lgtm Indicates that a PR is ready to be merged. label May 18, 2020
@enxebre
Copy link
Member

enxebre commented May 18, 2020

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label May 18, 2020
@openshift-bot
Copy link

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-ci-robot
Copy link

openshift-ci-robot commented May 18, 2020

@JoelSpeed: The following test failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
ci/prow/e2e-azure-operator 3ec2062 link /test e2e-azure-operator

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-bot
Copy link

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-merge-robot openshift-merge-robot merged commit 8e5c58a into openshift:master May 18, 2020
@openshift-ci-robot
Copy link

@JoelSpeed: All pull requests linked via external trackers have merged: openshift/kubernetes-autoscaler#151. Bugzilla bug 1835851 has been moved to the MODIFIED state.

In response to this:

Bug 1835851: Update DeleteNodesTwice test

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. bugzilla/severity-high Referenced Bugzilla bug's severity is high for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants