Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Attach a byomachine to multiple byohost in some condition by mistake #185

Merged
merged 7 commits into from
Oct 29, 2021

Conversation

huchen2021
Copy link
Contributor

@huchen2021 huchen2021 commented Oct 29, 2021

There is time gap happened like this:

The first reconciler:

  • fetch the byhostlist which are referencing this machine, get len(hostsList.Items)=0

  • attach new host "host1" to byomachine1. It will trigger the patch of byohost, please see function attachByoHost.

This is pretty huge patch, it patch a lots of field of byohost object. Please see the internal implement of patch function, h.patchStatus and h.patch. It trigger two patches, one is for status field, and one is for the other field. Patch status is slower than patch the other field.

  • Run setProviderID, it failed and trigger another reconciler, beacuse node "host1" is not existed. It needs Bootstraping k8s Node in host1 successfully.

The second reconciler:

  • fetch the byhostlist which are referencing this machine, len(hostsList.Items) still 0 which is not correct in such time gap. That's because patch of status is slower than patch of other field.

The idea situation is len(hostsList.Items)=1. Actually it is not. That's beacuse in the first reconciler, attach host1 to byomachine1, it throw two patch request to update host1, and it need more time to complete the patch of status in backend. There is time gap for this happened, at this point, patch of status triggered by the first reconciler is not completed. We should not use status.MachineRef as condition

  • attach new host "host2" to byomachine1

  • setProviderID, it failed and trigger another reconciler, beacuse node "host2" is not existed. It needs Bootstraping k8s Node in host2 successfully.

third reconcinler:

  • fetch the byhostlist which are referencing this machine, len(hostsList.Items)=2. Please see here, as long as len(hostsList.Items) is not 1, it will go to attach a new byohost.

  • attach new host "host3" to byomachine1, In our code, as long as len(hostsList.Items) is not 1, it will attach new host to byomachine.

  • setProviderID, it failed and trigger another reconciler, beacuse node "host3" is not existed. It needs Bootstraping k8s Node in host3 successfully.

Finally, it will attach all rest of byohost to this byomachine.

The solution is not use status.MachineRef as condition to fetch an attached byohost, add a new label "byoh.infrastructure.cluster.x-k8s.io/byomachine-name" and use this.

Signed-off-by: Hui Chen [email protected]

@huchen2021 huchen2021 added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress label Oct 29, 2021
Signed-off-by: Hui Chen <[email protected]>
@anusha94 anusha94 requested a review from dharmjit October 29, 2021 03:49
@huchen2021 huchen2021 removed do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress cla-not-required labels Oct 29, 2021
Copy link
Contributor

@dharmjit dharmjit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM! Some Nits only

// Remove cluster-name label
delete(byoHost.Labels, clusterv1.ClusterLabelName)

// Remove Byomachine-name label
delete(byoHost.Labels, infrastructurev1beta1.AttachedByoMachineName)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit:

Suggested change
delete(byoHost.Labels, infrastructurev1beta1.AttachedByoMachineName)
delete(byoHost.Labels, infrastructurev1beta1.ByoMachineName)

apis/infrastructure/v1beta1/byohost_types.go Show resolved Hide resolved
controllers/infrastructure/byomachine_controller.go Outdated Show resolved Hide resolved
agent/reconciler/host_reconciler.go Show resolved Hide resolved
@huchen2021 huchen2021 merged commit 326a2af into vmware-tanzu:main Oct 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants