Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

📖 Label Sync Between MachineDeployment and underlying Kubernetes Nodes #6255

Conversation

Arvinderpal
Copy link
Contributor

What this PR does / why we need it:

Proposal: Label Sync Between MachineDeployment and underlying Kubernetes Nodes

Google Doc: https://docs.google.com/document/d/17QA2E0GcbWNYb160qs8ArHOW0uMfL-NTYivefPGtn-c/edit?usp=sharing

Which issue(s) this PR fixes
Fixes # #493

@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Mar 4, 2022

CLA Signed

The committers listed above are authorized under a signed CLA.

  • ✅ login: Arvinderpal / name: Arvinderpal (b6b2d67)

@k8s-ci-robot k8s-ci-robot added cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Mar 4, 2022
@k8s-ci-robot
Copy link
Contributor

Hi @Arvinderpal. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Mar 4, 2022
@Arvinderpal
Copy link
Contributor Author

@enxebre PTAL

@enxebre
Copy link
Member

enxebre commented Mar 10, 2022

@Arvinderpal could you please fix the EasyCLA?

Overall this looks reasonable to me. PTAL @sbueringer @fabriziopandini

@Arvinderpal Arvinderpal force-pushed the md-node-label-sync-proposal branch from 5f5c038 to b6b2d67 Compare March 13, 2022 22:10
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels Mar 13, 2022
@Arvinderpal Arvinderpal force-pushed the md-node-label-sync-proposal branch from b6b2d67 to 338e17b Compare March 13, 2022 22:45
@Arvinderpal Arvinderpal changed the title 📖 WIP: Label Sync Between MachineDeployment and underlying Kubernetes Nodes 📖 Label Sync Between MachineDeployment and underlying Kubernetes Nodes Mar 15, 2022
@Arvinderpal
Copy link
Contributor Author

@enxebre I made the suggested changes.
@fabriziopandini @sbueringer PTAL.
Thank you

@fabriziopandini
Copy link
Member

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Mar 18, 2022
docs/proposals/20220210-md-node-label-sync.md Outdated Show resolved Hide resolved
docs/proposals/20220210-md-node-label-sync.md Outdated Show resolved Hide resolved
docs/proposals/20220210-md-node-label-sync.md Show resolved Hide resolved
docs/proposals/20220210-md-node-label-sync.md Outdated Show resolved Hide resolved
docs/proposals/20220210-md-node-label-sync.md Outdated Show resolved Hide resolved
docs/proposals/20220210-md-node-label-sync.md Outdated Show resolved Hide resolved
docs/proposals/20220210-md-node-label-sync.md Outdated Show resolved Hide resolved
docs/proposals/20220210-md-node-label-sync.md Outdated Show resolved Hide resolved
docs/proposals/20220210-md-node-label-sync.md Outdated Show resolved Hide resolved
docs/proposals/20220210-md-node-label-sync.md Outdated Show resolved Hide resolved
docs/proposals/20220210-md-node-label-sync.md Outdated Show resolved Hide resolved
docs/proposals/20220210-md-node-label-sync.md Outdated Show resolved Hide resolved
docs/proposals/20220210-md-node-label-sync.md Show resolved Hide resolved
docs/proposals/20220210-md-node-label-sync.md Show resolved Hide resolved
docs/proposals/20220210-md-node-label-sync.md Outdated Show resolved Hide resolved
docs/proposals/20220210-md-node-label-sync.md Show resolved Hide resolved
@sbueringer
Copy link
Member

sbueringer commented Mar 21, 2022

@Arvinderpal Overall looks good, nice work! a few comments

Copy link
Contributor Author

@Arvinderpal Arvinderpal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fabriziopandini @sbueringer Thank you for our feedback. Please see my comments below.

docs/proposals/20220210-md-node-label-sync.md Outdated Show resolved Hide resolved
docs/proposals/20220210-md-node-label-sync.md Outdated Show resolved Hide resolved
docs/proposals/20220210-md-node-label-sync.md Outdated Show resolved Hide resolved
docs/proposals/20220210-md-node-label-sync.md Outdated Show resolved Hide resolved
docs/proposals/20220210-md-node-label-sync.md Outdated Show resolved Hide resolved
docs/proposals/20220210-md-node-label-sync.md Outdated Show resolved Hide resolved
docs/proposals/20220210-md-node-label-sync.md Show resolved Hide resolved
docs/proposals/20220210-md-node-label-sync.md Show resolved Hide resolved
docs/proposals/20220210-md-node-label-sync.md Outdated Show resolved Hide resolved
docs/proposals/20220210-md-node-label-sync.md Outdated Show resolved Hide resolved
@Arvinderpal Arvinderpal force-pushed the md-node-label-sync-proposal branch from 338e17b to 5fae8b4 Compare March 22, 2022 02:38
Copy link
Contributor Author

@Arvinderpal Arvinderpal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@enxebre @sbueringer PTAL at my comments.

docs/proposals/20220210-md-node-label-sync.md Outdated Show resolved Hide resolved
docs/proposals/20220210-md-node-label-sync.md Show resolved Hide resolved
docs/proposals/20220210-md-node-label-sync.md Outdated Show resolved Hide resolved
@sbueringer
Copy link
Member

sbueringer commented May 25, 2022

I think we have two open points which would be nice to get consensus on:

  1. Should we have a separate field for the labels that are synced: 📖 Label Sync Between MachineDeployment and underlying Kubernetes Nodes #6255 (comment)
  2. Do we want to implement it for KCP and MD machines as part of this proposal or try to only implement it for MD machines? 📖 Label Sync Between MachineDeployment and underlying Kubernetes Nodes #6255 (comment) (we can figure out the details during implementation)

@fabriziopandini
Copy link
Member

I have commented on both points.
TL;DR; it seems to me that consensus is on using labels and extending this proposal to include KCP
When those points are addressed and all the pending comments resolved, IMO we can start lazy consensus, or as @CecileRobertMichon suggested, gate lazy consensus with at least two maintainers lgtm

@vincepri
Copy link
Member

vincepri commented Jun 7, 2022

@Arvinderpal Do you have time to address the above points?

@Arvinderpal Arvinderpal force-pushed the md-node-label-sync-proposal branch from ef8b61f to c15fbbe Compare June 7, 2022 17:25
@k8s-ci-robot
Copy link
Contributor

New changes are detected. LGTM label has been removed.

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 7, 2022
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To complete the pull request process, please ask for approval from enxebre after the PR has been reviewed.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@Arvinderpal Arvinderpal force-pushed the md-node-label-sync-proposal branch from c15fbbe to 5f96f5a Compare June 7, 2022 17:31
@Arvinderpal
Copy link
Contributor Author

@Arvinderpal Do you have time to address the above points?

@vincepri Done! We can start lazy consensus.

@vincepri
Copy link
Member

vincepri commented Jun 8, 2022

/assign @CecileRobertMichon @sbueringer

Lazy consensus expires on 06/15

Copy link
Member

@sbueringer sbueringer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Arvinderpal a few details, otherwise lgtm from my side

MachineSet.spec.template.metadata.labels => Machine.labels
```

As of this writing, changing the `MachineDeployment.spec.template.metadata.labels` will trigger a rollout. This is undesirable. CAPI will be updated to instead ignore updates of labels that fall within the CAPI prefix. There is precedence for this with the handling of `MachineDeploymentUniqueLabel`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
As of this writing, changing the `MachineDeployment.spec.template.metadata.labels` will trigger a rollout. This is undesirable. CAPI will be updated to instead ignore updates of labels that fall within the CAPI prefix. There is precedence for this with the handling of `MachineDeploymentUniqueLabel`.
As of this writing, changing the `MachineDeployment.spec.template.metadata.labels` will trigger a rollout. This is undesirable. CAPI will be updated to instead propagate updates of labels in-place without a rollout. Similar changes will be made to KCP.

I would suggest to propagate all labels inline. Otherwise we will end up in a weird state where changes to some labels trigger a MD/MS rollout while others are in-place updated (should be simpler to implement as well).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, that said this is a breaking behavioral change which needs proper release documentation and a new minor release; in addition, changes to the logic of MachineDeployment might be extensive

Copy link
Member

@sbueringer sbueringer Jun 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah which is why I favored a separate field ... but yeah that decision has been made now :)

Just to confirm, we're fine with breaking behavior changes of our API fields as long as we do them in new minor releases? (my impression was that this requires a new apiVersion)

P.S. re: MD changes. The real fun part is probably to stop considering labels for the hash we're using while not triggering a MD rollout when the new CAPI version with this behavior is rolled out

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to confirm, we're fine with breaking behavior changes of our API fields as long as we do them in new minor releases? (my impression was that this requires a new apiVersion)

The API fields wouldn't need to change, but the propagation behavior of those fields would change, which is something that needs to be documented.

P.S. re: MD changes. The real fun part is probably to stop considering labels for the hash we're using while not triggering a MD rollout when the new CAPI version with this behavior is rolled out

The same problem applies if we add a new field to the Machine template, we'd need a way to make sure that the generated hash is backward compatible. If that's easier though, let's just add a new field?

Copy link
Member

@sbueringer sbueringer Jun 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The API fields wouldn't need to change, but the propagation behavior of those fields would change, which is something that needs to be documented.

Okay, good to know that we don't consider this a breaking change :)

The same problem applies if we add a new field to the Machine template, we'd need a way to make sure that the generated hash is backward compatible. If that's easier though, let's just add a new field?

I think if it's a separate field we just never have to include it in the hash and we don't have to figure out how to remove something from the hash calculation without rollouts. Probably?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Between the two solutions proposed above, the second one seems the most sane from a user perspective. If we also think that in the fullness of time we'd want to expand the in place node fields, this might be a good start.

That said, the biggest issue I see with the above is that we're relying on a hashing mechanism that's flaky and fragile. Have we considered breaking the hashing first before introducing new changes that might affect future rollouts and extensibility points?

Copy link
Member

@sbueringer sbueringer Jun 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That said, the biggest issue I see with the above is that we're relying on a hashing mechanism that's flaky and fragile. Have we considered breaking the hashing first before introducing new changes that might affect future rollouts and extensibility points?

Good point. I don't think we went that far regarding thinking about how to implement this proposal. But I think we should try to get to a consensus relatively soon, we're going back and forth on the field discussion since a while.

Maybe the following is a path forward?

  1. Pick the option that labels will be in a separate node.labels field
    Here it was basically a 50:50 decision (📖 Label Sync Between MachineDeployment and underlying Kubernetes Nodes #6255 (comment)), but sounds like now we prefer a separate field
  2. Implement the sync from Machine to Nodes
  3. Figure out how to in-place upgrade node.labels in MD
  4. Figure out how to in-place upgrade node.labels in KCP

I think the feature already would have a big benefit after 2., even without in-place upgrades (as it at least allows to propagate labels to nodes at all, which isn't possible right now).

Then we can continue with 3. and doing POC's and figuring out what the ideal way is to implement in-place upgrades.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Damn :/ Forgot that adding the field will already lead to a rollout on CAPI upgrade if we don't change the hashing.
So we definitely have to figure out how to do it for 2. already.

But If the general consensus is that we prefer the new field anyway (and we have the hash problem in both cases) we can at least move the proposal forward and then have to experiment on the implementation how to actually do it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sbueringer Thanks for pointing out the issue!
Introducing a new field is fine by me! I don't believe we have any objections, besides maybe @fabriziopandini :)

Should we also consider replacing the Deepcopy with something that includes specific fields (i.e. excludes our new field).

Also, if I'm not mistaken, wouldn't introducing a new field always cause a different hash to be generated, even if that new field is nil. So, when people upgrade, there will always be a rollout? Is this behavior considered okay for a breaking change?

Copy link
Member

@sbueringer sbueringer Jun 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also consider replacing the Deepcopy with something that includes specific fields (i.e. excludes our new field).

I think as long as we are using a changed struct the hash will change. I think we have a few options to solve this:

  1. "freeze" the struct by creating another one which produces the same output
  2. hack the way we write the struct into the hasher (e.g. by just dropping parts of the string)
  3. Replace Spew with something which writes a compatible output but allows ignoring fields
  4. Replace the entire hash mechanism through something else

There might be other alternatives as well. But for me personally it would be fine to experiment during implementation. I think by making it a separate field we have a few advantages, e.g.:

  • the hash issue is probably simpler to solve (if we keep the hash mechanism, we don't have to figure out how to only include pre-existing labels in the hash and ignore new ones)
  • and that it's not a behavioral change of the label field anymore.

So, when people upgrade, there will always be a rollout? Is this behavior considered okay for a breaking change?

Up until now we tried to avoid that at all costs (we even have an e2e test which validates it)

Comment on lines +105 to +107
Labels that fall under the CAPI specific prefix `node.cluster.k8s.io/*` will need to be kept in sync between the Machine and Node objects. Synchronization must handle these two scenarios:
(A) Label is added/removed on a Machine and must be added/removed on the corresponding Node.
(B) Label is added/removed on the Node and must be removed/re-added on the Node to bring it in sync with the labels on Machine.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Labels that fall under the CAPI specific prefix `node.cluster.k8s.io/*` will need to be kept in sync between the Machine and Node objects. Synchronization must handle these two scenarios:
(A) Label is added/removed on a Machine and must be added/removed on the corresponding Node.
(B) Label is added/removed on the Node and must be removed/re-added on the Node to bring it in sync with the labels on Machine.
Labels that fall under the CAPI specific prefix `node.cluster.k8s.io/*` will need to be kept in sync between the Machine and Node objects. Synchronization must handle these three scenarios:
(A) Label is added/removed on a Machine and must be added/removed on the corresponding Node.
(B) Label is added/removed on the Node and must be removed/re-added on the Node to bring it in sync with the labels on Machine.
(C) Label value is changed on the Machine or Node and must be brought in sync with the label value on Machine.

I think we have 3 scenarios (or something similar)


```
If you utilize inequality based selection for workload placement, to prevent unintended scheduling of pods during the initial node startup phase, it is recommend that you specify the following taint in your KubeadmConfigTemplate:
`cluster.x-k8s.io=label-sync-pending:NoSchedule`
Copy link
Member

@sbueringer sbueringer Jun 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
`cluster.x-k8s.io=label-sync-pending:NoSchedule`
`cluster.x-k8s.io/label-sync-pending:NoSchedule`

Assuming key:effect

(same for other occurrences)


The benefit being that it provides a clear indication to the user that these labels will be synced to the Kubernetes Node(s). This will, however, require api changes.

Option #2: Propogate labels in the top-level metadata
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Option #2: Propogate labels in the top-level metadata
Option #2: Propagate labels in the top-level metadata

nit: please search/replace across the whole doc (also "propagation")

@enxebre
Copy link
Member

enxebre commented Sep 6, 2022

I created a poc adding a new field to see how it feels here #7173.

My main concern is around UX for consumers and introducing divergence from current deployment rollout approach. Another possibility would be to consider solving in place rollouts generically first, then node label propagation would be not an exception but just like any other input change could be either replace (existing MachineDeployment rollout) or in place.

@vincepri
Copy link
Member

@Arvinderpal Do you have time to address the above feedback?

@sbueringer
Copy link
Member

@Arvinderpal Do you have time to address the above feedback?

Just to surface that here. Fabrizio and Alberto are currently working on / exploring how to implement it. Afaik they will update the proposal accordingly soon.

@vincepri
Copy link
Member

Just want to make sure the proposal PR is up-to-date with current discussions

@Arvinderpal
Copy link
Contributor Author

Hey guys, sorry about the delay. Unfortunately, I don't have the bandwidth to work on this. Please feel free to update the proposal as you see fit. Much appreciated!

@fabriziopandini
Copy link
Member

kk I will take over work for this proposal, thank you @Arvinderpal for the work done so far and to everyone who provided valuable feedback

@fabriziopandini
Copy link
Member

/close

created #7296 to start nailing down Machine-Node label propagation

@k8s-ci-robot
Copy link
Contributor

@fabriziopandini: Closed this PR.

In response to this:

/close

created #7296 to start nailing down Machine-Node label propagation

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants