-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
📖 Label Sync Between MachineDeployment and underlying Kubernetes Nodes #6255
📖 Label Sync Between MachineDeployment and underlying Kubernetes Nodes #6255
Conversation
|
Hi @Arvinderpal. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@enxebre PTAL |
@Arvinderpal could you please fix the EasyCLA? Overall this looks reasonable to me. PTAL @sbueringer @fabriziopandini |
5f5c038
to
b6b2d67
Compare
b6b2d67
to
338e17b
Compare
@enxebre I made the suggested changes. |
/ok-to-test |
@Arvinderpal Overall looks good, nice work! a few comments |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fabriziopandini @sbueringer Thank you for our feedback. Please see my comments below.
338e17b
to
5fae8b4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@enxebre @sbueringer PTAL at my comments.
I think we have two open points which would be nice to get consensus on:
|
I have commented on both points. |
@Arvinderpal Do you have time to address the above points? |
ef8b61f
to
c15fbbe
Compare
New changes are detected. LGTM label has been removed. |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Kubernetes Nodes
c15fbbe
to
5f96f5a
Compare
@vincepri Done! We can start lazy consensus. |
/assign @CecileRobertMichon @sbueringer Lazy consensus expires on 06/15 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Arvinderpal a few details, otherwise lgtm from my side
MachineSet.spec.template.metadata.labels => Machine.labels | ||
``` | ||
|
||
As of this writing, changing the `MachineDeployment.spec.template.metadata.labels` will trigger a rollout. This is undesirable. CAPI will be updated to instead ignore updates of labels that fall within the CAPI prefix. There is precedence for this with the handling of `MachineDeploymentUniqueLabel`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As of this writing, changing the `MachineDeployment.spec.template.metadata.labels` will trigger a rollout. This is undesirable. CAPI will be updated to instead ignore updates of labels that fall within the CAPI prefix. There is precedence for this with the handling of `MachineDeploymentUniqueLabel`. | |
As of this writing, changing the `MachineDeployment.spec.template.metadata.labels` will trigger a rollout. This is undesirable. CAPI will be updated to instead propagate updates of labels in-place without a rollout. Similar changes will be made to KCP. |
I would suggest to propagate all labels inline. Otherwise we will end up in a weird state where changes to some labels trigger a MD/MS rollout while others are in-place updated (should be simpler to implement as well).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, that said this is a breaking behavioral change which needs proper release documentation and a new minor release; in addition, changes to the logic of MachineDeployment might be extensive
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah which is why I favored a separate field ... but yeah that decision has been made now :)
Just to confirm, we're fine with breaking behavior changes of our API fields as long as we do them in new minor releases? (my impression was that this requires a new apiVersion)
P.S. re: MD changes. The real fun part is probably to stop considering labels for the hash we're using while not triggering a MD rollout when the new CAPI version with this behavior is rolled out
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to confirm, we're fine with breaking behavior changes of our API fields as long as we do them in new minor releases? (my impression was that this requires a new apiVersion)
The API fields wouldn't need to change, but the propagation behavior of those fields would change, which is something that needs to be documented.
P.S. re: MD changes. The real fun part is probably to stop considering labels for the hash we're using while not triggering a MD rollout when the new CAPI version with this behavior is rolled out
The same problem applies if we add a new field to the Machine template, we'd need a way to make sure that the generated hash is backward compatible. If that's easier though, let's just add a new field?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The API fields wouldn't need to change, but the propagation behavior of those fields would change, which is something that needs to be documented.
Okay, good to know that we don't consider this a breaking change :)
The same problem applies if we add a new field to the Machine template, we'd need a way to make sure that the generated hash is backward compatible. If that's easier though, let's just add a new field?
I think if it's a separate field we just never have to include it in the hash and we don't have to figure out how to remove something from the hash calculation without rollouts. Probably?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Between the two solutions proposed above, the second one seems the most sane from a user perspective. If we also think that in the fullness of time we'd want to expand the in place node fields, this might be a good start.
That said, the biggest issue I see with the above is that we're relying on a hashing mechanism that's flaky and fragile. Have we considered breaking the hashing first before introducing new changes that might affect future rollouts and extensibility points?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That said, the biggest issue I see with the above is that we're relying on a hashing mechanism that's flaky and fragile. Have we considered breaking the hashing first before introducing new changes that might affect future rollouts and extensibility points?
Good point. I don't think we went that far regarding thinking about how to implement this proposal. But I think we should try to get to a consensus relatively soon, we're going back and forth on the field discussion since a while.
Maybe the following is a path forward?
- Pick the option that labels will be in a separate node.labels field
Here it was basically a 50:50 decision (📖 Label Sync Between MachineDeployment and underlying Kubernetes Nodes #6255 (comment)), but sounds like now we prefer a separate field - Implement the sync from Machine to Nodes
- Figure out how to in-place upgrade node.labels in MD
- Figure out how to in-place upgrade node.labels in KCP
I think the feature already would have a big benefit after 2., even without in-place upgrades (as it at least allows to propagate labels to nodes at all, which isn't possible right now).
Then we can continue with 3. and doing POC's and figuring out what the ideal way is to implement in-place upgrades.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Damn :/ Forgot that adding the field will already lead to a rollout on CAPI upgrade if we don't change the hashing.
So we definitely have to figure out how to do it for 2. already.
But If the general consensus is that we prefer the new field anyway (and we have the hash problem in both cases) we can at least move the proposal forward and then have to experiment on the implementation how to actually do it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sbueringer Thanks for pointing out the issue!
Introducing a new field is fine by me! I don't believe we have any objections, besides maybe @fabriziopandini :)
Should we also consider replacing the Deepcopy with something that includes specific fields (i.e. excludes our new field).
Also, if I'm not mistaken, wouldn't introducing a new field always cause a different hash to be generated, even if that new field is nil. So, when people upgrade, there will always be a rollout? Is this behavior considered okay for a breaking change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we also consider replacing the Deepcopy with something that includes specific fields (i.e. excludes our new field).
I think as long as we are using a changed struct the hash will change. I think we have a few options to solve this:
- "freeze" the struct by creating another one which produces the same output
- hack the way we write the struct into the hasher (e.g. by just dropping parts of the string)
- Replace Spew with something which writes a compatible output but allows ignoring fields
- Replace the entire hash mechanism through something else
There might be other alternatives as well. But for me personally it would be fine to experiment during implementation. I think by making it a separate field we have a few advantages, e.g.:
- the hash issue is probably simpler to solve (if we keep the hash mechanism, we don't have to figure out how to only include pre-existing labels in the hash and ignore new ones)
- and that it's not a behavioral change of the label field anymore.
So, when people upgrade, there will always be a rollout? Is this behavior considered okay for a breaking change?
Up until now we tried to avoid that at all costs (we even have an e2e test which validates it)
Labels that fall under the CAPI specific prefix `node.cluster.k8s.io/*` will need to be kept in sync between the Machine and Node objects. Synchronization must handle these two scenarios: | ||
(A) Label is added/removed on a Machine and must be added/removed on the corresponding Node. | ||
(B) Label is added/removed on the Node and must be removed/re-added on the Node to bring it in sync with the labels on Machine. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Labels that fall under the CAPI specific prefix `node.cluster.k8s.io/*` will need to be kept in sync between the Machine and Node objects. Synchronization must handle these two scenarios: | |
(A) Label is added/removed on a Machine and must be added/removed on the corresponding Node. | |
(B) Label is added/removed on the Node and must be removed/re-added on the Node to bring it in sync with the labels on Machine. | |
Labels that fall under the CAPI specific prefix `node.cluster.k8s.io/*` will need to be kept in sync between the Machine and Node objects. Synchronization must handle these three scenarios: | |
(A) Label is added/removed on a Machine and must be added/removed on the corresponding Node. | |
(B) Label is added/removed on the Node and must be removed/re-added on the Node to bring it in sync with the labels on Machine. | |
(C) Label value is changed on the Machine or Node and must be brought in sync with the label value on Machine. |
I think we have 3 scenarios (or something similar)
|
||
``` | ||
If you utilize inequality based selection for workload placement, to prevent unintended scheduling of pods during the initial node startup phase, it is recommend that you specify the following taint in your KubeadmConfigTemplate: | ||
`cluster.x-k8s.io=label-sync-pending:NoSchedule` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
`cluster.x-k8s.io=label-sync-pending:NoSchedule` | |
`cluster.x-k8s.io/label-sync-pending:NoSchedule` |
Assuming key:effect
(same for other occurrences)
|
||
The benefit being that it provides a clear indication to the user that these labels will be synced to the Kubernetes Node(s). This will, however, require api changes. | ||
|
||
Option #2: Propogate labels in the top-level metadata |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Option #2: Propogate labels in the top-level metadata | |
Option #2: Propagate labels in the top-level metadata |
nit: please search/replace across the whole doc (also "propagation")
I created a poc adding a new field to see how it feels here #7173. My main concern is around UX for consumers and introducing divergence from current deployment rollout approach. Another possibility would be to consider solving in place rollouts generically first, then node label propagation would be not an exception but just like any other input change could be either replace (existing MachineDeployment rollout) or in place. |
@Arvinderpal Do you have time to address the above feedback? |
Just to surface that here. Fabrizio and Alberto are currently working on / exploring how to implement it. Afaik they will update the proposal accordingly soon. |
Just want to make sure the proposal PR is up-to-date with current discussions |
Hey guys, sorry about the delay. Unfortunately, I don't have the bandwidth to work on this. Please feel free to update the proposal as you see fit. Much appreciated! |
kk I will take over work for this proposal, thank you @Arvinderpal for the work done so far and to everyone who provided valuable feedback |
/close created #7296 to start nailing down Machine-Node label propagation |
@fabriziopandini: Closed this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
What this PR does / why we need it:
Proposal: Label Sync Between MachineDeployment and underlying Kubernetes Nodes
Google Doc: https://docs.google.com/document/d/17QA2E0GcbWNYb160qs8ArHOW0uMfL-NTYivefPGtn-c/edit?usp=sharing
Which issue(s) this PR fixes
Fixes # #493