Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

⚠️ Set MachinePool feature flag default to true + Beta #10141

Merged
merged 1 commit into from
Mar 26, 2024

Conversation

mboersma
Copy link
Contributor

What this PR does / why we need it:

Sets the default for EXP_MACHINE_POOL to true, so MachinePools are an "always on" feature (unless the user disables it).

This seems like the first step en route to moving MachinePools to beta status based on discussion at last week's community meeting, but please let me know if there are other prerequisites or details I've missed.

Which issue(s) this PR fixes:

Refs #9005

/area machinepool

@k8s-ci-robot k8s-ci-robot added area/machinepool Issues or PRs related to machinepools cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Feb 12, 2024
@mboersma
Copy link
Contributor Author

For some reason this change is preventing the "When following the Cluster API quick-start with ClusterClass" spec from deleting its cluster; it times out in AfterEach. I can reproduce this locally and I'll try to figure it out.

@sbueringer
Copy link
Member

sbueringer commented Feb 13, 2024

For some reason this change is preventing the "When following the Cluster API quick-start with ClusterClass" spec from deleting its cluster; it times out in AfterEach. I can reproduce this locally and I'll try to figure it out.

Sounds really strange. I think the feature flag was already enabled in all e2e tests (https://github.com/kubernetes-sigs/cluster-api/blob/main/test/e2e/config/docker.yaml#L263)

@fabriziopandini fabriziopandini changed the title 🌱 Set MachinePool feature flag default to true ⚠️ Set MachinePool feature flag default to true Feb 15, 2024
@fabriziopandini
Copy link
Member

Modified the PR type so this change pops up in the release notes

@jackfrancis
Copy link
Contributor

Any updates here?

@mboersma
Copy link
Contributor Author

mboersma commented Mar 7, 2024

No updates, I was out for a week and have had other tasks. I'll dig into this again today.

It looks like a latent bug somewhere in the DockerMP controller, when ClusterClass is also enabled. I made one attempt to fix the code but it wasn't sufficient. We thought the feature flag was on in all the tests, but perhaps not.

@mboersma mboersma force-pushed the machine-pools-are-on branch 3 times, most recently from f7f9fab to 0e2905a Compare March 13, 2024 16:51
@mboersma
Copy link
Contributor Author

mboersma commented Mar 13, 2024

I'm clearly grasping at straws here, since I can't reproduce this locally (it always passes and cleanup doesn't error out).

One thing I noticed is this at the end of every containerd.log:

Mar 12 19:01:56.278779 q��������uick-start-ur0y9c-md-0-rqn8x-vlkx6-57shl containerd[128]: time="2024-03-12T19:01:56.278678475Z" level=error msg="failed to reload cni configuration after receiving fs change event(WRITE         \"/etc/cni/net.d/10-kindnet.conflist.temp\")" error="cni config load failed: no network config found in /etc/cni/net.d: cni plugin not initialized: failed to load cni config"
Mar 12 19:01:56.279274 quick-start-ur0y9c-md-0-rqn8x-vlkx6-57shl containerd[128]: time="2024-03-12T19:01:56.278842167Z" level=error msg="failed to reload cni configuration after receiving fs change event(WRITE         \"/etc/cni/net.d/10-kindnet.conflist.temp\")" error="cni config load failed: no network config found in /etc/cni/net.d: cni plugin not initialized: failed to load cni config"
Mar 12 19:01:56.279274 quick-start-ur0y9c-md-0-rqn8x-vlkx6-57shl containerd[128]: time="2024-03-12T19:01:56.278941917Z" level=error msg="failed to reload cni configuration after receiving fs change event(WRITE         \"/etc/cni/net.d/10-kindnet.conflist.temp\")" error="cni config load failed: no network config found in /etc/cni/net.d: cni plugin not initialized: failed to load cni config"

Edit: never mind, that seems to be true for passing PRs as well.

@sbueringer
Copy link
Member

I'll take a closer look when I get around to it

@@ -70,7 +70,7 @@ func init() {
// To add a new feature, define a key for it above and add it here.
var defaultClusterAPIFeatureGates = map[featuregate.Feature]featuregate.FeatureSpec{
// Every feature should be initiated here:
MachinePool: {Default: false, PreRelease: featuregate.Alpha},
MachinePool: {Default: true, PreRelease: featuregate.Alpha},
Copy link
Member

@sbueringer sbueringer Mar 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no clue, but wondering if it should be changed to Beta

(but has nothing to do with the CI failure of course)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 in core k8s features are defaulted to true in beta.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just change it to Beta here then?

The graduation criteria for MachinePools in its proposal are still "TBD," and MachinePool Machines have landed and seem stable, so I think this makes sense; I'll go ahead and do so and squash. But please let me know if there is anything else we should consider.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me. After a few years it's fair to call it beta :)

@sbueringer
Copy link
Member

@mboersma Pushed a commit to also enable MachinePools in CAPD per default. But I don't think this causes the issue.

Based on the logs I can see that MachinePools is enabled in CAPD anyway (because we enable the feature gate via docker.yaml).

This should also mean that changing the manager.yaml's shouldn't be relevant

@mboersma
Copy link
Contributor Author

Pushed a commit to also enable MachinePools in CAPD per default.

Thanks! I'm surprised I missed this... But I agree, it's probably not related.

My guess is that there's some race condition-type bug in deleting a Docker MachinePool this happens to trigger. Locally, I've run e2e many times but only seen a couple of these failures.

But clearly changing the MachinePool feature flag to true also changed some behavior, although we assumed it was already true everywhere.

Copy link
Member

@neolit123 neolit123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we anticipating any users to complain "we don't want MP on by default" and if yes what would be their story?

@mboersma
Copy link
Contributor Author

are we anticipating any users to complain "we don't want MP on by default" and if yes what would be their story?

I'm sure someone will object, and I understand from the point of view of "keep it simple." I don't think there's much cost to turning it on, especially if your clusters don't contain MachinePools. Changing the default here shouldn't be a breaking change (this PR notwithstanding 😛).

In terms of this change, they can still set EXP_MACHINE_POOL to false. That's annoying, but we went through the same evolution with ClusterResourceSet.

@mboersma mboersma force-pushed the machine-pools-are-on branch from 92e5bf2 to 0172398 Compare March 14, 2024 15:41
@sbueringer
Copy link
Member

sbueringer commented Mar 15, 2024

Last one from my side, otherwise lgtm: #10141 (comment)
Feel free to squash

I think it's fine to enable per default and also to eventually get rid of the feature flag in later releases.

I think if folks have a problem with an additional controller being enabled per default they can implement a webhook blocking creations of MachinePools (or of course while we still have the feature flag disable it)

@mboersma mboersma force-pushed the machine-pools-are-on branch from ad34a9f to b13f043 Compare March 15, 2024 15:51
@mboersma
Copy link
Contributor Author

/retitle ⚠️ Set MachinePool feature flag default to true + Beta

@k8s-ci-robot k8s-ci-robot changed the title ⚠️ Set MachinePool feature flag default to true ⚠️ Set MachinePool feature flag default to true + Beta Mar 15, 2024
@mboersma mboersma force-pushed the machine-pools-are-on branch 2 times, most recently from 426ac75 to e961abe Compare March 15, 2024 18:41
@nawazkh
Copy link
Member

nawazkh commented Mar 20, 2024

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 20, 2024
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 09879fd4885b2042d5eb6a2ed29ca2a9dff325d2

@sbueringer sbueringer added the tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges. label Mar 26, 2024
Copy link
Member

@sbueringer sbueringer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Last one

config/manager/manager.yaml Show resolved Hide resolved
@sbueringer
Copy link
Member

cc @mboersma ^^

@mboersma mboersma force-pushed the machine-pools-are-on branch from b13369c to 33def78 Compare March 26, 2024 13:30
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 26, 2024
@sbueringer
Copy link
Member

Thx!

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 26, 2024
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 6b7170d400000b57e047c959fc2e85c64f3704bb

Copy link
Member

@fabriziopandini fabriziopandini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

@mboersma thanks for taking care of this!
When we will start thinking about GA, we should figure out how to move the code out of the exp folder, but that's not necessary for this step

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: fabriziopandini

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 26, 2024
@k8s-ci-robot k8s-ci-robot merged commit 128914b into kubernetes-sigs:main Mar 26, 2024
21 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v1.7 milestone Mar 26, 2024
@mboersma mboersma deleted the machine-pools-are-on branch March 26, 2024 20:49
@sbueringer sbueringer mentioned this pull request Apr 16, 2024
Dhairya-Arora01 pushed a commit to Dhairya-Arora01/cluster-api that referenced this pull request May 25, 2024
Dhairya-Arora01 pushed a commit to Dhairya-Arora01/cluster-api that referenced this pull request May 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/machinepool Issues or PRs related to machinepools cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants