Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Guidance for skipping Alpha for feature gated changes #7828

Merged
merged 1 commit into from
Apr 24, 2024

Conversation

wojtek-t
Copy link
Member

Follow up from SIG Architecture meeting on Apr 18th 2024

/assign @deads2k @johnbelamaric @thockin @liggitt @jpbetz

@k8s-ci-robot k8s-ci-robot added area/developer-guide Issues or PRs related to the developer guide cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. sig/architecture Categorizes an issue or PR as relevant to SIG Architecture. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Apr 19, 2024
Comment on lines 21 to 22
- changes that are explicitly considered as bug fixes are recommended to go directly to `Beta`
and should be enabled by-default from the very beginning
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many bug fixes today are merged without a feature gate at all.

Is the intent here that "bug fixes that have a sufficient level of risk that being able to turn the fix off via a feature gate is justified" should be handled this way?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we are using Deprecated state to roll out some bug fixes that may have some risk of breaking , see kubernetes/kubernetes#119789

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Deprecated" is really for changing functionality which could be perceived as "removal". It's, not a very clear principle.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it actually works in all cases. Let me try to clarify that [yes, I think that bug fixes with sufficient level of risk should actually go with FGs]

@@ -9,6 +9,17 @@ Feature gates are intended to cover the development life cycle of a feature - th
Features generally progress through `Alpha` -> `Beta` -> `GA`. Sometimes we end up deciding that a feature is not going to be supported and we end up marking them as `Deprecated`.

The majority of features will go through all three stages, but occasionally there are features which may skip stages.
While some exceptions may happen, approvers should use the following guidance:
- features that involve API changes must progress through all `Alpha`, `Beta`, `GA` stages
- larger or more complex changes should progress through all `Alpha`, `Beta`, `GA` stages
Copy link
Contributor

@jpbetz jpbetz Apr 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- larger or more complex changes should progress through all `Alpha`, `Beta`, `GA` stages
- features that are unproven at achieving their goals, have significant complexity, risk of defects/problematic edge cases, or performance/scalability implications should progress through all `Alpha`, `Beta`, `GA` stages

(trying to expand the word "larger" into criteria somehow.. further improvements welcome)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to the clarification.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this too - thanks!

While some exceptions may happen, approvers should use the following guidance:
- features that involve API changes must progress through all `Alpha`, `Beta`, `GA` stages
- larger or more complex changes should progress through all `Alpha`, `Beta`, `GA` stages
- smaller changes that carry non-trivial risk (e.g. due to changing user-facing behavior) might skip
Copy link
Contributor

@jpbetz jpbetz Apr 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- smaller changes that carry non-trivial risk (e.g. due to changing user-facing behavior) might skip
- features achieve their goals with minimal complexity and performance/scalability implications, but that still carry non-trivial risk (e.g. due to changing user-facing behavior, or risk of defects/problematic edge cases) might skip

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to the clarification.

We should also indicate that changes that carry a risk of making previously working functionality no longer work in certain edges should always start in an off-by-default state.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"features which..."

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

- larger or more complex changes should progress through all `Alpha`, `Beta`, `GA` stages
- smaller changes that carry non-trivial risk (e.g. due to changing user-facing behavior) might skip
`Alpha` and start directly in `Beta` (provided the appropriate `Beta` quality is achieved)
but should be off by-default until proven in some production environment
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
but should be off by-default until proven in some production environment
but should be off by-default until proven in representative production environment that utilized the feature with the scale or variety of use to prove it's working

- smaller changes that carry non-trivial risk (e.g. due to changing user-facing behavior) might skip
`Alpha` and start directly in `Beta` (provided the appropriate `Beta` quality is achieved)
but should be off by-default until proven in some production environment
- smaller changes with low enough risk might skip `Alpha` and start directly in `Beta`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

those are basically the bug fixes? Is the KEP only needed to introduce the Feature Gate instead of providing config options?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If so, we can be open about it

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, these are "bugfixes which might need a backout"

Copy link
Member

@SergeyKanzhelev SergeyKanzhelev Apr 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe just say it like this:

Suggested change
- smaller changes with low enough risk might skip `Alpha` and start directly in `Beta`
- smaller changes with low enough risk that still may need to be disabled using the feature gate without introducing a new long term configuration option, might skip `Alpha` and start directly in `Beta`

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just want it to be very explicit

Copy link
Member

@thockin thockin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This overlaps with my PR on this file, but it's OK. I'll rebase. :)

I am a visual person:

     Trivial     Low             High            API changes
     bugfixes    complexity      complexity            |
        |        changes with    changes with          |
        |        no API          no API                |
        |           |               |                  |
Risk ---O-----------O---------------O------------------O-------->
        |           |               |                  |
     No gate     Start beta      Start alpha     Start alpha,
     required    and enabled     or beta, but    disabled                                                                                                       
                                 disabled

While some exceptions may happen, approvers should use the following guidance:
- features that involve API changes must progress through all `Alpha`, `Beta`, `GA` stages
- larger or more complex changes should progress through all `Alpha`, `Beta`, `GA` stages
- smaller changes that carry non-trivial risk (e.g. due to changing user-facing behavior) might skip
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"features which..."

- smaller changes that carry non-trivial risk (e.g. due to changing user-facing behavior) might skip
`Alpha` and start directly in `Beta` (provided the appropriate `Beta` quality is achieved)
but should be off by-default until proven in some production environment
- smaller changes with low enough risk might skip `Alpha` and start directly in `Beta`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, these are "bugfixes which might need a backout"

Comment on lines 21 to 22
- changes that are explicitly considered as bug fixes are recommended to go directly to `Beta`
and should be enabled by-default from the very beginning
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Deprecated" is really for changing functionality which could be perceived as "removal". It's, not a very clear principle.

@SergeyKanzhelev
Copy link
Member

This overlaps with my PR on this file, but it's OK. I'll rebase. :)

I am a visual person:

     Trivial     Low             High            API changes
     bugfixes    complexity      complexity            |
        |        changes with    changes with          |
        |        no API          no API                |
        |           |               |                  |
Risk ---O-----------O---------------O------------------O-------->
        |           |               |                  |
     No gate     Start beta      Start alpha     Start alpha,
     required    and enabled     or beta, but    disabled                                                                                                       
                                 disabled

Do we need to say explicitly that the trivial bugfix is different from low complexity changed by the fact that we provide the temporary opt out config option. Maybe better name "trivial" -> "no opt-out".

Copy link
Member Author

@wojtek-t wojtek-t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PTAL

@@ -9,6 +9,17 @@ Feature gates are intended to cover the development life cycle of a feature - th
Features generally progress through `Alpha` -> `Beta` -> `GA`. Sometimes we end up deciding that a feature is not going to be supported and we end up marking them as `Deprecated`.

The majority of features will go through all three stages, but occasionally there are features which may skip stages.
While some exceptions may happen, approvers should use the following guidance:
- features that involve API changes must progress through all `Alpha`, `Beta`, `GA` stages
- larger or more complex changes should progress through all `Alpha`, `Beta`, `GA` stages
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this too - thanks!

While some exceptions may happen, approvers should use the following guidance:
- features that involve API changes must progress through all `Alpha`, `Beta`, `GA` stages
- larger or more complex changes should progress through all `Alpha`, `Beta`, `GA` stages
- smaller changes that carry non-trivial risk (e.g. due to changing user-facing behavior) might skip
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment on lines 21 to 22
- changes that are explicitly considered as bug fixes are recommended to go directly to `Beta`
and should be enabled by-default from the very beginning
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it actually works in all cases. Let me try to clarify that [yes, I think that bug fixes with sufficient level of risk should actually go with FGs]

Copy link
Member

@SergeyKanzhelev SergeyKanzhelev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 22, 2024
@@ -9,6 +9,28 @@ Feature gates are intended to cover the development life cycle of a feature - th
Features generally progress through `Alpha` -> `Beta` -> `GA`. Sometimes we end up deciding that a feature is not going to be supported and we end up marking them as `Deprecated`.

The majority of features will go through all three stages, but occasionally there are features which may skip stages.
While some exceptions may happen, approvers should use the following guidance:
- features that involve API changes must progress through all `Alpha`, `Beta`, `GA` stages
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I posted on slack about this but can we define (or point me to a place) what is meant by an API change?

We have staging api changes which are obvious to me. But do we consider cli flags and configs for kubelet to be a API change?

So any new flag/api change to KubeletConfig would require alpha -> beta -> GA?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general when we use this term I think it refers to PRs that would trigger API review. So, this is about the K8s APIs, not the broader user surface. So, basically, anything that makes you read https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api_changes.md

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we consider cli flags and configs for kubelet to be a API change

Yes. https://github.com/kubernetes/community/blob/master/sig-architecture/api-review-process.md#what-parts-of-a-pr-are-api-changes is a crisp list of "things that are APIs"

APIs are the way things outside the project integrate with our binaries / servers, and changes to that surface area can disrupt or break integrators. It's important that we be very confident in things that increase that surface area, so having at least a single release where we are still able to adjust that surface area is important.

So any new flag/api change to KubeletConfig would require alpha -> beta -> GA?

Kubelet config overall is still beta (:-/) so a field inside it can't really be more mature than beta, but generally, yes, new things start as alpha for a release.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the other important point is the removal of flags, removing a flag in kubelet per example will make that the new version will fail to start with "unknown flag" (I can not remember the exactly) so removal of flags need to consider this problem

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, this is about the K8s APIs, not the broader user surface. So, basically, anything that makes you read https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api_changes.md

we should starting thinking about moving these or some of this conventions to the CRI API too

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a suggestion with Jordan's link

@thockin
Copy link
Member

thockin commented Apr 23, 2024

/lgtm

@johnbelamaric
Copy link
Member

/hold

LGTM but I am holding for @deads2k. David go ahead and remove the hold if you are good with this.

/approve
/lgtm

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 24, 2024
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: johnbelamaric, wojtek-t

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 24, 2024
@@ -9,6 +9,28 @@ Feature gates are intended to cover the development life cycle of a feature - th
Features generally progress through `Alpha` -> `Beta` -> `GA`. Sometimes we end up deciding that a feature is not going to be supported and we end up marking them as `Deprecated`.

The majority of features will go through all three stages, but occasionally there are features which may skip stages.
While some exceptions may happen, approvers should use the following guidance:
- features that involve API changes must progress through all `Alpha`, `Beta`, `GA` stages
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- features that involve API changes must progress through all `Alpha`, `Beta`, `GA` stages
- features that involve [API changes](https://github.com/kubernetes/community/blob/master/sig-architecture/api-review-process.md#what-parts-of-a-pr-are-api-changes) must progress through all `Alpha`, `Beta`, `GA` stages

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 24, 2024
@johnbelamaric
Copy link
Member

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 24, 2024
@deads2k
Copy link
Contributor

deads2k commented Apr 24, 2024

LGTM but I am holding for @deads2k. David go ahead and remove the hold if you are good with this.

/approve /lgtm

Thanks for the chance for a re-look.

/lgtm
/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 24, 2024
@k8s-ci-robot k8s-ci-robot merged commit ead10e6 into kubernetes:master Apr 24, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/developer-guide Issues or PRs related to the developer guide cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. sig/architecture Categorizes an issue or PR as relevant to SIG Architecture. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants