Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KEP-1965: update with Beta criteria/milestone and PRR questions answered #3589

Merged
merged 5 commits into from
Oct 6, 2022

Conversation

andrewsykim
Copy link
Member

@andrewsykim andrewsykim commented Oct 5, 2022

Signed-off-by: Andrew Sy Kim [email protected]
Co-authored-by: Monis Khan [email protected]

  • One-line PR description:

Update KEP-1965 to include all Beta criterias and answer remaining PRR questions.

  • Issue link:

#1965

  • Other comments:

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. labels Oct 5, 2022
@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Oct 5, 2022
@andrewsykim andrewsykim force-pushed the kep-1965 branch 2 times, most recently from 42968e1 to 1560bab Compare October 5, 2022 20:58
@andrewsykim andrewsykim changed the title [WIP] KEP-1965: update with Beta criteria/milestone and PRR questions answered KEP-1965: update with Beta criteria/milestone and PRR questions answered Oct 5, 2022
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 5, 2022
@andrewsykim andrewsykim force-pushed the kep-1965 branch 3 times, most recently from b3b8d0e to 4cbc197 Compare October 6, 2022 01:23
Copy link
Member

@wojtek-t wojtek-t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fairly interested in this graduating to Beta too.

@@ -95,7 +120,7 @@ will only delay the storage migration for the same period of time.

## Design Details

The [kubelet heartbeat](https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/0009-node-heartbeat.md)
The [kubelet heartbeat](https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/589-efficient-node-heartbeats)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

L128: This was slightly misleading to me as it was suggesting the main GC controller in kcm.
In fact, there is a separate controller inside kcm that is doing exactly that:
https://github.com/kubernetes/kubernetes/blob/master/pkg/controller/storageversiongc/gc_controller.go

[and then it makes much more sense to use hardcoded namespaces, etc.]

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My current understanding is that GC controller in KCM removes stales Leases, but a post start hook in kube-apiserver starts the lease heart beat: https://github.com/kubernetes/kubernetes/blob/725993afce11082b973901d2e92553640771d55d/pkg/controlplane/instance.go#L466-L483

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see what you're saying, let me reword this a little so it's clearer

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, thanks


- [X] Feature gate (also fill in values in `kep.yaml`)
- Feature gate name: APIServerIdentity
- Components depending on the feature gate: kube-apiserver
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this also be kube-controller-manager?

I guess the lease-gc controller is also gated on this feature?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, updated

* **Does enabling the feature change any default behavior?**
A namespace "kube-apiserver-lease" will be used to store kube-apiserver
identity Leases.
A namespace "kube-apiserver-lease" will be used to store kube-apiserver identity Leases.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[This namespace will also be created, and old leases will be GC-ed].

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


###### Are there any tests for feature enablement/disablement?

Yes, see [apiserver_identity_test.go](https://github.com/kubernetes/kubernetes/blob/24238425492227fdbb55c687fd4e94c8b58c1ee3/test/integration/controlplane/apiserver_identity_test.go).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say that these test the feature itself (they both are explicitly enabling the feature).

I think we don't have tests that change the FG in the middle of the test (which effectively is the enablement/disablement test).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point -- I added a note saying we should have this done before Beta

###### What are the reasonable SLOs (Service Level Objectives) for the enhancement?

A rough SLO here is that kube-apiserver updates leases at the same frequency as kubelet node heart beats,
since the same mechanism is being used.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the configured frequency period might be different.

I would rather go towards something like "healthy kube-apiservers has a lease which is not older than 2*frequency XX% of time".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


###### Are there any missing metrics that would be useful to have to improve observability of this feature?

Yes, heart beat latency could be useful.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LastUpdateTime of my own lease object.
But that has a potential cardinality problem, given the lease is changed on every kube-apiserver restart...

We could consider something like "oldest lease", but given we leave old leases for a longer time (1h), that's not particularly useful...

So I'm not entirely sure how we could construct such a metric here...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, my understanding for this section was "metrics that would be nice but we wouldn't added due to cardinality issues, etc"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated so it's clear we're not actually going to add this metric

@@ -0,0 +1,3 @@
kep-number: 1965
beta:
approver: "@deads2k"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy to take this too (I already added some comments below).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the format take multiple PRR approvers, maybe comma separted?

@andrewsykim andrewsykim force-pushed the kep-1965 branch 4 times, most recently from 1a94d52 to 41547e0 Compare October 6, 2022 09:54
Copy link
Member

@wojtek-t wojtek-t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one minor comment - other than that it LGTM.

@@ -95,7 +120,7 @@ will only delay the storage migration for the same period of time.

## Design Details

The [kubelet heartbeat](https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/0009-node-heartbeat.md)
The [kubelet heartbeat](https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/589-efficient-node-heartbeats)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -0,0 +1,3 @@
kep-number: 1965
beta:
approver: "@deads2k" # and @wojtek-t
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess David will be approving from SIG level anyway, so you can also switch to me.
But I don't have any strong opinion (I could probably approve PRR before he wakes up then).

@wojtek-t
Copy link
Member

wojtek-t commented Oct 6, 2022

/approve PRR :)

cache. On processing an item, the controller will delete the Lease if the last
`renewTime` was more than `leaseDurationSeconds` ago (default to 1h). The
default `leaseDurationSeconds` is chosen to be way longer than the default
Each kube-apiserver will run a lease controller in a post-start-hook to refresh
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leases have a thing called graceful release in the standard client-go code. For beta, can we leverage this mechanism so that a server in a controlled shutdown can remove itself? Do we not want graceful release for cases where we expect to be restarted and desire our indicated state of the world to persist over the gap?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I've settled on wanting the apiserver identity to persist over the gap because we'd generally expect consistency of the identity on a host or member over time.

@lavalamp make sense to you too?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I've settled on wanting the apiserver identity to persist over the gap because we'd generally expect consistency of the identity on a host or member over time.

So to clarify, we're saying the holder identity should not change over a restart? i.e. remove the PID from the ID?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Turns out the actual ID being used is just kube-apiserver-<uuid> (https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/apiserver/pkg/server/config.go#L326). We'll have to revisit that during implementation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want identity to persist then we'd need to make a bunch of changes like not put the PID in the identity -- we put that in there in case you run two apiservers on the same system.

I don't think I want people to get attached to specific apiservers (don't make them pets) but it is handy to have a more durable name so you can find the thing or notice if one is consistently misbehaving. I'd personally keep this lease ephemeral but consider adding a nickname to help with readability / persistence over time.

Copy link
Member Author

@andrewsykim andrewsykim Oct 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a kubernetes.io/hostname label on the Lease might be sufficient?

I see value in having the lease naming mirror what already exists for kube-controller-manager and kube-scheduler for consistency.

Will definitely give this more thought during implementation.


Proposed e2e tests:
- an e2e test that validates the existence of the Lease objects per kube-apiserver
- an e2e test that restarts a kube-apiserver and validates that a new Lease is created
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure how a standard e2e test could do this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure how a standard e2e test could do this.

Oh, possibly an upgrade test could since at the end of the test you could compare against the starting state.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking of re-using something similar to RestartControllerManager, but for kube-apiserver

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm willing to do either or both.


###### What are the reasonable SLOs (Service Level Objectives) for the enhancement?

A rough SLO here is that healthy kube-apiservers has a lease which is not older than 2 times the frequency of
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how would a cluster-admin know whether this has been achieved? Perhaps its more practical to know that the number of leases in kube-apiserver-lease equals the number of kube-apiservers expected 95% of the time.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how would a cluster-admin know whether this has been achieved?

They could check the Lease's spec.renewTime, but I guess there would be no way of knowing if it was under some threshold 95% of the time. Let me add this as another potential SLO

@deads2k
Copy link
Contributor

deads2k commented Oct 6, 2022

/lgtm
/approve
/hold

holding for the afternoon for @lavalamp to have a look

@k8s-ci-robot k8s-ci-robot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. lgtm "Looks good to me", indicates that a PR is ready to be merged. labels Oct 6, 2022
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: andrewsykim, deads2k, wojtek-t

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 6, 2022
garbage collect expired Leases.
will be stored in a special namespace `kube-apiserver-lease`. The Lease creation
and heart beat will be managed by a controller that is started in kube-apiserver's
post startup hook. A separate controller in kube-controller-manager will be responsible
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I agree with putting that in controller manager, apiserver gets updated first and the new version of apiserver needs to get to healthy without assistance from controller-manager.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The controller in kube-controller-manager is only for garbage collecting stale leases (https://github.com/kubernetes/kubernetes/blob/master/pkg/controller/storageversiongc/gc_controller.go#L17), so I don't think it should impact health of newly started apiservers, but I have not fully tested this myself.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's not running at all, leases will build up over time and be confusing at best and affect the performance of consumers at worst, right? It feels like bad karma but I guess I can't think of anything disastrous...

refresh period, to tolerate clock skew and/or accidental refresh failure. The
default resync period is 1h. By default, assuming negligible clock skew, a Lease
will be deleted if the kube-apiserver fails to refresh its Lease for one to two
hours. The GC controller will run in kube-controller-manager, to leverage leader
hours. The `storageversiongc` controller will run in kube-controller-manager, to leverage leader
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this needs to go in apiserver at the very least for the first release in which we turn this on by default. I don't like having the cleanup routine in a different place than the writing routine in general, either.

Copy link
Member Author

@andrewsykim andrewsykim Oct 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the main reason for running it in kube-controller-manager is to leverage leader election there. If we move this to kube-apiserver, wouldn't we need to wire in leader election there? (maybe that's fine, but seems like a valid reason to avoid it in the first place)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this controller can work purely off of external observed state, it seems like a good candidate to run in KCM?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's fine if every apiserver runs the cleanup loop, for things that take an hour to expire it shouldn't hurt to race to delete them.

... I guess I don't care THAT much. Last argument: people forking & reusing apiserver (which I strongly discourage) but not controller manager will get a surprise

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's fine if every apiserver runs the cleanup loop, for things that take an hour to expire it shouldn't hurt to race to delete them.

I suppose this is true. My natural inclination is still kube-controller-manager, but I probably won't fight too hard since active-active for this purpose is easy to reason about.

### Non-Goals

* improving the availability of kube-apiserver

## Proposal

We will use “hostname+PID+random suffix (e.g. 6 base58 digits)” as the ID.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are all e.g. cloud providers actually OK with letting the hostname of systems running apiserver be known?

My first idea: just use a UUID. Also add a "nickname" (label on the lease object?), something persisted to disk that is the same across restarts of the "same" apiserver, to help humans understand continuity.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are all e.g. cloud providers actually OK with letting the hostname of systems running apiserver be known?

I think most cloud providers work around this by just overriding the hostname or using an opaque value that doesn't reveal anything about the system. We also use hostnames for kube-controller-manager/kube-scheduler leases
already right?

My first idea: just use a UUID. Also add a "nickname" (label on the lease object?), something persisted to disk that is the same across restarts of the "same" apiserver, to help humans understand continuity.

Fwiw, seems like we are already using UUID: https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/apiserver/pkg/server/config.go#L326, so hostname for apiserver identity specifically doesn't seem like an issue in the current form. I like the label idea but will give it more thought.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My first idea: just use a UUID. Also add a "nickname" (label on the lease object?), something persisted to disk that is the same across restarts of the "same" apiserver, to help humans understand continuity.

I'm ok with this as long as its also easy for someone to run as a pod in the cluster. I'm also supportive of hashing whatever name we choose

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 6, 2022
…e naming, persistence policy and components

Signed-off-by: Andrew Sy Kim <[email protected]>
@lavalamp
Copy link
Member

lavalamp commented Oct 6, 2022

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 6, 2022
@lavalamp
Copy link
Member

lavalamp commented Oct 6, 2022

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 6, 2022
@k8s-ci-robot k8s-ci-robot merged commit e634c15 into kubernetes:master Oct 6, 2022
@k8s-ci-robot k8s-ci-robot added this to the v1.26 milestone Oct 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory lgtm "Looks good to me", indicates that a PR is ready to be merged. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants