-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KEP-1965: update with Beta criteria/milestone and PRR questions answered #3589
Conversation
42968e1
to
1560bab
Compare
b3b8d0e
to
4cbc197
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fairly interested in this graduating to Beta too.
@@ -95,7 +120,7 @@ will only delay the storage migration for the same period of time. | |||
|
|||
## Design Details | |||
|
|||
The [kubelet heartbeat](https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/0009-node-heartbeat.md) | |||
The [kubelet heartbeat](https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/589-efficient-node-heartbeats) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
L128: This was slightly misleading to me as it was suggesting the main GC controller in kcm.
In fact, there is a separate controller inside kcm that is doing exactly that:
https://github.com/kubernetes/kubernetes/blob/master/pkg/controller/storageversiongc/gc_controller.go
[and then it makes much more sense to use hardcoded namespaces, etc.]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My current understanding is that GC controller in KCM removes stales Leases, but a post start hook in kube-apiserver starts the lease heart beat: https://github.com/kubernetes/kubernetes/blob/725993afce11082b973901d2e92553640771d55d/pkg/controlplane/instance.go#L466-L483
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I see what you're saying, let me reword this a little so it's clearer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the code it's called storageversiongc:
https://github.com/kubernetes/kubernetes/blob/master/pkg/controller/storageversiongc/gc_controller.go#L17
maybe let's make it clear too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated, thanks
|
||
- [X] Feature gate (also fill in values in `kep.yaml`) | ||
- Feature gate name: APIServerIdentity | ||
- Components depending on the feature gate: kube-apiserver |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this also be kube-controller-manager?
I guess the lease-gc controller is also gated on this feature?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, updated
* **Does enabling the feature change any default behavior?** | ||
A namespace "kube-apiserver-lease" will be used to store kube-apiserver | ||
identity Leases. | ||
A namespace "kube-apiserver-lease" will be used to store kube-apiserver identity Leases. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[This namespace will also be created, and old leases will be GC-ed].
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
|
||
###### Are there any tests for feature enablement/disablement? | ||
|
||
Yes, see [apiserver_identity_test.go](https://github.com/kubernetes/kubernetes/blob/24238425492227fdbb55c687fd4e94c8b58c1ee3/test/integration/controlplane/apiserver_identity_test.go). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would say that these test the feature itself (they both are explicitly enabling the feature).
I think we don't have tests that change the FG in the middle of the test (which effectively is the enablement/disablement test).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point -- I added a note saying we should have this done before Beta
###### What are the reasonable SLOs (Service Level Objectives) for the enhancement? | ||
|
||
A rough SLO here is that kube-apiserver updates leases at the same frequency as kubelet node heart beats, | ||
since the same mechanism is being used. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But the configured frequency period might be different.
I would rather go towards something like "healthy kube-apiservers has a lease which is not older than 2*frequency XX% of time".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
|
||
###### Are there any missing metrics that would be useful to have to improve observability of this feature? | ||
|
||
Yes, heart beat latency could be useful. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LastUpdateTime of my own lease object.
But that has a potential cardinality problem, given the lease is changed on every kube-apiserver restart...
We could consider something like "oldest lease", but given we leave old leases for a longer time (1h), that's not particularly useful...
So I'm not entirely sure how we could construct such a metric here...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, my understanding for this section was "metrics that would be nice but we wouldn't added due to cardinality issues, etc"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated so it's clear we're not actually going to add this metric
@@ -0,0 +1,3 @@ | |||
kep-number: 1965 | |||
beta: | |||
approver: "@deads2k" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy to take this too (I already added some comments below).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the format take multiple PRR approvers, maybe comma separted?
1a94d52
to
41547e0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just one minor comment - other than that it LGTM.
@@ -95,7 +120,7 @@ will only delay the storage migration for the same period of time. | |||
|
|||
## Design Details | |||
|
|||
The [kubelet heartbeat](https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/0009-node-heartbeat.md) | |||
The [kubelet heartbeat](https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/589-efficient-node-heartbeats) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the code it's called storageversiongc:
https://github.com/kubernetes/kubernetes/blob/master/pkg/controller/storageversiongc/gc_controller.go#L17
maybe let's make it clear too?
@@ -0,0 +1,3 @@ | |||
kep-number: 1965 | |||
beta: | |||
approver: "@deads2k" # and @wojtek-t |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess David will be approving from SIG level anyway, so you can also switch to me.
But I don't have any strong opinion (I could probably approve PRR before he wakes up then).
/approve PRR :) |
cache. On processing an item, the controller will delete the Lease if the last | ||
`renewTime` was more than `leaseDurationSeconds` ago (default to 1h). The | ||
default `leaseDurationSeconds` is chosen to be way longer than the default | ||
Each kube-apiserver will run a lease controller in a post-start-hook to refresh |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leases have a thing called graceful release in the standard client-go code. For beta, can we leverage this mechanism so that a server in a controlled shutdown can remove itself? Do we not want graceful release for cases where we expect to be restarted and desire our indicated state of the world to persist over the gap?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I've settled on wanting the apiserver identity to persist over the gap because we'd generally expect consistency of the identity on a host or member over time.
@lavalamp make sense to you too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I've settled on wanting the apiserver identity to persist over the gap because we'd generally expect consistency of the identity on a host or member over time.
So to clarify, we're saying the holder identity should not change over a restart? i.e. remove the PID from the ID?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Turns out the actual ID being used is just kube-apiserver-<uuid>
(https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/apiserver/pkg/server/config.go#L326). We'll have to revisit that during implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we want identity to persist then we'd need to make a bunch of changes like not put the PID in the identity -- we put that in there in case you run two apiservers on the same system.
I don't think I want people to get attached to specific apiservers (don't make them pets) but it is handy to have a more durable name so you can find the thing or notice if one is consistently misbehaving. I'd personally keep this lease ephemeral but consider adding a nickname to help with readability / persistence over time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe a kubernetes.io/hostname
label on the Lease might be sufficient?
I see value in having the lease naming mirror what already exists for kube-controller-manager and kube-scheduler for consistency.
Will definitely give this more thought during implementation.
|
||
Proposed e2e tests: | ||
- an e2e test that validates the existence of the Lease objects per kube-apiserver | ||
- an e2e test that restarts a kube-apiserver and validates that a new Lease is created |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure how a standard e2e test could do this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure how a standard e2e test could do this.
Oh, possibly an upgrade test could since at the end of the test you could compare against the starting state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking of re-using something similar to RestartControllerManager, but for kube-apiserver
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm willing to do either or both.
|
||
###### What are the reasonable SLOs (Service Level Objectives) for the enhancement? | ||
|
||
A rough SLO here is that healthy kube-apiservers has a lease which is not older than 2 times the frequency of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how would a cluster-admin know whether this has been achieved? Perhaps its more practical to know that the number of leases in kube-apiserver-lease equals the number of kube-apiservers expected 95% of the time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how would a cluster-admin know whether this has been achieved?
They could check the Lease's spec.renewTime
, but I guess there would be no way of knowing if it was under some threshold 95% of the time. Let me add this as another potential SLO
Signed-off-by: Andrew Sy Kim <[email protected]> Co-authored-by: Monis Khan <[email protected]>
Signed-off-by: Andrew Sy Kim <[email protected]>
…unning in kube-controller-manager Signed-off-by: Andrew Sy Kim <[email protected]>
Signed-off-by: Andrew Sy Kim <[email protected]>
/lgtm holding for the afternoon for @lavalamp to have a look |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: andrewsykim, deads2k, wojtek-t The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
garbage collect expired Leases. | ||
will be stored in a special namespace `kube-apiserver-lease`. The Lease creation | ||
and heart beat will be managed by a controller that is started in kube-apiserver's | ||
post startup hook. A separate controller in kube-controller-manager will be responsible |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I agree with putting that in controller manager, apiserver gets updated first and the new version of apiserver needs to get to healthy without assistance from controller-manager.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The controller in kube-controller-manager is only for garbage collecting stale leases (https://github.com/kubernetes/kubernetes/blob/master/pkg/controller/storageversiongc/gc_controller.go#L17), so I don't think it should impact health of newly started apiservers, but I have not fully tested this myself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's not running at all, leases will build up over time and be confusing at best and affect the performance of consumers at worst, right? It feels like bad karma but I guess I can't think of anything disastrous...
refresh period, to tolerate clock skew and/or accidental refresh failure. The | ||
default resync period is 1h. By default, assuming negligible clock skew, a Lease | ||
will be deleted if the kube-apiserver fails to refresh its Lease for one to two | ||
hours. The GC controller will run in kube-controller-manager, to leverage leader | ||
hours. The `storageversiongc` controller will run in kube-controller-manager, to leverage leader |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this needs to go in apiserver at the very least for the first release in which we turn this on by default. I don't like having the cleanup routine in a different place than the writing routine in general, either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the main reason for running it in kube-controller-manager is to leverage leader election there. If we move this to kube-apiserver, wouldn't we need to wire in leader election there? (maybe that's fine, but seems like a valid reason to avoid it in the first place)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this controller can work purely off of external observed state, it seems like a good candidate to run in KCM?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's fine if every apiserver runs the cleanup loop, for things that take an hour to expire it shouldn't hurt to race to delete them.
... I guess I don't care THAT much. Last argument: people forking & reusing apiserver (which I strongly discourage) but not controller manager will get a surprise
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's fine if every apiserver runs the cleanup loop, for things that take an hour to expire it shouldn't hurt to race to delete them.
I suppose this is true. My natural inclination is still kube-controller-manager, but I probably won't fight too hard since active-active for this purpose is easy to reason about.
### Non-Goals | ||
|
||
* improving the availability of kube-apiserver | ||
|
||
## Proposal | ||
|
||
We will use “hostname+PID+random suffix (e.g. 6 base58 digits)” as the ID. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are all e.g. cloud providers actually OK with letting the hostname of systems running apiserver be known?
My first idea: just use a UUID. Also add a "nickname" (label on the lease object?), something persisted to disk that is the same across restarts of the "same" apiserver, to help humans understand continuity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are all e.g. cloud providers actually OK with letting the hostname of systems running apiserver be known?
I think most cloud providers work around this by just overriding the hostname or using an opaque value that doesn't reveal anything about the system. We also use hostnames for kube-controller-manager/kube-scheduler leases
already right?
My first idea: just use a UUID. Also add a "nickname" (label on the lease object?), something persisted to disk that is the same across restarts of the "same" apiserver, to help humans understand continuity.
Fwiw, seems like we are already using UUID: https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/apiserver/pkg/server/config.go#L326, so hostname for apiserver identity specifically doesn't seem like an issue in the current form. I like the label idea but will give it more thought.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My first idea: just use a UUID. Also add a "nickname" (label on the lease object?), something persisted to disk that is the same across restarts of the "same" apiserver, to help humans understand continuity.
I'm ok with this as long as its also easy for someone to run as a pod in the cluster. I'm also supportive of hashing whatever name we choose
…e naming, persistence policy and components Signed-off-by: Andrew Sy Kim <[email protected]>
/lgtm |
/hold cancel |
Signed-off-by: Andrew Sy Kim [email protected]
Co-authored-by: Monis Khan [email protected]
Update KEP-1965 to include all Beta criterias and answer remaining PRR questions.
#1965