From b64b3bf3ce058ce710b9b41a04162d3b2fa11ea9 Mon Sep 17 00:00:00 2001 From: Maciej Szulik Date: Fri, 14 Jan 2022 17:13:52 +0100 Subject: [PATCH 1/4] TimeZone suppot in CronJob --- keps/prod-readiness/sig-apps/3140.yaml | 3 + .../README.md | 504 ++++++++++++++++++ .../3140-TimeZone-support-in-CronJob/kep.yaml | 44 ++ 3 files changed, 551 insertions(+) create mode 100644 keps/prod-readiness/sig-apps/3140.yaml create mode 100644 keps/sig-apps/3140-TimeZone-support-in-CronJob/README.md create mode 100644 keps/sig-apps/3140-TimeZone-support-in-CronJob/kep.yaml diff --git a/keps/prod-readiness/sig-apps/3140.yaml b/keps/prod-readiness/sig-apps/3140.yaml new file mode 100644 index 00000000000..7d2c3015ef0 --- /dev/null +++ b/keps/prod-readiness/sig-apps/3140.yaml @@ -0,0 +1,3 @@ +kep-number: 3140 +alpha: + approver: deads2k diff --git a/keps/sig-apps/3140-TimeZone-support-in-CronJob/README.md b/keps/sig-apps/3140-TimeZone-support-in-CronJob/README.md new file mode 100644 index 00000000000..7709e32904b --- /dev/null +++ b/keps/sig-apps/3140-TimeZone-support-in-CronJob/README.md @@ -0,0 +1,504 @@ +# KEP-3140: TimeZone support in CronJob + + +- [Release Signoff Checklist](#release-signoff-checklist) +- [Summary](#summary) +- [Motivation](#motivation) + - [Goals](#goals) + - [Non-Goals](#non-goals) +- [Proposal](#proposal) + - [Notes/Constraints/Caveats (Optional)](#notesconstraintscaveats-optional) + - [Risks and Mitigations](#risks-and-mitigations) +- [Design Details](#design-details) + - [CronJob API](#cronjob-api) + - [CronJob controller](#cronjob-controller) + - [Test Plan](#test-plan) + - [Graduation Criteria](#graduation-criteria) + - [Alpha](#alpha) + - [Beta](#beta) + - [GA](#ga) + - [Upgrade / Downgrade Strategy](#upgrade--downgrade-strategy) + - [Version Skew Strategy](#version-skew-strategy) +- [Production Readiness Review Questionnaire](#production-readiness-review-questionnaire) + - [Feature Enablement and Rollback](#feature-enablement-and-rollback) + - [Rollout, Upgrade and Rollback Planning](#rollout-upgrade-and-rollback-planning) + - [Monitoring Requirements](#monitoring-requirements) + - [Dependencies](#dependencies) + - [Scalability](#scalability) + - [Troubleshooting](#troubleshooting) +- [Implementation History](#implementation-history) +- [Drawbacks](#drawbacks) +- [Alternatives](#alternatives) +- [Infrastructure Needed (Optional)](#infrastructure-needed-optional) + + +## Release Signoff Checklist + +Items marked with (R) are required *prior to targeting to a milestone / release*. + +- [x] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR) +- [x] (R) KEP approvers have approved the KEP status as `implementable` +- [x] (R) Design details are appropriately documented +- [ ] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors) + - [ ] e2e Tests for all Beta API Operations (endpoints) + - [ ] (R) Ensure GA e2e tests for meet requirements for [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md) + - [ ] (R) Minimum Two Week Window for GA e2e tests to prove flake free +- [ ] (R) Graduation criteria is in place + - [ ] (R) [all GA Endpoints](https://github.com/kubernetes/community/pull/1806) must be hit by [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md) +- [x] (R) Production readiness review completed +- [x] (R) Production readiness review approved +- [x] "Implementation History" section is up-to-date for milestone +- [ ] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io] +- [ ] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes + + + +[kubernetes.io]: https://kubernetes.io/ +[kubernetes/enhancements]: https://git.k8s.io/enhancements +[kubernetes/kubernetes]: https://git.k8s.io/kubernetes +[kubernetes/website]: https://git.k8s.io/website + +## Summary + +CronJob creates Jobs based on the schedule specified by the author, but the Time +Zone used during the creation depens on where kube-controller-manager is running. +This proposal aims to extend CronJob resource with the ability for a user to +define the TimeZone when a Job should be created. + +## Motivation + +Not long after the [introduction of CronJob in kubernetes](https://github.com/kubernetes/kubernetes/pull/11980) +a [request was raised to support setting time zones](https://github.com/kubernetes/kubernetes/issues/47202). +The initial [response from SIG-Apps and SIG-Architecture](https://github.com/kubernetes/kubernetes/issues/47202#issuecomment-360820586) +at the time was that introducing such functionality would require cluster operator +to manually include TimeZone database since golang did not have one. +Starting from [golang 1.15](https://go.dev/doc/go1.15) `time/tzdata` package can +be embedded in the binary thus removing the requirement for external database. +At that time the majority of the focus was towards [moving CronJob to GA](https://github.com/kubernetes/enhancements/issues/19) +so the effort to support TimeZone was again delayed. Now that we have CronJob +fully GA-ed it's time to satisfy the original request. + +### Goals + +- Add the field `.spec.timeZone` which allows specifying a valid TimeZone + +### Non-Goals + + + +## Proposal + +Add the field `.spec.timeZone` to the CronJob resource. The cronjob controller +will take the field into account when scheduling the next Job run. In case the +field is not specified or is empty the controller will maitain the current +behavior which is to rely on the time zone of the kube-controller-manager +process. + +### Notes/Constraints/Caveats (Optional) + +The current mechanism, which will still be the default behavior heavily relies +on the time zone of the kube-controller-manager process which is hard for a +regular user to figure out. Exposing an explicit field for setting a time zone +will allow CronJob authors to simplify the user experience when creating CronJobs. + +### Risks and Mitigations + +- Outdated time zone in the golang might lead to wrong schedule times + +This problem can be mitigated with a fresh build of kube-controller-manager with +updated golang version, but it heavily relies on the go community to keep the +time zone database up-to-date. + +- Malicious user can create multiple CronJobs with different time zone which can +actually trigger Jobs at the exact same time + +Cluster administrators should enforce quota to ensure not that many Jobs and +CronJobs can be created per user. + + +## Design Details + +### CronJob API + +The `CronJobSpec` structure is expanded with new `TimeZone` field which allows +specifying the name of the time zone to be used. Missing or empty value of the +field indicates the current behavior, which relies on the time zone of the +kube-controller-manager process. + +```golang + +type CronJobSpec struct { + + // The schedule in Cron format, see https://en.wikipedia.org/wiki/Cron. + Schedule string + + // Time zone for the above schedule + TimeZone *string + +} +``` + +The value provided in `TimeZone` field will be validated against the embedded +golang timezone database. + +### CronJob controller + +CronJob controller will use non-nil, non-empty value from `TimeZone` field when +parsing the schedule and during scheduling the next run time. Additionally the +time zone will be reflected in the `.status.lastSuccessfulTime` and `.status.lastScheduleTime`. +In all other cases the controller will maintain the current behavior. + +### Test Plan + +Unit and integration tests covering the time zone mechanics of CronJob, including: + +- defaulting +- validation +- creating CronJob +- updating CronJob + +Additionally, all of tests will be performed with feature gate enabled and disabled. + +### Graduation Criteria + +#### Alpha + +- Functionality implemented behind feature gate: + - TimeZone field added to API (kube-apiserver) + - CronJob controller reacting to new field (kube-controller-manager) + +#### Beta + +TBD + +#### GA + +TBD + + + +### Upgrade / Downgrade Strategy + +- Upgrades + +When upgrading from a release without this feature to a release with `TimeZone` +users should not notice any change in behavior. The default value for `TimeZone` +is nil, which is equal to current behavior for backwards compatibility. + +- Downgrades + +When downgrading from a release with this feature, to a release without `TimeZone` +there are a few cases: + 1. If TimeZone feature gate was enabled and user specified a TimeZone, the newly + created Jobs after downgrade will return to previous behavior, as if no TimeZone + was ever specified. + 2. If TimeZone feature gate was enabled and user did not specify TimeZone, there + should be no change in behavior. + 3. If TimeZone feature gate was not enabled there should be no change in behavior. + +### Version Skew Strategy + +This feature has no node runtime implications. + +## Production Readiness Review Questionnaire + +### Feature Enablement and Rollback + +###### How can this feature be enabled / disabled in a live cluster? + + +- [x] Feature gate (also fill in values in `kep.yaml`) + - Feature gate name: CronJobTimeZone + - Components depending on the feature gate: kube-apiserver, kube-controller-manager + +###### Does enabling the feature change any default behavior? + +No, the default behavior is maintained independently of the feature gate. + +###### Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)? + +Yes, the feature can be disabled. The outcome of disabling will depend if the +new field was set by the user: + +1. If there was no value set in `TimeZone` field there will be no change in behavior. +2. If the user set a valid `TimeZone`, the newly created Jobs will be triggered +as if the field was never set. + +###### What happens if we reenable the feature if it was previously rolled back? + +The controller will start reading the new field. + +###### Are there any tests for feature enablement/disablement? + +Yes, both units and integration tests for enablement, disablement and transitions. + +### Rollout, Upgrade and Rollback Planning + + + +###### How can a rollout or rollback fail? Can it impact already running workloads? + + + +###### What specific metrics should inform a rollback? + + + +###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested? + + + +###### Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.? + + + +### Monitoring Requirements + + + +###### How can an operator determine if the feature is in use by workloads? + + + +###### How can someone using this feature know that it is working for their instance? + + + +- [ ] Events + - Event Reason: +- [ ] API .status + - Condition name: + - Other field: +- [ ] Other (treat as last resort) + - Details: + +###### What are the reasonable SLOs (Service Level Objectives) for the enhancement? + + + +###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service? + + + +- [ ] Metrics + - Metric name: + - [Optional] Aggregation method: + - Components exposing the metric: +- [ ] Other (treat as last resort) + - Details: + +###### Are there any missing metrics that would be useful to have to improve observability of this feature? + + + +### Dependencies + + + +###### Does this feature depend on any specific services running in the cluster? + + + +### Scalability + + + +###### Will enabling / using this feature result in any new API calls? + +No new API calls are expected. + +###### Will enabling / using this feature result in introducing new API types? + +Yes, `.spec.timeZone` will be present in CronJob API. + +###### Will enabling / using this feature result in any new calls to the cloud provider? + +No new calls to cloud provider are expected. + +###### Will enabling / using this feature result in increasing size or count of the existing API objects? + +Yes. + +- API type(s): CronJob +- Estimated increase in size: new field in CronJob spec up to 50 bytes. + +###### Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs? + +No increase it existing SLIs/SLOs is expected. + +###### Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, ...) in any components? + +Additional CPU and memory increase in the kube-controller-manager is negligible +since the current schedule parsing is already covering time zone specification. +We're not using it, yet. + +### Troubleshooting + + + +###### How does this feature react if the API server and/or etcd is unavailable? + +###### What are other known failure modes? + + + +###### What steps should be taken if SLOs are not being met to determine the problem? + +## Implementation History + + + +## Drawbacks + + + +## Alternatives + + + +## Infrastructure Needed (Optional) + + diff --git a/keps/sig-apps/3140-TimeZone-support-in-CronJob/kep.yaml b/keps/sig-apps/3140-TimeZone-support-in-CronJob/kep.yaml new file mode 100644 index 00000000000..61c2cb1a20e --- /dev/null +++ b/keps/sig-apps/3140-TimeZone-support-in-CronJob/kep.yaml @@ -0,0 +1,44 @@ +title: TimeZone support in CronJob +kep-number: 3140 +authors: + - "@soltysh" +owning-sig: sig-apps +participating-sigs: +status: implementable +creation-date: 2022-01-14 +reviewers: + - "@ravig" + - "@iterion" +approvers: + - "@janetkuo" +see-also: + - "/keps/sig-apps/19-Graduate-CronJob-to-Stable" + - "https://github.com/kubernetes/kubernetes/issues/47202" +replaces: + +# The target maturity stage in the current dev cycle for this KEP. +stage: alpha + +# The most recent milestone for which work toward delivery of this KEP has been +# done. This can be the current (upcoming) milestone, if it is being actively +# worked on. +latest-milestone: "v1.24" + +# The milestone at which this feature was, or is targeted to be, at each stage. +milestone: + alpha: "v1.24" + beta: "v1.25" + stable: "v1.27" + +# The following PRR answers are required at alpha release +# List the feature gate name and the components for which it must be enabled +feature-gates: + - name: CronJobTimeZone + components: + - kube-apiserver + - kube-controller-manager +disable-supported: true + +# The following PRR answers are required at beta release +metrics: + - my_feature_metric From 82eb73a3f412899519a5b435520f94a106e8a40e Mon Sep 17 00:00:00 2001 From: Maciej Szulik Date: Mon, 31 Jan 2022 16:51:28 +0100 Subject: [PATCH 2/4] Address comments --- .../README.md | 31 +++++++++++-------- .../3140-TimeZone-support-in-CronJob/kep.yaml | 3 +- 2 files changed, 20 insertions(+), 14 deletions(-) diff --git a/keps/sig-apps/3140-TimeZone-support-in-CronJob/README.md b/keps/sig-apps/3140-TimeZone-support-in-CronJob/README.md index 7709e32904b..bcfbf8b5cf0 100644 --- a/keps/sig-apps/3140-TimeZone-support-in-CronJob/README.md +++ b/keps/sig-apps/3140-TimeZone-support-in-CronJob/README.md @@ -63,7 +63,7 @@ Items marked with (R) are required *prior to targeting to a milestone / release* ## Summary CronJob creates Jobs based on the schedule specified by the author, but the Time -Zone used during the creation depens on where kube-controller-manager is running. +Zone used during the creation depends on where kube-controller-manager is running. This proposal aims to extend CronJob resource with the ability for a user to define the TimeZone when a Job should be created. @@ -76,13 +76,13 @@ at the time was that introducing such functionality would require cluster operat to manually include TimeZone database since golang did not have one. Starting from [golang 1.15](https://go.dev/doc/go1.15) `time/tzdata` package can be embedded in the binary thus removing the requirement for external database. -At that time the majority of the focus was towards [moving CronJob to GA](https://github.com/kubernetes/enhancements/issues/19) +At that time, the majority of the focus was towards [moving CronJob to GA](https://github.com/kubernetes/enhancements/issues/19), so the effort to support TimeZone was again delayed. Now that we have CronJob fully GA-ed it's time to satisfy the original request. ### Goals -- Add the field `.spec.timeZone` which allows specifying a valid TimeZone +- Add the field `.spec.timeZone` which allows specifying a valid TimeZone name ### Non-Goals @@ -95,14 +95,14 @@ and make progress. Add the field `.spec.timeZone` to the CronJob resource. The cronjob controller will take the field into account when scheduling the next Job run. In case the -field is not specified or is empty the controller will maitain the current -behavior which is to rely on the time zone of the kube-controller-manager +field is not specified or is empty, the controller will maintain the current +behavior, which is to rely on the time zone of the kube-controller-manager process. ### Notes/Constraints/Caveats (Optional) -The current mechanism, which will still be the default behavior heavily relies -on the time zone of the kube-controller-manager process which is hard for a +The current mechanism, which will still be the default behavior, heavily relies +on the time zone of the kube-controller-manager process, which is hard for a regular user to figure out. Exposing an explicit field for setting a time zone will allow CronJob authors to simplify the user experience when creating CronJobs. @@ -125,20 +125,22 @@ CronJobs can be created per user. ### CronJob API -The `CronJobSpec` structure is expanded with new `TimeZone` field which allows +The `.spec` for a CronJob is expanded with a new `timeZone` field which allows specifying the name of the time zone to be used. Missing or empty value of the field indicates the current behavior, which relies on the time zone of the kube-controller-manager process. +In the API code, that looks like: + ```golang type CronJobSpec struct { - // The schedule in Cron format, see https://en.wikipedia.org/wiki/Cron. - Schedule string + // The schedule in Cron format, see https://en.wikipedia.org/wiki/Cron. + Schedule string - // Time zone for the above schedule - TimeZone *string + // Time zone for the above schedule + TimeZone *string } ``` @@ -149,7 +151,7 @@ golang timezone database. ### CronJob controller CronJob controller will use non-nil, non-empty value from `TimeZone` field when -parsing the schedule and during scheduling the next run time. Additionally the +parsing the schedule and during scheduling the next run time. Additionally, the time zone will be reflected in the `.status.lastSuccessfulTime` and `.status.lastScheduleTime`. In all other cases the controller will maintain the current behavior. @@ -489,6 +491,9 @@ Why should this KEP _not_ be implemented? ## Alternatives +Another approach was to specify time zone as an offset to UTC, but using the +name instead seems more user friendly. + +An upgrade flow can be vulnerable to the enable, disable, enable if you have +a lease that is acquired by a new kube-controller-manager, then an old +kube-controller-manager, then a new kube-controller-manager. + ###### What specific metrics should inform a rollback? -- [ ] Metrics - - Metric name: - - [Optional] Aggregation method: - - Components exposing the metric: +- [x] Metrics + - Metric name: `cronjob_controller_rate_limiter_use` + - Components exposing the metric: `kube-controller-manager` - [ ] Other (treat as last resort) - Details: From ff618193a929ae42380726e6ef83309f21c8ca7b Mon Sep 17 00:00:00 2001 From: Maciej Szulik Date: Wed, 2 Feb 2022 18:11:25 +0100 Subject: [PATCH 4/4] Add more clarification about tzdata --- .../3140-TimeZone-support-in-CronJob/README.md | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/keps/sig-apps/3140-TimeZone-support-in-CronJob/README.md b/keps/sig-apps/3140-TimeZone-support-in-CronJob/README.md index a779c163303..bd1713b1686 100644 --- a/keps/sig-apps/3140-TimeZone-support-in-CronJob/README.md +++ b/keps/sig-apps/3140-TimeZone-support-in-CronJob/README.md @@ -126,9 +126,10 @@ CronJobs can be created per user. ### CronJob API The `.spec` for a CronJob is expanded with a new `timeZone` field which allows -specifying the name of the time zone to be used. Missing or empty value of the -field indicates the current behavior, which relies on the time zone of the -kube-controller-manager process. +specifying the name of the time zone to be used, the list of valid time zones +can be found [in tz database](https://en.wikipedia.org/wiki/List_of_tz_database_time_zones). +Missing or empty value of the field indicates the current behavior, which relies +on the time zone of the kube-controller-manager process. In the API code, that looks like: @@ -145,8 +146,9 @@ type CronJobSpec struct { } ``` -The value provided in `TimeZone` field will be validated against the embedded -golang timezone database. +The value provided in `TimeZone` field will be validated against the embedded golang +timezone database, which will result in the `kube-apiserver` and `kube-controller-manager` +binaries growing by roughly extra 500kB. ### CronJob controller