You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Aug 21, 2024. It is now read-only.
Thank you for this controller, great work! Unfortunately, I had this odd issue recently and I was not able to replicate the issue in a test environment therefore I don't know the root cause. I think this is a critical bug, because it's about an intermittently occurring unexpected behaviour: A job for each TZCronjob get scheduled immediately, regardless of their schedules defined in TZCronJob, when cronjobber pod is recreated.
Platform: AWS EKS 1.14
Scenario
1- cronjobber 0.2.0 was installed with updatetz
2- A TZCronJob is created with schedule "45 14 * * 2" and timezone "Europe/London", no startingDeadlineSeconds set
3- A job is scheduled and completed successfully on 12 May 1:45 pm UTC, which is expected
NAME COMPLETIONS DURATION AGE
somejob-1589291100 1/1 7s 2d
4- Cronjobber pod gets recreated at "Wed, 13 May 2020 15:13:52 +0000":
cronjobber container logs
{"level":"info","ts":"2020-05-13T15:14:24.116Z","caller":"cronjobber/main.go:75","msg":"Starting Cronjobber version 0.2.0 revision f860bc912c395c58fa72741d8d34e8bf4b1a2c00"}
{"level":"info","ts":"2020-05-13T15:14:24.117Z","caller":"cronjobber/controller.go:106","msg":"Starting TZCronJob Manager"}
updatetz container logs
2020-05-13T15:14:29+0000 Local Time Zone database updated to version 2020a on /tmp/zoneinfo
5- Although its last schedule was for 12 May 1:45 pm UTC and it shouldn't be scheduled until next week, another job of this TZCronJob is started at "Wed, 13 May 2020 15:14:24 +0000", which is exactly the same to the second with the start time of cronjobber.
NAME COMPLETIONS DURATION AGE
somejob-1589291100 1/1 7s 2d
somejob-1589294700 1/1 17s 23h
Now, this happened for all of our TZCronJobs. They got rescheduled immediately after cronjobber pod is recreated 😰
Observation
1- updatetz updates the local tz database for the first time in the pod 5 seconds later than cronjobber controller's start time.
2- I've noticed that unix timestamps on Job specs can lead to something. 1589291100 (the expected behaviour) is 12 May 1:45 pm UTC, which is good.
It gets interesting for 1589294700: This is 12 May 2:45 pm (UTC) -> so it looks like cronjobber could not get the timezone data from the mounted volume when it started up. I guess this caused cronjobber to misinterpret existing schedules. Cronjobber synced the existing cronjobs with the misinterpreted tzdata, which led kube-scheduler to interpret that the last schedules were missed although they weren't missed and hence kube-scheduler rescheduled all jobs.
Potential workarounds/suggestions
Since I couldn't replicate the scenario, below would be only blind guesses;
1- We could set startingDeadlineSeconds, but that would not solve the issue permanently. There will be some possibility for the issue to reoccur if [misinterpreted t] + startingDeadlineSeconds > x, where t is the tzcronjob's last successful execution timestamp and x is the time when cronjobber pod is recreated.
2- I haven't written a single line of Go (yet) so this may be gibberish, but I think there may be some missing/wrong variable initialisation. For example, if there is an initial value of the current timestamp, that is used when cronjobber is not able to read tzdata, then this magic number should be of a max value, instead of a min value. In other words, it's better to do nothing (don't sync) than doing something wrong (rescheduling).
3- Cronjobber should not start until updatetz does its job at least once. Maybe docker-entrypoint in updatetz can touch a file on another volume and an init container within the same pod will check if this file exists in this volume. Since there will be an init container, the cronjobber container will not start until the init container completes successfully.
The text was updated successfully, but these errors were encountered:
Hi,
Thank you for this controller, great work! Unfortunately, I had this odd issue recently and I was not able to replicate the issue in a test environment therefore I don't know the root cause. I think this is a critical bug, because it's about an intermittently occurring unexpected behaviour: A job for each TZCronjob get scheduled immediately, regardless of their schedules defined in TZCronJob, when cronjobber pod is recreated.
Platform: AWS EKS 1.14
Scenario
1- cronjobber 0.2.0 was installed with updatetz
2- A TZCronJob is created with schedule "45 14 * * 2" and timezone "Europe/London", no startingDeadlineSeconds set
3- A job is scheduled and completed successfully on 12 May 1:45 pm UTC, which is expected
4- Cronjobber pod gets recreated at "Wed, 13 May 2020 15:13:52 +0000":
cronjobber container logs
updatetz container logs
5- Although its last schedule was for 12 May 1:45 pm UTC and it shouldn't be scheduled until next week, another job of this TZCronJob is started at "Wed, 13 May 2020 15:14:24 +0000", which is exactly the same to the second with the start time of cronjobber.
Now, this happened for all of our TZCronJobs. They got rescheduled immediately after cronjobber pod is recreated 😰
Observation
1- updatetz updates the local tz database for the first time in the pod 5 seconds later than cronjobber controller's start time.
2- I've noticed that unix timestamps on Job specs can lead to something. 1589291100 (the expected behaviour) is 12 May 1:45 pm UTC, which is good.
It gets interesting for 1589294700: This is 12 May 2:45 pm (UTC) -> so it looks like cronjobber could not get the timezone data from the mounted volume when it started up. I guess this caused cronjobber to misinterpret existing schedules. Cronjobber synced the existing cronjobs with the misinterpreted tzdata, which led kube-scheduler to interpret that the last schedules were missed although they weren't missed and hence kube-scheduler rescheduled all jobs.
Potential workarounds/suggestions
Since I couldn't replicate the scenario, below would be only blind guesses;
1- We could set startingDeadlineSeconds, but that would not solve the issue permanently. There will be some possibility for the issue to reoccur if [misinterpreted t] + startingDeadlineSeconds > x, where t is the tzcronjob's last successful execution timestamp and x is the time when cronjobber pod is recreated.
2- I haven't written a single line of Go (yet) so this may be gibberish, but I think there may be some missing/wrong variable initialisation. For example, if there is an initial value of the current timestamp, that is used when cronjobber is not able to read tzdata, then this magic number should be of a max value, instead of a min value. In other words, it's better to do nothing (don't sync) than doing something wrong (rescheduling).
3- Cronjobber should not start until updatetz does its job at least once. Maybe docker-entrypoint in updatetz can touch a file on another volume and an init container within the same pod will check if this file exists in this volume. Since there will be an init container, the cronjobber container will not start until the init container completes successfully.
The text was updated successfully, but these errors were encountered: