Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pull sandbox image periodically #1601

Merged
merged 1 commit into from
Jan 31, 2024
Merged

Pull sandbox image periodically #1601

merged 1 commit into from
Jan 31, 2024

Conversation

cartermckinnon
Copy link
Member

@cartermckinnon cartermckinnon commented Jan 30, 2024

Issue #, if available:

Helps workaround #1597, proper fix will be an updated containerd from Amazon Linux.

Description of changes:

As a hotfix for accidental garbage collections of the sandbox container image, we'll check every minute and re-pull it if necessary.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Testing Done

Tested live on a node, the timer has the desired effect:

[ec2-user@ip-172-31-37-28 ~]$ sudo journalctl -u sandbox-image -f
...
Jan 30 20:22:37 ip-172-31-37-28.us-west-2.compute.internal systemd[1]: Starting pull sandbox image defined in containerd config.toml...
Jan 30 20:22:37 ip-172-31-37-28.us-west-2.compute.internal sudo[12946]:     root : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/bin/ctr#040--namespace#040k8s.io#040image#040ls
Jan 30 20:22:37 ip-172-31-37-28.us-west-2.compute.internal systemd[1]: Started pull sandbox image defined in containerd config.toml.
Jan 30 20:24:37 ip-172-31-37-28.us-west-2.compute.internal systemd[1]: Starting pull sandbox image defined in containerd config.toml...
Jan 30 20:24:37 ip-172-31-37-28.us-west-2.compute.internal sudo[12961]:     root : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/bin/ctr#040--namespace#040k8s.io#040image#040ls
Jan 30 20:24:37 ip-172-31-37-28.us-west-2.compute.internal systemd[1]: Started pull sandbox image defined in containerd config.toml.
Jan 30 20:26:17 ip-172-31-37-28.us-west-2.compute.internal systemd[1]: Starting pull sandbox image defined in containerd config.toml...
Jan 30 20:26:17 ip-172-31-37-28.us-west-2.compute.internal sudo[13007]:     root : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/bin/ctr#040--namespace#040k8s.io#040image#040ls
Jan 30 20:26:17 ip-172-31-37-28.us-west-2.compute.internal systemd[1]: Started pull sandbox image defined in containerd config.toml.
Jan 30 20:27:37 ip-172-31-37-28.us-west-2.compute.internal systemd[1]: Starting pull sandbox image defined in containerd config.toml...
Jan 30 20:27:37 ip-172-31-37-28.us-west-2.compute.internal sudo[13034]:     root : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/bin/ctr#040--namespace#040k8s.io#040image#040ls
Jan 30 20:27:37 ip-172-31-37-28.us-west-2.compute.internal systemd[1]: Started pull sandbox image defined in containerd config.toml.

@cartermckinnon
Copy link
Member Author

/ci

Copy link
Contributor

@cartermckinnon roger that! I've dispatched a workflow. 👍

@cartermckinnon
Copy link
Member Author

The CI job for 1.29 will fail, but will be fixed by #1602. I'll re-run in a bit

Copy link
Contributor

@cartermckinnon the workflow that you requested has completed. 🎉

Kubernetes versionBuildLaunchTest
1.23success ✅success ✅success ✅
1.24success ✅success ✅success ✅
1.25success ✅success ✅success ✅
1.26success ✅success ✅success ✅
1.27success ✅success ✅success ✅
1.28success ✅success ✅success ✅
1.29failure ❌skipped ⏭️skipped ⏭️

@cartermckinnon
Copy link
Member Author

cartermckinnon commented Jan 30, 2024

/ci

giving 1.29 another go after the CI fix...

Copy link
Contributor

@cartermckinnon roger that! I've dispatched a workflow. 👍

Copy link
Contributor

@cartermckinnon the workflow that you requested has completed. 🎉

Kubernetes versionBuildLaunchTest
1.23success ✅success ✅success ✅
1.24success ✅success ✅success ✅
1.25success ✅success ✅success ✅
1.26success ✅success ✅success ✅
1.27success ✅failure ❌skipped ⏭️
1.28success ✅success ✅success ✅
1.29success ✅success ✅success ✅

@cartermckinnon
Copy link
Member Author

1.27 hit a resource limit in the CI account, disregard

@Idan-Lazar
Copy link

Could someone please merge it?

@avisaradir
Copy link

Is there an expected date and time for the distribution of this solution to #1597 ?

@spatelwearpact
Copy link

Please merge this in, we're dealing with a production outage right now because of this issue! None of our pods and jobs are able to deploy into the cluster!

@dims
Copy link
Member

dims commented Jan 31, 2024

Please merge this in, we're dealing with a production outage right now because of this issue! None of our pods and jobs are able to deploy into the cluster!

@spatelwearpact Samir, this is NOT the right forum for a production outage discussion, please escalate through your AWS support folks.

Copy link
Member

@mmerkes mmerkes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@spatelwearpact
Copy link

@dims your comment is not helpful, the issue we're having is in production and is directly related to this PR. So how about you guys get cracking on this and get it merged instead of telling me to report things to my TAM!

@dims
Copy link
Member

dims commented Jan 31, 2024

@spatelwearpact thanks for the tip!

@dims
Copy link
Member

dims commented Jan 31, 2024

@bryantbiggs
Copy link
Contributor

FYI - the Kubernetes version skew policy does allow you to run 1.28 nodes with a 1.29 control plane if that helps folks get around any immediate issues until a fix is pushed out

@dims
Copy link
Member

dims commented Jan 31, 2024

cc @henry118

@cartermckinnon cartermckinnon merged commit 824c55e into master Jan 31, 2024
2 checks passed
@cartermckinnon cartermckinnon deleted the sandbox-image-timer branch January 31, 2024 21:45
@RobCannon
Copy link

About how long does it take for the new AMI to appear in the AWS EKS console now that this fix is merged?

cartermckinnon added a commit that referenced this pull request Feb 1, 2024
@cartermckinnon cartermckinnon added the changelog/exclude Exclude this PR from future changelog entries. label Feb 2, 2024
@dims
Copy link
Member

dims commented Feb 3, 2024

@RobCannon please see #1597 (comment)

atmosx pushed a commit to gathertown/amazon-eks-ami that referenced this pull request Jun 18, 2024
atmosx pushed a commit to gathertown/amazon-eks-ami that referenced this pull request Jun 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
changelog/exclude Exclude this PR from future changelog entries.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants