Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for using ECR as pull-through image cache #16593

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

rsafonseca
Copy link
Contributor

@rsafonseca rsafonseca commented May 30, 2024

This PR introduces a simple way to enable using ECR as a Pull-through image cache, without having to mutate images on the cluster using tools like Kyverno.

Containerd already has support for specifying registry mirrors in kops, but since ECR uses short lived tokens, it's not trivial (or even impossible without adding a few extra hacks on top of it) to use it as a pull-through cache.

This PR also bumps the ecr-credential-provider binary, which before version 1.29.0 specifically tried to parse an ECR repo URL from the image passed, leading to not being possible to enable this feature. This is now resolved in the latest versions.

This PR uses a flag to enable the feature when needed, and adds any server addresses configured in the mirrors to be allowed on the CredentialProviderConfig object that kops configures on the kubelet.

To configure this:

example:

spec:
  containerd:
    useECRCredentialsForMirrors: true
    registryMirrors:
      docker.io:
        - https://<MY-AWS-ACCOUNT>.dkr.ecr.us-east-1.amazonaws.com/v2/<MY cache rule namespace, e.g docker-hub>

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label May 30, 2024
@k8s-ci-robot
Copy link
Contributor

Hi @rsafonseca. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels May 30, 2024
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign rifelpet for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added area/api area/nodeup size/M Denotes a PR that changes 30-99 lines, ignoring generated files. area/documentation size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels May 30, 2024
@hakman
Copy link
Member

hakman commented May 30, 2024

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels May 30, 2024
@rsafonseca
Copy link
Contributor Author

rsafonseca commented May 30, 2024

Any idea why these e2e tests might be failing @hakman?
It seems odd that they never come up and the log dump references not finding an unrelated role that should belong to another job
e.g.
pull-kops-e2e-cni-amazonvpc:

0530 19:12:47.346328   14816 dumplogs.go:51] /home/prow/go/src/k8s.io/kops/.build/dist/linux/amd64/kops toolbox dump --name e2e-pr16593.pull-kops-e2e-cni-amazonvpc.test-cncf-aws.k8s.io --dir /logs/artifacts --private-key /tmp/kops/e2e-pr16593.pull-kops-e2e-cni-amazonvpc.test-cncf-aws.k8s.io/id_ed25519 --ssh-user ubuntu
W0530 19:12:59.466784   50124 aws.go:2055] could not find role "masters.e2e-pr16593.pull-kops-e2e-cni-cilium-ipv6.test-cn-40qra0". Resource may already have been deleted: operation error IAM: GetInstanceProfile, https response error StatusCode: 404, RequestID: 013035b7-7eba-43a9-98dd-abf842145433, NoSuchEntity: Instance Profile masters.e2e-pr16593.pull-kops-e2e-cni-cilium-ipv6.test-cn-40qra0 cannot be found.
W0530 19:13:02.485316   50124 aws.go:2055] could not find role "nodes.e2e-pr16593.pull-kops-e2e-cni-cilium-ipv6.test-cncf-47o1f8". Resource may already have been deleted: operation error IAM: GetInstanceProfile, https response error StatusCode: 404, RequestID: 4bc7f2c4-9c38-4723-85bd-b493cf0794c8, NoSuchEntity: Instance Profile nodes.e2e-pr16593.pull-kops-e2e-cni-cilium-ipv6.test-cncf-47o1f8 cannot be found.

could something be leaking somewhere? or do you think there's any change here that could be somehow related to this weird behaviour?

EDIT: nvm, found the problem... hidden tabs in string literal.. damn vscode 😂
the above behaviour is still odd though, seems like the logs are leaking across jobs when they fail

@hakman
Copy link
Member

hakman commented May 31, 2024

Please ignore the IPv6 tests

@rsafonseca
Copy link
Contributor Author

All good then, though it looks like pull-kops-e2e-cni-cilium-eni is flaky, passed on retest.
Any thoughts on this PR @hakman ? Does it make sense to you to add the config option under containerd on the CRD? :)

@k8s-ci-robot k8s-ci-robot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. do-not-merge/contains-merge-commits and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Jun 9, 2024
@k8s-ci-robot k8s-ci-robot added the area/provider/aws Issues or PRs related to aws provider label Jun 10, 2024
@rsafonseca rsafonseca force-pushed the containerd_ecr_mirrors branch 2 times, most recently from 615a93f to a7dbaac Compare June 10, 2024 18:20
@sczelo
Copy link

sczelo commented Jun 21, 2024

Any update about this PR? Do you need any help or test, to be merged faster?
I built the images locally from this repo and tested it on AWS. It's worked for me.

@SzilardS
Copy link

SzilardS commented Jul 2, 2024

@rifelpet @hakman @johngmyers is there any way we can help you guys to get it merged before/into the 1.30 release?

@elliotdobson
Copy link
Contributor

This PR also bumps the ecr-credential-provider binary, which before version 1.29.0 specifically tried to parse an ECR repo URL from the image passed, leading to not being possible to enable this feature. This is now resolved in the latest versions.

@rsafonseca can you provide some more details about this? An issue or PR from the source repo?

@rsafonseca
Copy link
Contributor Author

There is no specific issue regarding this, simply the failure logic was changed some commits ago to not error out when it fails to parse an ECR specific url. This is easy to test with the old vs newer binaries, the version that kops currently has in the assets will drop out and not pass the ecr auth token when the referenced image repo is not ECR, so when containerd tries to use ECR as a mirror it has no creds.

@rsafonseca
Copy link
Contributor Author

Hey folks, can anyone take a look at this? :)

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 7, 2024
@k8s-ci-robot k8s-ci-robot added area/addons size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. area/channels area/kops-controller area/provider/gcp Issues or PRs related to gcp provider area/provider/openstack Issues or PRs related to openstack provider area/provider/scaleway Issues or PRs related to Scaleway provider area/provider/spotinst Issues or PRs related to spotinst provider area/rolling-update and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Oct 24, 2024
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Oct 24, 2024
@rsafonseca
Copy link
Contributor Author

/retest-required

@k8s-ci-robot
Copy link
Contributor

@rsafonseca: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-kops-e2e-aws-upgrade-k130-ko130-to-klatest-kolatest-many-addons 8b06aa4 link false /test pull-kops-e2e-aws-upgrade-k130-ko130-to-klatest-kolatest-many-addons
pull-kops-kubernetes-e2e-ubuntu-gce-build 8b06aa4 link false /test pull-kops-kubernetes-e2e-ubuntu-gce-build
pull-kops-e2e-cni-kuberouter 8b06aa4 link false /test pull-kops-e2e-cni-kuberouter
pull-kops-e2e-cni-cilium-eni 9a76a8b link true /test pull-kops-e2e-cni-cilium-eni
pull-kops-e2e-cni-flannel 9a76a8b link true /test pull-kops-e2e-cni-flannel

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/addons area/api area/channels area/documentation area/kops-controller area/nodeup area/provider/aws Issues or PRs related to aws provider area/provider/gcp Issues or PRs related to gcp provider area/provider/openstack Issues or PRs related to openstack provider area/provider/scaleway Issues or PRs related to Scaleway provider area/provider/spotinst Issues or PRs related to spotinst provider area/rolling-update cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants