Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve EKS AMI Lookups #3663

Open
luthermonson opened this issue Aug 11, 2022 · 8 comments
Open

Improve EKS AMI Lookups #3663

luthermonson opened this issue Aug 11, 2022 · 8 comments
Labels
area/provider/eks Issues or PRs related to Amazon EKS provider kind/feature Categorizes issue or PR as related to a new feature. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.

Comments

@luthermonson
Copy link
Contributor

luthermonson commented Aug 11, 2022

/kind feature

Describe the solution you'd like
I would like to propose changing the AMI lookups a bit to standardize EKS and create some common terms.

Problem: with these two pull requests (one, two) there was support added for pulling from SSM the latest AMI ID for EKS optimized images. TL;DR on the SSM method is that AWS keeps these public key values up to date when they build new images and it’s a guaranteed place to find the most up to date AMI ID. Problems with this are:

  • Not all flavors of image are available in SSM for example all Ubuntu images don’t have a key
  • Doesn’t allow for wildcards, keys have to be exact
  • This SSM method was made to “eliminate the need for you to manually look up the Amazon EKS optimized AMI IDs”
  • Introduced new AMI “LookupTypes” which don’t really match any CAPA image naming standards.

Solution: The existing AMI search is perfectly adequate to find AMI IDs, this method takes some simple lookup params (template, baseos, kubeversion) and performs an AMI search and then loops over them and finds the most recent one and returns that AMI ID. This method allows for wildcards and more levers to pull in the query params than the EKS keys. So I propose the following:

  • Eliminate the “LookupType” linux flavor concept and move the SSM lookup method to a string where they can pass the exact SSM key they want. This makes code more maintainable (no sprintf templates) and lets the user set their key at pool creation time to exactly what they want. The logic for AMI lookup will be…
    • If AMI.ID set, pull that
    • If SSM Key set, pull that
    • Default to AMI Query with any lookup params passed or their defaults.
  • Rewrite the SSM Keys into DescribeImages input queries.
  • Add ARM64 support, this PR attempts this but I feel the better method than asking the instance type it’s arch is to add an ImageLookupArch which defaults to amd64.
  • Standardize EKS to the CAPA BaseOS terms… change AmazonLinux to amazon-2, AmazonLinuxGPU is amazon-2-gpu.

Pros: Enables ubuntu-18.04 and ubuntu-20.04 support for EKS. Enables ARM64 support across the entire project as currently x86_64 is hardcoded in all DescribeImages queries. Standardizes the support BaseOS and gives them names and constants to validate against.

Cons: Deprecates some APIs that may be in use, however… the one I'm looking at is AMIReference.EKSOptimizedLookupType and it’s not even documented and in experimental EKS features.

Sample list of BaseOS across Self managed/EKS from my current WIP

Amazon2         = "amazon-2"          // capa, eks
Amazon2GPU      = "amazon-2-gpu"      // eks
Ubuntu1804      = "ubuntu-18.04"      // capa, eks
Ubuntu2004      = "ubuntu-20.04"      // capa, eks
CentOS7         = "centos-7"          // capa
FlatcarStable   = "flatcar-stable"    // capa
BottleRocket    = "bottlerocket"      // eks
Windows2019Core = "windows-2019-core" // eks
Windows2019Full = "windows-2019-full" // eks
Windows2022Core = "windows-2022-core" // eks (coming soon)
Windows2022Full = "windows-2022-full" // eks (coming soon)

and the Name templates for EKS from my WIP look like this.

// go template for AMI Name Lookup
var eksAMINameFormats = map[string]string{
	Amazon2:         "amazon-eks-node-{{.K8sVersion}}-v*", //amazon-2
	Amazon2GPU:      "",
	Ubuntu1804:      "ubuntu-eks/k8s_{{.K8sVersion}}/images/*18.04*",
	Ubuntu2004:      "ubuntu-eks/k8s_{{.K8sVersion}}/images/*20.04*",
	BottleRocket:    "bottlerocket-aws-k8s-{{.K8sVersion}}*  ",
	Windows2019Core: "Windows_Server-2019-English-Core-EKS_Optimized-{{.K8sVersion}}-*", // coming soon
	Windows2019Full: "Windows_Server-2019-English-Full-EKS_Optimized-{{.K8sVersion}}-*", // coming soon
	Windows2022Core: "Windows_Server-2022-English-Core-EKS_Optimized-{{.K8sVersion}}-*", // currently unavailable
	Windows2022Full: "Windows_Server-2022-English-Full-EKS_Optimized-{{.K8sVersion}}-*", // currently unavailable
	CentOS7:         "",                                                                 // unavailable, leave empty so ok check fails and return default, add pattern if they become available
	FlatcarStable:   "",                                                                 // unavailable, leave empty so ok check fails and return default, add pattern if they become available
}

Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]

Environment:

  • Cluster-api-provider-aws version:
  • Kubernetes version: (use kubectl version):
  • OS (e.g. from /etc/os-release):
@k8s-ci-robot k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. needs-priority needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Aug 11, 2022
@luthermonson
Copy link
Contributor Author

Related Issues

@Skarlso
Copy link
Contributor

Skarlso commented Aug 16, 2022

The proposal looks reasonable to me. But I'm not great in the AMI code lookup part atm. I will take a look at the code review accordingly.

That said, I like this approach. It adds more flexibility to how we would use AMI lookup and add support for other types much easier if I understand it correctly.

@richardcase
Copy link
Member

/triage accepted
/priority important-soon
/milestone v1.6.0
/area provider/eks

@k8s-ci-robot k8s-ci-robot added this to the v1.6.0 milestone Sep 2, 2022
@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. area/provider/eks Issues or PRs related to Amazon EKS provider and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority labels Sep 2, 2022
@Skarlso
Copy link
Contributor

Skarlso commented Sep 8, 2022

Heh, turns out there is an ancient issue about something like this already here #1869

@richardcase richardcase removed this from the v1.6.0 milestone Nov 10, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Mark this PR as fresh with /remove-lifecycle stale
  • Close this PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-triage-robot
Copy link

This issue is labeled with priority/important-soon but has not been updated in over 90 days, and should be re-triaged.
Important-soon issues must be staffed and worked on either currently, or very soon, ideally in time for the next release.

You can:

  • Confirm that this issue is still relevant with /triage accepted (org members only)
  • Deprioritize it with /priority important-longterm or /priority backlog
  • Close this issue with /close

For more details on the triage process, see https://www.kubernetes.dev/docs/guide/issue-triage/

/remove-triage accepted

@k8s-ci-robot k8s-ci-robot added lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. and removed triage/accepted Indicates an issue or PR is ready to be actively worked on. labels Feb 8, 2023
@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

If CAPA/CAPI contributors determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle rotten
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Mar 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/provider/eks Issues or PRs related to Amazon EKS provider kind/feature Categorizes issue or PR as related to a new feature. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.
Projects
None yet
Development

No branches or pull requests

5 participants