-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KEP-3000: Image Promotion and Distribution Policy #3079
Changes from 7 commits
1919dc5
467d84a
bfdee87
300cc1d
b5ffc45
6e68a6c
35d575b
6a1b9bf
f960d6f
0e5fafb
f074245
4a77099
2abfc73
6bfecc8
7030358
f15fdd6
4b14ce8
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
@@ -0,0 +1,192 @@ | ||||||
# KEP/MST-3000: Image Promotion and Distribution Policy | ||||||
|
||||||
<!-- toc --> | ||||||
|
||||||
- [Summary](#summary) | ||||||
- [Background (from wiki)](#background-from-wiki) | ||||||
- [Motivation](#motivation) | ||||||
- [Why a new domain?](#why-a-new-domain) | ||||||
- [How can we help?](#how-can-we-help) | ||||||
- [Goals](#goals) | ||||||
- [Non-Goals](#non-goals) | ||||||
- [What is not in scope](#what-is-not-in-scope) | ||||||
- [What are good goals to shoot for](#what-are-good-goals-to-shoot-for) | ||||||
- [Proposal](#proposal) | ||||||
- [What exactly are you doing?](#what-exactly-are-you-doing) | ||||||
- [User Stories](#user-stories) | ||||||
- [SIG Release - Image Promotion](#sig-release---image-promotion) | ||||||
- [Cloud Customer - Installing K8s via kubeadm](#cloud-customer---installing-k8s-via-kubeadm) | ||||||
- [Notes/Constraints/Caveats](#notesconstraintscaveats) | ||||||
- [Risks and Mitigations](#risks-and-mitigations) | ||||||
- [Design Details](#design-details) | ||||||
- [Release Promotion](#release-promotion) | ||||||
- [Policy](#policy) | ||||||
- [Process](#process) | ||||||
- [Artifact Distribution](#artifact-distribution) | ||||||
- [Policy](#policy-1) | ||||||
- [Process](#process-1) | ||||||
- [Alternatives / Background](#alternatives--background) | ||||||
- [How much is this going to save us?](#how-much-is-this-going-to-save-us) | ||||||
- [Infrastructure Needed](#infrastructure-needed) | ||||||
- [Hack this doc](#hack-this-doc) | ||||||
<!-- /toc --> | ||||||
|
||||||
## Summary | ||||||
|
||||||
The container images and release binaries produced by our community need a clear path to be hosted by multiple service/cloud providers. | ||||||
|
||||||
The global community should be routed to the appropriate mirror for their country or cloud provider to ensure cost effective worldwide access. | ||||||
|
||||||
This KEP should cover the policy and distribution mechanisms we will put in place to allow creating a globally distributed, multi-cloud and country solution. | ||||||
|
||||||
## Background (from wiki) | ||||||
|
||||||
## Motivation | ||||||
|
||||||
For a few years now, we have been using k8s.gcr.io in all our repositories as default repository for downloading images from. | ||||||
|
||||||
The cost of distributing Kubernetes comes at great cost nearing $150kUSD/month (mostly egress) in donations. | ||||||
|
||||||
Additionally some of our community members are unable to access the official release artifacts due to country level firewalls that do not them connect to Google services. | ||||||
|
||||||
Ideally we can dramatically reduce cost and allow everyone in the world to download the artifacts released by our community. | ||||||
|
||||||
We are now used to using the [image promoter process](https://github.com/kubernetes/enhancements/tree/master/keps/sig-release/1734-k8s-image-promoter) to promote images to the official kubernetes container registry using the infrastructure (GCR staging repos etc) provided by [sig-k8s-infra](https://github.com/kubernetes/k8s.io/tree/main/k8s.gcr.io) | ||||||
|
||||||
## Why a new domain? | ||||||
|
||||||
So far we (all kubernetes project) are using GCP as our default infrastructure provider for all things like GCS, GCR, GKE based prow clusters etc. Google has graciously sponsored a lot of our infrastructure costs as well. However for about a year or so we are finding that our costs are sky-rocketing because the community usage of this infrastructure has been from other cloud providers like AWS, Azure etc. So in conjunction with CNCF staff we are trying to put together a plan to host copies of images and binaries nearer to where they are used rather than incur cross-cloud costs. | ||||||
|
||||||
One part of this plan is to setup a proxy OCI service, that can identify where the traffic is coming from and redirect to the nearest image layer/repository. This is why we are setting up a new service using what we call an [oci-proxy](https://github.com/kubernetes-sigs/oci-proxy) for everyone to use. This proxy will identify traffic coming from, for example, a certain AWS region, then will setup a HTTP redirect to a source in that AWS region. If we get traffic from GKE/GCP or we don't know where the traffic is coming from, it will still redirect to the current infrastructure (k8s.gcr.io). | ||||||
|
||||||
## How can we help? | ||||||
|
||||||
When Kubernetes master opens up for v1.25 development, we need to update all default urls in our code and test harness to the new registry url. As a team sig-k8s-infra is signing up to ensure that this oci-proxy based registry.k8s.io will be as robust and available as the current setup. As a backup, we will continue to run the current k8s.gcr.io as well. So do not worry about that going away. Turning on traffic to the new url will help us monitor and fix things if/when they break and we will be able to tune traffic and lower our costs of operation. | ||||||
|
||||||
### Goals | ||||||
|
||||||
A policy and procedure for use by SIG Release to promote container images and release binaries to multiple registries and mirrors. | ||||||
|
||||||
A solution to allow redirection to appropriate mirrors to lower cost and allow access from any cloud or country globally. | ||||||
|
||||||
### Non-Goals | ||||||
|
||||||
Anything related to creation of artifacts, bom, digital signatures, staging buckets. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Starting 1.24, the releases will be signed. digital signatures are now in scope.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Updated in f960d6f |
||||||
|
||||||
### What is not in scope | ||||||
|
||||||
- Currently we focus on AWS only. We are getting a lot of help from AWS in terms of technical details as well as targeted infrastructure costs for standing up and running this infrastructure | ||||||
|
||||||
### What are good goals to shoot for | ||||||
|
||||||
- In terms of cost reduction, monitor GCP infrastructure and get to the point where we fully avoid serving large binary image layers from GCR/GCS | ||||||
- We can add other AWS regions and clouds as needed in well known documented way | ||||||
- Seamless transition for the community from the old k8s.gcr.io to registry.k8s.io with same rock solid stability as we now have with k8s.gcr.io | ||||||
|
||||||
## Proposal | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We should also document that we're explicitly OK with a model where the management of the mirror is opaque to us as long as the other criteria are met. |
||||||
|
||||||
There are two intertwined concepts that are part of this proposal. | ||||||
|
||||||
First, the policy and procedures to promote/upload our artifacts to multiple providers. Our existing processes upload only to GCS buckets. Ideally we extend the existing software/promotion process to push directly to multiple providers. Alternatively we use a second process to synchronize artifacts from our existing production buckets to similar constructs at other providers. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think we have to go too much into technical details here, but I'd like to emphasize that modifying existing tools should be preferred over adding another step to the process. Meaning we can put the whole "Alternatively…" down to the alternatives section. |
||||||
|
||||||
Additionally we require a registry and artifact url-redirection solution to the local cloud provider or country. | ||||||
|
||||||
## What exactly are you doing? | ||||||
|
||||||
- We are setting up an AWS account with an IAM role and s3 buckets in AWS regions where we see a large percentage of source image pull traffic | ||||||
- We will iterate on a sandbox url (registry.sandbox.k8s.io) for our experiments and ONLY promote things to (registry.k8s.io) when we have complete confidence | ||||||
- both registry and registry-sandbox are serving traffic using oci-proxy on google cloud run | ||||||
- oci-proxy will be updated to identify incoming traffic from AWS regions based on IP ranges so we can route traffic to s3 buckets in that region. If a specific AWS region do not currently host s3 buckets, we will redirect to the nearest region which does have s3 buckets (tradeoff between storage and network costs) | ||||||
- We will bulk sync existing image layers to these s3 layers as a starting point (from GCS/GCR) | ||||||
- We will update image-promoter to push to these s3 buckets as well in addition to the current setup | ||||||
- We will set up monitoring/reporting to check on new costs we incur on the AWS infrastructure and update what we do in GCP infrastructure as well to include the new components | ||||||
- We will have a plan in place on how we could add additional AWS regions in the future | ||||||
- We will have CI jobs that will run against registry-sandbox.k8s.io as well to monitor stability before we promote code to registry | ||||||
- We will automate the deployment/monitoring and testing of code landing in the oci-proxy repository | ||||||
|
||||||
### User Stories | ||||||
|
||||||
#### SIG Release - Image Promotion | ||||||
ameukam marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
|
||||||
```feature | ||||||
As a SIG Release volunteer | ||||||
I want to promote our binaries/images to multiple clouds | ||||||
ameukam marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
|
||||||
Given a promotion / manifest | ||||||
When my PR is merged | ||||||
Then the promotion process occurs | ||||||
``` | ||||||
|
||||||
#### Cloud Customer - Installing K8s via kubeadm | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'd add a use case for just pulling an official container image from region X. |
||||||
|
||||||
```feature | ||||||
As a CLOUD end-user | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We should not stick the user story to a cloud environment. We don't want to break the existing way of consuming those container images produced. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Since the most of the spend is only cloud users, this should be fine to focus on for this KEP There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Pulled out User Stories for this merge |
||||||
I want to install Kubernetes | ||||||
|
||||||
Given some compute resources at CLOUD | ||||||
When I use kubeadm to deploy Kubernetes | ||||||
Then I will be redirected to a local CLOUD registry | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. we should be more clear about the meaning of "local":
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. resolved in f960d6f |
||||||
``` | ||||||
|
||||||
### Notes/Constraints/Caveats | ||||||
|
||||||
The primary purpose of the KEP is getting consensus on the agreed policy and procedure to unblock our community and move forward together. | ||||||
|
||||||
There has been a lot of activity around the technology and tooling for both goals, but we need shared agreement on policy and procedure first. | ||||||
|
||||||
### Risks and Mitigations | ||||||
|
||||||
This is the primary pipeline for delivering Kubernetes worldwide. Ensuring the appropriate SLAs and support as well as artifact integrity is crucial. | ||||||
|
||||||
## Design Details | ||||||
|
||||||
### Release Promotion | ||||||
|
||||||
#### Policy | ||||||
|
||||||
(more details needed, #sig-release-eng?) | ||||||
|
||||||
#### Process | ||||||
|
||||||
Currently the promotion process is primarily driven by the CIP/[promo-tool#kpromo](https://github.com/kubernetes-sigs/promo-tools#kpromo)? | ||||||
|
||||||
### Artifact Distribution | ||||||
|
||||||
#### Policy | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We need to detail how to on-board a mirror. E.g.:
Then we can add the mirror and have the front-end server start redirecting traffic. We will want to periodically healthcheck each mirror (e.g. pull a random blob, measure latency). If HC fails, remove mirror until it passes N times. We need a site or something indicating which mirrors are healthy, maybe stats. We will want to log all redirects and set up a PII-anonymizing process so we can publish some aggregated information about how much traffic is going to each mirror, top images globally, etc. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Going off of what @ameukam says here, we are working closely with the providers who consume the most to bring up infra that we manage. |
||||||
|
||||||
#### Process | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We will need to detail how we turn up the new DNS name and redirector and how we plan to convert users of old GCR name into the new name. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Members of sig-k8s-infra have made PRs against projects like kops and Kubernetes to change the defaults. |
||||||
|
||||||
Artifacts will be written to S3 style storage or CDNs provided by cloud providers through a tool in the promo-tools suite. | ||||||
|
||||||
## Alternatives / Background | ||||||
|
||||||
- Original KEP | ||||||
- https://github.com/kubernetes/enhancements/tree/master/keps/sig-release/1734-k8s-image-promoter | ||||||
- Oras | ||||||
- https://github.com/oras-project/oras | ||||||
- KubeCon Talk | ||||||
- https://www.youtube.com/watch?v=F2IFjz7sr9Q | ||||||
- Apache has a widespread mirror network | ||||||
- @dims has experince here | ||||||
- http://ws.apache.org/mirrors.cgi | ||||||
- https://infra.apache.org/mirrors.html | ||||||
- [Umbrella issue: k8s.gcr.io => registry.k8s.io solution k/k8s.io#1834 | ||||||
](https://github.com/kubernetes/k8s.io/issues/1834) | ||||||
- [ii/registry.k8s.io Implementation proposals](https://github.com/ii/registry.k8s.io#registryk8sio) | ||||||
- [ii.nz/blog :: Building a data pipline for displaying Kubernetes public artifact traffic | ||||||
](https://ii.nz/post/building-a-data-pipline-for-displaying-kubernetes-public-artifact-traffic/) | ||||||
|
||||||
### How much is this going to save us? | ||||||
|
||||||
Cost of K8s Artifact hosting - Data Studio Graphs | ||||||
|
||||||
![](https://i.imgur.com/LAn4UIE.png) | ||||||
|
||||||
## Infrastructure Needed | ||||||
|
||||||
It would be good to request some donations for some larger providers, including one in China, via cncf.io/credits | ||||||
|
||||||
## Hack this doc | ||||||
|
||||||
- [![hackmd-github-sync-badge](https://hackmd.io/KjHufZssQR654ShkZFUzyA/badge)](https://hackmd.io/KjHufZssQR654ShkZFUzyA) | ||||||
- [kubernetes/enhancements!3079](https://github.com/kubernetes/enhancements/pull/3079) |
Original file line number | Diff line number | Diff line change | ||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
@@ -0,0 +1,24 @@ | ||||||||||||||||||
title: Artifact Distribution Policy | ||||||||||||||||||
kep-number: 3000 | ||||||||||||||||||
authors: | ||||||||||||||||||
- "@hh" | ||||||||||||||||||
- "@BobyMCbobs" | ||||||||||||||||||
owning-sig: sig-release | ||||||||||||||||||
participating-sigs: | ||||||||||||||||||
- sig-k8s-infra | ||||||||||||||||||
status: provisional | ||||||||||||||||||
creation-date: 2021-11-26 | ||||||||||||||||||
reviewers: | ||||||||||||||||||
- "@cpanato" | ||||||||||||||||||
- "@puerco" | ||||||||||||||||||
- "@spiffxp" | ||||||||||||||||||
- "@thockin" | ||||||||||||||||||
approvers: | ||||||||||||||||||
- "@ameukam" | ||||||||||||||||||
- "@justaugustus" | ||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. updated in 6bfecc8 |
||||||||||||||||||
stage: alpha | ||||||||||||||||||
latest-milestone: "v1.24" | ||||||||||||||||||
milestone: | ||||||||||||||||||
alpha: "v1.24" | ||||||||||||||||||
beta: "v1.25" | ||||||||||||||||||
stable: "v1.26" | ||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We should update this to reflect the current progress we make in kubernetes-sigs/promo-tools#533
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Updated in 0e5fafb |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should break this into 2 major phases:
We may even want to break it to 2 KEPs so we can "finish" one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good thought! This KEP just focuses on container images now.