Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[cabpk] Support for machines images with pre-pulled container images from non-standard repos #1767

Closed
akutz opened this issue Aug 15, 2019 · 26 comments
Labels
area/bootstrap Issues or PRs related to bootstrap providers kind/feature Categorizes issue or PR as related to a new feature. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. priority/backlog Higher priority than priority/awaiting-more-evidence.

Comments

@akutz
Copy link
Contributor

akutz commented Aug 15, 2019

/kind feature

Note: This issue could also have been filed to the the kubernetes-sigs/image-builder repository.

Describe the solution you'd like
Machine images built from the CAPI image builder contain pre-pulled container images for the following components:

  • coredns
  • etcd
  • kube-proxy
  • kubernetes-apiserver
  • kubernetes-controller-manager
  • kubernetes-scheduler

Thanks to kubernetes-sigs/image-builder#16 and kubernetes-sigs/image-builder#19 by @figo, machine images may be built using custom package/container repositories as the source for the above container images.

However, this presents a problem as initially reported by @imikushin -- kubeadm's ClusterConfiguration must be used to specify a custom image repository:

// ImageRepository sets the container registry to pull images from.
// If empty, `k8s.gcr.io` will be used by default; in case of kubernetes version is a CI build (kubernetes version starts with `ci/` or `ci-cross/`)
// `gcr.io/kubernetes-ci-images` will be used as a default for control plane components and for kube-proxy, while `k8s.gcr.io`
// will be used for all the other images.
ImageRepository string `json:"imageRepository,omitempty"`

There is a way around this -- to manually load custom images into the local image repository in the namespace k8s.gcr.io, however this has longer-term implications as it hides the true origin of the custom images.

As @imikushin pointed out, his team uses a custom machine image that merges an incoming kubeadm configuration with a local version that overrides the imageRepository setting to ensure the correct, custom images are used. However, custom machine images are not a desirable solution.

On a call with Ivan and @detiber, Jason suggested the possibility of creating a ConfigMap on the management cluster that could contain overrides for the kubeadm bootstrap provider.

What we all agreed is this -- it should not be the responsibility of the end-user who is deploying a cluster to know that the machine images in use may be pre-loaded with custom container images for the Kubernetes components. Not should it be the end-user's job to create the kubeadm configuration that specifies the use of those container images.

This issue tracks the discussion around how to support deploying machine images with pre-pulled container images from non-standard image repositories without forcing an end-user to specify the non-standard repository as part of the cluster/machine configuration.

cc @imikushin @detiber @shep @zjs @chuckha @figo

@timothysc timothysc assigned akutz, detiber, figo and chuckha and unassigned akutz Aug 15, 2019
@akutz
Copy link
Contributor Author

akutz commented Aug 19, 2019

/assign @fabriziopandini

Hi Fabrizio,

I'm assigning this to you to work with @figo on this.

@detiber
Copy link
Member

detiber commented Aug 28, 2019

@fabriziopandini @figo any updates on this area?

@akutz
Copy link
Contributor Author

akutz commented Aug 28, 2019

I think this was handled by kubernetes-sigs/image-builder#26 by @figo

@detiber
Copy link
Member

detiber commented Aug 29, 2019

@akutz that was the image-builder portion, but we had also discussed having a way for users creating a cluster to not necessarily need to know what all kubeadm parameters needed to be overridden in the KubeadmConfig bootstrapping configuration.

@fabriziopandini
Copy link
Member

fabriziopandini commented Aug 29, 2019

some initial work.

  • we want to preserve the true origin of the custom images, not to mask by renaming with k8s.gcr.io.
  • similarly, we want to preserve the tag of the custom images, not to mask by renaming with an upstream release tag.
  • kubeadm can handle both if image repository and Kubernetes version are set
  • preloading images in the CRI should be executed before running kubeadm init/join, as part of the (machine) image build process (we are considering two options for this)
  • how to make image repository and Kubernetes version settings transparent to the user is still TBD

@detiber
Copy link
Member

detiber commented Aug 29, 2019

@chuckha @akutz @fabriziopandini @figo @zjs Since we already have the image building foundation in place, I think we can isolate any additional changes needed to support streamlining UX for setting/defaulting the image configuration in the KubeadmConfig to just CABPK, and we still have a non-trivial problem to solve, I suggest that we do not block the initial v1alpha2 release on this issue and address it as a followup to the initial v1alpha2 release (but prior to v1alpha3), thoughts?

@chuckha
Copy link
Contributor

chuckha commented Aug 29, 2019

Agreed with @detiber. In theory all the plumbing exists to allow custom image repositories but the experience for the user is quite bad. They have to know magic characters to type in in order for their cluster to work.

While this is entirely impractical I don't see it as blocking the initial release and this can be addressed in this kubeadm provider after the initial release. Worst case we have to add a field that only this provider uses/knows about which is still backwards compatible and doesn't even warrant a minor version change. However, I think there are many things the image builder could do to help this along, such as providing some kind of script or metadata to use before kubeadm init is executed.

@figo
Copy link
Contributor

figo commented Aug 29, 2019

  • how to make image repository and Kubernetes version settings transparent to the user is still TBD

this is where bootstrap-provider-kubeadm plumbing is needed.

Agreed with @detiber. In theory all the plumbing exists to allow custom image repositories but the > experience for the user is quite bad. They have to know magic characters to type in in order for
their cluster to work.

@chuckha first of all, the default case does not change,
the customization case UX should not be too bad if it is one additional field to configure, document from vendor should help to alleviate the problem (which they already doing anyway).

@detiber
Copy link
Member

detiber commented Aug 29, 2019

Based on the feedback received, I'm going to bump this to v0.1.x and we can work out the rest of the logistics needed later.

@chuckha
Copy link
Contributor

chuckha commented Aug 29, 2019

Had a chat with @figo. Looks like the one of the problems is that kubeadm uses 3.1 as a hardcoded constant for the pause container. This is likely able to be addressed at the image build layer and potentially in Kubeadm itself.

@angarg
Copy link

angarg commented Sep 17, 2019

I wonder if building on top of kubernetes/enhancements#1159 is a way to get this behavior? In that request, a user of kubeadm would supply a kustomize directory structure for customizations to be applied to the static manifests that kubeadm generates. If our provider image contained a location of our kustomization overlays, and if the kubeadm call used it, then could that provide us with the mechanism we need to modify the bootstrap process?

@detiber
Copy link
Member

detiber commented Sep 23, 2019

@angarg longer term, yes. However, we would need to be able to support Kubernetes v1.(n-2).x where n is the latest supported Kubernetes release. Once that minimum supported version of kubeadm has support for kustomize patches, then we could start to rely on them.

@akutz
Copy link
Contributor Author

akutz commented Oct 9, 2019

ping @ncdc @vincepri @juan-lee

@ncdc
Copy link
Contributor

ncdc commented Oct 9, 2019

Had a chat with @figo. Looks like the one of the problems is that kubeadm uses 3.1 as a hardcoded constant for the pause container. This is likely able to be addressed at the image build layer and potentially in Kubeadm itself.

You can override this, but it's not pretty. You have to specify the args to the kubelet to set the --pod-infra-container-image flag.

@ncdc
Copy link
Contributor

ncdc commented Oct 9, 2019

I have a couple of thoughts around simplifying the UX. Some of them are covered in #1227. At this point I think I'm interested in researching two complementary approaches:

  1. In the absence of a KubeadmConfig and/or an an infra machine (e.g. AWSMachine), a component could create these for you and wire them up to your Machine (and do the same thing for Cluster + infrastructure ref). We could have a way to define defaults (e.g. "I want to use CAPZ for all my clusters").

  2. Either separately or in combination with the previous item, have a way to merge in user-specified defaults to user-created KubeadmConfigs/AWSMachines/etc before proceeding to generate bootstrap data, provision machines, etc. This is somewhat like using kustomize to merge in changes.

@ncdc ncdc changed the title Support for machines images with pre-pulled container images from non-standard repos [cabpk] Support for machines images with pre-pulled container images from non-standard repos Nov 13, 2019
@ncdc ncdc transferred this issue from kubernetes-retired/cluster-api-bootstrap-provider-kubeadm Nov 13, 2019
@k8s-ci-robot k8s-ci-robot added the kind/feature Categorizes issue or PR as related to a new feature. label Nov 13, 2019
@ncdc ncdc added this to the v0.3.0 milestone Nov 13, 2019
@ncdc ncdc added the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label Nov 13, 2019
@ncdc
Copy link
Contributor

ncdc commented Nov 13, 2019

@akutz I moved this into CAPI

@chuckha
Copy link
Contributor

chuckha commented Nov 15, 2019

It's been a while so I want to recap what I think I'm reading:

The desired outcome is to have CABPK work with non-default images without the user specifying the location of the images AND without automatically masking the images by renaming them to match the expected images.

This to me sounds provider specific and it will be a matter of the provider coordinating with the bootstrap provider to set the appropriate flags.

We should also outline what flags/options need to be set:

  1. The image repository
  2. We need to enable the kubelet to use a different pause container, however --pod-infra-container-image is a docker only flag according to the flag help:

root@ip-172-31-42-251:~# ./kubelet --help | grep pause
--pod-infra-container-image string The image whose network/ipc namespaces containers in each pod will use. This docker-specific flag only works when container-runtime is set to docker. (default "k8s.gcr.io/pause:3.1")

Is there a way to get the kubelet to use a different image with a different CRI?

@ncdc
Copy link
Contributor

ncdc commented Dec 20, 2019

The desired outcome is to have CABPK work with non-default images without the user specifying the location of the images

Is this accurate? If so, I'm not sure this is possible?

Is setting the image repository and version sufficient?

@detiber
Copy link
Member

detiber commented Dec 20, 2019

The desired outcome is to have CABPK work with non-default images without the user specifying the location of the images

Is this accurate? If so, I'm not sure this is possible?

I think the idea is that a management cluster administrator could provide some values somehow that could be used as additional default values for clusters created by workload cluster administrators.

@ncdc
Copy link
Contributor

ncdc commented Dec 20, 2019

Ah, that makes sense. We could prototype a standalone POC that has a defaulting webhook for KubeadmConfigs that is configurable, maybe from a ConfigMap.

@ncdc ncdc added priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. and removed priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. labels Jan 22, 2020
@vincepri
Copy link
Member

/milestone Next

@k8s-ci-robot k8s-ci-robot modified the milestones: v0.3.0, Next Feb 12, 2020
@vincepri
Copy link
Member

/area bootstrap
/priority backlog

@k8s-ci-robot k8s-ci-robot added area/bootstrap Issues or PRs related to bootstrap providers priority/backlog Higher priority than priority/awaiting-more-evidence. labels Apr 20, 2020
@vincepri vincepri removed the priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. label Apr 20, 2020
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 19, 2020
@vincepri
Copy link
Member

/lifecycle frozen

@k8s-ci-robot k8s-ci-robot added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jul 20, 2020
@vincepri
Copy link
Member

/close

On October 22nd we discussed this issue and came to the conclusion that this issue shouldn't probably be solved within Cluster API, but at a layer above. This is already possible today by using KCP's ImageRepository fields and pre-baking the container images in a VM image.

@k8s-ci-robot
Copy link
Contributor

@vincepri: Closing this issue.

In response to this:

/close

On October 22nd we discussed this issue and came to the conclusion that this issue shouldn't probably be solved within Cluster API, but at a layer above. This is already possible today by using KCP's ImageRepository fields and pre-baking the container images in a VM image.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/bootstrap Issues or PRs related to bootstrap providers kind/feature Categorizes issue or PR as related to a new feature. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. priority/backlog Higher priority than priority/awaiting-more-evidence.
Projects
None yet
Development

No branches or pull requests

10 participants