Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ignition support in bootstrap provider #3430

Closed
dongsupark opened this issue Jul 30, 2020 · 25 comments · Fixed by #4172
Closed

Add ignition support in bootstrap provider #3430

dongsupark opened this issue Jul 30, 2020 · 25 comments · Fixed by #4172
Assignees
Labels
area/bootstrap Issues or PRs related to bootstrap providers kind/design Categorizes issue or PR as related to design. kind/feature Categorizes issue or PR as related to a new feature. lifecycle/active Indicates that an issue or PR is actively being worked on by a contributor.

Comments

@dongsupark
Copy link
Member

Detailed Description

In CABPK, KubeadmConfig.Spec already has Format field, which is used to specify the output format of the bootstrap data that the controller generates. At the moment the field is not used anywhere. So CABPK always relies on cloud-init.

We can use the field to support other types of bootstrap configs, other than the default cloud-init. It would be a good use case to add ignition, used by Flatcar Container Linux or Fedora CoreOS.

Previous attempts:

There has been a project cluster-api-bootstrap-provider-kubeadm-ignition, which is actually a fork of another repo. Though the original repo from minsheng-fintech was recently removed, I am not sure why. I tried to contact the original author, but so far could not hear anything from him.

Anyway based on the code, I have created a PoC branch on top of the current cluster-api code base. Of course it is still up for discussions.

Related issues:

#1576
#1582
#3064

/kind feature
/area bootstrap

/cc @vbatts @t-lo @ncdc @vincepri @detiber

@k8s-ci-robot k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. area/bootstrap Issues or PRs related to bootstrap providers labels Jul 30, 2020
@vincepri
Copy link
Member

/kind design
/milestone v0.4.0

@k8s-ci-robot k8s-ci-robot added this to the v0.4.0 milestone Jul 30, 2020
@k8s-ci-robot k8s-ci-robot added the kind/design Categorizes issue or PR as related to design. label Jul 30, 2020
@rudoi
Copy link
Contributor

rudoi commented Jul 30, 2020

@dongsupark thanks for opening this!

We use quite a bit of Flatcar and were looking into adding this support a while back and we were struggling to think of the right way to handle the "immutability" of it in the AWS provider, for example.

CAPA kind of assumes the AMI has the correct version of kubeadm on it, so I was curious if you'd thought about this at all. If I understand CoreOS/Flatcar correctly, it's not best practice to publish a bunch of different machine images - you'd use the official release image and then use systemd units, etc to make sure your dependencies were installed.

Would love to hear your thoughts. I acknowledge that this isn't strictly related to this issue, though 😅.

@dongsupark
Copy link
Member Author

CAPA kind of assumes the AMI has the correct version of kubeadm on it, so I was curious if you'd thought about this at all. If I understand CoreOS/Flatcar correctly, it's not best practice to publish a bunch of different machine images - you'd use the official release image and then use systemd units, etc to make sure your dependencies were installed.

You are right.
That is the main reason why we need to create CAPA AMI for Flatcar, making use of a pending PR. It basically downloads necessary binaries under /opt, a read-write partition inside Flatcar. Then the final AMI will include everything we need, like kubeadm, containerd, crictl, etc.
Once the PR got merged, we will publish the CAPA AMI, and support other providers as well.

@vincepri
Copy link
Member

/milestone v0.3.x

Synced up with @dongsupark on slack, we're going to copy and adapt the CABPK bootstrapper and provide new types and controllers for the kubeadm-ignition based one. Everything will live under the same CABPK group as new types, but under the experimental folder for now.

@vincepri
Copy link
Member

/milestone Next

@k8s-ci-robot k8s-ci-robot modified the milestones: v0.3.9, Next Aug 25, 2020
@vbatts
Copy link

vbatts commented Oct 7, 2020

@vincepri curious, would this support need to be include in the v1alpha1 roadmap?

@vbatts
Copy link

vbatts commented Oct 7, 2020

(asking as I joined the call today and was looking over #3754)

@vincepri
Copy link
Member

vincepri commented Oct 7, 2020

(Assuming you meant v1alpha4) We can just add it to the roadmap, we just need someone assigned to push it forward.

There is also the node agent talks that came up from #2554, which might be of interest for you all

cc @randomvariable

@vbatts
Copy link

vbatts commented Oct 14, 2020

@vincepri yes, sorry v1alpha4. And yes, that issue is very related. The secrets access is part of cloud-init on AWS that we can not support currently, and upstream ignition has rejected the multi-part mime support needed for handling these secrets on AWS. It's not an ideal spot.

@vbatts
Copy link

vbatts commented Oct 14, 2020

I was look at #3761, and it's the basis of my ^^ comment.

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 12, 2021
@fabriziopandini
Copy link
Member

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 12, 2021
@invidian
Copy link
Member

Hey all, I had a look into #3437 and comments there and I plan to open a replacement PR to solve this issue, but wanted to share my idea for solving it before completing this work.

I would like to propose the following move things forward:

  1. Add format field to bootstrap Secret object, next to value field, so Infrastructure providers can automatically identify bootstrap data format and act accordingly if needed. Different behavior for AWS is required for example, as Ignition do not support AWS Secret Manager, meaning CAPA must upload Ignition data to S3 to make nodes be able to read it while first booting.
    Such approach also keeps CABPK cloud-agnostic as it should be, so then each Infrastructure provider can use their own technology to securely deliver the bootstrap data
    to the instances.

  2. Add ignition as a valid value for KubeadmControlplane.spec.kubeadmConfigSpec.format field. If this field is set, CABPK will generate bootstrap data in Ignition format rather than in cloud-init format.

Now, cloud-init config generated by CABPK together with fields exposed by KubeadmControlPlane.spec.kubeadmConfigSpec like users or ntp cannot be mapped 1:1 and in 100% to Ignition, but same result can be achieved in a different way.

CABPK core uses 2 features of cloud-init for bootstrapping: run_cmd and write_files.

For run_cmd, CABPK will generate kubeadm.service systemd unit and kubeadm.sh script file, which will include:

  • preKubeadmCommands
  • actual kubeadm join/init command
  • postKubeadmCommands
  • Removal of kubeadm configuration file, as with Ignition files cannot be written to /tmp directory, as it is done with cloud-init.

write_files can easily be replaced with storage.files. Also code makes it easy to convert from one format which is used internally to the other.

However, cloud-init additionally offers Jinja templating for all files which will be written, which is used for example by CAPA (example). Such feature is not available with Ignition.

Fortunately CABPK does not use templating for bootstrapping files, except settings which might be controlled by user (like in CAPA example), which makes things a bit simpler, as templating can be moved to infrastructure provider (cluster configuration template).

If user needs to template some parts of the configuration, they can use preKubeadmCommands OR create their own systemd service running before mentioned kubeadm.service unit.

Also, as far as I saw, no major Infrastructure provider use templating extensively or in options other than writing files.

To break down "additional" fields in KubeadmControlPlane.spec.kubeadmConfigSpec:

  • diskSetup - It should be possible to map it to storage.disks.
  • files - Can be mapped 1:1 to storage.files.
  • mounts - It should be possible to map it to storage.filesystems?
  • ntp- Perhaps could me mapped to configure systemd-timesyncd.
  • users - Can be mapped to passwd.users.

  1. Add ignition field to KubeadmControlPlane.spec.kubeadmConfigSpec with the following structure:
kind: KubeadmControlPlane
spec:
  kubeadmConfigSpec:
    ignition:
      containerLinuxConfig:
        additionalConfig: |
          ---
          systemd:
            units: ...

Right now Fedora CoreOS uses Ignition version 3.0+ and Flatcar is still using 2.3 and as suggested in #3437 (comment), only 3.0+ should be used right now, which can optionally be downgraded to 2.3-compatible format.

Having the structure above will keep enough space to extend the structure in the future to something like:

kind: KubeadmControlPlane
spec:
  kubeadmConfigSpec:
    ignition:
      fedoraCoreOSConfig:

Or:

kind: KubeadmControlPlane
spec:
  kubeadmConfigSpec:
    ignition:
      containerLinuxConfig:
        version: 3.0
        additionalConfig: |
          ---
          systemd:
            units: ...

additionalConfig field will allow users to specify their own configuration in Ignition-native (or CLC/FCCT) way and to access features not available in kubeadmConfigSpec like adding systemd units.

additionalConfig field will be of type string to avoid pulling entire Ignition config scheme into KubeadmControlPlane CRD and also to allow using different transpiler versions in the future if desired.

Please let me know what you think and if such addition would be acceptable :)

BTW I've already create a PoC for it which is available at https://github.com/kinvolk/cluster-api/commits/invidian/ignition-support. It requires more work though.

@MarcelMue
Copy link
Contributor

I personally think that the suggestion made by @invidian is quite sound. It would make life for ignition users a lot simpler and offers a nice basis for integration.
I am not very opinionated about the proposed extensions of kubeadmConfigSpec.

@invidian
Copy link
Member

Opened PR with changes proposed above: #4172.

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 11, 2021
@invidian
Copy link
Member

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 11, 2021
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 9, 2021
@randomvariable
Copy link
Member

/lifecycle frozen

@k8s-ci-robot k8s-ci-robot added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Aug 9, 2021
@randomvariable
Copy link
Member

/lifecycle active

@k8s-ci-robot k8s-ci-robot added lifecycle/active Indicates that an issue or PR is actively being worked on by a contributor. and removed lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. labels Aug 9, 2021
@randomvariable
Copy link
Member

/assign @invidian

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. and removed lifecycle/active Indicates that an issue or PR is actively being worked on by a contributor. labels Nov 7, 2021
@invidian
Copy link
Member

invidian commented Nov 7, 2021

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 7, 2021
@randomvariable
Copy link
Member

/lifecycle active

@k8s-ci-robot k8s-ci-robot added the lifecycle/active Indicates that an issue or PR is actively being worked on by a contributor. label Nov 8, 2021
@omniproc
Copy link

omniproc commented Nov 16, 2021

I just wanted to drop that Jinja templating is not exclusive for cloud-init. You can use it to template whatever you like. All you need is a working Jinja install and a Jinja formatted template.
So if you want to unify the bootstrap process but still allow two different bootstrap techs, cloud-init and ignition, using a single templateing engine unrelated to those bootstrappers might be a way to deal with that.
You could offer two default bootstrap templates, one based on cloud-init one based on ignition. Both need about the same variables passed from CAPI. So you could just have Jinja templates for both bootstrap favours, run Jinja and pass it the common variables. The output is the ready to use bootstrap yaml / json file.
And if the user wants more power to customize the bootstrap process: just replace the default Jinja template.
Even more: all the custom var generation that's currently done in Go in CAPV, e.g. GenerateName, could be offloaded as a Jinja filter so a user may access that filter at any time or write own filters if further customization is needed.
It's envsubst on steroids really.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/bootstrap Issues or PRs related to bootstrap providers kind/design Categorizes issue or PR as related to design. kind/feature Categorizes issue or PR as related to a new feature. lifecycle/active Indicates that an issue or PR is actively being worked on by a contributor.
Projects
None yet