Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Secure bootstrapping for capz machines #2189

Closed
wants to merge 7 commits into from

Conversation

shysank
Copy link
Contributor

@shysank shysank commented Mar 23, 2022

What type of PR is this?
/kind feature

What this PR does / why we need it:

This PR provides an option to secure bootstrap data for capz machines. This is similar to the way capa works right now. On a high level, it works as follows: If SecureBootstrapEnabled is set to true,

  1. Creates a azure key vault during cluster reconciliation
  2. During machine reconciliation, saves the bootstrap data as secrets in the vault created in (1). The data is split into chunks to make sure the secret doesn't exceed azure keyvault limits.
  3. Creates a cloudinit boothook script that: fetches the secrets created in (2) -> write them to a file -> restart cloudinit with the file created as source
  4. As the vm is set to provision, set custom data to the script created in (3)

If SecureBootstrapEnabled is set to false (or not set), there is no change with the current behaviour.

Limitations:

  1. This works only with UserAssigned identity. This is because the identity requires key vault administrator roles for reading and deleting secrets. System assigned identity is not possible since we can only assign role after vm provisioning.
  2. Cluster identity also requires key vault administrator roles for creating vault and secrets.
  3. SecureBootstrapEnabled works for the whole cluster, and not for individual machines (unlike capa). This is because we need to create a KeyVault for managing secrets, and doing it at the machine level will be too cumbersome. We could probably add an override at machine level to disable secure bootstrapping.

** Credits to @randomvariable and the CAPA team for the original implementation **

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #915 (partially)

Special notes for your reviewer:

I have split the pr into logical commits for easier review.

Please confirm that if this PR changes any image versions, then that's the sole change this PR makes.

TODOs:

  • squashed commits
  • includes documentation
  • adds unit tests

Release note:

Secure bootstrapping for capz machines

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/feature Categorizes issue or PR as related to a new feature. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Mar 23, 2022
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To complete the pull request process, please ask for approval from shysank after the PR has been reviewed.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@shysank shysank force-pushed the secure_bootstrapping branch from 8c3c498 to 0e6debc Compare March 23, 2022 19:09
@k8s-ci-robot
Copy link
Contributor

k8s-ci-robot commented Mar 23, 2022

@shysank: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-cluster-api-provider-azure-apidiff 0e6debc link false /test pull-cluster-api-provider-azure-apidiff

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@shysank
Copy link
Contributor Author

shysank commented Mar 23, 2022

/test pull-cluster-api-provider-azure-e2e-exp

@shysank
Copy link
Contributor Author

shysank commented Mar 24, 2022

cc @CecileRobertMichon

@CecileRobertMichon
Copy link
Contributor

@shysank I haven't gone through the code yet but just from looking at the PR description, my initial questions are:

Perhaps it would make sense to start with a design proposal since this is such a big change/feature.

@shysank
Copy link
Contributor Author

shysank commented Mar 24, 2022

The goal of this pr is to provide a way to improve the security of bootstrapping mechanism that leverage cloud init only; bring capz on par with capa in terms of bootstrap data security albeit with some limitations (as mentioned in the pr description). This is not a generic solution to support other cloud initializers, perhaps we could implement UserdDataResolver for each cloud initializer, but we need more foundational changes to our design as mentioned in #5294 for a more robust solution . IMO, this is more of a stop gap solution until we are able separate kubeadm and machine bootstrapping.

Perhaps it would make sense to start with a design proposal since this is such a big change/feature.

I'll work on creating a design proposal.

@k8s-ci-robot
Copy link
Contributor

@shysank: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 25, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 23, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jul 23, 2022
@jackfrancis
Copy link
Contributor

@CecileRobertMichon are you able to assess if we should try to salvage this PR?

@CecileRobertMichon
Copy link
Contributor

I believe @sonasingh46 was looking into it https://kubernetes.slack.com/archives/CEX9HENG7/p1658756031736239

Regardless, we should start with a proposal as noted above and open a new PR once we're ready.

/close

@k8s-ci-robot
Copy link
Contributor

@CecileRobertMichon: Closed this PR.

In response to this:

I believe @sonasingh46 was looking into it https://kubernetes.slack.com/archives/CEX9HENG7/p1658756031736239

Regardless, we should start with a proposal as noted above and open a new PR once we're ready.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@sonasingh46
Copy link
Contributor

I will work to raise a proposal PR for this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Secure sensitive bootstrap data
6 participants