Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enabling InsecureSkipSecretsManager in AWSMachine CloudInit results in "invalid secret backend" errors #3394

Closed
dlmather opened this issue Apr 7, 2022 · 6 comments · Fixed by #3400
Labels
kind/bug Categorizes issue or PR as related to a bug. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Milestone

Comments

@dlmather
Copy link
Contributor

dlmather commented Apr 7, 2022

/kind bug

What steps did you take and what happened:
[A clear and concise description of what the bug is.]
Set CloudInit.InsecureSkipSecretsManager to true in AWSMachineTemplates for our control-plane machines. Machines booted correctly, but during control plane rolls, machines fail to be properly deleted by the capa-controller-manager, with errors like the following:

E0407 22:35:27.676718       1 awsmachine_controller.go:552] controllers/AWSMachine "msg"="unable to delete secrets" "error"="invalid secret backend"
E0407 22:35:27.677549       1 controller.go:317] controller/awsmachine "msg"="Reconciler error" "error"="invalid secret backend" "name"="aws-us-west-2-vos1-control-plane-20220407-xfdzt" "namespace"="default" "reconciler group"="infrastructure.cluster.x-k8s.io" "reconciler kind"="AWSMachine"

return nil, errors.New("invalid secret backend")

What did you expect to happen:
Preferably no errors on deletion, or the configuration is considered invalid if it is in fact an issue.

Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]

Environment:

  • Cluster-api-provider-aws version: 1.4.0
  • Kubernetes version: (use kubectl version): 1.19.10
  • OS (e.g. from /etc/os-release): Fedora CoreOS 33-20210328
@k8s-ci-robot k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. needs-priority needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Apr 7, 2022
@sedefsavas
Copy link
Contributor

Thanks for reporting this.

Will need to debug this further but I see in the code path that this is a valid issue.

Also, we are missing an e2e test with this flag on, that is why tests could not catch this.

/triage accepted
/priority important-soon

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority labels Apr 7, 2022
@dlmather
Copy link
Contributor Author

dlmather commented Apr 8, 2022

Okay, we might take a stab at raising a fix ourselves, just because we'll likely want to have this working on our clusters in the shorter term. Will reference any PR with this issue if we find success there :)

@sedefsavas
Copy link
Contributor

Are you working on a fork?

Do you mind sharing the use case where using any of the secret manager backends not an option?

@dlmather
Copy link
Contributor Author

dlmather commented Apr 8, 2022

I can't remember the exact issue, but we ran into some cloud init bootstrapping issues when spinning up other K8s 1.23.4 clusters and found the suggestion here: https://cluster-api-aws.sigs.k8s.io/topics/userdata-privacy.html#how-cluster-api-secures-tls-secrets. That appeared to resolve the issue and we moved on. Now I've come back to older clusters to upgrade and ran into the deletion issue at that time.

I'll spin off a fork starting tomorrow.

@sedefsavas
Copy link
Contributor

If you are interested in working on this, we can make a patch release right after that is merged.

@nehatomar12
Copy link

@dlmather with insecureSkipSecretsManager: false (or the default behaviour ) in cloudInit

are you getting Exception: [Errno 2] No such file or directory: '/etc/secret-userdata.txt' for url: file:///etc/secret-userdata.txt for cloud-init ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
4 participants