Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Restarting the installation process can cause certificate problems if K8s was not fully configured #2669

Closed
10 tasks done
romsok24 opened this issue Oct 6, 2021 · 5 comments

Comments

@romsok24
Copy link
Contributor

romsok24 commented Oct 6, 2021

Describe the bug
When reruning the epicli installation process one can fail with this error:

failed to connect to {https://127.0.0.1:2379  <nil> 0 <nil>}. Err :connection error: desc = "transport: authentication handshake failed: x509: certificate signed by unknown authority (possibly because of \"crypto/rsa: verification error\" while trying to verify candidate authority certificate \"etcd-ca\")"

How to reproduce
Steps to reproduce the behavior:

  1. execute epicli init ... (with params)
  2. if the installation from step on 1 will fail in between of kubernetes component creation - execute the step no 1 again

Expected behavior
On epicli preflight phase there should be a task to check for the existence of /etc/kubernetes/pki/ folder
and it should be cleaned if exists to ensure that all the certs from a brand new installation run will be signed with the most current CA cert.

Config files

Environment

  • OS: Ubuntu 18.04.4 LTS

epicli version: 1.0.1


DoD checklist

  • Changelog updated (if affected version was released)
  • COMPONENTS.md updated / doesn't need to be updated
  • Automated tests passed (QA pipelines)
    • apply
    • upgrade
  • Case covered by automated test (if possible)
  • Idempotency tested
  • Documentation updated / doesn't need to be updated
  • All conversations in PR resolved
  • Backport tasks created / doesn't need to be backported
@atsikham
Copy link
Contributor

atsikham commented Oct 6, 2021

@romsok24 is that a constant issue? On which step it failed during the first run? May be an investigation needed when it occurs.

We have a possibility to re-generate certificates. Someone could have custom validity period, I think pki folder should not be cleaned up each time for apply.

@romsok24
Copy link
Contributor Author

romsok24 commented Oct 7, 2021

Failing ansible task is: TASK [kubernetes_common : Update in-cluster configuration]

IMO - the failure is not related to the code but to the fact, that - as I wrote - part of the certs in /etc/kubernetes/pki/ are signed with the cA cert from the previous run and the other with the current one.

Cleaning this folder would be probably a good solution for this.

@przemyslavic
Copy link
Collaborator

Seems to be related to #1175.

@atsikham
Copy link
Contributor

atsikham commented Jan 4, 2022

My proposal is to check this task after #2828 as it might be related.

@atsikham atsikham changed the title [BUG] Restarting the installation process can cause certificate problems if the k8s was not fully configured [BUG] Restarting the installation process can cause certificate problems if K8s was not fully configured Jan 24, 2022
@atsikham atsikham self-assigned this Jan 24, 2022
@przemyslavic
Copy link
Collaborator

Tested multiple times apply command after cancelling the build at different stage. It went smoothly on re-apply.
The task on which the first build failed may be relevant here. I would close this task and re-open if it occurs again.

@przemyslavic przemyslavic self-assigned this Feb 4, 2022
@seriva seriva closed this as completed Feb 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants