Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Juju units get stuck in "Waiting for vault to be available" waiting state #69

Closed
gruyaume opened this issue Oct 26, 2023 · 1 comment · Fixed by #81
Closed

Juju units get stuck in "Waiting for vault to be available" waiting state #69

gruyaume opened this issue Oct 26, 2023 · 1 comment · Fixed by #81
Labels
bug Something isn't working

Comments

@gruyaume
Copy link
Collaborator

Describe the bug

Juju units get stuck in "Waiting for vault to be available" waiting state when kubernetes pods get rescheduled.

To Reproduce

  1. Deploy Vault
juju deploy vault-k8s --channel=edge --trust -n 4
  1. Wait for vault units to be in active/idle state
  2. Delete one of the pods
kubectl delete pod  vault-k8s-1 -n <your model>
  1. Run juju status
guillaume@potiron:~$ juju status
Model  Controller          Cloud/Region        Version  SLA          Timestamp
dem    microk8s-localhost  microk8s/localhost  3.1.6    unsupported  11:50:00-04:00

App        Version  Status   Scale  Charm      Channel  Rev  Address         Exposed  Message
vault-k8s           waiting      4  vault-k8s  edge      44  10.152.183.110  no       installing agent

Unit          Workload  Agent  Address      Ports  Message
vault-k8s/0*  active    idle   10.1.182.49         
vault-k8s/1   waiting   idle   10.1.182.42         Waiting for vault to be available
vault-k8s/2   active    idle   10.1.182.57         
vault-k8s/3   active    idle   10.1.182.38

Expected behavior

The expectation is for them to go back to an active/idle state automatically.

Environment

  • Charm / library version (if relevant):
  • Juju version (output from juju --version): 3.1.6
  • Cloud Environment: MicroK8s v1.27.6
  • Kubernetes version: 1.27
@gruyaume gruyaume added the bug Something isn't working label Oct 26, 2023
@gruyaume
Copy link
Collaborator Author

gruyaume commented Nov 5, 2023

This issue may be due to the fact that we use IP's instead of hostnames. When pods go down and back up, they'll come back wit a different IP. I'd recommend that we use K8s hostnames instead so that identity is not impacted by pods being rescheduled.

DanielArndt added a commit that referenced this issue Nov 23, 2023
Fixes #69 by using the FQDN to connect to the vault instance instead of
using the IP address.

Using the IP address caused an issue because the IP address would change
after a crash or removal of a pod. In turn, this would cause the TLS
certificate to no longer be valid, as the TLS certificate is validated
against the new IP address (but was only issued for the old IP address).
We could re-issue the certificate, but the certificate is also valid for
anything which uses the same FQDN which the new pod will share.

This change removes the IP address from the certificate, and relies on
the FQDN instead.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant