-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OpenStack etcd-manager authentication expires: no more backups after 1hr #9730
Comments
Perhaps @justinsb knows how to proceed (because of commits open kops: openstack and etcd-manager as well)? 👍 |
Unfortunately, I don't have swift available, but I suspect it is a matter of adding Then etcd-manager need to do an update on their part. But it is fairly trivial to push a custom etcd-manager image if you are able to make that change and build kops master + etcd-manager yourself. |
/kind bug |
Not exactly sure why my attempts at fixing this failed and yours works, but, yours works great @olemarkus ;-) 👍 Backups keep coming in now. Btw.. It's not exactly easy to run a custom etcd-manager: the kops cluster spec does not allow swapping the image, because the template is hardcoded in kops. This means you have to run a custom kops and roll the masters. |
I merged the etcd-manager patch, but I realized we should have fixed the code here first, because it is vendored into etcd-manager. I think the problem arises when we configuration authentication from env variables. I sent #9836 |
This has been fixed for the master branch |
@olemarkus: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Hi,
With reference to etcd-manager issue: kopeio/etcd-manager#337
1. What
kops
version are you running? The commandkops version
, will displaythis information.
1.17.9
2. What Kubernetes version are you running?
kubectl version
will print theversion if a cluster is running or provide the Kubernetes version specified as
a
kops
flag.1.17.1
3. What cloud provider are you using?
OpenStack
4. What commands did you run? What is the simplest way to reproduce this issue?
Create a kubernetes cluster on OpenStack using kops using defaults / 'the usual'.
5. What happened after the commands executed?
Cluster created. Your kops Swift container will be receiving etcd-manager backups every 15 minutes. After 1 hour, backups are not coming in anymore. Tailing etcd-manager-* pods will show a large amount of "Authentication failed" responses on read calls to the kop Swift container.
6. What did you expect to happen?
Etcd-manager backups keep coming in after 1 hour as well.
7. Please provide your cluster manifest. Execute
kops get --name my.example.com -o yaml
to display your cluster manifest.You may want to remove your cluster name and other sensitive information.
Defaults, no changes.
8. Please run the commands with most verbose logging by adding the
-v 10
flag.Paste the logs into this report, or in a gist and provide the gist link here.
N/A
9. Anything else do we need to know?
OpenStack authentication tokens are usually set to expire after a certain amount of time. This is when you should 'reauth'. For my OpenStack cloud provider this expiry value is set to 60 minutes. That's how I confirmed this as root cause.
I can see reauth logic in the code, but it seems not working properly.
etcd-manager commit 8faecdad725d05f9c7375461cbf4f3dbbec6e527 on Thu Oct 17 2019 "avoid reauthentication loops (#1746)" was pulled in by kops commit c67cdaa6e4a4cfc7eb0e7d1ae2c3920eea5daa97 on Sun May 31 2020 "Update vendored kops version to 1.17.0"
I've tried debugging and patching etcd-manager myself for a while now. Forcing NewSwiftClient upon every Swift call for example, or forcing Reauth to happen all the time.
Not exactly sure where to file the bug report, so I'll update this issue with a reference to etcd-manager's issue and back as well. At least this kops issue can be used to track etcd-manager fixes so they can be pulled in.
Any thoughts?
The text was updated successfully, but these errors were encountered: