Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenStack etcd-manager authentication expires: no more backups after 1hr #9730

Closed
kciredor opened this issue Aug 11, 2020 · 7 comments · Fixed by kopeio/etcd-manager#338
Closed
Labels
area/provider/openstack Issues or PRs related to openstack provider kind/bug Categorizes issue or PR as related to a bug.

Comments

@kciredor
Copy link

kciredor commented Aug 11, 2020

Hi,

With reference to etcd-manager issue: kopeio/etcd-manager#337

1. What kops version are you running? The command kops version, will display
this information.

1.17.9

2. What Kubernetes version are you running? kubectl version will print the
version if a cluster is running or provide the Kubernetes version specified as
a kops flag.

1.17.1

3. What cloud provider are you using?

OpenStack

4. What commands did you run? What is the simplest way to reproduce this issue?

Create a kubernetes cluster on OpenStack using kops using defaults / 'the usual'.

5. What happened after the commands executed?

Cluster created. Your kops Swift container will be receiving etcd-manager backups every 15 minutes. After 1 hour, backups are not coming in anymore. Tailing etcd-manager-* pods will show a large amount of "Authentication failed" responses on read calls to the kop Swift container.

6. What did you expect to happen?

Etcd-manager backups keep coming in after 1 hour as well.

7. Please provide your cluster manifest. Execute
kops get --name my.example.com -o yaml to display your cluster manifest.
You may want to remove your cluster name and other sensitive information.

Defaults, no changes.

8. Please run the commands with most verbose logging by adding the -v 10 flag.
Paste the logs into this report, or in a gist and provide the gist link here.

N/A

9. Anything else do we need to know?

OpenStack authentication tokens are usually set to expire after a certain amount of time. This is when you should 'reauth'. For my OpenStack cloud provider this expiry value is set to 60 minutes. That's how I confirmed this as root cause.

I can see reauth logic in the code, but it seems not working properly.

etcd-manager commit 8faecdad725d05f9c7375461cbf4f3dbbec6e527 on Thu Oct 17 2019 "avoid reauthentication loops (#1746)" was pulled in by kops commit c67cdaa6e4a4cfc7eb0e7d1ae2c3920eea5daa97 on Sun May 31 2020 "Update vendored kops version to 1.17.0"

I've tried debugging and patching etcd-manager myself for a while now. Forcing NewSwiftClient upon every Swift call for example, or forcing Reauth to happen all the time.

Not exactly sure where to file the bug report, so I'll update this issue with a reference to etcd-manager's issue and back as well. At least this kops issue can be used to track etcd-manager fixes so they can be pulled in.

Any thoughts?

@kciredor
Copy link
Author

Perhaps @justinsb knows how to proceed (because of commits open kops: openstack and etcd-manager as well)? 👍

@olemarkus
Copy link
Member

Unfortunately, I don't have swift available, but I suspect it is a matter of adding authOption.AllowReauth = true after this line.
https://github.com/kubernetes/kops/blob/master/util/pkg/vfs/swiftfs.go#L49

Then etcd-manager need to do an update on their part. But it is fairly trivial to push a custom etcd-manager image if you are able to make that change and build kops master + etcd-manager yourself.

@olemarkus
Copy link
Member

/kind bug
/area provider/openstack

@k8s-ci-robot k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. area/provider/openstack Issues or PRs related to openstack provider labels Aug 11, 2020
@kciredor
Copy link
Author

Not exactly sure why my attempts at fixing this failed and yours works, but, yours works great @olemarkus ;-) 👍 Backups keep coming in now.

Btw.. It's not exactly easy to run a custom etcd-manager: the kops cluster spec does not allow swapping the image, because the template is hardcoded in kops. This means you have to run a custom kops and roll the masters.

@justinsb
Copy link
Member

I merged the etcd-manager patch, but I realized we should have fixed the code here first, because it is vendored into etcd-manager. I think the problem arises when we configuration authentication from env variables. I sent #9836

@olemarkus
Copy link
Member

This has been fixed for the master branch
/close

@k8s-ci-robot
Copy link
Contributor

@olemarkus: Closing this issue.

In response to this:

This has been fixed for the master branch
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/provider/openstack Issues or PRs related to openstack provider kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants