-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cannot use terraform and gossip-based cluster at the same time #2990
Comments
I can reproduce the problem using kops 1.7.5 |
If you run |
I also had the same reported issue. I took @pastjean advice and re-run |
Nevermind. I have to create the cluster with |
Closing! |
Bug is still valid for me - and @pastjean solution is not working for me. I'm using an s3 remote store, here are my versions:
to reproduce, I do the same steps as @simnalamburt reported. I then run Checking the s3 store, in the folder Here is the update command output:
As you can see, the log reports that the certificate is generated. I've tried doing a |
@sybeck2k, we have also experienced this issue as of a few hours ago. You will need to run It is still a workaround solution atm, but we have found it to be repeatably successful. |
This should be fixed in master. If someone wants to test master or wait for the 1.8 beta release |
@jlaswell thanks a lot! I can confirm your workaround works for kops 1.7.1 . |
Not sure about what is used when. I would bet that looking through some of the source code best for that but, I do know that you can look in the s3 bucket used for the state store if you are using AWS. We've perused that a few times to get an understanding. |
@chrislovecnm I can still reproduce this in
|
@shashanktomar I would assume the work flow is
What does rolling update show? If would be a bug that the update does not create the same hash in the tf code that we are doing in the direct target code path. |
@chrislovecnm I can reproduce this in
Here is the rolling update output:
|
I reproduced this in I can confirm that running the following fixed it: |
More detail please |
I am having the same problem here with (see version info below), the work around does indeed work but it takes way too long to complete - it would be great if this could be resolved. kops version
|
As above, this is still broken in: Generally, as I understand it, the workaround flow is: |
The fix using rolling-update did not work for me.
|
@mosho1 did you export the config? |
@mbolek yeah, it did, though I have already brought down that cluster and used |
Fyi: Still broken in 1.9.0 |
in 1.9.1 too. I am running a gossip-based cluster ( and Was able to work around this issue by following comments above. # assume that you already applied terraform once and ELB for kube api is generated on AWS
# make sure that export kubecfg before applying terraform so that LC is configured with exported cfg.
kops export kubecfg --name $NAME
kops update cluster $NAME --target=terraform --out=.
terraform plan
terraform apply
kops rolling-update cluster $NAME --cloudonly --force --yes In case of continuous failing you might add |
Who wants to do a rolling-update straight after provisioning a cluster? kops should provision the correct server entries in the kubectl config file in the first place - Given that kops creates a dns entry just fine with a sensible name e.g. api.cluster.mydomain.net (as an alias record to the elb/alb), why isnt kops export kubecfg using the alias record in the server and not the elb? This alias record is already in the certificate as OP says, and if kops generates a kubectl config entry using a server: https://[alias record], then it works just fine, and no rolling-updates or post-shenanigans are needed. This should work out of the box |
Ok... so I though I had something but it seems the issue persists. You need to export config to fix the API server endpoint and you need to roll master to fix the SSL cert |
Another workaround that does not require waiting to roll the master(s) is to create the ELB, then update the cluster and then do the rest of the terraform apply. Steps are:
|
@mbolek the issue indeed persists |
@fejta-bot: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Still seeing this issue. Kops - Version 1.18.2 |
Hi Team, Still i am getting certificate issue. i am using Kops verion v1.19 |
Currently experiencing this Will try using |
Tried Still capturing the /reopen |
@kmichailg: You can't reopen an issue/PR unless you authored it or you are a collaborator. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/reopen |
@olemarkus: Reopened this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Rotten issues close after 30d of inactivity. Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
@fejta-bot: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Tested with Still capturing the .k8s.local instead of the correct ELB address. Workarounds doesn't seem to work.
Tried re-exporting the ELB endpoint:
Doing this makes the master node seem to be stuck on the Also tried creating the gateway and ELB first before using
I am using |
Still experiencing this regarding gossip-based clusters. Abandoning the infrastructure-as-code for now (via terraform), will just deploy via kops only. Hopefully you reopen this for tracking. Thank you! |
Issue still persists. Great feature, but not usable at the moment. |
/reopen |
@alen-z: You can't reopen an issue/PR unless you authored it or you are a collaborator. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/remove-lifecycle rotten |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-contributor-experience at kubernetes/community. |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /close |
@k8s-triage-robot: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
If you create a cluster with both terraform and gossip options enabled, all
kubectl
commands shall fail.How to reproduce the error
My environment
Setting up the cluster
Spoiler Alert: Creating the self-signed certificate before creating actual Kubernetes cluster is the root cause of this issue. Please continue to see why.
Scenario 1. Looking up non-existent domain
This is basically because of erroneous
~/.kube/config
file. If you run thekops create cluster
with both terraform and gossip options enabled, you'll get wrong~/.kube/config
file.Let's manually correct that file. Or, you'll get good config file if you explicitly export the configuration once again.
kops export kubecfg kops-temp.k8s.local --state s3://kops-temp
Then the non-existent domain will be replaced with the ELB of master nodes' DNS name.
And you'll be ended up to the scenario 2 when you retry.
Scenario 2. Invalid certificate
This is simply because the DNS name of ELB is not included in the certificate. This scenario occurs only when you create the cluster with terraform option being enabled. If you try to create the cluster only with gossip option not using the terraform target, the self-signed certificate will properly contain the DNS name of ELB.
(Sorry for the Korean, this is the list of DNS alternative names of certificate)
The only way to workaround this problem is forcing "kops-temp.k8s.local" to point proper IP address via manually editing
/etc/hosts
, which is undesired for many people.I'm not very familiar with Kops internal, but I expect a huge change to properly fix this issue. Maybe using AWS Certificate Manager can be a solution. (#834) Any ideas?
The text was updated successfully, but these errors were encountered: