Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rebase on upstream 1.4.0 or cherry-pick important fixes? #95

Open
jhoblitt opened this issue Jan 9, 2019 · 3 comments
Open

rebase on upstream 1.4.0 or cherry-pick important fixes? #95

jhoblitt opened this issue Jan 9, 2019 · 3 comments

Comments

@jhoblitt
Copy link

jhoblitt commented Jan 9, 2019

The kubernetes_deployment resource is fairly painful to use without hashicorp#210, and requires running tf multiple times for a deployment to converge. The fix is part of the upstream 1.4.0 release. Are there plans to rebase this module upon upstream "soonish" or is it preferred to try and cherry-pick/back-port critical fixes from upstream?

@sl1pm4t
Copy link
Owner

sl1pm4t commented Jan 9, 2019

Hi @jhoblitt, at this stage I don't intend to rebase this provider on upstream. This fork has diverged quite considerably from upstream and I don't see the reconciliation effort as worthwhile right now. I'm optimistic upstream will catch up with the features in this provider and this fork can be abandoned.
So, for the time being cherry-picking fixes from usptream will be the way to go.

Also, I'm curious how does issue hashicorp#210 manifest? In the 1.5 years of using kubernetes_deployment resource with this provider I've not seen the kind of issue you describe?! Our deploys all work in a single apply.

@jhoblitt
Copy link
Author

jhoblitt commented Jan 9, 2019

@sl1pm4t I also have been hoping that upstream will pickup most of the additional resource types but I'm in a bind as I need the ingress type and am trying to avoid having to maintain an internal fork.

In fairness, I haven't yet tried to cherry-pick erraform-providers#210 to see if it resolves the problem I'm seeing but it definitely isn't present with upstream 1.4.0 (note that changing between this fork and upstream also requires a minor change to the deployment syntax, which is a frustration).

An example of a failure is using a module to install tiller for the helm provider and then trying to use helm resources. This will fail on at least the first tf run as the the helm provider will try to talk to the tiller pod before the rs/pods have finished provisioning. If the docker image pull is slow or the k8s cluster is busy, sometimes a second tf run is too fast and will fail again. This is on top of a strange error from the kubernetes_deployment resource itself even though the deploy is properly created. Eg.

https://github.com/lsst-sqre/terraform-gitlfs/blob/800eae562de6f698936f5d5498ee01dfd55bb822/tf/main.tf#L59-L78

module "tiller" {
  source = "git::https://github.com/lsst-sqre/terraform-tinfoil-tiller.git//?ref=sl1pm4t-1.3.0"

  namespace       = "kube-system"
  service_account = "tiller"
  tiller_image    = "gcr.io/kubernetes-helm/tiller:v2.11.0"
}

provider "helm" {
  version = "~> 0.7.0"

  service_account = "${module.tiller.service_account}"
  namespace       = "${module.tiller.namespace}"
  install_tiller  = false

  kubernetes {
    host                   = "${module.gke.host}"
    cluster_ca_certificate = "${base64decode(module.gke.cluster_ca_certificate)}"
  }
}

First run error on a 1.11.5-gke.5 cluster:

Error: Error applying plan:

2 error(s) occurred:

* module.tiller.kubernetes_deployment.tiller_deploy: 1 error(s) occurred:

* kubernetes_deployment.tiller_deploy: an error on the server ("service unavailable") has prevented the request from succeeding
* module.nginx_ingress.helm_release.nginx_ingress: 1 error(s) occurred:

* helm_release.nginx_ingress: error creating tunnel: "could not find tiller"

Terraform does not automatically rollback in the face of errors.
Instead, your Terraform state file has been partially updated with
any resources that successfully completed. Please address the error
above and apply again to incrementally change your infrastructure.


[terragrunt] 2019/01/09 11:16:48 Detected 1 Hooks
[terragrunt] 2019/01/09 11:16:48 Hit multiple errors:
exit status 1

@jimmiebtlr
Copy link

Fairly sure that's a terraform limitation. Providers can't take data from resources and work the first run.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants