Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to deploy a kubernetes manifest when relying on another resource #1380

Open
Quintasan opened this issue Aug 23, 2021 · 31 comments
Open

Comments

@Quintasan
Copy link

Since https://github.com/hashicorp/terraform-provider-kubernetes-alpha was archived and we can no longer comment on hashicorp/terraform-provider-kubernetes-alpha#123 then I'm just going to cross-post it to here so this doesn't get forgotten

@Quintasan Quintasan added the bug label Aug 23, 2021
@txomon
Copy link

txomon commented Sep 14, 2021

Can you clarify if you mean using a depends_on dependency as opposed to just referencing other resource for values?

@dirien
Copy link

dirien commented Sep 28, 2021

@Quintasan, same here with cert-manager helm chart and the kubernetes_manifest ClusterIssuer

@Raclaw-jl
Copy link

same thing with any CRD.

If you run your terraform code from scratch (with no resources existing yet) and you want your clusterIssuer to depend on the helm_release or null_resource that will install said CRD, that means you won't be able to plan or apply anything yet.

The point of using the kubernetes_manifest resource to me is to handle CRDs. it would be nice having a mechanism allowing the terraform to check the validity of the kubernetes_manifest but not the existence of the CRD if it relies on the application of another resource (certmanager) to install the CRDs. Even nicer would be to throw a warning in that case.

Right now the other possibility is to handle the deployment of a stack in two steps:

  1. install your CRDs
  2. install whatever runs on those CRDs
    Not ideal but it works.

@dirien
Copy link

dirien commented Sep 30, 2021

@Raclaw-jl, agree on you! This should be fixed, but Terraform had always problems with the CRD support... 😄

@txomon
Copy link

txomon commented Oct 1, 2021

This is a terraform limitation, not specific to kubernetes.

The limitation comes from not having all the data required at planning stage. Another example of this limitations would be planning new namespaces in a still-to-be-created k8s cluster.

Edit: The discussion previously shared didn't 100% match its scope with this known terraform limitation

@mo4islona
Copy link

mo4islona commented Oct 4, 2021

Hope it helps someone

Working hack is to use different modules for helm release and resources based on CRD

./modules/cert-manager/main.tf

resource "helm_release" "cert-manager" {
  name             = "cert-manager"
  repository       = "https://charts.jetstack.io"
  chart            = "cert-manager"
  namespace        = "ingress"
  create_namespace = true
  version          = "1.5.3"
  set {
    name  = "installCRDs"
    value = true
  }

  timeout = 150
}

./modules/certificates/main.tf

resource "kubernetes_manifest" "issuer" {
  manifest = {
    apiVersion = "cert-manager.io/v1"
    kind       = "ClusterIssuer"
   .... 
}    

main.tf

module "cert-manager" {
  source     = "./modules/cert-manager"
}

module "certificates" {
  depends_on = [module.cert-manager]
  source     = "./modules/certificates"
}

@Abhishekqwerty
Copy link

Abhishekqwerty commented Oct 12, 2021

Same for me. Cannot use depends_on for Kubernetes terraform resource. Awaiting for this feature.

@DaniJG
Copy link

DaniJG commented Nov 5, 2021

Also facing a similar issue, installing external-secrets-operator and then trying to setup a secret store CRD as part of the cluster bootstrapping.

In case it helps anyone, I ended up using a different workaround. You can wrap the CRD as a helm_release without creating your own chart. The idea is to leverage an existing chart like itscontained/raw which lets you define arbitrary YAML as part of the chart values:

# Instead of this ...
resource "kubernetes_manifest" "external_secrets_cluster_store" {
  depends_on = [helm_release.external_secrets]
  manifest = { ... }
}

# ... you can try using this
resource "helm_release" "external_secrets_cluster_store" {
  depends_on = [helm_release.external_secrets]
  name       = "external-secrets-cluster-store"
  repository = "https://charts.itscontained.io"
  chart      = "raw"
  version    = "0.2.5"
  values = [
    <<-EOF
    resources:
      - apiVersion: external-secrets.io/v1alpha1
        kind: ClusterSecretStore
        metadata:
          name: cluster-store
        spec:
          ... further contents of the ClusterSecretStore omitted ...
    EOF
  ]
}

@DBarthe
Copy link

DBarthe commented Nov 12, 2021

This issue needs to be fixed, but there is a workaround for those interested (mentioned here) : uses terraform-provider-kubectl which allow you to apply a yaml file without checking that type and apiVersion exist during plan stage

example from mentioned issue :

resource "helm_release" "cert_manager" {
  name       = "cert-manager"
  namespace  = "cert-manager"

  repository = "https://charts.jetstack.io"
  chart      = "cert-manager"
  version    = "v1.2.0"

  create_namespace = true

  values = [
    file("values/cert-manager.yaml")
  ]

  provisioner "local-exec" {
    command = "echo 'Waiting for cert-manager validating webhook to get its CA injected, so we can start to apply custom resources ...' && sleep 60"
  }
}

resource "kubectl_manifest" "cluster_issuer_letsencrypt_prod" {
  depends_on = [ helm_release.cert_manager ]
  yaml_body  = <<YAML
apiVersion: "cert-manager.io/v1"}
kind: ClusterIssuer
metadata:
  name: letsencrypt-prod
spec:
  acme:
    ...
YAML
}

tgeoghegan added a commit to divviup/prio-server that referenced this issue Jan 5, 2022
The `kubernetes_manifest` resource in provider `hashicorp/kubernetes`
has a known issue[1] where resources created in a manifest can't depend
on other resources that don't exist yet. To work around this, we instead
use `gavinbunney/kubectl`'s `kubectl_manifest` resource, which does not
have this problem because it uses a different mechanism for planning.

[1] hashicorp/terraform-provider-kubernetes#1380

Resolves #1088
tgeoghegan added a commit to divviup/prio-server that referenced this issue Jan 5, 2022
The `kubernetes_manifest` resource in provider `hashicorp/kubernetes`
has a known issue[1] where resources created in a manifest can't depend
on other resources that don't exist yet. To work around this, we instead
use `gavinbunney/kubectl`'s `kubectl_manifest` resource, which does not
have this problem because it uses a different mechanism for planning.

[1] hashicorp/terraform-provider-kubernetes#1380

Resolves #1088
@dc232
Copy link

dc232 commented Jan 20, 2022

Hi all i'm facing a similar issue when using the depends on flag with the stated kubernetes provider where my code is as follows

provider "kubernetes" {
  host                   = var.kube_host
  client_certificate     = base64decode(var.kube_client_certificate)
  client_key             = base64decode(var.kube_client_key)
  cluster_ca_certificate = base64decode(var.kube_cluster_ca_cert)
}
resource "kubernetes_manifest" "letsencrypt_issuer_staging" {

  manifest = yamldecode(templatefile(
    "${path.module}/manifests/letsencrypt-issuer.tpl.yaml",
    {
      "name"                      = "letsencrypt-staging"
      "namespace"                 = kubernetes_namespace.cert.metadata[0].name
      "email"                     = var.cloudflareemail
      "server"                    = "https://acme-staging-v02.api.letsencrypt.org/directory"
      "api_token_secret_name"     = kubernetes_secret_v1.example.metadata[0].name #this will be gotten from theh azure vault
      "api_token_secret_data_key" = keys(kubernetes_secret_v1.example.data)[0]
    }
  ))

  depends_on = [helm_release.cert_manager]
}

resource "kubernetes_manifest" "letsencrypt_issuer_production" {

  manifest = yamldecode(templatefile(
    "${path.module}/manifests/letsencrypt-issuer.tpl.yaml",
    {
      "name"                      = "letsencrypt-prod"
      "namespace"                 = kubernetes_namespace.cert.metadata[0].name
      "email"                     = var.cloudflareemail
      "server"                    = "https://acme-v02.api.letsencrypt.org/directory"
      "api_token_secret_name"     = kubernetes_secret_v1.example.metadata[0].name #this will be gotten from theh azure vault
      "api_token_secret_data_key" = keys(kubernetes_secret_v1.example.data)[0]
    }
  ))

  depends_on = [helm_release.cert_manager]
}

This seems to result in

cannot create REST client: no client config

using terraform 1.1.3

whilst the docs do say that kubeconfig does need to present to use kubernetes_manifest I want to understand why this is as other resources which get deployed to the cluster such as a storage class or namespace do not require a kubeconfig, rather it seems to be derived by the values

  host                   = var.kube_host
  client_certificate     = base64decode(var.kube_client_certificate)
  client_key             = base64decode(var.kube_client_key)
  cluster_ca_certificate = base64decode(var.kube_cluster_ca_cert)

as a result, I'm a little bit baffled by the dependency on the kubeconfig
also, I don't the provided config is compatible with kubectl_manifest resource
any help is greatly appreciated

Doc links referenced:
https://registry.terraform.io/providers/hashicorp/kubernetes/latest/docs/resources/manifest

@eminwux
Copy link

eminwux commented May 5, 2022

Hi, I bumped into the same issue while injecting an SSH key into a kubernetes_manifest resource:

resource "tls_private_key" "ssh_key" {
  algorithm = "RSA"
  rsa_bits  = 4096
}

resource "kubernetes_manifest" "jenkins_agent_sts" {
  depends_on = [
    tls_private_key.ssh_key
  ]
  manifest = yamldecode(templatefile(
    "${path.module}/manifests/jenkins_agent_sts.tpl.yaml",
    {
      "namespace"           = var.namespace
      "image"               = var.agent_image
      "tag"                 = var.agent_image_tag
      "ssh_public_key"      = tls_private_key.ssh_key.public_key_openssh
    }
  ))
}

The error I get:

Error: Failed to determine GroupVersionResource for manifest

  with kubernetes_manifest.jenkins_agent_sts,
 on terraform.tf line 146, in resource "kubernetes_manifest" "jenkins_agent_sts":
 146: resource "kubernetes_manifest" "jenkins_agent_sts" {

unmarshaling unknown values is not supported

@dc232
Copy link

dc232 commented May 5, 2022

@emilianofs
it looks like the error is coming as the template file is being decoded however the kubectl mainifest is being applied and this needs to be in yaml format so instead of

resource "tls_private_key" "ssh_key" {
  algorithm = "RSA"
  rsa_bits  = 4096
}

resource "kubernetes_manifest" "jenkins_agent_sts" {
  depends_on = [
    tls_private_key.ssh_key
  ]
  manifest = yamldecode(templatefile(
    "${path.module}/manifests/jenkins_agent_sts.tpl.yaml",
    {
      "namespace"           = var.namespace
      "image"               = var.agent_image
      "tag"                 = var.agent_image_tag
      "ssh_public_key"      = tls_private_key.ssh_key.public_key_openssh
    }
  ))
}

try

resource "tls_private_key" "ssh_key" {
  algorithm = "RSA"
  rsa_bits  = 4096
}

resource "kubernetes_manifest" "jenkins_agent_sts" {
  depends_on = [
    tls_private_key.ssh_key
  ]
  manifest = templatefile(
    "${path.module}/manifests/jenkins_agent_sts.tpl.yaml",
    {
      "namespace"           = var.namespace
      "image"               = var.agent_image
      "tag"                 = var.agent_image_tag
      "ssh_public_key"      = tls_private_key.ssh_key.public_key_openssh
    }
  )
}

The above should produce the result if not see below

you could also use the templatefile function in terraform
see
https://www.terraform.io/language/functions/templatefile
for more details in terms of integration with the existing resource definition posed above

an alterative can be found below where the data is rendered in memory

resource "tls_private_key" "ssh_key" {
  algorithm = "RSA"
  rsa_bits  = 4096
}
data "template_file" "jenkins_agent"{
  template = file("${path.module}/manifests/jenkins_agent_sts.tpl.yaml")
  vars = {
      "namespace"                      = "${var.namespace}"
      "image"                 = var.agent_image
      "tag"                     =  var.agent_image_tag
      "ssh_public_key"     = tls_private_key.ssh_key.public_key_openssh
    }
}
resource "kubernetes_manifest" "jenkins_agent_sts" {
  depends_on = [
    tls_private_key.ssh_key
  ]
  manifest = data.template_file.jenkins_agent.rendered
    }
  ))
}

@eminwux
Copy link

eminwux commented May 5, 2022

@dc232
Thanks for your reply. I have tried the two options and both of them return the same error: unmarshaling unknown values is not supported.

Removing yamldecode()

 Error: Failed to determine GroupVersionResource for manifest
 
   with kubernetes_manifest.jenkins_agent_sts,
   on terraform.tf line 163, in resource "kubernetes_manifest" "jenkins_agent_sts":
  163: resource "kubernetes_manifest" "jenkins_agent_sts" {
 
 unmarshaling unknown values is not supported

Using template_file resource:

 Error: Failed to determine GroupVersionResource for manifest
 
   with kubernetes_manifest.jenkins_agent_sts,
   on terraform.tf line 156, in resource "kubernetes_manifest" "jenkins_agent_sts":
  156: resource "kubernetes_manifest" "jenkins_agent_sts" {
 
 unmarshaling unknown values is not supported

The only way I can get it work is by defining the manifest in HCL instead of using the YAML template. There is a really useful tool to convert a YAML manifest to HCL k2tf: https://github.com/sl1pm4t/k2tf

@patsevanton
Copy link

is this issue present in the roadmap ?

@winston0410
Copy link

This issue still exists

@alexsomesan
Copy link
Member

Can any of the recent reporters please provide an example that causes this issue?

@dm3ch
Copy link

dm3ch commented Aug 25, 2022

Can any of the recent reporters please provide an example that causes this issue?

Unfortunately, I have no example in my saved snippets, but it seems I still remember my thoughts why it happens.

If I right remember to reproduce this you need two resources in your terraform project:

  1. CRD definition (could be kubernetes_manifest or part of helm release)
  2. Kubernetes_manifest creating manifest of CRD type defined in previous point

If I right understand the problem is when terraform refreshes state it tries to query kubernetes to check if CRD item (2nd point) exists. But k8s returns an error, cause CRD itself wasn't yet created (so API for this CRD does not exist yet).

P.S. Sorry for not providing an actual reproduction snippet, but I have no cluster to reproduce this right now
P.S.S. Just checked initial issue (it has reproduction snippet) hashicorp/terraform-provider-kubernetes-alpha#123 and it seems that maybe I'm wrong understood this issue or maybe my case is just one of consequent problems

@winston0410
Copy link

@alexsomesan

This module would not work, if kubernetes_manifest is used instead of kubectl_manifest

terraform {
  required_providers {
    kubernetes = {
      source = "registry.terraform.io/hashicorp/kubernetes"
      version = "2.12.1"
    }
    helm = {
      source = "hashicorp/helm"
      version = "2.6.0"
    }
    kubectl = {
      source  = "gavinbunney/kubectl"
      version = "1.14.0"
    }
  }
}

variable "namespace" {
  type = string
  description = "k8s namespace used in this module"
}

variable "email" {
  type        = string
  description = "Email address that Let's Encrypt will use to send notifications about expiring certificates and account-related issues to."
  sensitive   = true
}

variable "api_token" {
  type        = string
  description = "API Token for Cloudflare"
  sensitive   = true
}

resource "helm_release" "cert_manager" {
  name       = "cert-manager"
  namespace = var.namespace
  repository = "https://charts.jetstack.io"
  chart      = "cert-manager"
  version = "1.9.1"
  
  set {
    name  = "installCRDs"
    value = "true"
  }
}

# Make the API Token a secret available globally
resource "kubernetes_secret_v1" "letsencrypt_cloudflare_api_token_secret" {
  metadata {
    name      = "letsencrypt-cloudflare-api-token-secret"
    namespace = var.namespace
  }

  data = {
    "api-token" = var.api_token
  }
}

resource "kubectl_manifest" "letsencrypt_issuer_staging" {
  yaml_body = templatefile(
    "${path.module}/letsencrypt-issuer.tpl.yaml",
    {
      "name"                      = "letsencrypt-staging"
      "email"                     = var.email
      "server"                    = "https://acme-staging-v02.api.letsencrypt.org/directory"
      "api_token_secret_name"     = kubernetes_secret_v1.letsencrypt_cloudflare_api_token_secret.metadata.0.name
      "api_token_secret_data_key" = keys(kubernetes_secret_v1.letsencrypt_cloudflare_api_token_secret.data).0
    }
  )

  depends_on = [
    # Need to install the CRDs first
    helm_release.cert_manager
  ]
}

resource "kubectl_manifest" "letsencrypt_issuer_production" {
  yaml_body = templatefile(
    "${path.module}/letsencrypt-issuer.tpl.yaml",
    {
      "name"                      = "letsencrypt-production"
      "email"                     = var.email
      "server"                    = "https://acme-v02.api.letsencrypt.org/directory"
      "api_token_secret_name"     = kubernetes_secret_v1.letsencrypt_cloudflare_api_token_secret.metadata.0.name
      "api_token_secret_data_key" = keys(kubernetes_secret_v1.letsencrypt_cloudflare_api_token_secret.data).0
    }
  )

  depends_on = [
    # Need to install the CRDs first
    helm_release.cert_manager
  ]
}

@alexsomesan
Copy link
Member

@winston0410 thanks a lot for sharing the example!
I'll have a go a running it on my side, but one obvious thing already pops up. If the "cert_manager" helm_release resource installs CRDs then you cannot have CRs based on those CRDs be managed as manifests in the same apply operation. This is a known limitation of the provider and it has to do with having access to those CRDs' schemas at planning time (when they would in fact not be present if applied at the same time as the CR manifests).
Workaround for that is to split the operations into two applies, where the first one installs the CRDs and anything else other than CRs and the second apply deploys the CRs.

@Blunderchips
Copy link

Are there any plans to resolve this issue?

I am running into the same issue as @alexsomesan but, unfortunately, don't have the option to run multiple operations. One of the main reasons for going with terraform for our k8s setup was having a single tool for cloud and cluster setup.

@txomon
Copy link

txomon commented Feb 13, 2023

@Blunderchips not being able to do multi stage applies is a problem that you will find in many cases, such as when you provision a cluster through google cloud and then want to install something on it. Terraform just can't compute the final state, and that's the main reason for multi stage applies. I am running a setup like the one you mention and it works wonders.

The reason why the kubectl command is running is because it's not running any kind of server side checks to make sure the plan is correct. A possible suggestion would be to optionally disable validation, however this is something that I doubt will be prioritized because the main limitation is not the provider but rather the fact that a multi stage apply is needed.

@robertobado
Copy link

Also facing a similar issue, installing external-secrets-operator and then trying to setup a secret store CRD as part of the cluster bootstrapping.

In case it helps anyone, I ended up using a different workaround. You can wrap the CRD as a helm_release without creating your own chart. The idea is to leverage an existing chart like itscontained/raw which lets you define arbitrary YAML as part of the chart values:

# Instead of this ...
resource "kubernetes_manifest" "external_secrets_cluster_store" {
  depends_on = [helm_release.external_secrets]
  manifest = { ... }
}

# ... you can try using this
resource "helm_release" "external_secrets_cluster_store" {
  depends_on = [helm_release.external_secrets]
  name       = "external-secrets-cluster-store"
  repository = "https://charts.itscontained.io"
  chart      = "raw"
  version    = "0.2.5"
  values = [
    <<-EOF
    resources:
      - apiVersion: external-secrets.io/v1alpha1
        kind: ClusterSecretStore
        metadata:
          name: cluster-store
        spec:
          ... further contents of the ClusterSecretStore omitted ...
    EOF
  ]
}

I tried to migrate to kubernetes_manifest after kubectl_manifest started to behave flaky and producing inconsistent results provisioning ClusterIssuer for cert-manager. This is the only workaround I could find without requiring a separate run context. The itscontained chart is no longer available, I replaced it by https://artifacthub.io/packages/helm/wikimedia/raw

@mloskot
Copy link

mloskot commented Nov 27, 2023

Regarding the suggestions in #1380 (comment) and #1380 (comment), @DaniJG posted a nice self-contained explanation on Medium at Avoid the Terraform kubernetes_manifest resource.

@sharebear
Copy link

Unfortunately the kubectl_manifest resource seems to be broken for Kubernetes 1.27+ gavinbunney/terraform-provider-kubectl#270 leaving itscontained/raw as the only good solution right now.

@mloskot
Copy link

mloskot commented Nov 30, 2023 via email

@moss2k13
Copy link

we're using this one from dysnix:
https://artifacthub.io/packages/helm/dysnix/raw

@faust64
Copy link

faust64 commented Jan 10, 2024

Unfortunately the kubectl_manifest resource seems to be broken for Kubernetes 1.27+ gavinbunney/terraform-provider-kubectl#270 leaving itscontained/raw as the only good solution right now.

kubectl_manifests doesn't look maintained indeed.
And as i've been trying out the helm_release provider, there are multiple a caveat to that solution:

When some apply fails, for some reason. Object gets tainted. Next plan/apply would re-create objects. That is: delete everything, then create everything.
Plan doesn't show anything ... unless you enable some experiments / manifests feature at provider level ... which mostly shows you helm template output ... a good start, doesn't seem to validate objects against API, not aware of mutations, ...
In general, even for a couple files, changing a single input, ... plan is so slooow ... and apply worse. single chart, 3 objects, apply takes 5 minutes. With kubernetes_manifests + tf templating, less than a second.
Having applied a helm_release with terraform, try this: edit objects, on the cluster you manage, outside of terraform state. Then run a tf plan. "No changes"; Hooray ...

At which point, that helm provider is pretty much the worst thing i've ever used managing Kubernetes. And my company did write their own ungodly ansible-playbooks-wrapped-in-go terraform provider, ...

How come we don't have a single viable/feature-complete terraform provider managing Kubernetes?!

@tonybaltazar
Copy link

My issue is not related to having a CRD dependency but rather a simple variable interpolation within the yamldecode() encoded text in the manifest argument of kubernetes_manifest. It seems like the resource code for kubernetes_manifest needs to have some safeguards to properly handle terraform plans that rely on other terraform resources that haven't been created.

@tonybaltazar
Copy link

As mentioned on #1380 (comment) It seems like removing yamldecode() completely and just HCL instead of YAML for the kubernetes manifest works perfectly fine in our case. This is a good workaround for issue that most people are having here.

@gagbo
Copy link

gagbo commented Jun 25, 2024

Working hack is to use different modules for helm release and resources based on CRD

I just tried this (maybe incorrectly?), and it didn’t work for me, I guess that’s what the thumbs down meant, but I wasn’t sure. It really feels bad to have to use an extra workspace/module just to have the CRD applied before being able to use it.

This one though seems to pass the planning phase (but I have other demons to fight before being sure it also applies)

@mloskot
Copy link

mloskot commented Jun 25, 2024

@gagbo

This one though seems to pass the planning phase (but I have other demons to fight before being sure it also applies)

Yes, that one works - it works for me as I confirmed in #1380 (comment) See also my #1380 (comment) on @DaniJG 's solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests