Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

azurerm_container_app - Cannot deploy container with ingress enabled #20435

Closed
1 task done
btbaetwork opened this issue Feb 13, 2023 · 37 comments · Fixed by #24042
Closed
1 task done

azurerm_container_app - Cannot deploy container with ingress enabled #20435

btbaetwork opened this issue Feb 13, 2023 · 37 comments · Fixed by #24042

Comments

@btbaetwork
Copy link

Is there an existing issue for this?

  • I have searched the existing issues

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Terraform Version

1.3.8

AzureRM Provider Version

3.43.0

Affected Resource(s)/Data Source(s)

azurerm_container_app

Terraform Configuration Files

# this is just the example script taken from https://registry.terraform.io/providers/hashicorp/azurerm/latest/docs/resources/container_app with the "ingress"-block added
terraform {
  required_providers {
  }
}
provider "azurerm" {
  features {}
}


resource "azurerm_resource_group" "example" {
  name     = "example-resources"
  location = "West Europe"
}

resource "azurerm_log_analytics_workspace" "example" {
  name                = "acctest-01"
  location            = azurerm_resource_group.example.location
  resource_group_name = azurerm_resource_group.example.name
  sku                 = "PerGB2018"
  retention_in_days   = 30
}

resource "azurerm_container_app_environment" "example" {
  name                       = "Example-Environment"
  location                   = azurerm_resource_group.example.location
  resource_group_name        = azurerm_resource_group.example.name
  log_analytics_workspace_id = azurerm_log_analytics_workspace.example.id
}


resource "azurerm_container_app" "example" {
  name                         = "example-app"
  container_app_environment_id = azurerm_container_app_environment.example.id
  resource_group_name          = azurerm_resource_group.example.name
  revision_mode                = "Single"

  template {
    container {
      name   = "examplecontainerapp"
      image  = "mcr.microsoft.com/azuredocs/containerapps-helloworld:latest"
      cpu    = 0.25
      memory = "0.5Gi"
    }
  }

  ingress {
      external_enabled = true
      target_port = 80
      traffic_weight {
        percentage = 100
      }
  }
}

Debug Output/Panic Output

azurerm_container_app.example: Still creating... [9m50s elapsed]
azurerm_container_app.example: Still creating... [10m0s elapsed]
azurerm_container_app.example: Still creating... [10m10s elapsed]
azurerm_container_app.example: Still creating... [10m20s elapsed]
╷
│ Error: creating Container App (Subscription: "7eac1563-570f-4dc5-bcc0-2f057ad0cff0"
│ Resource Group Name: "example-resources"
│ Container App Name: "example-app"): polling after CreateOrUpdate: Code="ContainerAppOperationError" Message="Failed to provision revision for container app 'example-app'. Error details: Operation expired."
│
│   with azurerm_container_app.example,
│   on main.tf line 31, in resource "azurerm_container_app" "example":
│   31: resource "azurerm_container_app" "example" {
│
│ creating Container App (Subscription: "7eac1563-570f-4dc5-bcc0-2f057ad0cff0"
│ Resource Group Name: "example-resources"
│ Container App Name: "example-app"): polling after CreateOrUpdate: Code="ContainerAppOperationError" Message="Failed to provision revision for container app 'example-app'. Error details: Operation expired."

Expected Behaviour

Container App should be successfully provisioned and be publicly reachable using the FQDN

Actual Behaviour

  • Creation of the Container App times out after ~10 minutes.
  • Looking at the Portal, the Container App is created but no container is inside.
  • Terraform is in a state where it doesn`t know about the already created ContainerApp and cannot even destroy any more (manual cleanup necessary)

Steps to Reproduce

just apply the config above and wait for it to timeout :-)

Important Factoids

No response

References

No response

@btbaetwork btbaetwork added the bug label Feb 13, 2023
@github-actions github-actions bot removed the bug label Feb 13, 2023
@xiaxyi
Copy link
Contributor

xiaxyi commented Feb 14, 2023

Thanks @btbaetwork for raising this issue. I need to find out the minimum timeout for environment creation. Will update once confirmed.

@ggeorgovassilis
Copy link

If I may add another data point: container app env is created after about 12min. Container app creation starts immediately after that. I'm creating one container app with the new resource provider:

resource "azurerm_container_app" "containerapp-helloworld" {
  name                         = "containerapp-helloworld-${var.sfx}"
  container_app_environment_id = azurerm_container_app_environment.containerapp-environment.id
  resource_group_name          = azurerm_resource_group.rgcontainers.name
  revision_mode                = "Single"

  template {
    container {
      name   = "simple-hello-world-container"
      image  = "mcr.microsoft.com/azuredocs/containerapps-helloworld:latest"
      cpu    = 0.25
      memory = "0.5Gi"
    }
    min_replicas = 1
    max_replicas = 1
  }
  ingress {
    external_enabled           = true
    allow_insecure_connections = true
    target_port                = 80
    traffic_weight {
      percentage = 100
    }
  }
}

and one the old azapi way:

resource "azapi_resource" "containerapp-apache" {
  type      = "Microsoft.App/containerapps@2022-03-01"
  name      = "containerapp-apache-${var.sfx}"
  parent_id = azurerm_resource_group.rgcontainers.id
  location  = azurerm_resource_group.rgcontainers.location

  body = jsonencode({
    properties = {
      managedEnvironmentId = azurerm_container_app_environment.containerapp-environment.id
      configuration = {
        ingress = {
          external : true,
          allowInsecure : true,
          targetPort : 80
        },

      }
      template = {
        containers = [
          {
            image = "registry.hub.docker.com/library/httpd:2.4"
            name  = "apache-container"
            resources = {
              cpu    = 0.25
              memory = "0.5Gi"
            }
          }
        ]
        scale = {
          minReplicas = 1,
          maxReplicas = 1
        }
      }
    }

  })
  #  depends_on = [azapi_resource.containerapp-environment]
}

The first fails with a timeout (although the resource is created fine), the second succeeds.

@jpinsolle-bc
Copy link

Same here, to be more precise, in my case, a creation without the ingress part works. And after that if you update your container app with an ingress it works too (no timeout). Timeout occurs when you send for the first time a definition with an ingress.

@btbaetwork
Copy link
Author

btbaetwork commented Feb 15, 2023

Can confirm what @jpinsolle-betclic wrote. Also these timings may be of interest:

  • creating the Container App Environment (without any app inside): ~ 8 minutes
  • trying to create the Container App inside of the pre-created Environment: times out after ~ 10 minutes (so no "cummulative" timeout with the Environment creation)
  • adding the Container App without an ingress: ~20 seconds
  • adding the ingress to the pre-created Container App: ~20 seconds (so no timeout because something really takes that much time)
  • adding a Container App to the environment (that had been created the azurerm-provider) using the AzAPI-provider works including the ingress in one go and also takes only ~20seconds

Apart from that i noticed the following (maybe related, maybe different error):
When the Container App is up and running with the ingress activated (following @jpinsolle-betclic s procedure), every following "terraform apply" will update the already existing app despite no changes to the terraform code have been made:

Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols:
  ~ update in-place

Terraform will perform the following actions:

  # azurerm_container_app.example will be updated in-place
  ~ resource "azurerm_container_app" "example" {
        id                            = "/subscriptions/7eac1563-570f-4dc5-bcc0-2f057ad0cff0/resourceGroups/example-resources/providers/Microsoft.App/containerApps/example-app"
        name                          = "example-app"
        tags                          = {}
        # (8 unchanged attributes hidden)

      ~ ingress {
            # (5 unchanged attributes hidden)

          ~ traffic_weight {
              ~ percentage      = 0 -> 100
                # (1 unchanged attribute hidden)
            }
          - traffic_weight {
              - latest_revision = false -> null
              - percentage      = 100 -> null
              - revision_suffix = "nqhtr2u" -> null
            }
        }

        # (1 unchanged block hidden)
    }

Plan: 0 to add, 1 to change, 0 to destroy.

This is not the case when the AzApi-provider is used

@ggeorgovassilis: you wrote that "The first fails with a timeout (although the resource is created fine)" - is there really a running container inside your Container App that is reachable from "outside"? The resource itself looks fine for me also at first glance, but in reality doesn`t run anything...

@ggeorgovassilis
Copy link

is there really a running container inside your Container App

@btbaetwork I spoke hastily and with little knowledge. In fact, the container is created but ingres returns 404.

@dcoj
Copy link

dcoj commented Feb 16, 2023

I can also confirm the same findings - creation in two steps (without ingress then adding) is fast ~20-30s in all cases, but creating with ingress enabled leads to timeout after around 10-15mins and deployment failure usually with revision 'unknown' as the status and the whole azure container apps ui is somewhat unresponsive until the app is deleted. On some occasions I had to delete the app via CLI as the portal wouldn't load the app state.

This was on a new empty container apps env (created ~1hr previous) and the container apps env had VNET integration via the infrastructure_subnet_id property.

@ThijSlim
Copy link

ThijSlim commented Feb 20, 2023

I got the same issue, with revision mode set to Multiple I do see the container deployed in the Azure Portal, still the revision status is stuck on "In Progress" and terraform timeouts:

resource "azurerm_container_app" "containerapp-helloworld" {
  revision_mode                = "Multiple"
}

@kopaka808
Copy link

I can observe the same behaviour, thanks to this thread I can at least pinpoint it to the ingress block - I was desperately focusing on the registry block, assuming my ACR is not accessible to pull the image from. Thanks for providing this workaround with the 2-step deployment!

Once I include the ingress block during initial deployment of my container_app resource (as opposed to adding it to an existing container_app resource in a separate run), I run into the same timeout mentioned above (ignoring the settings of the timeout block btw), resulting in a Container App that has no container, is not included in my TF state file and therefore can only be deleted manually from the Azure Portal.

@trichling
Copy link

trichling commented Feb 24, 2023

Hi all,

I can confirm the bug with the configuration stated above. However I was experimenting with some more properties on the ingress and I could manage to get it working with this ingress block:

  ingress {
    external_enabled           = true
    target_port                = 80
    traffic_weight {
      latest_revision = true
      percentage      = 100
    }
  }

Befor this setup I only provided the required parameters as in @ggeorgovassilis configuration above. After adding latest_revision parameter it suddenly started working again. Also creation was very fast, 4:50 minutes for the environment and just 16 seconds for each container app.

@andmos
Copy link

andmos commented Feb 24, 2023

Can also confirm. Creating container_app with ingress block times out, while creating without and then adding the block works.

@ggeorgovassilis
Copy link

@trichling thanks, that works. @andmos does "... and then adding the block" mean two deployments? For me it worked with a single deployment with the simple change Tobias proposed.

@andmos
Copy link

andmos commented Feb 24, 2023

@ggeorgovassilis ah did not try with more parameters in the block. Will give it a go, I got it working with a minimal block and two runs of terraform apply.

@dcoj
Copy link

dcoj commented Feb 26, 2023

can also confirm this works with traffic_weight block set to:

traffic_weight {
  percentage = 100
  latest_revision = true
}

@stewartbeck
Copy link

I can confirm that i'm seeing the behavior. From the logs i see that when it tries to assign the traffic weight, it's not including the revision hash so it fails trying to find the revision.

If you remove the ingress it'll succeed in creation. Then you can modify the terraform and put the ingress back and this time it succeeds, but updating the traffic percentage to 100 will still fail.

Here is a screen cap of the logs when it fails:
image

@stewartbeck
Copy link

Adding a bit more context: Setting LatestRevision = true works and allows it to successfully set the traffic to 100%.

Looking into the code at in the helpers/container_apps.go you see:

if !v.LatestRevision { traffic.RevisionName = pointer.To(fmt.Sprintf("%s--%s", appName, v.RevisionSuffix)) }

that is why setting latest revision works. Seems the client isnt correctly populating the RevisionSuffix after it gets generated.

@franhoey
Copy link

Thank you all, this has moved me on an validated I'm not going crazy, but after adding the latest_revision=false to the traffic_weight, I now get this error

polling after CreateOrUpdate: Future#WaitForCompletion: context has been cancelled: StatusCode=0 -- Original Error: context deadline exceeded

@franhoey
Copy link

Today when I've returned to this it's all working, I'm assuming the "context deadline exceeded" was a temporary issue and adding latest_revision = true has solved the issue

jarpsimoes added a commit to jarpsimoes/terraform-provider-azurerm that referenced this issue Mar 28, 2023
@ryantk
Copy link

ryantk commented Mar 29, 2023

I can confirm setting last_revision = true fixes the above timeout issue for me.

@jdubois
Copy link

jdubois commented Mar 29, 2023

I also confirm. If you want a complete working example, you can have a look at this code I just finished: https://github.com/microsoft/NubesGen/blob/099ce8616a1b762ff9d0016fcdb13b7e0037ac47/terraform/modules/container-apps/main.tf#L97

@JeremyKeusters
Copy link

Can also confirm that this issue is happening because of the ingress block. Big thanks to @trichling for sharing the fix with latest_revision = true.

@simonecoppini
Copy link

wow... I have been fighting with some problme from about 4 days now...

I think there are still different problems with this kind of resource.... I confirm all has been said and I add a new problem I thought was about VS 2022 but now I think it is about the terraform module.

I cannot deploy my projet in the app container form VS 2022... I get the error

Failed to push the docker image to your azure container registry for use in your azure container app.

But, the image is correctly deployed to the registry but it is not used in the app container.

In fact I can deploy with no problem the image only to the container registry, and I can deploy with no problem to an app container I created manually... only I cannot deploy to the container app created with terraform.

Anyone has the same problem? How do you solve it?

@dhilgarth
Copy link

@simonecoppini This seems unrelated. Please create a new issue for that

@sbaia13
Copy link

sbaia13 commented Jun 28, 2023

Hi, i had the same problem recently. This happen when revision_mode is set to Single. This terraform ressource worked for me :

resource "azurerm_container_app" "example" {
count = length(var.environnements-name)
name = "keycloak-${var.environnements-name[count.index]}"
container_app_environment_id = azurerm_container_app_environment.example.id
resource_group_name = azurerm_resource_group.example.name
revision_mode = "Multiple" ### Single restrict the deployment to only the ultimate version(revisions) of the container, Multiple allow splitting traffic betwen multipe versions

template {
container {
# Container definition with environement variables to connect with the Cosmos DB for postgresql
name = "keycloak-02"
image = "${azurerm_container_registry.example.login_server}/keycloak:latest"
cpu = 0.25
memory = "0.5Gi"

}
## Defining max and min number of containers
max_replicas = 3
min_replicas = 1

}

ingress {
## Defining the ingress with the port to use to tagert the postgresql, revision mode must be set to Multiple
transport = "auto"
target_port = 8080
external_enabled = "true" ## true for testing only
traffic_weight {
latest_revision = true ## If true trafic weight routed to the new revision (this parameter is required to set an ingress)
percentage = 100
## Labels can be used to split trafic between multiple revisions for testing purpose
}
}

depends_on = [ azurerm_cosmosdb_postgresql_cluster.example, null_resource.import-image,]
}

Hope this will help !

@mpereira-ae
Copy link

Another confirmation of latest_revision = true in the ingress block making it work, thanks @trichling!

@rcskosir
Copy link
Contributor

Thanks for taking the time to submit this issue. It looks like this has been resolved as of the suggestion of latest_revision = true in the ingress block making it work. As such, I am going to mark this issue as closed.

@stewartbeck
Copy link

This should definitely NOT be closed. Latest_revision should not be required to be true - this is a hack. There is a clear bug in the code where the revision suffix is not getting set correctly.

@rcskosir rcskosir reopened this Jul 19, 2023
@rcskosir
Copy link
Contributor

Thank you for the clarification @stewartbeck. I have reopened this issue. I appreciate the quick response.

@dvdr00t
Copy link

dvdr00t commented Jul 28, 2023

Wow! Spent the whole day trying to figure out why the Apply took more than 10 mins and then failed out of nowhere. Many thanks @trichling to find that hack! For me (West Europe), adding latest_revision = true as a workaround made the job. Time for Apply also decreases to ~1 min to create the environment and ~ 1 min to deploy two different container apps.

Hopefully this gets fixed soon, the documentation marking the parameter as Optional is definitely misleading.

@joeizy
Copy link

joeizy commented Jul 30, 2023

FWIW - add me to the list of people who had it "working" then added ingress, thought it was fine b/c it worked, later it broke and took extensive time to figure it out. This bug is evasive and a bit gnarly b/c it's not obvious at the time you make the change that it will later break and when it does break, the error message gives no indication or direction on the issue or how to fix.

@darren-rose
Copy link

setting latest_revision = true also resolved the issue for me

@js-jslog
Copy link

I'll be one more person reporting exactly the same issue, exactly the same results with the workaround and exactly the same thoughts about what a time sink this problem is. The 10 minute wait time for testing of ideas combined with the non-specificity of the error is really painful. Even if the error message were updated to indicate that this workaround exists that would be a massive improvement. Thanks to the team for what they do.

FWIW - add me to the list of people who had it "working" then added ingress, thought it was fine b/c it worked, later it broke and took extensive time to figure it out. This bug is evasive and a bit gnarly b/c it's not obvious at the time you make the change that it will later break and when it does break, the error message gives no indication or direction on the issue or how to fix.

@zioproto
Copy link
Contributor

it is not clear to me why if revision_mode = "Single" the traffic_weight block is mandatory in the Terraform provider.

However, when creating a Container App with the portal, inspecting the json object I see:

        "configuration": {
            "secrets": null,
            "activeRevisionsMode": "Single",
            "ingress": {
                "fqdn": "test2.xxxxxxxxx-xxxxxx.eastus.azurecontainerapps.io",
                "external": true,
                "targetPort": 8080,
                "exposedPort": 0,
                "transport": "Auto",
                "traffic": [
                    {
                        "weight": 100,
                        "latestRevision": true
                    }
                ],

It seems the traffic block with latestRevision: true is there by default in the object created by the portal

Cc: @lonegunmanb @jiaweitao001

@clemlesne
Copy link

Relates to #21022, #22432, #21242, #23289.

@klemmchr
Copy link

@clemlesne this issue has a regression or wasn't fixed properly in the first place. When omitting latestRevision in a single ingress block the container app is stuck during creation and the operation will be canceled after 10 minutes.

@alexdresko
Copy link

@klemmchr I don't know if it has anything to do with azurerm_contaier_app. We're using azapi_resource's Microsoft.App/containerApps@2022-03-01 and started noticing a 10-minute timeout recently when creating container apps.

That being said, the JSON we use to create the container app using azapi_resource _does not have a latestRevision section. Nothing in Terraform Cloud indicates that the problem is related to latestRevision, but I might try to add that and see if it fixes our problem.

Copy link

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Apr 23, 2024
@magodo
Copy link
Collaborator

magodo commented Sep 14, 2024

👏Folks, I've got some update after connecting to the service team.

In principle, the ingress.traffic (API model) expects the latestRevision is true, otherwise, the revisionName is non-empty, and conform to the format {containerAppName}--{revisionSuffix}.

Whilst, there is an issue in the service side, in that for Single revision mode, if you didn't specify latestRevision to true during creation, even if a valid revisionName will cause the API call hangs, resulting into a timeout error at the client side. Though this isn't an issue for Multiple mode, see #27221.

Speaking of the implementation of this provider, if the latest_revision is false, the provider will then set the revisionName in the API, by combining the appName with the revision_suffix:

if !v.LatestRevision {
traffic.RevisionName = pointer.To(fmt.Sprintf("%s--%s", appName, v.RevisionSuffix))
}

This aligns with the expectation of this API.

I've updated the validation logic in #27396 to align with above. Hoping this clarifies the behavior and the existing service known issue a bit.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet