Cluster creation fails with NodesNotReady status if network policy is calico #905

katrinsharp · 2019-04-09T05:38:34Z

What happened:

Cluster was successfully created (kubernetes 1.12.6) with the same template that fails now. If I remove networkProfile portion of template the deployment finishes successfully, if I add it back - then it takes a long time to complete (~40 min) stuck on Create or Update Managed Cluster and it fails with following message:

Deployment failed. Correlation ID: <xxx>. {
  "status": "Failed",
  "error": {
    "code": "ResourceDeploymentFailure",
    "message": "The resource operation completed with terminal provisioning state 'Failed'.",
    "details": [
      {
        "code": "NodesNotReady",
        "message": ""
      }
    ]
  }
}

MC_* resource group is created successfully.

Downgrading to 1.12.6 and using calico doesn't help - it still has exactly same issue. If networkProfile part is omitted in ARM, cluster is create successfully.

Region: US East.

What you expected to happen:

Cluster deployment is successful.

How to reproduce it (as minimally and precisely as possible):

Use the following as part of the cluster arm template:

"networkProfile": {
          "networkPlugin": "azure",
          "networkPolicy": "calico"
        }

Anything else we need to know?:

Environment:

Kubernetes version (use kubectl version): 1.12.7
Size of cluster (how many worker nodes are in the cluster?) 3 B2S
General description of workloads in the cluster (e.g. HTTP microservices, Java app, Ruby on Rails, machine learning, etc.) Not relevant
Others:

The text was updated successfully, but these errors were encountered:

feiskyer · 2019-04-09T08:31:15Z

this is bug for v1.12.x. versions, below that or azure network policy are still working. It should be fixed in next AKS release.

jnoller · 2019-04-12T13:18:32Z

Pending 4-8-2019 release in queue.

borqosky · 2019-04-14T15:31:26Z

@jnoller does this https://github.com/Azure/AKS/blob/master/CHANGELOG.md#release-2019-04-08-hotfix contain the fix for it ?. If yes, has it already been deployed into WestEurope (I still see this error)?

jnoller · 2019-04-14T21:32:15Z

It does not, that should roll out beginning tomorrow

thatInfrastructureGuy · 2019-04-15T16:40:39Z

I faced the same issue when upgrading the cluster from 1.12.6 to 1.12.7 with calico enabled. Now the cluster is in failed state. I was wondering if the patch will fix the existing clusters with (calico+aks1.12) or will it only fix the newly created clusters?

jnoller · 2019-04-15T16:49:31Z

@thatInfrastructureGuy It will only patch new clusters, as this feature is in preview and should not be enabled on production systems.

feiskyer · 2019-04-30T08:04:42Z

The issue has been fixed now. @thatInfrastructureGuy @katrinsharp Could you try to upgrade the cluster, or create a new one?

After upgrading, remember to do a cleanup first (this is required, see discussion on aks-engine here):

kubectl delete -f https://github.com/Azure/aks-engine/raw/master/docs/topics/calico-3.3.1-cleanup-after-upgrade.yaml

If there're other Pods (e.g. dashboard) in crashing state, delete the pod and let Kubernetes creating new ones could bring it back.

feiskyer · 2019-05-07T14:12:13Z

Already fixed now. Please upgrade or create a new cluster if you have a failed cluster with calico network policy.

jnoller added triage bug and removed triage labels Apr 11, 2019

feiskyer mentioned this issue Apr 17, 2019

Deployment failed MicrosoftDocs/azure-docs#28567

Closed

ams0 mentioned this issue Apr 24, 2019

Calico logging is set to INFO #925

Closed

sauryadas self-assigned this Apr 24, 2019

jakaruna-MSFT mentioned this issue Apr 30, 2019

Calico network policies not getting enforced MicrosoftDocs/azure-docs#30133

Closed

feiskyer closed this as completed May 7, 2019

ghost locked as resolved and limited conversation to collaborators Jul 24, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cluster creation fails with NodesNotReady status if network policy is calico #905

Cluster creation fails with NodesNotReady status if network policy is calico #905

katrinsharp commented Apr 9, 2019

feiskyer commented Apr 9, 2019 •

edited

Loading

jnoller commented Apr 12, 2019

borqosky commented Apr 14, 2019

jnoller commented Apr 14, 2019

thatInfrastructureGuy commented Apr 15, 2019

jnoller commented Apr 15, 2019

feiskyer commented Apr 30, 2019

feiskyer commented May 7, 2019

Cluster creation fails with NodesNotReady status if network policy is calico #905

Cluster creation fails with NodesNotReady status if network policy is calico #905

Comments

katrinsharp commented Apr 9, 2019

feiskyer commented Apr 9, 2019 • edited Loading

jnoller commented Apr 12, 2019

borqosky commented Apr 14, 2019

jnoller commented Apr 14, 2019

thatInfrastructureGuy commented Apr 15, 2019

jnoller commented Apr 15, 2019

feiskyer commented Apr 30, 2019

feiskyer commented May 7, 2019

feiskyer commented Apr 9, 2019 •

edited

Loading