Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cluster creation fails with NodesNotReady status if network policy is calico #905

Closed
katrinsharp opened this issue Apr 9, 2019 · 8 comments
Assignees
Labels

Comments

@katrinsharp
Copy link

What happened:

Cluster was successfully created (kubernetes 1.12.6) with the same template that fails now. If I remove networkProfile portion of template the deployment finishes successfully, if I add it back - then it takes a long time to complete (~40 min) stuck on Create or Update Managed Cluster and it fails with following message:

Deployment failed. Correlation ID: <xxx>. {
  "status": "Failed",
  "error": {
    "code": "ResourceDeploymentFailure",
    "message": "The resource operation completed with terminal provisioning state 'Failed'.",
    "details": [
      {
        "code": "NodesNotReady",
        "message": ""
      }
    ]
  }
}

MC_* resource group is created successfully.

Downgrading to 1.12.6 and using calico doesn't help - it still has exactly same issue. If networkProfile part is omitted in ARM, cluster is create successfully.

Region: US East.

What you expected to happen:

Cluster deployment is successful.

How to reproduce it (as minimally and precisely as possible):

Use the following as part of the cluster arm template:

"networkProfile": {
          "networkPlugin": "azure",
          "networkPolicy": "calico"
        }

Anything else we need to know?:

Environment:

  • Kubernetes version (use kubectl version): 1.12.7
  • Size of cluster (how many worker nodes are in the cluster?) 3 B2S
  • General description of workloads in the cluster (e.g. HTTP microservices, Java app, Ruby on Rails, machine learning, etc.) Not relevant
  • Others:
@feiskyer
Copy link
Member

feiskyer commented Apr 9, 2019

this is bug for v1.12.x. versions, below that or azure network policy are still working. It should be fixed in next AKS release.

@jnoller
Copy link
Contributor

jnoller commented Apr 12, 2019

Pending 4-8-2019 release in queue.

@borqosky
Copy link

@jnoller does this https://github.com/Azure/AKS/blob/master/CHANGELOG.md#release-2019-04-08-hotfix contain the fix for it ?. If yes, has it already been deployed into WestEurope (I still see this error)?

@jnoller
Copy link
Contributor

jnoller commented Apr 14, 2019

It does not, that should roll out beginning tomorrow

@thatInfrastructureGuy
Copy link

I faced the same issue when upgrading the cluster from 1.12.6 to 1.12.7 with calico enabled. Now the cluster is in failed state. I was wondering if the patch will fix the existing clusters with (calico+aks1.12) or will it only fix the newly created clusters?

@jnoller
Copy link
Contributor

jnoller commented Apr 15, 2019

@thatInfrastructureGuy It will only patch new clusters, as this feature is in preview and should not be enabled on production systems.

@feiskyer
Copy link
Member

The issue has been fixed now. @thatInfrastructureGuy @katrinsharp Could you try to upgrade the cluster, or create a new one?

After upgrading, remember to do a cleanup first (this is required, see discussion on aks-engine here):

kubectl delete -f https://github.com/Azure/aks-engine/raw/master/docs/topics/calico-3.3.1-cleanup-after-upgrade.yaml

If there're other Pods (e.g. dashboard) in crashing state, delete the pod and let Kubernetes creating new ones could bring it back.

@feiskyer
Copy link
Member

feiskyer commented May 7, 2019

Already fixed now. Please upgrade or create a new cluster if you have a failed cluster with calico network policy.

@feiskyer feiskyer closed this as completed May 7, 2019
@ghost ghost locked as resolved and limited conversation to collaborators Jul 24, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

6 participants