Skip to content
This repository has been archived by the owner on Jan 11, 2023. It is now read-only.

Custom VNET support for Windows #1767

Closed
IvanCaro opened this issue Nov 14, 2017 · 15 comments
Closed

Custom VNET support for Windows #1767

IvanCaro opened this issue Nov 14, 2017 · 15 comments

Comments

@IvanCaro
Copy link

IvanCaro commented Nov 14, 2017

Is this a request for help?:

---NO

Is this an ISSUE or FEATURE REQUEST? (choose one):

---ISSUE

What version of acs-engine?:

---Version: canary
GitCommit: 8db990b
GitTreeState: clean

Orchestrator and version (e.g. Kubernetes, DC/OS, Swarm)
Kubernetes

What happened:
{
"error": {
"code": "InvalidTemplate",
"message": "Unable to process template language expressions for resource '/subscriptions/b21c32c2-4e59-433a-8c53-ebd4729d33ab/resourceGroups/Kubernetes-Prod-9997/providers/Microsoft.Compute/virtualMachines/11742k8s9000' at line '1' and column '46171'. 'The template variable 'subnet' is not found. Please see https://aka.ms/arm-template/#variables for usage details.'"
}
}

What you expected to happen:
{
"error": {
"code": "InvalidTemplate",
"message": "Unable to process template language expressions for resource '/subscriptions/b21c32c2-4e59-433a-8c53-ebd4729d33ab/resourceGroups/Kubernetes-Prod-9997/providers/Microsoft.Compute/virtualMachines/11742k8s9000' at line '1' and column '46171'. 'The template variable 'subnet' is not found. Please see https://aka.ms/arm-template/#variables for usage details.'"
}
}

How to reproduce it (as minimally and precisely as possible):
{
"error": {
"code": "InvalidTemplate",
"message": "Unable to process template language expressions for resource '/subscriptions/b21c32c2-4e59-433a-8c53-ebd4729d33ab/resourceGroups/Kubernetes-Prod-9997/providers/Microsoft.Compute/virtualMachines/11742k8s9000' at line '1' and column '46171'. 'The template variable 'subnet' is not found. Please see https://aka.ms/arm-template/#variables for usage details.'"
}
}

Anything else we need to know:
{
"error": {
"code": "InvalidTemplate",
"message": "Unable to process template language expressions for resource '/subscriptions/b21c32c2-4e59-433a-8c53-ebd4729d33ab/resourceGroups/Kubernetes-Prod-9997/providers/Microsoft.Compute/virtualMachines/11742k8s9000' at line '1' and column '46171'. 'The template variable 'subnet' is not found. Please see https://aka.ms/arm-template/#variables for usage details.'"
}
}

@jackfrancis
Copy link
Member

Hi @IvanCaro, could you provide the api model that you used as input for template generation?

@chweidling
Copy link

chweidling commented Nov 16, 2017

I have the same problem. My deployment file is this:

{
  "apiVersion": "vlabs",
  "properties": {
    "orchestratorProfile": {
      "orchestratorType": "Kubernetes",
      "orchestratorRelease": "1.8"
    },
    "masterProfile": {
      "count": 1,
      "dnsPrefix": "myprefix",
      "vmSize": "Standard_D2s_v3",
      "vnetSubnetId": "/subscriptions/mySubscriptionId/resourceGroups/CHW-Test/providers/Microsoft.Network/virtualNetworks/kubernetes-application-gw-jf/subnets/kubsub",
      "firstConsecutiveStaticIP": "10.1.255.5"
    },
    "agentPoolProfiles": [
      {
        "name": "backend",
        "count": 3,
        "osType": "Windows",
        "vmSize": "Standard_D4s_v3",
        "vnetSubnetId": "/subscriptions/mySubscriptionId/resourceGroups/CHW-Test/providers/Microsoft.Network/virtualNetworks/kubernetes-application-gw-jf/subnets/kubsub",
        "availabilityProfile": "AvailabilitySet"
	  }
    ],
    "windowsProfile": {
      "adminUsername": "",
      "adminPassword": ""
    },
    "linuxProfile": {
      "adminUsername": "",
      "ssh": {
        "publicKeys": [
          {
            "keyData": "ssh-rsa XXX"
          }
        ]
      }
    },
    "servicePrincipalProfile": {
      "clientId": "",
      "secret": ""
    }
  }
}

Remark: When I remove all custom VNET entries, the deplyoment works well.

@jackfrancis
Copy link
Member

Nothing jumps out to me as being obviously wrong, there, but it's possible the vnet references don't match up with the firstConsecutiveStaticIP address.

I tested that the current code at HEAD works using the provided example vnet workflow. That consists of doing something like what this script suggests:

https://github.com/Azure/acs-engine/blob/master/examples/vnet/k8s-vnet-predeploy.sh

and then deploying with this api model:

https://github.com/Azure/acs-engine/blob/master/examples/vnet/kubernetesvnet.json

and finally, after cluster deployment, updating the vnet's route table like this script suggests:

https://github.com/Azure/acs-engine/blob/master/examples/vnet/k8s-vnet-postdeploy.sh

(Those links above are the source of our E2E tests, btw.)

I performed a manual E2E run using the above, with the only modification being "orchestratorRelease": "1.8" to match your deployment type.

Hopefully this information is helpful!

If we're confident there's actually a bug in the code, it would be valuable to prove that your api model and workflow are successful with a prior release of acs-engine. After that point we could triage and try to investigate the source of regression.

@chweidling
Copy link

It looks like this is a problem for Windows agents only. Linux agents are deployed as expected.

When I look at the generated azuredeploy.json file, then for Windows agents, then I find a customData property, which has the following content:

[base64(concat('<#\n    .SYNOPSIS\n        Provisions VM as a Kubernetes agent.\n\n    .DESCRIPTION\n        Provisions VM as a Kubernetes agent.\n#>\n[CmdletBinding(DefaultParameterSetName=\"Standard\")]\nparam(\n    [string]\n    [ValidateNotNullOrEmpty()]\n    $MasterIP,\n\n    [parameter()]\n    [ValidateNotNullOrEmpty()]\n    $KubeDnsServiceIp,\n\n    [parameter(Mandatory=$true)]\n    [ValidateNotNullOrEmpty()]\n    $MasterFQDNPrefix,\n\n    [parameter(Mandatory=$true)]\n    [ValidateNotNullOrEmpty()]\n    $Location,\n\n    [parameter(Mandatory=$true)]\n    [ValidateNotNullOrEmpty()]\n    $AgentKey,\n\n    [parameter(Mandatory=$true)]\n    [ValidateNotNullOrEmpty()]\n    $AzureHostname,\n\n    [parameter(Mandatory=$true)]\n    [ValidateNotNullOrEmpty()]\n    $AADClientId,\n\n    [parameter(Mandatory=$true)]\n    [ValidateNotNullOrEmpty()]\n    $AADClientSecret\n)\n\n$global:CACertificate = \"',variables('caCertificate'),'\"\n$global:AgentCertificate = \"',variables('clientCertificate'),'\"\n$global:DockerServiceName = \"Docker\"\n$global:KubeDir = \"c:\\k\"\n$global:KubeBinariesSASURL = \"',variables('kubeBinariesSASURL'),'\"\n$global:KubeBinariesVersion = \"',variables('kubeBinariesVersion'),'\"\n$global:WindowsTelemetryGUID = \"',variables('windowsTelemetryGUID'),'\"\n$global:KubeletStartFile = $global:KubeDir + \"\\kubeletstart.ps1\"\n$global:KubeProxyStartFile = $global:KubeDir + \"\\kubeproxystart.ps1\"\n\n$global:TenantId = \"',variables('tenantID'),'\"\n$global:SubscriptionId = \"',variables('subscriptionId'),'\"\n$global:ResourceGroup = \"',variables('resourceGroup'),'\"\n$global:SubnetName = \"',variables('subnetName'),'\"\n$global:MasterSubnet = \"',variables('subnet'),...))]

There is an access to a variable subnet (variables('subnet')), which is nowhere defined in azuredeploy.json.

After I patched the generated azuredeploy.json file, so that it contains a variable

"subnet": "/subscriptions/<subscriptionId>/resourceGroups/CHW-Test/providers/Microsoft.Network/virtualNetworks/kubernetes-application-gw-jf/subnets/kubsub"

then the Windows agents are deployed as expected.

@jackfrancis
Copy link
Member

This is a known limitation of the current custom VNET implementation. Our published example api models for this feature are Linux-only. See:

https://github.com/Azure/acs-engine/blob/master/examples/vnet/kubernetesvnet.json

@JiangtianLi has agreed to work on filling this gap. We'll submit a distinct PR that delivers custom VNET support for Windows agents.

Thanks for your patience!

@jackfrancis
Copy link
Member

@JiangtianLi A basic acceptance criteria for this implementation is a new examples/vnet/kubernetesvnet-win.json api model that has a Windows agent profile instead of a Linux one. We can then add that to our E2E coverage to ensure that we continue to support this flavor of cluster deployment going forward.

@jackfrancis jackfrancis changed the title bug with last version Custom VNET support for Windows Nov 17, 2017
@JiangtianLi
Copy link
Contributor

@jackfrancis Thanks for investigating this. I will fix for windows.

@chweidling
Copy link

Just another remark: My deployment example above worked well acs-engine 0.8.x. I successfully deployed a cluster with Windows agents into an existing VNET.

It looks like the feature is broken now.

@JiangtianLi
Copy link
Contributor

I am working on the fix here: #1810

cc @madhanrm @dineshgovindasamy FYI

@jay-stillman
Copy link

@JiangtianLi do you have an estimate to when this will be resolved / released? This really is causing us some issues in pushing acs into production

@patrick-motard
Copy link

It would be really nice if this limitation was specified in the documentation for custom vnets. Would have saved me a lot of time.

@JiangtianLi
Copy link
Contributor

@patrick-motard Sorry about that. I will add to documentation.

@lastcoolnameleft
Copy link

I've encountered a similar issue. It would be helpful if acs-engine would fail prior to the template generation since Custom Vnet + Windows agent nodes is not an accepted configuration at this time.

@jackfrancis
Copy link
Member

@lastcoolnameleft I agree that's a gap, I've filed this to fix:

#2168

FYI @JiangtianLi

@stale
Copy link

stale bot commented Mar 9, 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contribution. Note that acs-engine is deprecated--see https://github.com/Azure/aks-engine instead.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

8 participants