Skip to content
This repository has been archived by the owner on Jan 11, 2023. It is now read-only.

Enabling Azure CNI for Windows #2174

Merged
merged 12 commits into from
Feb 7, 2018
Merged

Enabling Azure CNI for Windows #2174

merged 12 commits into from
Feb 7, 2018

Conversation

saiyan86
Copy link
Contributor

What this PR does / why we need it:

This PR enables Azure CNI for Windows nodes. This PR allows Windows/hybrid kubernetes clusters to take advantage of Azure CNI.

Which issue this PR fixes

fixes #1504

Special notes for your reviewer:

This PR requires Azure CNI v1.0.2.

@@ -310,7 +310,9 @@ func setOrchestratorDefaults(cs *api.ContainerService) {
o.KubernetesConfig.EtcdVersion = DefaultEtcdVersion
}
if a.HasWindows() {
o.KubernetesConfig.NetworkPolicy = DefaultNetworkPolicyWindows
if o.KubernetesConfig.NetworkPolicy == "" {
o.KubernetesConfig.NetworkPolicy = DefaultNetworkPolicyWindows
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does Windows need to default to "none"? Originally it was default to "none" because Windows doesn't support azure cni.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in const.go by setting DefaultNetworkPolicyWindows to "azure".

@@ -699,7 +699,7 @@ func (a *Properties) validateNetworkPolicy() error {
}

// Temporary safety check, to be removed when Windows support is added.
if (networkPolicy == "calico" || networkPolicy == "azure") && a.HasWindows() {
if (networkPolicy == "calico") && a.HasWindows() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: no need to have parenthesis here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is parenthesis around (networkPolicy == "calico") removed?

if ($global:AzureCNIEnabled) {
$KubeletCommandLine += $global:AzureCNIKubeletOptions
} else {
$KubeletCommandLine += " --network-plugin=cni --cni-bin-dir=`$global:CNIPath --cni-conf-dir `$global:CNIPath\config"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is azure cni and wincni mutually exclusive? $KubeletCommandLine sets wincni in a few lines above and should be removed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch. I removed it.

@@ -208,6 +275,8 @@ c:\k\kubelet.exe --hostname-override=`$global:AzureHostname --pod-infra-containe
`$global:CNIConfig = "$global:CNIConfig"
`$global:HNSModule = "$global:HNSModule"
`$global:VolumePluginDir = "$global:VolumePluginDir"
`$global:MaxPods="$global:MaxPods"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is MaxPods used?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not yet, but could be used in the future. Should I remove it for now?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, please remove it.


# Download Azure VNET CNI plugins.
# Mirror from https://github.com/Azure/azure-container-networking/releases
$zipfile = [Io.path]::Combine("$global:AzureCNIDir","azure-vnet.zip")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: extra space.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

Set-AzureNetworkPolicy()
{
# Azure VNET network policy requires tunnel (hairpin) mode because policy is enforced in the host.
Set-VnetPluginMode "tunnel"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this overwrite bridge mode before?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I removed line 207.

@@ -280,6 +349,13 @@ Update-CNIConfig(`$podCIDR, `$masterSubnetGW)

try
{

if (`$global:NetworkPolicy -eq "azure") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since with azure cni, kubelet start is much simpler, it seems to me we can write a much simpler kubeStartStr in case of azure network policy instead of return here without using rest of the cmdlet.

@@ -71,6 +71,16 @@ $global:CNIConfig = [Io.path]::Combine($global:CNIPath, "config", "`$global:Netw
$global:HNSModule = [Io.path]::Combine("$global:KubeDir", "hns.psm1")

$global:VolumePluginDir = [Io.path]::Combine("$global:KubeDir", "volumeplugins")
#azure cni
$global:NetworkPolicy = "{{WrapAsVariable "networkPolicy"}}"
$global:VNetCNIPluginsURL = "{{WrapAsVariable "vnetCniWindowsPluginsURL"}}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is vnetCniWindowsPluginsURL defined?

Copy link
Contributor Author

@saiyan86 saiyan86 Jan 31, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In parts/k8s/kubernetesparams.t line 535

@@ -278,6 +343,16 @@ Update-CNIConfig(`$podCIDR, `$masterSubnetGW)
Add-Content -Path `$global:CNIConfig -Value (ConvertTo-Json `$configJson -Depth 20)
}

"@

if ($global:NetworkPolicy -eq "azure") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think kubeStartStr for azure cni doesn't need most of the script above?

$global:AzureCNIDir = [Io.path]::Combine("$global:KubeDir", "azurecni")
$global:AzureCNIBinDir = [Io.path]::Combine("$global:AzureCNIDir", "bin")
$global:AzureCNIConfDir = [Io.path]::Combine("$global:AzureCNIDir", "netconf")
$global:AzureCNIKubeletOptions = " --network-plugin=cni --cni-bin-dir=$global:AzureCNIBinDir --cni-conf-dir=$global:AzureCNIConfDir"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is nice. Can you do me a favor to add a WindowsCNIKubeletOptions for windows cni option to make it more readable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

@msorby
Copy link

msorby commented Feb 5, 2018

I had an issue with getting a mixed cluster (Linux and windows agent pool) working, kube dns not starting with networkPolicy:none. As described in this: Issue 2100

Built acs-engine with this pullrequest, and networkPolicy:azure, cluster camre right up 👍

#azure cni
$global:NetworkPolicy = "{{WrapAsVariable "networkPolicy"}}"
$global:VNetCNIPluginsURL = "{{WrapAsVariable "vnetCniWindowsPluginsURL"}}"
$global:MaxPods = "{{WrapAsVariable "maxPods"}}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove maxPods and add it back when it will be used?

Add-Content -Path `$global:CNIConfig -Value (ConvertTo-Json `$configJson -Depth 20)
}
if ($global:NetworkPolicy -eq "azure") {
$kubeStartStr += @"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use a try-catch block to catch exception.

$kubeStartStr += @"
Write-Host "NetworkPolicy azure, starting kubelet."
$KubeletCommandLine
return 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need return.

@@ -699,7 +699,7 @@ func (a *Properties) validateNetworkPolicy() error {
}

// Temporary safety check, to be removed when Windows support is added.
if (networkPolicy == "calico" || networkPolicy == "azure") && a.HasWindows() {
if (networkPolicy == "calico") && a.HasWindows() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is parenthesis around (networkPolicy == "calico") removed?

JiangtianLi
JiangtianLi previously approved these changes Feb 6, 2018
Copy link
Contributor

@JiangtianLi JiangtianLi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@jackfrancis
Copy link
Member

@saiyan86 permission to rebase/force push (so we can run this through E2E with recent improvements)?

@cypres
Copy link

cypres commented Feb 6, 2018

Will this also fix #2027 ? @JiangtianLi

@jackfrancis
Copy link
Member

@JiangtianLi I rebased this to test using recent E2E improvements. Wanted to clarify: do we want this PR to turn on Azure CNI by default for Windows?

@JiangtianLi
Copy link
Contributor

@duedal That issue is unrelated.

@JiangtianLi
Copy link
Contributor

@jackfrancis Yes, we want Azure CNI by default for Windows. We want to run deep test once I bring in the windows patch. For now, we can make sure e2e test pass.
@saiyan86 You mentioned yesterday to hold for your testing?

@jackfrancis
Copy link
Member

Just ran a successful round of Windows E2E tests backed by Azure CNI. I advocate we merge and start doing some manual testing.

@jackfrancis
Copy link
Member

@saiyan86 Will wait for your go-ahead to merge.

@saiyan86
Copy link
Contributor Author

saiyan86 commented Feb 7, 2018

Thanks @jackfrancis, @JiangtianLi Can you sign off if there is no regression and azure cni is not default.
In our testing of azure-cni, most of the scenarios are working, except we think we are hitting the issue similar to the issue where service-ip connectivity was broken: BUG 15422319 [RS3] Backport: Create remote endpoint on L2Bridge fails

May be we need to follow up with windows team to understand what is going on there for service vip.l

@jackfrancis
Copy link
Member

Azure CNI is default for vlabs only. In practice this means acs-engine CLI users. @JiangtianLi for further thought on if that is O.K. given the bug identified by @saiyan86

@jackfrancis
Copy link
Member

As per @saiyan86 we're changing the default back to kubenet. We'll merge this PR as-is and fix bugs in follow-up PRs. This PR allows us to test Azure CNI in Windows clusters by building them explicitly with "networkPolicy": "azure".

Thanks again @saiyan86 !

@jackfrancis jackfrancis merged commit 7eb4b81 into Azure:master Feb 7, 2018
@ghost ghost removed the in progress label Feb 7, 2018
@ultimateboy ultimateboy mentioned this pull request Feb 8, 2018
@ofiliz
Copy link
Contributor

ofiliz commented Feb 8, 2018

It is weird that you have rejected my PR #1505, copy/pasted my code to this PR, and then merged it and closed #1504 without even notifying me. I wrote the entire azure-vnet CNI plugin, all changes in acs-engine, including this one. A simple mention in the description would have been nice. CC @brendandburns @anhowe

@JiangtianLi
Copy link
Contributor

/cc @sharmasushant, @tamilmani1989, @saiyan86, could you give some context to Onur about CNI status and update the description here?

/cc @jackfrancis

@sharmasushant
Copy link
Contributor

@ofiliz This is not the same PR. This PR does not enable use of both calico (when available) and azure at the same time. So vnetIntegration related stuff is removed.

@magnock
Copy link

magnock commented Feb 12, 2018

Hello guys, acs-engines 0.12.5 with windows and linux pool still generates nodes with kube-dns-v20 crashing !! any quick fix ? recompiling master is not an option, as it's not stable at all.
Is there any stable version where Linux and Windows pods can communicate without any problems ??
Thank you very much.

@jackfrancis
Copy link
Member

@magnock are you able to reproduce the issue easily? If so, and if you are able to build acs-engine from master, you could re-try a cluster deployment using HEAD, which as of this morning includes Windows RS3 fixes that may address your symptoms.

@msorby
Copy link

msorby commented Feb 12, 2018

@magnock I can build K8i cluster with 0.12.5 where the kube-dns-v20 is stable and up and running.
One thing that you can check is that your ServicePrincipalProfile is correct. It had threw me for a loop, since there is no validation and it's not being used provisioning the cluster, but it's used later on.
I'm not the only one bitten by this ;-)

How ever windows pods can't talk to other pods without a workaround.
But the kube-dns should be stable from my experience.

tesharp pushed a commit to tesharp/acs-engine that referenced this pull request Mar 16, 2018
* enabling azure cni

* delete overwrite

* address comments

* address comments

* fix kubeStartStr

* fix kubeStartStr

* remove misc files

* squash commits for kubeStartStr

* passed final test

* rebase cleanup

* setting Azure CNI for vlabs only

* default back to kubenet
wenwu449 pushed a commit to yangl900/acs-engine that referenced this pull request Mar 26, 2018
* update ubuntu image for German cloud (Azure#2036)

* Fix issue with apiserver when using AADProfile (Azure#2047) (Azure#2055)

* Fix issue with apiserver when using AADProfile (Azure#2047)

* fixing failed test

* missed another test

* clear containers (Azure#1945)

* clear-containers: add runtime to api and pass through parameters

Signed-off-by: Jess Frazelle <[email protected]>

* clear-containers: add scripts

Signed-off-by: Jess Frazelle <[email protected]>

* clear-containers: add example

Signed-off-by: Jess Frazelle <[email protected]>

* clear-containers: fix variables

Signed-off-by: Jess Frazelle <[email protected]>

* clear-containers: add docs

Signed-off-by: Jess Frazelle <[email protected]>

* clear-containers: update install script

Signed-off-by: Jess Frazelle <[email protected]>

* clear-containers: fix script

Signed-off-by: Jess Frazelle <[email protected]>

* clear-containers: update example

Signed-off-by: Jess Frazelle <[email protected]>

* clear-containers: update features docs

Signed-off-by: Jess Frazelle <[email protected]>

* clear-containers: make test linters happy

Signed-off-by: Jess Frazelle <[email protected]>

* setKubeletOpts to work better with kubeconfig

Signed-off-by: Jess Frazelle <[email protected]>

* whitespace cruft

* more whitespace fun

* Add --feature-gates handling for kubelet and api server (Azure#2032)

* Add --feature-gates handling

For kubeletConfig, preserve the existing behaviour of adding Accelerators=true for agent config (for kubernetes 1.6.0 and later)

* simplified default implementation and removed KUBELET_FEATURE_GATES

* unnecessary default assignment, and simple validation

* removed copyMap func

* enforce min version, "Accelerators=true" is only for agents

* Pass current version in to addDefaultFeatureGates

Generalise addDefaultFeatureGates by passing in current orchestrator version, and deferring the minimum required version to callers

* Add tests for --feature-gates behaviour

Test for only adding Accelerators=true for 1.6.0+
Test for correct application of KubeletConfig from top-level vs Master/AgentProfile

* Avoid ref sharing for kubeletConfig

Fix failing test by avoiding reference sharing of kubeletConfig properites
on master and agent profiles

* Remove outdated comments

* k8s/script: allow parallelizing custom script without clear-containers (Azure#2067)

Signed-off-by: Jess Frazelle <[email protected]>

* Improving IP address assignment for master nodes with Azure CNI. (Azure#1966)

* Azure cni static ip change (#1)

* VSTS#1828538 Modified master IPs from dynamic to static and made agent nic dependent on master nic

* Modified firstconsecutive static IP for default vnet

* updated documentation specific to Azure CNI

* Fixed styling test errors

* Handled no master scenario

* updated one of the examples cluster config to use firstConsecutiveStaticIp from the start of subnet

* removed azureCNI check so that agent nic will always depend on master nic

* Specified dependency of agent nic on master nic in windows agents template

* moving firstConsecutiveStaticIP away from the edge of usable address space

* Update documentation to help customers how to specify firstConsecutiveStaticIP and ipAddressCount for master nodes.

* Add support for Kubernetes v1.8.7 (Azure#2068)

* added support for Kubernetes v1.8.7

// TODO: build/publish Windows image

* fix unit test

* fixed dirty windows 1.8.6 build artifact

* Upgrade Azure CNI to 1.0.1 (Azure#2064)

* Azure CNI version bump

* s/conf/conflist

* Adopt CIS Kubernetes Benchmark, Part 2: Controller Manager. (Azure#2066)

* Jenkins soak tests (Azure#2028)

* added soak test

* added soak test name

* add location to name

* delete rg if re-provisioning cluster

* fix typo

* added wait flag

* fix no wait arg

* add err log

* remove ssh key

* remove unused vars

* Revert "remove ssh key"

This reverts commit 9041c1a.

* terminology

* delete files if deployment failed

* remove soak test spec

* Pass in soak cluster name

* only generate ssh key once

* add default for KeyGenerated

* fix typo

* Do not generate SSH for soak test

* fix typo in comment

* add dashboard test debug log

* Revert "add dashboard test debug log"

This reverts commit 4937282.

* Add Enable Pod Security Option (Azure#2048)

* Add PodSecurityPolicy

* use helpers.IsTrueBoolPointer, delete EnablePodSecurityPolicy function and update defaultAPIServerConfig

* latest dashboard for v1.8 and v1.9 clusters (Azure#2070)

* latest addon-resizer for v1.8 and v1.9 clusters (Azure#2071)

* update kube-dns for v1.8 and v1.9 k8s clusters (Azure#2073)

* latest kube-dns for v1.8 and v1.9 clusters

* k8s-dns-dnsmasq-nanny-amd64:1.14.5

* latest heapster for v1.8 and v1.9 clusters (Azure#2072)

* update pause image for v1.8 and v1.9 k8s clusters (Azure#2074)

* latest pause image for v1.8 and v1.9 clusters

* omitted v1.9.1

* changed wording of error (Azure#2089)

* update Ubuntu image (Azure#2079)

* Update docs for --feature-gates (Azure#2081)

* re-enable read-only port on kubelet (Azure#2091)

fixes heapster connection issues

* revert addon-resizer version update (Azure#2090)

* revert heapster version and re-enable kubelet read-only-port

* revert addon-resizer to 1.7

* isolated bug fix to addon-resizer version

* Add support for k8s 1.9.2 (Azure#2092)

* add support for k8s 1.9.2

* updated windows zip

* revert to 1.7 for addon resizer

* Extend windows os drive size when customized OSDiskSizeGB is used (Azure#2097)

* Adopt CIS Kubernetes Benchmark, Part 3: Kubelet (Azure#2098)

* Restore KubernetesConfig sans struct embedding (Azure#2108)

* restore properties to KubernetesConfig

* lint

* comment

* rebase errata

* Kind should be only EncryptionConfig for encyption-config.yaml (Azure#2104)

* Kind should be only Config

* removed the wrong Kind entry!

* remove redundant apiVersion

* running CSE provisioning script in foreground (Azure#2113)

* Add autoscale test to E2E (Azure#2096)

* initial attempt at autoscale test

* working autoscale test

* only add add’l options if passed in

* Adding 3 replicas for load tester deployment

* wait longer and linux only

* skip autoscale test for v1.9 clusters

Azure#2114

* Add member update after restarting etcd (Azure#2118)

* Remove SecurityContextDeny setting in API server admission control. (Azure#2125)

* Update custom vnet doc (Azure#2128)

* add Azure Active Directory Admin Group Object ID flag (Azure#2111)

* we don’t want to see stderr when checking for provision.complete (Azure#2126)

* only create cert files on master (Azure#2120)

* only create cert files on master

* master node provision script cleanup

* Enable iptables forward for kubernetes (Azure#2139)

* --authorization-mode=Node only if secure kubelet (Azure#2138)

* --authorization-mode=Node only if secure kubelet

* EnableSecureKubelet unit test errata, using defaults generally

* Validate k8s versions for PSP (Azure#2145)

* Update window binary build documentation (Azure#2147)

* Update window binary build documentation

* move background section

* Upgrade docker-engine to 1.13.* for all Kubernetes clusters >= v1.7 (Azure#2144)

* update 1.7, 1.8, 1.9 latest to docker 1.13.*

* docker-engine update to 1.13.* for >= 1.7 clusters

* remove add 3 hours from timestamp, add’l \n (Azure#2149)

* remove add 3 hours from timestamp, add’l \n

* more \n

* language

* Add regression tests for 3 masters & 5 masters (Azure#2154)

* added multi master configs

* fix fmt

* Mount in /var/lib/cni from the host. (Azure#2165)

* Kubernetes E2E: test addons if present (Azure#2156)

* conditional addon tests

* uses generated model to introspect cluster features

* I heart output

* deployment flows need expanded cluster definition

* reverting to ClusterDefinition for node counts

* standard stdout implementation for all commands

* typo

* disable broken chmod command

* stdout tweaks

* retrieving deployment error details during upgrade (Azure#1995)

* analyze and return deployment status during upgrade
* added unittests for DeployTemplateSync

* adding francecentral to azureconst generative script (Azure#2164)

* add francecentral (Azure#2167)

* validation error if custom VNET + Windows (Azure#2168)

* Allow 1 core master node VM sizes (Azure#2173)

* Allow DS1 Master VM sizes

* 1 core for K8s masters

* revert constants

* update azure consts (Azure#2179)

* update prometheus-grafana addon (Azure#2183)

* Update clusterdefinition.md (Azure#2171)

Fix apiserver options table markdown

* Adding ServiceNodeExclusion as a default flag for Controller Manager (Azure#2180)

* remove francecentral (Azure#2193)

* improve networkpolicy documentation (Azure#2170)

* Protect etcd tls from race conditions (Azure#2160)

* chown etcd for keys in custom script

* remove certs complete

* add longer retry for etcd

* remove second retry cmd

* fix retrycmd_if_failure

* add retries

* passwd -u “etcd”

* fix redirect output

* pulling down provision logs during e2e runs (Azure#2190)

* pulling down provision logs during e2e runs

* setting no deployment retries by default

* remove /opt/azure/containers/setup-etcd.sh logs

* logs = finding bugs!

* deleting deployments in resource groups (Azure#2195)

* added kubernetes version validation for managed clusters (Azure#2194)

* fix-api-server-bind-address-flag - Fixes a flag typo in the api serve… (Azure#2192)

* fix-api-server-bind-address-flag - Fixes a flag typo in the api server yaml

* fix-api-server-bind-address-flag - update docs with --bind-address fix

* Update doc entry for bind-address flag

* Network validation checks during provision (Azure#2196)

* Add DNS + HTTPS checks, capture DNS packets

* ARM doesn’t like ‘{‘

* standardizing retrycmd_if_failure usage patterns

* Adding DNS pre-check for aptdocker.azureedge.net

* tracking time for each retried provision event

* standardizing to 3 masters api model for e2e tests

* retain e2e resources for debugging

* getting metrics logs from all cluster hosts

* improved master/agent host retrieval

* lint

* lint

* Adding “agent” substring to e2e api model pools

* invalid agent pool name

* revert agent forwarding ssh config

* restore cleanup

* add agent dns validation

* 5 seconds between etcddisk mount retries

* Fix DC/OS release version (Azure#2197)

* update ApiServerConfig customization/override example (Azure#2201)

* idiomatic windows e2e definition (Azure#2206)

* don’t abort errors during log gathering (Azure#2207)

* Make sure --cluster-dns uses DNSServiceIp set in KubernetesConfig and not always default value (Azure#2078)

* Updated broken links (Azure#2208)

* E2E nginx outbound access test: simplify port test (Azure#2204)

* replace curl with nc

* what does output look like on mismatch?

* We are testing for outbound internet access

not web content matching

* testing curl for err

* real tests, and not installing curl a bunch of times

* don’t cleanup k8s e2e clusters (Azure#2210)

* Add additional v3 vm sizes permitted for dcos master and agent (Azure#2184)

* Cloud init improvements (Azure#2203)

* chown etcd for keys in custom script

* remove certs complete

* add longer retry for etcd

* remove second retry cmd

* fix retrycmd_if_failure

* add retries

* passwd -u “etcd”

* fix redirect output

* remove extra lines

* ignore warnings for etcd user changes

* Parametrize retry cmd

* removed unused data dir

* use etcd args

* Revert "use etcd args"

This reverts commit ccbff6d.

* parametrize sleep

* changed the retries to 120 for network stuff

* Remove Agent NICs dependency on Master NICs during upgrade. (Azure#2213)

* replaced apierror with armerror (Azure#2205)

* replaced apierror with armerror

* addressed comments

* addressed comments

* reverted change in pkg/api/types.go

* Kubernetes provision script: check for kubectl and docker files (Azure#2211)

* unnecessary add’l systemctl enable

* generalize ensureFilepath

* fail provision if etcd check fails

* rationalize azureconst (Azure#2215)

* e2e ssh cleanup (Azure#2216)

* return nil error on successful deployment (Azure#2218)

* explicitly check aptdocker.azureedge.net (Azure#2220)

* Add more etcd setup visibility (Azure#2214)

* add etcd setup log to artifacts

* remove hiding useradd output

* show output of user add

* add check for etcd user

* add default audit policy (Azure#2189)

* add default audit policy

* apiserver audit log rotation is user-configurable

* add nc checks to agent (Azure#2221)

* Enabling Azure CNI for Windows (Azure#2174)

* enabling azure cni

* delete overwrite

* address comments

* address comments

* fix kubeStartStr

* fix kubeStartStr

* remove misc files

* squash commits for kubeStartStr

* passed final test

* rebase cleanup

* setting Azure CNI for vlabs only

* default back to kubenet

* more set -x (Azure#2224)

* more set -x

* send ps to background

* timestamps

* adding certs dependency in cloud-init

* rationalize etcd certs dep

* extra ensure_etcd_ready

* fixed version checking for managed clusters (Azure#2226)

* Enabled preprovisioning on windows dcos agents (Azure#2228)

* retry get aptdocker gpg key many times (Azure#2229)

* Keyvault etcd certs (Azure#2155)

* Use single values for etcdpeer key params

* fixed param logic and added logic to vars

* remove unused code

* only add master certs/keys to params and vars if master is not hosted

* move apiserver cert

* add master profile != nil check

* undo move api server key

* Enable cloud controller manager support for 1.9

* Remove debug binary

* adding debug to gitignore

* minor doc fixes

* Fix azure cni service ip (Azure#2237)

* enabling azure cni

* fix Azure CNI service IP connectivity

* fix --auto-suffix when dnsPrefix is defined in apimodel json file (Azure#2239)

* E2E: don’t collect logs if soak test (Azure#2240)

* don’t collect logs if soak test

* this!

* Kubernetes 1.9.3 support (Azure#2242)

* Add version 1.9.3

* update win zip and re-fmt

* Kubernetes 1.8.8 support (Azure#2243)

* add k8s 1.8.8

* updated win zip

* rebase errata

* more rebase errata

* Update kuberneteswindowssetup.ps1 for azure cni to remove redundant code (Azure#2244)

* enabling azure cni

* remove redundant line

* Windows RS3 hot fix for k8s (Azure#2230)

* wait for certs to start etcd stuff in cloud init (Azure#2245)

* set addon enabled value if nil (Azure#2254)

* update generateproxycertscript.sh to use secure etcd endpoint/certs (Azure#2252)

* enforce apt-get update warnings/errors retries (Azure#2241)

* enforce apt-get update warnings/errors retries

* Add single quotes around sp secret (Azure#2255)

* --use-service-account-credentials=false if no rbac (Azure#2253)

* new ubuntu image (Azure#2259)

* Kubernetes Tiller Addon: configuration to set max-history (Azure#2217)

* Add max-history configuration to tiller addon.

* Test for max-history configuration for tiller addon.

* freshen go-dev image (Azure#2261)

* freshen go-dev image

* lint

* - keeping original DeploymentOperationsListResult in DeploymentError (Azure#2266)

* - keeping original DeploymentOperationsListResult in DeploymentError
- add DeploymentValidationError to distinguish validation errors

* addressed comments

* untangle —authorization-mode from enableSecureKubelet (Azure#2267)

* untangle —authorization-mode from “secure kubelet”

* fix typo

* fix monitoring extension and add support for prometheus v2 (Azure#2257)

This commit includes the following changes:
- fixes the broken monitoring (prometheus/grafana) extension
- makes this more resilient in the future, as the chart versions are now
static (future to-do item would be to have extensionParameters override
these versions)
- gives the user and contributor more flexibility by allowing them to
pass in a custom url for the prometheus chart values config (this is
primarily important for developing and testing away from the
Azure/acs-engine repo)

* enable AggregatedAPI's by default for k8s 1.9.0+ (Azure#2264)

* E2E test - 50 nodes (Azure#2260)

* E2E: cleanup legacy kubernetes (Azure#2275)

* add e2e hybrid definition

also remove tiller explicit config from windows api model

* removing windows + hybrid from legacy e2e

* removing tests from legacy e2e that are elsewhere

* add rescheduler, remove more from legacy e2e

* add debug for service URL content mismatch

* kubelet —cluster-domain is user-overridable (Azure#2276)

* api/vlabs: fix typos in tests (Azure#2280)

Signed-off-by: Jess Frazelle <[email protected]>

* add prerequisit  to have permissions to create service principals in the subscription (Azure#2281)

acs-engine hangs with "WARN[0008] apimodel: ServicePrincipalProfile was empty, assigning role to application..."
if user does not have enough permissions to create and assign service principals ans azure applications

* E2E: service LB validations and pod Ready/NotReady (Azure#2279)

* debug output if service URL validate error

* debugging num retries

* rearranging deck chairs

* service validate should guarantee service IP

* this actually works

* improve pod Ready/NotReady check

* general hpa foo (Azure#2291)

* more time and avoid nil panic (Azure#2289)

* New etcd versions and update default to v3.2.16 (Azure#2292)

* new etcd versions and set default to 3.3.1

* using 3.2.16 as default

* More e2e tests (Azure#2277)

* add features on and features off

* fix off model

* add seperate tests for each feature disabled

* move features off dir

* rbac bool

* added clear containers

* added addons enabled test

* fix typo in apimodel

* remove aci-connector

* move addons to default

* Don't display "Error: <nil>" on successful deployment (Azure#2300)

* E2E Addons (Azure#2294)

* add features on and features off

* fix off model

* add seperate tests for each feature disabled

* move features off dir

* rbac bool

* added clear containers

* added addons enabled test

* fix typo in apimodel

* remove aci-connector

* wip add mem/cpu limits/requests checks

* add resources to container spec

* fix resources type

* add checks to tiller

* remove extra err var

* add check for dashboard and aci connector

* update default definition

* fmt

* fix typo

* Refactor resources validation

* fix error string

* fix linter

* remove pointer

* fix ineffassign

* small fixes

* ensure docker installs before ensure docker runs (Azure#2305)

* Save apimodel after upgrade (Azure#2306)

* cmd/deploy: Handle error due to missing permissions during deploy (Azure#2297)

* Handle error due to missing permissions during deploy

* CreateRoleAssignmentSimple can already return an error. Use this if a status 403 (not enough permissions) occurs.
  This is opposed to status 404 that seems to be issued to signal work in progress during service principal generation (by arm).
* autoFillApimodel: remove the duplicated retry logic of CreateRoleAssignmentSimple. this allows to properly fail if CreateRoleAssignmentSimple returns an error

* style fix: gofmt -s

* Clarify that only Calico supports K8s network policies (Azure#2270)

* using --cluster-domain for kube-dns domain (Azure#2303)

* set kubelet defaults for --cgroups-per-qos &  --enforce-node-allocatable (Azure#2310)

* set kubelet defaults for --cgroups-per-qos &  --enforce-node-allocatable

* update docs

* Updates NVIDIA drivers installation (Azure#2219)

* updated NVIDIA drivers installation

* linting engine.go

* update GPU doc

* Revert static IP allocation logic in Azure CNI, PR 1966. (Azure#2315)

* add restarts to nvidia drivers download in cloud-init (Azure#2316)

* add restarts to nvidia drivers download

and only create cloud-init string if necessary

* add tests

* add v1.8 gpu-enabled api model for e2e testing

* trying Standard_NC6

* e2e

* lint

* updated comment

* bad match string, less freq checks, - unused func

* more general success determination, typo

* more typo

* Support multiple AcsEngineClientIDs (Azure#2293)

* Support multiple AcsEngineClientIDs

* Fix acsEngineClientID assignment

* Fix formatting azureclient.go

* Fix2 formatting azureclient.go

* docs cruft (Azure#2321)

we are not actually setting —read-only-port=0 for kubelet

* Use FirstConsecutiveStaticIP in original API model instead of resetting it to default during upgrade.

* fix quotation in etcd daemon args (Azure#2325)

* remove Windows + custom VNET validation error (Azure#2322)

* Add isUpgrade flag.

* Apply same logic to other routes setting FirstConsecutiveStaticIP.

* Reboot etcd fix (Azure#2329)

* fix quotation in etcd daemon args

* Revert "fix quotation in etcd daemon args"

This reverts commit 606bab4.

* fix reboot by adding systemctl enable service

* Remove agent NICs if upgrade master nodes.

* Private clusters (Azure#2326)

* add isprivatecluster func

* wip remove load balancer for PC

* add enablePrivateCluster flag

* no public IPs for private cluster

* working private cluster for 3 masters

* remove duplicate iptables cmd

* remove useless function

* fmt

* revert dnsprefix docs change

* undo etcd change

Move change to a separate PR because it is unrelated

* remove masterPublicIpAddress

* fix typo

* handle DCOS + swarm nil case

* add docs and example

* replace host by jumpbox in the docs

* add instructions to create jumpbox

* indents

* missing import (Azure#2348)

* Allow "v" prefix in orchestrator version and release (Azure#2344)

* fix quotation in etcd daemon args

* Revert "fix quotation in etcd daemon args"

This reverts commit 606bab4.

* trim v in orch ver / rel

* add unit tests

* Improve the instructions for AAD. (Azure#2330)

* Improve the instructions for AAD.

* broken link and syntax

* typo (Azure#2342)

* Fix master resources merge conflict (Azure#2353)


* apply azure CNI static IP revert

* Improve info to get issuerurl (Azure#2356)

* circleci: compile as separate step (Azure#2350)

* Improve code blocks (Azure#2335)

* Update Azure Gov ACSEngineClientID (Azure#2352)

* add ClustrRole & ClusterRoleBinding for azure file (Azure#2238)

add as cluster-service for azure-cloud-provider

* mount /sbin/apparmor_parser if PodSecurityPolicy is enabled (Azure#2320)

* mount /sbin/apparmor_parser if PSP

* this is the correct kubelet service file

* this is the correct sed command

* /sbin/apparmor_parser already exists

* Allow a default k8s version for loading agentpool-only clusters (Azure#2357)

The defaultKubernetesVersion argument will be used if
Properties.KubernetesVersion was empty.

* Private clusters iteration 2: change the server for the cluster kubeconfig (Azure#2354)

* modify 2nd kubeconfig for private clusters

* typo

* fix customscript kubeconfig

* revert change in custom data kubeconfig

* update docs for private clusters (Azure#2363)

* Remove 1.6.x upgrade tests. (Azure#2364)

* Add k8s 1.7.13 support (Azure#2369)

* add version 1.7.13

* update win zip

* Resolve merge conflict in building 1.7.13 (Azure#2370)

* Resolve merge conflict in building 1.7.13

* Add comment

* Set custom UbutuImageConfig for gov (Azure#2375)

* Fix guid validation (Azure#2373)

* metrics server addon (Azure#2339)

* metrics server addon

* use addonmanager mode EnsureExists

* fix labels on metrics APIService

* enable hpa autoscale test for 1.9 clusters

* Reuse GetCloudTargetEnv in FormatAzureProdFQDN (Azure#2376)

* reuse GetCloudTargetEnv in FormatAzureProdFQDN

* Fix FQDNFormat lint error

* minor fix in build script (Azure#2379)

* rationalized vendor/ (Azure#2390)

* remove unnecessary hyperkube reference (Azure#2391)

* Remove deprecated '--require-kubeconfig' for k8s (Azure#2365)

* Remove deprecated --require-kubeconfig

* adding —require-kubeconfig back to 1.7j

* less than is what we want here

* Pass --location to containerService (Azure#2381)

* Notes on day-to-day operations on an acs-engine cluster (Azure#2351)

* Notes from my experiences..

.. over the last couple of days

* Rename day-two-operations.md to kubernetes-day2-operations.md

* add etcd certs to KV docs (Azure#2396)

* upgrade tiller to 2.8.1 (Azure#2397)

* remove k8s 1.5 related code/artifacts (Azure#2394)

* blocking cse on cluster nodes ready (Azure#2225)

* blocking cse on cluster nodes ready

* deal with agent-only clusters

* use kubectl var and ignore stderr

* increase node active check timeout to 30 mins

* test single master node Windows clusters (Azure#2402)

* a miss

* fix 2 more missed error

* remove unnecessary

* remove more unnecessary
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Integrate Azure-VNET CNI plugin to Kubernetes Windows nodes
8 participants