Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable more optional configurations for AKS node pools #1706

Closed
wants to merge 1 commit into from

Conversation

meixingdb
Copy link
Contributor

@meixingdb meixingdb commented Sep 20, 2021

Issue: #1701

/kind feature

Describe the solution you'd like
[A clear and concise description of what you want to happen.]

Add optional configurations for users to specify specs for AKS node pools. The list of options include:
// AutoScaling config
type AutoScaling struct {
// MaxCount - Maximum number of nodes for auto-scaling
// +kubebuilder:validation:Required
MaxCount *int32 json:"maxCount"

// MinCount - Minimum number of nodes for auto-scaling
// +kubebuilder:validation:Required
MinCount *int32 `json:"minCount"`

}
// EnableNodePublicIP - Enable public IP for nodes
// +optional
EnableNodePublicIP
// EnableFIPS - Whether to use FIPS enabled OS
// +optional
EnableFIPS
// OsDiskType - OS disk type to be used for machines in a given agent pool. Allowed values are 'Ephemeral' and 'Managed'. If unspecified, defaults to 'Ephemeral' when the VM supports ephemeral OS and has a cache disk larger than the requested OSDiskSizeGB. Otherwise, defaults to 'Managed'. May not be changed after creation. Possible values include: 'Managed', 'Ephemeral'
// +optional
OsDiskType
// NodeLabels - Agent pool node labels to be persisted across all nodes in agent pool.
// +optional
NodeLabels
// NodeTaints - Taints added to new nodes during node pool create and scale. For example, key=value:NoSchedule.
// +optional
NodeTaints
// VnetSubnetID - VNet SubnetID specifies the VNet's subnet identifier for nodes and maybe pods
// +optional
VnetSubnetID
// AvailabilityZones - Availability zones for nodes. Must use VirtualMachineScaleSets AgentPoolType.
// +optional
AvailabilityZones
// ScaleSetPriority - ScaleSetPriority to be used to specify virtual machine scale set priority. Default to regular. Possible values include: 'Spot', 'Regular'
// +optional
ScaleSetPriority
// MaxPods - Maximum number of pods that can run on a node.
// +optional
MaxPods
// KubeletConfig - KubeletConfig specifies the configuration of kubelet on agent nodes.
// +optional
KubeletConfig

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes # #1701

Special notes for your reviewer:

Please confirm that if this PR changes any image versions, then that's the sole change this PR makes.

TODOs:

  • squashed commits
  • includes documentation
  • adds unit tests

Release note:


Enable more optional configurations for AKS node pools.
User can config below options on AKS node pool:
type AutoScaling struct {
	MaxCount *int32 `json:"maxCount"`
	MinCount *int32 `json:"minCount"`
}AutoScaling

EnableNodePublicIP
EnableFIPS
OsDiskType
NodeLabels
NodeTaints
VnetSubnetID
AvailabilityZones
ScaleSetPriority
MaxPods
KubeletConfig

an example node pool config YAML:

kind: AzureManagedMachinePool
metadata:
  name: example
  namespace: default
spec:
  mode: User
  sku: Standard_D2s_v3
  osDiskSizeGB: 48
  autoScaling:
    minCount: 2
    maxCount: 10
  availabilityZones:
  - "1"
  - "2"
  - "3"
  enableFIPS: true
  enableNodePublicIP: false
  nodeLabels:
    label1: test
    label2: test2
  nodeTaints:
  - key1=value1:NoSchedule
  - key2=value2:NoSchedule
  maxPods: 40
  scaleSetPriority: Spot

Nodepool options can be enabled
Screen Shot 2021-11-16 at 7 54 35 AM

Kubeletconfig can be optionally set
Screen Shot 2021-11-16 at 8 02 23 AM

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. kind/feature Categorizes issue or PR as related to a new feature. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Sep 20, 2021
@k8s-ci-robot
Copy link
Contributor

Welcome @meixingdb!

It looks like this is your first PR to kubernetes-sigs/cluster-api-provider-azure 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes-sigs/cluster-api-provider-azure has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Sep 20, 2021
@k8s-ci-robot
Copy link
Contributor

Hi @meixingdb. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added area/provider/azure Issues or PRs related to azure provider sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Sep 20, 2021
@k8s-ci-robot k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Sep 20, 2021
@meixingdb meixingdb changed the title [WIP] AKS clusters should allow users to specify more optional configuration for node pools AKS clusters should allow users to specify more optional configuration for node pools Sep 20, 2021
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Sep 20, 2021
@meixingdb meixingdb changed the title AKS clusters should allow users to specify more optional configuration for node pools Enable more optional configurations for AKS node pools Sep 21, 2021
@meixingdb
Copy link
Contributor Author

/assign @CecileRobertMichon

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 22, 2021
@@ -67,6 +67,7 @@ func New(scope ManagedClusterScope) *Service {
}

// Reconcile idempotently creates or updates a managed cluster, if possible.
//gocyclo:ignore
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
//gocyclo:ignore

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you please think of a more elegant way to do this ?

Comment on lines 56 to 70
// MaxCount - Maximum number of nodes for auto-scaling
// +optional
MaxCount *int32 `json:"maxCount,omitempty"`

// MinCount - Minimum number of nodes for auto-scaling
// +optional
MinCount *int32 `json:"minCount,omitempty"`

// EnableAutoScaling - Whether to enable auto-scaler
// +optional
EnableAutoScaling *bool `json:"enableAutoScaling,omitempty"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// MaxCount - Maximum number of nodes for auto-scaling
// +optional
MaxCount *int32 `json:"maxCount,omitempty"`
// MinCount - Minimum number of nodes for auto-scaling
// +optional
MinCount *int32 `json:"minCount,omitempty"`
// EnableAutoScaling - Whether to enable auto-scaler
// +optional
EnableAutoScaling *bool `json:"enableAutoScaling,omitempty"`
// EnableAutoScaling for AKS
// +optional
AutoScaling *AutoScaling `json:autoScaling, omitempty`
type AutoScaling struct {
// MaxCount - Maximum number of nodes for auto-scaling
// +kubebuilder:validation:Required
MaxCount *int32 `json:"maxCount"`
// MinCount - Minimum number of nodes for auto-scaling
// +kubebuilder:validation:Required
MinCount *int32 `json:"minCount"`
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will help reduce the validation effort in the webhooks :)

return nil
}

// ValidateUpdate implements webhook.Validator so a webhook will be registered for the type.
//gocyclo:ignore
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
//gocyclo:ignore

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please think of a more elegant way to do this :)

I think cyclomatic complexity is configured to :

  • reduce the complexity of the code
  • avoid cognitive burden
  • reduce the bugginess in code

If it is breaking it means we should try and improve code.

Copy link
Contributor

@CecileRobertMichon CecileRobertMichon Oct 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

also pasting my slack comment for reference:

the default gocyclo complexity that gets reported is 30, that’s very high. In general we should aim to be even lower than that (10-20). If the function is becoming too complex, the linter can be a good nudge to refactor things to make the code easier to read and easier to test. A function with > 30 cyclomatic complexity is hard to properly unit test and can lead to more logic / edge case bugs. I’m not saying we should blindly break out a big function into 5 smaller equally complex functions but the linter should be a signal that we need to rethink the code, maybe there are other refactors that can be applied to simplify the logic

https://kubernetes.slack.com/archives/CEX9HENG7/p1631747376132600?thread_ts=1631743825.132300&cid=CEX9HENG7

Comment on lines 557 to 587
if s.InfraMachinePool.Spec.MaxCount != nil {
agentPoolSpec.MaxCount = s.InfraMachinePool.Spec.MaxCount
}

if s.InfraMachinePool.Spec.MinCount != nil {
agentPoolSpec.MinCount = s.InfraMachinePool.Spec.MinCount
}

if s.InfraMachinePool.Spec.EnableAutoScaling != nil {
agentPoolSpec.EnableAutoScaling = s.InfraMachinePool.Spec.EnableAutoScaling
}

if s.InfraMachinePool.Spec.EnableFIPS != nil {
agentPoolSpec.EnableFIPS = s.InfraMachinePool.Spec.EnableFIPS
}

if s.InfraMachinePool.Spec.EnableNodePublicIP != nil {
agentPoolSpec.EnableNodePublicIP = s.InfraMachinePool.Spec.EnableNodePublicIP
}

if s.InfraMachinePool.Spec.NodeLabels != nil {
agentPoolSpec.NodeLabels = s.InfraMachinePool.Spec.NodeLabels
}

if s.InfraMachinePool.Spec.NodeTaints != nil {
agentPoolSpec.NodeTaints = s.InfraMachinePool.Spec.NodeTaints
}

if s.InfraMachinePool.Spec.OsDiskType != nil {
agentPoolSpec.OsDiskType = s.InfraMachinePool.Spec.OsDiskType
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need all of these checks ?

Comment on lines 593 to 603
if s.InfraMachinePool.Spec.AvailabilityZones != nil {
agentPoolSpec.AvailabilityZones = s.InfraMachinePool.Spec.AvailabilityZones
}

if s.InfraMachinePool.Spec.ScaleSetPriority != nil {
agentPoolSpec.ScaleSetPriority = s.InfraMachinePool.Spec.ScaleSetPriority
}

if s.InfraMachinePool.Spec.MaxPods != nil {
agentPoolSpec.MaxPods = s.InfraMachinePool.Spec.MaxPods
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above

@LochanRn
Copy link
Member

@meixingdb I think logically everything seems to be fine !!

Can you please reduce the number unwanted checks ?
It will make the code more simpler !

@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 30, 2021
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To complete the pull request process, please ask for approval from cecilerobertmichon after the PR has been reviewed.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@LochanRn
Copy link
Member

LochanRn commented Oct 5, 2021

@meixingdb please amend your commit messages to be more descriptive thanks :)

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 6, 2021
@meixingdb
Copy link
Contributor Author

@meixingdb: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

@LochanRn Could you help me to understand the test failures? For the apidiff, I think the newly added fields in AgentPoolSpec in azure/types.go are all optional? why it's causing issue?
regarding the e2e test, what's the yaml template looks like for controlplane with 1 worker node? Is there a system node pool defined or not? I've tested controlplane with one system node pool. It worked.
Appreciate your helps.

@LochanRn
Copy link
Member

LochanRn commented Nov 8, 2021

It's not an issue and the job is meant to fail, if there are changes made to the api the apidiff job fails, which becomes an indicator to the reviewers about changes made to the api.

@LochanRn
Copy link
Member

LochanRn commented Nov 8, 2021

@meixingdb will go though your PR by tomorrow, thanks for the changes made 😊

@meixingdb
Copy link
Contributor Author

@meixingdb will go though your PR by tomorrow, thanks for the changes made 😊

Thanks a lot!

Copy link
Contributor

@alexeldeib alexeldeib left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops, apparently some of my comments were pending still. still relevant to the alpha versions since we now have beta as well

// KubeletConfig kubelet configurations of agent nodes.
type KubeletConfig struct {
// CPUManagerPolicy - CPU Manager policy to use.
CPUManagerPolicy *string `json:"cpuManagerPolicy,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this duplicated from exp/api?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like we need this. it's also in api/v1beta1/types.go without my change.
I tried remove this, but deep copy complained about it

@@ -37,6 +37,87 @@ type AzureManagedMachinePoolSpec struct {
// ProviderIDList is the unique identifier as specified by the cloud provider.
// +optional
ProviderIDList []string `json:"providerIDList,omitempty"`

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove this, do not change v1a3. Handle conversion in the webhook (there’s an existing pattern, you will be able to restore the struct easily

Copy link
Contributor Author

@meixingdb meixingdb Nov 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alexeldeib could you refer me to the pattern? I think I tried one in the past, but it failed the test.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think that's what you're looking for?

Comment on lines 100 to 40
type KubeletConfig struct {
// CPUManagerPolicy - CPU Manager policy to use.
CPUManagerPolicy *string `json:"cpuManagerPolicy,omitempty"`
// CPUCfsQuota - Enable CPU CFS quota enforcement for containers that specify CPU limits.
CPUCfsQuota *bool `json:"cpuCfsQuota,omitempty"`
// CPUCfsQuotaPeriod - Sets CPU CFS quota period value.
CPUCfsQuotaPeriod *string `json:"cpuCfsQuotaPeriod,omitempty"`
// ImageGcHighThreshold - The percent of disk usage after which image garbage collection is always run.
ImageGcHighThreshold *int32 `json:"imageGcHighThreshold,omitempty"`
// ImageGcLowThreshold - The percent of disk usage before which image garbage collection is never run.
ImageGcLowThreshold *int32 `json:"imageGcLowThreshold,omitempty"`
// TopologyManagerPolicy - Topology Manager policy to use.
TopologyManagerPolicy *string `json:"topologyManagerPolicy,omitempty"`
// AllowedUnsafeSysctls - Allowlist of unsafe sysctls or unsafe sysctl patterns (ending in `*`).
// TODO: consider using []string instead of *[]string
AllowedUnsafeSysctls *[]string `json:"allowedUnsafeSysctls,omitempty"`
// FailSwapOn - If set to true it will make the Kubelet fail to start if swap is enabled on the node.
FailSwapOn *bool `json:"failSwapOn,omitempty"`
// ContainerLogMaxSizeMB - The maximum size (e.g. 10Mi) of container log file before it is rotated.
ContainerLogMaxSizeMB *int32 `json:"containerLogMaxSizeMB,omitempty"`
// ContainerLogMaxFiles - The maximum number of container log files that can be present for a container. The number must be ≥ 2.
ContainerLogMaxFiles *int32 `json:"containerLogMaxFiles,omitempty"`
// PodMaxPids - The maximum number of processes per pod.
PodMaxPids *int32 `json:"podMaxPids,omitempty"`
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
type KubeletConfig struct {
// CPUManagerPolicy - CPU Manager policy to use.
CPUManagerPolicy *string `json:"cpuManagerPolicy,omitempty"`
// CPUCfsQuota - Enable CPU CFS quota enforcement for containers that specify CPU limits.
CPUCfsQuota *bool `json:"cpuCfsQuota,omitempty"`
// CPUCfsQuotaPeriod - Sets CPU CFS quota period value.
CPUCfsQuotaPeriod *string `json:"cpuCfsQuotaPeriod,omitempty"`
// ImageGcHighThreshold - The percent of disk usage after which image garbage collection is always run.
ImageGcHighThreshold *int32 `json:"imageGcHighThreshold,omitempty"`
// ImageGcLowThreshold - The percent of disk usage before which image garbage collection is never run.
ImageGcLowThreshold *int32 `json:"imageGcLowThreshold,omitempty"`
// TopologyManagerPolicy - Topology Manager policy to use.
TopologyManagerPolicy *string `json:"topologyManagerPolicy,omitempty"`
// AllowedUnsafeSysctls - Allowlist of unsafe sysctls or unsafe sysctl patterns (ending in `*`).
// TODO: consider using []string instead of *[]string
AllowedUnsafeSysctls *[]string `json:"allowedUnsafeSysctls,omitempty"`
// FailSwapOn - If set to true it will make the Kubelet fail to start if swap is enabled on the node.
FailSwapOn *bool `json:"failSwapOn,omitempty"`
// ContainerLogMaxSizeMB - The maximum size (e.g. 10Mi) of container log file before it is rotated.
ContainerLogMaxSizeMB *int32 `json:"containerLogMaxSizeMB,omitempty"`
// ContainerLogMaxFiles - The maximum number of container log files that can be present for a container. The number must be ≥ 2.
ContainerLogMaxFiles *int32 `json:"containerLogMaxFiles,omitempty"`
// PodMaxPids - The maximum number of processes per pod.
PodMaxPids *int32 `json:"podMaxPids,omitempty"`
}
// KubeletConfig kubelet configurations of agent nodes.
type KubeletConfig struct {
// CPUManagerPolicy - CPU Manager policy to use.
// +kubebuilder:validation:Enum=none;static
// +optional
CPUManagerPolicy *string `json:"cpuManagerPolicy,omitempty"`
// CPUCfsQuota - Enable CPU CFS quota enforcement for containers that specify CPU limits.
// +optional
CPUCfsQuota *bool `json:"cpuCfsQuota,omitempty"`
// CPUCfsQuotaPeriod - Sets CPU CFS quota period value.
// +optional
CPUCfsQuotaPeriod *string `json:"cpuCfsQuotaPeriod,omitempty"`
// ImageGcHighThreshold - The percent of disk usage after which image garbage collection is always run.
// +optional
// +kubebuilder:validation:Maximum=100
// +kubebuilder:validation:Minimum=0
ImageGcHighThreshold *int32 `json:"imageGcHighThreshold,omitempty"`
// ImageGcLowThreshold - The percent of disk usage before which image garbage collection is never run.
// +optional
// +kubebuilder:validation:Maximum=100
// +kubebuilder:validation:Minimum=0
ImageGcLowThreshold *int32 `json:"imageGcLowThreshold,omitempty"`
// TopologyManagerPolicy - Topology Manager policy to use.
// +kubebuilder:validation:Enum=none;est-effort;restricted;single-numa-node
// +optional
TopologyManagerPolicy *string `json:"topologyManagerPolicy,omitempty"`
// AllowedUnsafeSysctls - Allowlist of unsafe sysctls or unsafe sysctl patterns (ending in `*`).
// +optional
// +kubebuilder:validation:Enum=None;kernel.shm*;kernel.msg*;kernel.sem;fs.mqueue.*;net.*
AllowedUnsafeSysctls *[]string `json:"allowedUnsafeSysctls,omitempty"`
// FailSwapOn - If set to true it will make the Kubelet fail to start if swap is enabled on the node.
// +optional
FailSwapOn *bool `json:"failSwapOn,omitempty"`
// ContainerLogMaxSizeMB - The maximum size (e.g. 10Mi) of container log file before it is rotated.
// +optional
ContainerLogMaxSizeMB *int32 `json:"containerLogMaxSizeMB,omitempty"`
// ContainerLogMaxFiles - The maximum number of container log files that can be present for a container. The number must be ≥ 2.
// +optional
ContainerLogMaxFiles *int32 `json:"containerLogMaxFiles,omitempty"`
// PodMaxPids - The maximum number of processes per pod.
// +optional
PodMaxPids *int32 `json:"podMaxPids,omitempty"`
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes to be made in v1beta1 😅

azure/types.go Outdated
AvailabilityZones []string `json:"availabilityZones,omitempty"`

// ScaleSetPriority - ScaleSetPriority to be used to specify virtual machine scale set priority. Default to regular. Possible values include: 'Spot', 'Regular'
// +optional
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// +optional

azure/types.go Outdated
ScaleSetPriority *string `json:"scaleSetPriority,omitempty"`

// MaxPods - Maximum number of pods that can run on a node.
// +optional
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// +optional

azure/types.go Outdated
MaxPods *int32 `json:"maxPods,omitempty"`

// KubeletConfig - KubeletConfig specifies the configuration of kubelet on agent nodes.
// +optional
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// +optional

Comment on lines 61 to 64
// If AutoScaling is enabled, both MinCount and MaxCount should be set
if r.Spec.AutoScaling != nil && (r.Spec.AutoScaling.MaxCount == nil || r.Spec.AutoScaling.MinCount == nil || (*r.Spec.AutoScaling.MaxCount < *r.Spec.AutoScaling.MinCount)) {
allErrs = append(allErrs,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we validating again for MinCount and MaxCount ?
Kubebuilder already does that right ?

	// MaxCount - Maximum number of nodes for auto-scaling
	// +kubebuilder:validation:Required
	MaxCount *int32 `json:"maxCount"`

	// MinCount - Minimum number of nodes for auto-scaling
	// +kubebuilder:validation:Required
	MinCount *int32 `json:"minCount"`

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Copy link
Member

@LochanRn LochanRn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are most of the fields immutable, is there any documentation which you have followed ??

Comment on lines +81 to +83
// NodeTaints - Taints added to new nodes during node pool create and scale. For example, key=value:NoSchedule.
// +optional
NodeTaints []string `json:"nodeTaints,omitempty"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this field, can we add a regex to validate the strings ?

Comment on lines +105 to +141
if !reflect.DeepEqual(r.NodeTaints, old.NodeTaints) {
updateErrs = append(updateErrs,
field.Invalid(
field.NewPath("Spec", "NodeTaints"),
r.NodeTaints,
"field is immutable"))
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

afaik taints on a kubernetes node are mutable, please correct me if i am wrong !!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AKS limitation today: https://docs.microsoft.com/en-us/azure/aks/use-multiple-node-pools#setting-nodepool-taints (see the box a bit below that link)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks 😊

@meixingdb meixingdb force-pushed the upstream-nodepool branch 3 times, most recently from e5773fd to 56180d2 Compare November 16, 2021 07:18
@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Nov 16, 2021

CLA Not Signed

@k8s-ci-robot
Copy link
Contributor

@meixingdb: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-cluster-api-provider-azure-apidiff f8181c4 link false /test pull-cluster-api-provider-azure-apidiff
pull-cluster-api-provider-azure-coverage f8181c4 link false /test pull-cluster-api-provider-azure-coverage
pull-cluster-api-provider-azure-test f8181c4 link true /test pull-cluster-api-provider-azure-test

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@CecileRobertMichon CecileRobertMichon added the area/managedclusters Issues related to managed AKS clusters created through the CAPZ ManagedCluster Type label Nov 18, 2021
@k8s-ci-robot
Copy link
Contributor

@meixingdb: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 20, 2021
//dst.Spec.NodeTaints = []string{}
//dst.Spec.AvailabilityZones = []string{}

if restored.Spec.NodeLabels != nil {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alexeldeib @CecileRobertMichon
Can I get some helps on the conversions for Annotations/NodeLabels/NodeTaints/AvailabilityZones/KubeletConfig?
I've tried many combinations, none of them worked. Always got errors like this:

        - 		Annotations:                nil,
        + 		Annotations:                map[string]string{},
          		OwnerReferences:            nil,
          		Finalizers:                 nil,
          		... // 2 identical fields
          	},
          	Spec: v1beta1.AzureManagedMachinePoolSpec{
          		... // 10 identical fields
          		NodeTaints:        {"Lȃ$ʩʧȧɰʉ透酠v"},
          		VnetSubnetID:      nil,
        - 		AvailabilityZones: []string{},
        + 		AvailabilityZones: nil,
          		ScaleSetPriority:  &"{",
          		MaxPods:           nil,
          		KubeletConfig: &v1beta1.KubeletConfig{
          			... // 4 identical fields
          			ImageGcLowThreshold:   &-622828595,
          			TopologyManagerPolicy: &"lJÑĚʤp",
        - 			AllowedUnsafeSysctls:  &nil,
        + 			AllowedUnsafeSysctls:  nil,

          &v1beta1.AzureManagedMachinePool{
          	TypeMeta:   {},
          	ObjectMeta: {Name: "鍥唕ʡ", GenerateName: "CȡN阗綬é", Namespace: "|Zt(Ƌ濦¾UQ僷轘v刢#", SelfLink: ">藓Bƕ", ...},
          	Spec: v1beta1.AzureManagedMachinePoolSpec{
          		... // 10 identical fields
          		NodeTaints:        nil,
          		VnetSubnetID:      &"疒¯蟸酈ƫ搳ćDƤ株茀]喋",
        - 		AvailabilityZones: []string{},
        + 		AvailabilityZones: nil,
          		ScaleSetPriority:  &"{X慿é渇霓渉ƶ´墜2",
          		MaxPods:           &599523910,
          		KubeletConfig: &v1beta1.KubeletConfig{
          			... // 4 identical fields
          			ImageGcLowThreshold:   nil,
          			TopologyManagerPolicy: nil,
        - 			AllowedUnsafeSysctls:  &nil,
        + 			AllowedUnsafeSysctls:  nil,
          			FailSwapOn:            nil,
          			ContainerLogMaxSizeMB: &145186574,
          			... // 2 identical fields

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First you need to remove the new types from v1alpha4. Then, both alpha3 <-> beta1 and alpha4 <-> beta1 needs to be added. For new fields that were added, we want to restore v1beta1 types if they exist (this saves information in the case of a round trip conversion). If the field is a pointer, we only want to set it if it was previously set, for example:

if restored.Spec.KubeletConfig != nil {
    dst.Spec.KubeletConfig = restored.Spec.KubeletConfig
}

@@ -698,3 +698,29 @@ type AzureBastion struct {
func IsTerminalProvisioningState(state ProvisioningState) bool {
return state == Failed || state == Succeeded
}

// KubeletConfig kubelet configurations of agent nodes.
type KubeletConfig struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks like it's a duplicate of the type in exp/? If not, where is it used?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than define our own KubeletConfig type spec, can we simply add a new property to the AgentPoolSpec:

https://github.com/kubernetes-sigs/cluster-api-provider-azure/blob/v1.0.1/azure/types.go#L435

Something like this:

$ git diff
diff --git a/azure/types.go b/azure/types.go
index 45bb67f1..e0422643 100644
--- a/azure/types.go
+++ b/azure/types.go
@@ -19,6 +19,7 @@ package azure
 import (
        "reflect"
 
+       "github.com/Azure/azure-sdk-for-go/services/containerservice/mgmt/2021-05-01/containerservice"
        "github.com/google/go-cmp/cmp"
 
        infrav1 "sigs.k8s.io/cluster-api-provider-azure/api/v1beta1"
@@ -441,4 +442,7 @@ type AgentPoolSpec struct {
 
        // AvailabilityZones represents the Availability zones for nodes in the AgentPool.
        AvailabilityZones []string
+
+       // KubeletConfig kubelet configurations of agent nodes.
+       KubeletConfig containerservice.KubeletConfig
 }

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

discussed this with @jackfrancis offline, we don't actually want to do this, because it would make us rely on the SDK not making breaking changes and prevents us from using conversion for back compat (and also ties our API to a specific SDK version)

@@ -56,6 +56,116 @@ type AzureManagedMachinePoolSpec struct {
// ProviderIDList is the unique identifier as specified by the cloud provider.
// +optional
ProviderIDList []string `json:"providerIDList,omitempty"`

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should not be modifying v1alpha4 types. Only v1beta1 types can be updated in the main branch

@CecileRobertMichon
Copy link
Contributor

@meixingdb what do you think about breaking up this big PR into little small PRs, one for each feature that is getting added? this one is very big which makes it difficult to merge quickly. There are also several fields in here that have already been added in the main branch with separate PR which is causing merge conflicts. Examples of smaller PRs to add AKS types: #1669, #1564, #1910

@LochanRn
Copy link
Member

LochanRn commented Dec 9, 2021

@meixingdb what do you think about breaking up this big PR into little small PRs, one for each feature that is getting added? this one is very big which makes it difficult to merge quickly. There are also several fields in here that have already been added in the main branch with separate PR which is causing merge conflicts. Examples of smaller PRs to add AKS types: #1669, #1564, #1910

I agree with this and makes more sense !!

@devigned
Copy link
Contributor

The comments in thread have requested the work in this PR be broken into multiple smaller feature additions. I believe the amount of feedback and the push for smaller feature additions will make it difficult to continue reviewing this PR. I'm going to close this PR, not due to lack of interest or appreciation for these additions, but for the aforementioned feedback. Thank you, @meixingdb.

/close

@k8s-ci-robot
Copy link
Contributor

@devigned: Closed this PR.

In response to this:

The comments in thread have requested the work in this PR be broken into multiple smaller feature additions. I believe the amount of feedback and the push for smaller feature additions will make it difficult to continue reviewing this PR. I'm going to close this PR, not due to lack of interest or appreciation for these additions, but for the aforementioned feedback. Thank you, @meixingdb.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/managedclusters Issues related to managed AKS clusters created through the CAPZ ManagedCluster Type area/provider/azure Issues or PRs related to azure provider cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants