-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor: upstream most of Azure managed CAS changes in cloudprovider/azure for 1.29 #6991
refactor: upstream most of Azure managed CAS changes in cloudprovider/azure for 1.29 #6991
Conversation
Hi @comtalyst. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
b38947b
to
6c53a9d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(some annotations)
k8s.io/utils v0.0.0-20230726121419-3b25d923346b | ||
sigs.k8s.io/cloud-provider-azure v1.28.0 | ||
k8s.io/utils v0.0.0-20231127182322-b307cd553661 | ||
sigs.k8s.io/cloud-provider-azure v1.29.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated cloud-provider-azure version. This have resulted in other module changes as well as the switch to go.uber.org/mock/gomock
.
@@ -163,7 +163,7 @@ func (as *AgentPool) GetVMIndexes() ([]int, map[int]string, error) { | |||
} | |||
|
|||
indexes = append(indexes, index) | |||
resourceID, err := convertResourceGroupNameToLower("azure://" + *instance.ID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only trivial changes in this file.
@@ -185,7 +185,8 @@ func TestGetVMsFromCache(t *testing.T) { | |||
mockVMClient := mockvmclient.NewMockInterface(ctrl) | |||
testAS.manager.azClient.virtualMachinesClient = mockVMClient | |||
mockVMClient.EXPECT().List(gomock.Any(), testAS.manager.config.ResourceGroup).Return(expectedVMs, nil) | |||
ac, err := newAzureCache(testAS.manager.azClient, refreshInterval, testAS.manager.config.ResourceGroup, vmTypeStandard, false, "") | |||
testAS.manager.config.VMType = vmTypeStandard |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only trivial changes in this file.
@@ -35,8 +35,8 @@ import ( | |||
"github.com/Azure/azure-sdk-for-go/services/storage/mgmt/2021-09-01/storage" | |||
"github.com/Azure/go-autorest/autorest/date" | |||
"github.com/Azure/go-autorest/autorest/to" | |||
"github.com/golang/mock/gomock" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resulted from cloud-provider-azure bump.
@@ -18,8 +18,9 @@ package azure | |||
|
|||
import ( | |||
"fmt" | |||
"k8s.io/autoscaler/cluster-autoscaler/cloudprovider" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only trivial changes in this file.
@@ -47,8 +52,19 @@ type AzureManager struct { | |||
azClient *azClient | |||
env azure.Environment | |||
|
|||
azureCache *azureCache | |||
lastRefresh time.Time | |||
// azureCache is used for caching Azure resources. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Part of instance cache refactor.
@@ -106,8 +122,22 @@ func createAzureManagerInternal(configReader io.Reader, discoveryOpts cloudprovi | |||
return nil, err | |||
} | |||
|
|||
retryBackoff := wait.Backoff{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"added retry for creatingAzureManager in case of throttled requests"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not requiring this in master due to #6550.
Thus, this change in this PR will be equivalent to cherry-pick.
@@ -0,0 +1,260 @@ | |||
/* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Part of instance cache refactor.
|
||
lastSizeRefresh time.Time | ||
// Current Size (Number of VMs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Part of instance cache refactor.
@@ -347,7 +379,9 @@ func (m *AzureManager) getFilteredScaleSets(filter []labelAutoDiscoveryConfig) ( | |||
curSize = *scaleSet.Sku.Capacity | |||
} | |||
|
|||
vmss, err := NewScaleSet(spec, m, curSize) | |||
dedicatedHost := scaleSet.VirtualMachineScaleSetProperties != nil && scaleSet.VirtualMachineScaleSetProperties.HostGroup != nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Part of a managed feature.
"strings" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not requiring this in master due to #6863
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this a reference to the fact that you'll have separate PRs for 1.28, 1.29, 1.30 (not yet), and master?
and another PR has already added this line for master
so it will be missing from that particular PR? (aka #7003)
if that's the correct understanding, let's link #7003 here as well + update the comment to "this was already added in master by #6863. will not include this change in #7003"
// (DEPRECATED, DO NOT USE) EnableDetailedCSEMessage defines whether to emit error messages in the CSE error body info | ||
EnableDetailedCSEMessage bool `json:"enableDetailedCSEMessage,omitempty" yaml:"enableDetailedCSEMessage,omitempty"` | ||
// (DEPRECATED, DO NOT USE) EnableForceDelete defines whether to enable force deletion on the APIs | ||
EnableForceDelete bool `json:"enableForceDelete,omitempty" yaml:"enableForceDelete,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(for EnableForceDelete) Not requiring this in master due to #6447
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still mark as deprecation due to unknown future, might be removed later.
@@ -26,13 +26,18 @@ import ( | |||
"time" | |||
|
|||
"github.com/Azure/go-autorest/autorest/azure" | |||
"k8s.io/apimachinery/pkg/util/wait" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not requiring this in master due to #6550
instancesRefreshJitter: az.config.VmssVmsCacheJitter, | ||
}, | ||
|
||
enableForceDelete: az.config.EnableForceDelete, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not requiring this in master due to #6447
@@ -181,30 +247,25 @@ func (scaleSet *ScaleSet) getCurSize() (int64, error) { | |||
return scaleSet.curSize, nil | |||
} | |||
|
|||
// GetScaleSetSize gets Scale Set size. | |||
func (scaleSet *ScaleSet) GetScaleSetSize() (int64, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not required in master due to 1.30 fork stop cherry-picking the difference (??)
defer scaleSet.sizeMutex.Unlock() | ||
|
||
// setScaleSetSize sets ScaleSet size. | ||
func (scaleSet *ScaleSet) setScaleSetSize(size int64, delta int) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not required in master due to 1.30 fork stop cherry-picking the difference (??). But just method visibility, should not have a problem.
@@ -490,7 +601,18 @@ func (scaleSet *ScaleSet) DeleteNodes(nodes []*apiv1.Node) error { | |||
ref := &azureRef{ | |||
Name: node.Spec.ProviderID, | |||
} | |||
refs = append(refs, ref) | |||
|
|||
if node.Annotations[cloudprovider.FakeNodeReasonAnnotation] == cloudprovider.FakeNodeUnregistered { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not required in master due to 1.30 fork stop cherry-picking the difference (??). This one might worth looking into---later.
node := apiv1.Node{} | ||
nodeName := fmt.Sprintf("%s-asg-%d", scaleSetName, rand.Int63()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trivial changes early in this function
@@ -122,17 +138,14 @@ func buildNodeFromTemplate(scaleSetName string, template compute.VirtualMachineS | |||
node.Status.Capacity[apiv1.ResourceCPU] = *resource.NewQuantity(vcpu, resource.DecimalSI) | |||
// isNPSeries returns if a SKU is an NP-series SKU | |||
// SKU API reports GPUs for NP-series but it's actually FPGAs | |||
if !isNPSeries(*template.Sku.Name) { | |||
if isNPSeries(*template.Sku.Name) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems to add/fix support for FPGA?
Part of "Realign azure template labels with AKS changes" (so as the rest of this file)
) | ||
|
||
func buildInstanceOS(template compute.VirtualMachineScaleSet) string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really changed, just moved to line 228 133.
before I approve:
|
f67a5d7
to
d0cfa6c
Compare
Don't think I can run some tests until |
/ok-to-test |
@comtalyst: The
Use
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
@comtalyst: The following test failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
@comtalyst would you open a separate PR for master branch? |
Yes. In fact is one on each version, including master: #7003, #7076, #7067, #7075. |
/lgtm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
/approve
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: comtalyst, feiskyer, tallaxes The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Could you fix the test failures of pull-cluster-autoscaler-e2e-azure in a separate PR? |
7f4b7fa
into
kubernetes:cluster-autoscaler-release-1.29
That test doesn't support release branches yet. Running it here was a mistake. |
What type of PR is this?
/kind cleanup
What this PR does / why we need it:
"Refactor" as a part of fork-upstream (managed-selfhosted) realignment. Should not have any breaking changes.
This codebases realignment will simplify the logistics between the two, cutting a significant portion of maintenance cost.
There will be a separate effort focusing on improve code quality, rather than realigning codebase.
Which issue(s) this PR fixes:
Fixes #
Special notes for your reviewer:
All are in cloudprovider/azure except go.mod, go.sum, and vendor.
See comments (as annotations) for explanation of each changes.
But, overall:
EnableForceDelete
(already in master),EnableDetailedCSEMessage
,GetVmssSizeRefreshPeriod
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: