Add long running operation types, conditions, and helpers #1610

CecileRobertMichon · 2021-08-16T17:06:30Z

What type of PR is this?
/kind feature

What this PR does / why we need it: This is the first PR to implement #1541. It implements LongRunningOperationStates, additional conditions, and async resource creator and deleter interfaces. It enables the async reconcile and delete for 3 services as a POC: resource groups, vnets, and security groups. Those 3 were chosen because they show how this would work across services with different levels of complexity (groups is very simple, vnets cares about managed vs. unmanaged, and NSG needs to merge with the existing state).

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #

Special notes for your reviewer:

Please confirm that if this PR changes any image versions, then that's the sole change this PR makes.

TODOs:

unit tests
update spec with Result()
fix BYO vnet scenarios

Release note:

Add long-running operation types, conditions, and helpers

api/v1alpha4/types.go

util/reconciler/defaults.go

shysank

I did one pass over the pr, and I am tired 🙂 Overall the approach looks good to me. The abstractions are well thought out, and the final api in async/helper.go looks neat 👏 👏
Just a couple of nits, and one suggestion for handling clusterCache.

For moving forward, I'd suggest splitting the pr into smaller chunks. Ideally, We'd want to get all the async framework changes first with existing machine pool implementation and then add services one by one. This will be easier to review, and we can run some e2e tests, and see how the timeouts are working.

azure/scope/cluster.go

azure/scope/machinepool.go

azure/services/groups/client.go

CecileRobertMichon · 2021-08-31T14:54:56Z

@shysank thanks a lot for reviewing, I know it's a lot of change to look at! I tried to centralize the logic as much as possible so that we can reduce duplication and unit test the main logic one time as much as possible.

What I was thinking in terms of splitting the PRs was:

this PR which adds: async helper with logic and tests, new types with conversion, future handling utils and tests, and applies async to 3 services to demonstrate the functionality
(why 3? because it shows how the logic can we abstracted for the different types of services. why these 3? because they are some of the "simple" ones but still show some of the edge cases like BYO vnet/nsg & dealing existing NSG rules)
A PR that switches scalesets to the new abstractions (I didn't do this right away because it might require a bit of refactoring of scalesets and machine pool scope)
multiple PRs that enable this service per service (the changes would be pretty small for each service), with tests for each service
Once all the services are done, a PR that changes the overall reconcile loop timeout

WDYT?

azure/services/virtualnetworks/virtualnetworks.go

shysank · 2021-08-31T18:41:04Z

@shysank thanks a lot for reviewing, I know it's a lot of change to look at! I tried to centralize the logic as much as possible so that we can reduce duplication and unit test the main logic one time as much as possible.

What I was thinking in terms of splitting the PRs was:

this PR which adds: async helper with logic and tests, new types with conversion, future handling utils and tests, and applies async to 3 services to demonstrate the functionality
(why 3? because it shows how the logic can we abstracted for the different types of services. why these 3? because they are some of the "simple" ones but still show some of the edge cases like BYO vnet/nsg & dealing existing NSG rules)

yeah, I understand the motivation of choosing the above 3 services because they were unique in it's own ways. But for me, that is the same reason I found it difficult to review small details that I'm afraid I might have missed. I guess since this pr proves/will prove (after more approvals) that it works well with those scenarios, I thought maybe we could split them up. Having said that, I'd leave it to other community folks for more thoughts on this. Perhaps more 👀 would increase confidence. cc @devigned

A PR that switches scalesets to the new abstractions (I didn't do this right away because it might require a bit of refactoring of scalesets and machine pool scope)

+1

multiple PRs that enable this service per service (the changes would be pretty small for each service), with tests for each service

+1

Once all the services are done, a PR that changes the overall reconcile loop timeout

+1

nader-ziada · 2021-08-31T20:43:49Z

@shysank thanks a lot for reviewing, I know it's a lot of change to look at! I tried to centralize the logic as much as possible so that we can reduce duplication and unit test the main logic one time as much as possible.

What I was thinking in terms of splitting the PRs was:

this PR which adds: async helper with logic and tests, new types with conversion, future handling utils and tests, and applies async to 3 services to demonstrate the functionality
(why 3? because it shows how the logic can we abstracted for the different types of services. why these 3? because they are some of the "simple" ones but still show some of the edge cases like BYO vnet/nsg & dealing existing NSG rules)

A PR that switches scalesets to the new abstractions (I didn't do this right away because it might require a bit of refactoring of scalesets and machine pool scope)

multiple PRs that enable this service per service (the changes would be pretty small for each service), with tests for each service

Once all the services are done, a PR that changes the overall reconcile loop timeout

WDYT?

one thought I have is that it seems you are going to have to do some fixing/refactoring of the scaleset tests to make them compile anyway, so is it worth it to include these changes as well?

CecileRobertMichon · 2021-08-31T21:06:46Z

@nader-ziada @shysank I'm also happy to split this PR into a PR for just types and helpers, and then moving each service to a separate PR (including a separate one for scalesets) if you think that'd be better

nader-ziada · 2021-08-31T21:25:24Z

@CecileRobertMichon if you don't think getting the not compiling services to work in this PR is too much work, then let's keep the original plan, I can see the types and helpers are in separate commits. I was just wondering if getting these to work might be easier if you have to refactor anyways

CecileRobertMichon · 2021-08-31T23:10:10Z

ok so I actually I ended up taking out the service changes for now and just keeping the base interfaces, types and helpers (with unit tests). I think this will make it easier to review and easier for me to work on more tests in the background while this first PR gets reviewed. I will open another PR soon with groups to demonstrate the changes needed to change a service to async.

There are still changes to scalesets in this PR but only the strict minimum to make this work with the changed types and interfaces. I still need to update the scaleset tests to fix references to those new functions, but other than the PR should be in a good place.

CecileRobertMichon · 2021-09-13T17:46:00Z

/assign @devigned @nader-ziada

CecileRobertMichon · 2021-09-13T18:01:55Z

/retest

k8s-ci-robot · 2021-09-13T18:57:32Z

@CecileRobertMichon: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
pull-cluster-api-provider-azure-apidiff	`787dd19`	link	false	`/test pull-cluster-api-provider-azure-apidiff`

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

shysank · 2021-09-13T20:05:04Z

/test pull-cluster-api-provider-azure-e2e-windows

devigned

/lgtm
/approve

k8s-ci-robot · 2021-09-13T20:38:56Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: devigned

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [devigned]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot requested review from juan-lee and nader-ziada August 16, 2021 17:06

k8s-ci-robot added area/provider/azure Issues or PRs related to azure provider sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Aug 16, 2021

CecileRobertMichon commented Aug 16, 2021

View reviewed changes

api/v1alpha4/types.go Outdated Show resolved Hide resolved

CecileRobertMichon commented Aug 16, 2021

View reviewed changes

util/reconciler/defaults.go Outdated Show resolved Hide resolved

CecileRobertMichon force-pushed the async-machines branch from eb12d4f to 04c34fb Compare August 21, 2021 02:54

shysank mentioned this pull request Aug 24, 2021

POC: Parallel reconciliation of azure machine services #1369

Closed

3 tasks

k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 24, 2021

CecileRobertMichon force-pushed the async-machines branch from 04c34fb to 0ce8a02 Compare August 25, 2021 02:30

k8s-ci-robot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Aug 25, 2021

CecileRobertMichon mentioned this pull request Aug 27, 2021

Parallel reconciliation of azure resources #1181

Closed

shysank reviewed Aug 30, 2021

View reviewed changes

azure/scope/cluster.go Outdated Show resolved Hide resolved

azure/scope/machinepool.go Outdated Show resolved Hide resolved

azure/services/groups/client.go Outdated Show resolved Hide resolved

shysank reviewed Aug 31, 2021

View reviewed changes

azure/services/virtualnetworks/virtualnetworks.go Outdated Show resolved Hide resolved

CecileRobertMichon force-pushed the async-machines branch from 8849376 to 590e79d Compare August 31, 2021 18:30

k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 31, 2021

CecileRobertMichon force-pushed the async-machines branch from 342be40 to 34f3399 Compare August 31, 2021 19:25

CecileRobertMichon force-pushed the async-machines branch from 38dab78 to 1d48c0b Compare August 31, 2021 23:06

k8s-ci-robot assigned nader-ziada Sep 13, 2021

devigned approved these changes Sep 13, 2021

View reviewed changes

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 13, 2021

k8s-ci-robot merged commit 946668d into kubernetes-sigs:main Sep 13, 2021

k8s-ci-robot added this to the v0.5 milestone Sep 13, 2021

This was referenced Sep 14, 2021

Make VNet and NSGs reconcile/delete async #1684

Closed

Make route tables reconcile/delete async #1686

Merged

shysank mentioned this pull request Sep 20, 2021

Make azure services async #1702

Closed

23 tasks

This was referenced Nov 5, 2021

Make virtual network peerings reconcile/delete async #1838

Merged

Make disks delete async #1844

Merged

This was referenced Nov 12, 2021

Make availability set reconcile/delete async #1861

Merged

Make NAT Gateway reconcile/delete async #1865

Merged

Make inbound NAT rules reconcile/delete async #1870

Merged

Jont828 mentioned this pull request Nov 23, 2021

Make load balancer reconcile/delete async #1886

Merged

3 tasks

Jont828 mentioned this pull request Dec 10, 2021

Make subnets reconcile/delete async #1914

Merged

3 tasks

CecileRobertMichon mentioned this pull request Dec 14, 2021

Make vnets reconcile/delete async #1921

Merged

3 tasks

This was referenced Dec 18, 2021

Make network interface reconcile/delete async #1939

Merged

Make bastion hosts reconcile/delete async #1941

Merged

This was referenced Mar 17, 2022

Make VM extension reconcile async and move VMSS extension into scaleset service #2177

Merged

[WIP] Make tags reconcile async #2181

Closed

Jont828 mentioned this pull request May 20, 2022

Make public IPs reconcile/delete async #2317

Merged

3 tasks

Jont828 mentioned this pull request Jul 13, 2022

Make agent pools reconcile/delete async #2479

Merged

3 tasks

Jont828 mentioned this pull request Jan 27, 2023

Make scaleset reconcile/delete async #3111

Merged

3 tasks

CecileRobertMichon deleted the async-machines branch February 17, 2023 23:24

Jont828 mentioned this pull request Aug 4, 2023

Make scalesetvms delete async #3799

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add long running operation types, conditions, and helpers #1610

Add long running operation types, conditions, and helpers #1610

CecileRobertMichon commented Aug 16, 2021 •

edited

Loading

shysank left a comment •

edited

Loading

CecileRobertMichon commented Aug 31, 2021

shysank commented Aug 31, 2021

nader-ziada commented Aug 31, 2021

CecileRobertMichon commented Aug 31, 2021

nader-ziada commented Aug 31, 2021

CecileRobertMichon commented Aug 31, 2021

CecileRobertMichon commented Sep 13, 2021

CecileRobertMichon commented Sep 13, 2021

k8s-ci-robot commented Sep 13, 2021 •

edited

Loading

shysank commented Sep 13, 2021

devigned left a comment

k8s-ci-robot commented Sep 13, 2021

Add long running operation types, conditions, and helpers #1610

Add long running operation types, conditions, and helpers #1610

Conversation

CecileRobertMichon commented Aug 16, 2021 • edited Loading

shysank left a comment • edited Loading

Choose a reason for hiding this comment

CecileRobertMichon commented Aug 31, 2021

shysank commented Aug 31, 2021

nader-ziada commented Aug 31, 2021

CecileRobertMichon commented Aug 31, 2021

nader-ziada commented Aug 31, 2021

CecileRobertMichon commented Aug 31, 2021

CecileRobertMichon commented Sep 13, 2021

CecileRobertMichon commented Sep 13, 2021

k8s-ci-robot commented Sep 13, 2021 • edited Loading

shysank commented Sep 13, 2021

devigned left a comment

Choose a reason for hiding this comment

k8s-ci-robot commented Sep 13, 2021

CecileRobertMichon commented Aug 16, 2021 •

edited

Loading

shysank left a comment •

edited

Loading

k8s-ci-robot commented Sep 13, 2021 •

edited

Loading