Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add long running operation types, conditions, and helpers #1610

Merged

Conversation

CecileRobertMichon
Copy link
Contributor

@CecileRobertMichon CecileRobertMichon commented Aug 16, 2021

What type of PR is this?
/kind feature

What this PR does / why we need it: This is the first PR to implement #1541. It implements LongRunningOperationStates, additional conditions, and async resource creator and deleter interfaces. It enables the async reconcile and delete for 3 services as a POC: resource groups, vnets, and security groups. Those 3 were chosen because they show how this would work across services with different levels of complexity (groups is very simple, vnets cares about managed vs. unmanaged, and NSG needs to merge with the existing state).

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #

Special notes for your reviewer:

Please confirm that if this PR changes any image versions, then that's the sole change this PR makes.

TODOs:

  • unit tests
  • update spec with Result()
  • fix BYO vnet scenarios

Release note:

Add long-running operation types, conditions, and helpers

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/feature Categorizes issue or PR as related to a new feature. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Aug 16, 2021
@k8s-ci-robot k8s-ci-robot added area/provider/azure Issues or PRs related to azure provider sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Aug 16, 2021
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 24, 2021
@k8s-ci-robot k8s-ci-robot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Aug 25, 2021
Copy link
Contributor

@shysank shysank left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did one pass over the pr, and I am tired 🙂 Overall the approach looks good to me. The abstractions are well thought out, and the final api in async/helper.go looks neat 👏 👏
Just a couple of nits, and one suggestion for handling clusterCache.

For moving forward, I'd suggest splitting the pr into smaller chunks. Ideally, We'd want to get all the async framework changes first with existing machine pool implementation and then add services one by one. This will be easier to review, and we can run some e2e tests, and see how the timeouts are working.

azure/scope/cluster.go Outdated Show resolved Hide resolved
azure/scope/machinepool.go Outdated Show resolved Hide resolved
azure/services/groups/client.go Outdated Show resolved Hide resolved
@CecileRobertMichon
Copy link
Contributor Author

@shysank thanks a lot for reviewing, I know it's a lot of change to look at! I tried to centralize the logic as much as possible so that we can reduce duplication and unit test the main logic one time as much as possible.

What I was thinking in terms of splitting the PRs was:

  1. this PR which adds: async helper with logic and tests, new types with conversion, future handling utils and tests, and applies async to 3 services to demonstrate the functionality
    (why 3? because it shows how the logic can we abstracted for the different types of services. why these 3? because they are some of the "simple" ones but still show some of the edge cases like BYO vnet/nsg & dealing existing NSG rules)
  2. A PR that switches scalesets to the new abstractions (I didn't do this right away because it might require a bit of refactoring of scalesets and machine pool scope)
  3. multiple PRs that enable this service per service (the changes would be pretty small for each service), with tests for each service
  4. Once all the services are done, a PR that changes the overall reconcile loop timeout

WDYT?

@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 31, 2021
@shysank
Copy link
Contributor

shysank commented Aug 31, 2021

@shysank thanks a lot for reviewing, I know it's a lot of change to look at! I tried to centralize the logic as much as possible so that we can reduce duplication and unit test the main logic one time as much as possible.

What I was thinking in terms of splitting the PRs was:

  1. this PR which adds: async helper with logic and tests, new types with conversion, future handling utils and tests, and applies async to 3 services to demonstrate the functionality
    (why 3? because it shows how the logic can we abstracted for the different types of services. why these 3? because they are some of the "simple" ones but still show some of the edge cases like BYO vnet/nsg & dealing existing NSG rules)

yeah, I understand the motivation of choosing the above 3 services because they were unique in it's own ways. But for me, that is the same reason I found it difficult to review small details that I'm afraid I might have missed. I guess since this pr proves/will prove (after more approvals) that it works well with those scenarios, I thought maybe we could split them up. Having said that, I'd leave it to other community folks for more thoughts on this. Perhaps more 👀 would increase confidence. cc @devigned

  1. A PR that switches scalesets to the new abstractions (I didn't do this right away because it might require a bit of refactoring of scalesets and machine pool scope)

+1

  1. multiple PRs that enable this service per service (the changes would be pretty small for each service), with tests for each service

+1

  1. Once all the services are done, a PR that changes the overall reconcile loop timeout

+1

@nader-ziada
Copy link
Contributor

@shysank thanks a lot for reviewing, I know it's a lot of change to look at! I tried to centralize the logic as much as possible so that we can reduce duplication and unit test the main logic one time as much as possible.

What I was thinking in terms of splitting the PRs was:

  1. this PR which adds: async helper with logic and tests, new types with conversion, future handling utils and tests, and applies async to 3 services to demonstrate the functionality
    (why 3? because it shows how the logic can we abstracted for the different types of services. why these 3? because they are some of the "simple" ones but still show some of the edge cases like BYO vnet/nsg & dealing existing NSG rules)
  2. A PR that switches scalesets to the new abstractions (I didn't do this right away because it might require a bit of refactoring of scalesets and machine pool scope)
  3. multiple PRs that enable this service per service (the changes would be pretty small for each service), with tests for each service
  4. Once all the services are done, a PR that changes the overall reconcile loop timeout

WDYT?

one thought I have is that it seems you are going to have to do some fixing/refactoring of the scaleset tests to make them compile anyway, so is it worth it to include these changes as well?

@CecileRobertMichon
Copy link
Contributor Author

@nader-ziada @shysank I'm also happy to split this PR into a PR for just types and helpers, and then moving each service to a separate PR (including a separate one for scalesets) if you think that'd be better

@nader-ziada
Copy link
Contributor

@CecileRobertMichon if you don't think getting the not compiling services to work in this PR is too much work, then let's keep the original plan, I can see the types and helpers are in separate commits. I was just wondering if getting these to work might be easier if you have to refactor anyways

@CecileRobertMichon
Copy link
Contributor Author

ok so I actually I ended up taking out the service changes for now and just keeping the base interfaces, types and helpers (with unit tests). I think this will make it easier to review and easier for me to work on more tests in the background while this first PR gets reviewed. I will open another PR soon with groups to demonstrate the changes needed to change a service to async.

There are still changes to scalesets in this PR but only the strict minimum to make this work with the changed types and interfaces. I still need to update the scaleset tests to fix references to those new functions, but other than the PR should be in a good place.

@CecileRobertMichon
Copy link
Contributor Author

/assign @devigned @nader-ziada

@CecileRobertMichon
Copy link
Contributor Author

/retest

@k8s-ci-robot
Copy link
Contributor

k8s-ci-robot commented Sep 13, 2021

@CecileRobertMichon: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-cluster-api-provider-azure-apidiff 787dd19 link false /test pull-cluster-api-provider-azure-apidiff

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@shysank
Copy link
Contributor

shysank commented Sep 13, 2021

/test pull-cluster-api-provider-azure-e2e-windows

Copy link
Contributor

@devigned devigned left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: devigned

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 13, 2021
@k8s-ci-robot k8s-ci-robot merged commit 946668d into kubernetes-sigs:main Sep 13, 2021
@k8s-ci-robot k8s-ci-robot added this to the v0.5 milestone Sep 13, 2021
@shysank shysank mentioned this pull request Sep 20, 2021
23 tasks
@Jont828 Jont828 mentioned this pull request Dec 10, 2021
3 tasks
@CecileRobertMichon CecileRobertMichon deleted the async-machines branch February 17, 2023 23:24
@Jont828 Jont828 mentioned this pull request Aug 4, 2023
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/provider/azure Issues or PRs related to azure provider cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants