Parallel reconciliation of azure resources #1181

shysank · 2021-02-17T19:19:04Z

/kind feature
/kind explore

What steps did you take and what happened:

We reconcile serially a bunch of resources in azurecluster and azuremachine reconcilers. Some of these resources have no dependency on each other, and can be reconciled simultaneously. Eg. After a vm is reconciled, vmextensions, tags and roleassignments can be reconciled simultaneously, and potentially save some time.

Describe the solution you'd like

Reconcile mutually exclusive services in parallel using goroutines and channels.

Anything else you would like to add:

This is more of an exploration than an actual requirement.

Environment:

cluster-api-provider-azure version:
Kubernetes version: (use kubectl version):
OS (e.g. from /etc/os-release):

The text was updated successfully, but these errors were encountered:

k8s-ci-robot · 2021-02-17T19:19:05Z

@shysank: The label(s) kind/explore cannot be applied, because the repository doesn't have them

In response to this:

/kind feature
/kind explore

What steps did you take and what happened:

We reconcile serially a bunch of resources in azurecluster and azuremachine reconcilers. Some of these resources have no dependency on each other, and can be reconciled simultaneously. Eg. After a vm is reconciled, vmextensions, tags and roleassignments can be reconciled simultaneously, and potentially save some time.

Describe the solution you'd like

Reconcile these services in parallel using goroutines and channels.

Anything else you would like to add:

This is more of an exploration than an actual requirement.

Environment:

cluster-api-provider-azure version:

Kubernetes version: (use kubectl version):

OS (e.g. from /etc/os-release):

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

CecileRobertMichon · 2021-02-17T19:27:38Z

Another thing we should consider doing more of is #1067 for other resources. A lot of these create operations are long running and we don't need to be block everything else while the resource creates. We can start the create and then exit the controller loop right away and come back later to check if the operation is done. This also allows us to deal with a lot more objects concurrently (imagine you're trying to create 500 nodes, having the controller be able to kickstart the create and then circle back is a lot more efficient than having to wait for each one to be done before exiting the controller loop).

shysank · 2021-02-17T20:00:32Z

Yeah, the ideal solution is probably a combination of both. We could use #1067 for long running operations (vm create), and go routines for relatively shorter ones.

Or another thing we can do is to just timeout all operations longer than 5 seconds (let's say), update future data, and return control. This will mean only the reconciliation during create will be async, the subsequent ones will mostly be synchronous.

fejta-bot · 2021-06-14T20:17:19Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale

fejta-bot · 2021-07-14T20:44:53Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten

shysank · 2021-07-14T20:54:41Z

/remove-lifecycle rotten

arschles · 2021-08-26T18:36:13Z

my $0.02 is that it would be helpful to use goroutines for all operations. if that were done:

all ongoing work could be tracked and reconciled uniformly using tools like errgroup
a generic workflow engine could be built that knows how to parallelize or serialize an arbitrary group of operations requested by the user

this kind of thing would build on #1067. async operations can still happen, but the synchronous boundary would be the goroutine, and if you run N goroutines, then their termination point is when all operations are done, including synchronous ones.

here's what such an API could look like:

type operation struct {
    // the action to take
    action func() error
    // the unique ID of this operation
    id uuid.UUID
    // if set, run this operation after the one identified by this ID
    after uuid.UUID
}

func newAzureMachineOperation(after operation) operation {
    //...
}

// other constructors for operations go here...

type workflowEngine interface{
    submitOperation(ctx context.Context, workflowID uuid.UUID, op operation)
    finalize(ctx context.Context, workflowID uuid.UUID) []error
}

if a single instance of workflowEngine was passed to all reconcilers, they would use a single UUID (e.g. their correlation ID) to submit all of their operations and call finalize when it's time to record the final result of the reconcile.

CecileRobertMichon · 2021-08-26T18:42:01Z

@arschles check out https://github.com/kubernetes-sigs/cluster-api-provider-azure/blob/main/docs/proposals/20210716-async-azure-resource-creation-deletion.md, specifically https://github.com/kubernetes-sigs/cluster-api-provider-azure/blob/main/docs/proposals/20210716-async-azure-resource-creation-deletion.md#ParallelreconciliationofAzureservices (proposal recently merged, implementation is in progress)

arschles · 2021-08-27T22:30:39Z

Thanks @CecileRobertMichon - that was very illuminating! Is it too late to comment on the proposal?

CecileRobertMichon · 2021-08-27T22:48:33Z

@arschles not too late (PR #1610 has not merged yet) but the proposal has already merged so any amendments to the design would have to be through a new proposal PR

arschles · 2021-09-07T22:42:37Z

@CecileRobertMichon great, thanks for letting me know! I'm gonna work to get #1575 done and then I'll drop in and see where this effort is, and how I can help 👍

k8s-triage-robot · 2022-01-26T15:34:12Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

shysank · 2022-01-26T16:16:54Z

@CecileRobertMichon now that we are following async pattern for reconcile/delete of services, can we close this one?

k8s-triage-robot · 2022-02-25T16:49:58Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot · 2022-03-27T17:07:12Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue or PR with /reopen
Mark this issue or PR as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

k8s-ci-robot · 2022-03-27T17:07:22Z

@k8s-triage-robot: Closing this issue.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue or PR with /reopen

Mark this issue or PR as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot added the kind/feature Categorizes issue or PR as related to a new feature. label Feb 17, 2021

shysank changed the title ~~Parallel reconciliation of individual azure resources~~ Parallel reconciliation of azure resources Feb 17, 2021

CecileRobertMichon added this to the v0.5.x milestone Mar 16, 2021

shysank mentioned this issue May 6, 2021

POC: Parallel reconciliation of azure machine services #1369

Closed

3 tasks

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 14, 2021

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jul 14, 2021

k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Jul 14, 2021

CecileRobertMichon mentioned this issue Jul 20, 2021

Add proposal for Async Azure Resource Creation and Deletion #1541

Merged

3 tasks

nader-ziada modified the milestones: v0.5.x, v0.5 Aug 26, 2021

CecileRobertMichon modified the milestones: v0.5, next Oct 28, 2021

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 26, 2022

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Feb 25, 2022

k8s-ci-robot closed this as completed Mar 27, 2022

CecileRobertMichon mentioned this issue Apr 13, 2022

Non-blocking AzureBastion resource creation when using bastionSpec in AzureCluster #2229

Closed

CecileRobertMichon removed this from the next milestone May 4, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parallel reconciliation of azure resources #1181

Parallel reconciliation of azure resources #1181

shysank commented Feb 17, 2021 •

edited

Loading

k8s-ci-robot commented Feb 17, 2021

CecileRobertMichon commented Feb 17, 2021

shysank commented Feb 17, 2021

fejta-bot commented Jun 14, 2021

fejta-bot commented Jul 14, 2021

shysank commented Jul 14, 2021

arschles commented Aug 26, 2021

CecileRobertMichon commented Aug 26, 2021

arschles commented Aug 27, 2021

CecileRobertMichon commented Aug 27, 2021

arschles commented Sep 7, 2021

k8s-triage-robot commented Jan 26, 2022

shysank commented Jan 26, 2022

k8s-triage-robot commented Feb 25, 2022

k8s-triage-robot commented Mar 27, 2022

k8s-ci-robot commented Mar 27, 2022

Parallel reconciliation of azure resources #1181

Parallel reconciliation of azure resources #1181

Comments

shysank commented Feb 17, 2021 • edited Loading

k8s-ci-robot commented Feb 17, 2021

CecileRobertMichon commented Feb 17, 2021

shysank commented Feb 17, 2021

fejta-bot commented Jun 14, 2021

fejta-bot commented Jul 14, 2021

shysank commented Jul 14, 2021

arschles commented Aug 26, 2021

CecileRobertMichon commented Aug 26, 2021

arschles commented Aug 27, 2021

CecileRobertMichon commented Aug 27, 2021

arschles commented Sep 7, 2021

k8s-triage-robot commented Jan 26, 2022

shysank commented Jan 26, 2022

k8s-triage-robot commented Feb 25, 2022

k8s-triage-robot commented Mar 27, 2022

k8s-ci-robot commented Mar 27, 2022

shysank commented Feb 17, 2021 •

edited

Loading