Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor infra creation to improve the overall infra setup time #1869

Merged

Conversation

dharaneeshvrd
Copy link
Contributor

@dharaneeshvrd dharaneeshvrd commented Jul 3, 2024

What this PR does / why we need it:

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #1837

Special notes for your reviewer:

With the changes, infra creation takes around ~8 mins.
Most time taking resource is DHCP server VM.

/area provider/ibmcloud

  1. Please confirm that if this PR changes any image versions, then that's the sole change this PR makes.

Release note:


@k8s-ci-robot k8s-ci-robot added area/provider/ibmcloud Issues or PRs related to ibmcloud provider cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Jul 3, 2024
@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jul 3, 2024
Copy link

netlify bot commented Jul 3, 2024

Deploy Preview for kubernetes-sigs-cluster-api-ibmcloud ready!

Name Link
🔨 Latest commit fe38345
🔍 Latest deploy log https://app.netlify.com/sites/kubernetes-sigs-cluster-api-ibmcloud/deploys/66c8229d701b310008c1b317
😎 Deploy Preview https://deploy-preview-1869--kubernetes-sigs-cluster-api-ibmcloud.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@dharaneeshvrd dharaneeshvrd force-pushed the rearrange-infra-creation branch from b724976 to 21ca9b0 Compare July 3, 2024 04:19
Copy link
Contributor

@Karthik-K-N Karthik-K-N left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this PR, Overall approach looks good to me, Whats the time difference between old vs new approach?


powerVSCluster := clusterScope.IBMPowerVSCluster
func (r *IBMPowerVSClusterReconciler) reconcilePowerVSResources(clusterScope *scope.PowerVSClusterScope, updatePowerVSCluster *updatePowerVSCluster, ch chan reconcileResult, wg *sync.WaitGroup) {
defer wg.Done()
// reconcile PowerVS service instance
clusterScope.Info("Reconciling PowerVS service instance")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since PowerVS and VPC resource creations will happen concurrently, we can prefix log with some specific keys so we can trace the log of individual flow, something like
vpcLog := clusterScope.WithName("vpc")
vpcLog.Info("Creating vpc) , I hope it will work

}

var networkReady, loadBalancerReady bool
for _, cond := range clusterScope.IBMPowerVSCluster.Status.Conditions {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By this time network should be ready right, When do you think it may not be ready also if its not ready we cannot even attach with TransitGateway right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we consider moving this check before calling ReconcileTransitGateway to avoid failure?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With transit gateway we are directly attaching the powervs with CRN right? It is not dependent on the network. And I m not requeuing when network and load balancer is not ready since it would block the TG creation even though it is not dependant for the TG. But before proceeding to retrieve the VPC LB details and set the overall Ready status as true, wanted to validate their status has reached Ready, that's why added the condition to validate both of these resource's Ready status.

@dharaneeshvrd
Copy link
Contributor Author

dharaneeshvrd commented Jul 22, 2024

@Karthik-K-N

Whats the time difference between old vs new approach?

Currently it's taking around 15 mins and with these changes it is taking around 8 to 9 mins.

@mkumatag mkumatag added this to the v0.9.0 milestone Jul 23, 2024
@dharaneeshvrd dharaneeshvrd force-pushed the rearrange-infra-creation branch from 21ca9b0 to f70454b Compare July 24, 2024 03:29
@mkumatag
Copy link
Member

mkumatag commented Aug 6, 2024

@dharaneeshvrd as per the discussion, lets get the early testing done with this PR to ensure it is working fine

@mkumatag
Copy link
Member

@dharaneeshvrd as per the discussion, lets get the early testing done with this PR to ensure it is working fine

Any update on this?

/cc @Karthik-K-N

please take a look and give lgtm if no more comments

Copy link
Contributor

@Karthik-K-N Karthik-K-N left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One last question, Otherwise overall LGTM

return
}

conditions.MarkFalse(update.cluster, conditionArgs[0].(capiv1beta1.ConditionType), conditionArgs[1].(string), conditionArgs[2].(capiv1beta1.ConditionSeverity), conditionArgs[3].(string), conditionArgs[4:]...)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't there any better way rather than using interface{} types for conditionArgs? also since we are indexing it, Should we add a check for verifying the min legth to avoid index out of range?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I pass it as a straight types, we would get a golint error for exceeding the max number of args which is 5. Also I would need to create a separate func for true and false condition. It would complicate things IMO.

Should we add a check for verifying the min length to avoid index out of range?

Again, need to add a check on many places where we are updating the condition and won't be able to take major action if it errored. Ex here I want to return the main error instead of the error I m getting from this, it does n't look good to me. Scope of this is within the reconcile func and only two possible flow. I think the current way is ok IMO.

@dharaneeshvrd dharaneeshvrd force-pushed the rearrange-infra-creation branch from f70454b to 5b9d799 Compare August 16, 2024 13:28
@dharaneeshvrd
Copy link
Contributor Author

@mkumatag

Any update on this?

I have tested personally and Ashwin Hendre also able to test this and reported the improvement of 12% overall cluster setup time.

Copy link
Contributor

@Karthik-K-N Karthik-K-N left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 19, 2024
Copy link
Member

@mkumatag mkumatag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

few minor nits, otherwise lgtm

powerVSCluster.updateCondition(false, infrav1beta2.NetworkReadyCondition, infrav1beta2.NetworkReconciliationFailedReason, capiv1beta1.ConditionSeverityError, err.Error())
ch <- reconcileResult{reconcile.Result{}, err}
return
} else if requeue {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we aren't doing any requeue here, hence it will be good if we can rename or rewrite the logic to have meaningful name here

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also please add why there is no requeue here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have refactored the code, please take a look!

@dharaneeshvrd dharaneeshvrd force-pushed the rearrange-infra-creation branch from 5b9d799 to fe38345 Compare August 23, 2024 05:48
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 23, 2024
Copy link
Member

@mkumatag mkumatag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 28, 2024
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dharaneeshvrd, mkumatag

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 28, 2024
@k8s-ci-robot k8s-ci-robot merged commit 3eb2b52 into kubernetes-sigs:main Aug 28, 2024
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/provider/ibmcloud Issues or PRs related to ibmcloud provider cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
5 participants