-
Notifications
You must be signed in to change notification settings - Fork 84
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
VPC: Add subnet reconciliation #1970
VPC: Add subnet reconciliation #1970
Conversation
Hi @cjschaef. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
✅ Deploy Preview for kubernetes-sigs-cluster-api-ibmcloud ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM
|
||
// Reconcile Control Plane subnets. | ||
requeue := false | ||
for _, subnet := range subnets { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If no inter dependency, In future for performance optimization we can think of reconciling data and control plane subnets concurrently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is likely possible, if subnet creation is found to take longer than desired I can revisit that. I know Custom Image and LB's are the larger time sinks (in terms of waiting for them to be Ready
, and SG/SGR reconciliation is heavily complex and can take time as well.
return nil | ||
} | ||
|
||
// findOrCreatePublicGateway will attempt to find if there is an existing Public Gateway for a specific zone, for the cluster (in cluster's Resource Group and VPC), or create a new one. Only one Public Gateway is required in each zone, for any subnets in that zone. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For my better understanding, Do you plan to support multiple subnets in same zone, Do you have any use case in mind when we will need to do this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the simple path here, is when using different subnets for Control Plane and Data Plane, distributed across the same zones (one CP subnet per zone and one DP subnet per zone).
Otherwise, if the user desires to include edge subnets, or distribute nodes within smaller subnets IP CIDR's, multiple subnets could be used per zone as well.
/ok-to-test |
// If no Worker subnets were supplied, attempt to create one in each zone. | ||
if s.IBMVPCCluster.Spec.Network.WorkerSubnets == nil || len(s.IBMVPCCluster.Spec.Network.WorkerSubnets) == 0 { | ||
// If neither Control Plane nor Worker subnets were supplied, we rely on both Planes using the same subnet per zone, and we will re-reconcile those subnets below, for IBMVPCCluster Status updates. | ||
if len(s.IBMVPCCluster.Spec.Network.ControlPlaneSubnets) != 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You have mentioned as when both are not supplied rely on same subnets.
Don't you need to compare the len equals to 0 here instead of not equals to?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could reword, but no, in this case if some Control Plane subnets were supplied, we will auto-generate some Worker subnets.
If neither got provided, then Worker subnets will use the Control Plane subnets that were auto-generated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok got it.
cloud/scope/vpc_cluster.go
Outdated
// If we have the subnet name, we can easily check Network Stauts on the subnet's status. | ||
if isControlPlane { | ||
// If we find the subnet in Control Plane subnet status and that it is ready, we can return, with no requeue required. | ||
if subnet, ok := s.NetworkStatus().ControlPlaneSubnets[*subnet.Name]; ok && subnet.Ready { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s.NetworkStatus() != nil
is not required here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that is a good idea to be safe.
cloud/scope/vpc_cluster.go
Outdated
} | ||
} else { | ||
// If we find the subnet in Worker subnet status and that it is ready, we can return, with no requeue required. | ||
if subnet, ok := s.NetworkStatus().WorkerSubnets[*subnet.Name]; ok && subnet.Ready { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as prev comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will update.
cloud/scope/vpc_cluster.go
Outdated
// If we have the subnet name, we can easily check Network Stauts on the subnet's status. | ||
if isControlPlane { | ||
// If we find the subnet in Control Plane subnet status and that it is ready, we can return, with no requeue required. | ||
if subnet, ok := s.NetworkStatus().ControlPlaneSubnets[*subnet.Name]; ok && subnet.Ready { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even if we find it in status, don't we need to check the cloud that its still in desired state?
Please take care of this in all the similar blocks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The discussion I had previously, was that with certain VPC resources, if the details are already in Status, checking the realtime status of said resource would not need to happen. For instance, handling cases where CAPI created a resource, and someone/thing deleted the resource, having CAPI attempt to re-reconcile that case, even though Status had a name and ID (which are no longer valid, like ID) would not be desirable.
As VPC and Subnets, do not typically change status, unlike Load Balancers, the expectation is that once these resources are Ready (or the equivalent VPC status value/constant) it should stay that way until deletion (either requested or done maliciously). Load Balancers, due to updating
being a frequent occurrence, likely would not be able to rely on previous status (instead checking status in realtime).
If this expectation has changed, and CAPI is expected to recreate all VPC resources (if they disappear, API fails to find them, etc.), I can change all resource logic to no rely on Status details (such as ID) if that changes in realtime.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Karthik-K-N @mkumatag could you please share your opinion here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We usually follow this pattern of fetching id from status and checking its actual status in cloud, Its to make sure that the observed resource status is correct and up to date with real world.
I think its better to follow this pattern as its helps in early detection of any potential problems.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should reconciliation completely stop (panic) if a resource cannot be found then?
Or should resources be recreated and update Status with new ID's, etc.?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As of now reconciler will throw error. For reference. Its upto the user to resolve the error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will attempt to refactor, with the expectation that anything in Status that cannot be found in IBM Cloud, etc. results in an continuous error. Other resources will be updated in the same fashion in other PR's then.
cloud/scope/vpc_cluster.go
Outdated
} | ||
|
||
// checkSubnetStatus will check the status of a IBM Cloud Subnet and update the Network Status. | ||
func (s *VPCClusterScope) checkSubnetStatus(subnetDetails *vpcv1.Subnet, isControlPlane bool) (bool, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel we need to rename this if you are going to update the status also as part of this func.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will change to update
.
} | ||
|
||
// Add a tag to the subnet for the cluster. | ||
err = s.TagResource(s.IBMVPCCluster.Name, *subnetDetails.CRN) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hope tagging subnet is tested? We have nt tried before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it does. I don't perform exhaustive testing, due to time constraints, but I test all the reconciliation logic after each set of changes for the standard path. So tagging works on subnets (based on the API I don't think GlobalTagging cares what a resource is, as long as it has a CRN), even though the UI leaves a lot to be desired in terms of tagging support.
# ibmcloud resource search "tags:\"us-east-capi-subnet-1-v2vzf\"" | grep "Name\|Resource Type"
Name: us-east-capi-subnet-1-v2vzf-pgateway-us-east-1
Resource Type: public-gateway
Name: us-east-capi-subnet-1-v2vzf-subnet-compute-us-east-1
Resource Type: subnet
Name: us-east-capi-subnet-1-v2vzf-subnet-control-plane-us-east-1
Resource Type: subnet
Name: us-east-capi-subnet-1-v2vzf-pgateway-us-east-2
Resource Type: public-gateway
Name: us-east-capi-subnet-1-v2vzf-subnet-control-plane-us-east-2
Resource Type: subnet
Name: us-east-capi-subnet-1-v2vzf-subnet-compute-us-east-2
Resource Type: subnet
Name: us-east-capi-subnet-1-v2vzf-pgateway-us-east-3
Resource Type: public-gateway
Name: us-east-capi-subnet-1-v2vzf-subnet-compute-us-east-3
Resource Type: subnet
Name: us-east-capi-subnet-1-v2vzf-subnet-control-plane-us-east-3
Resource Type: subnet
Name: us-east-capi-subnet-1-v2vzf-rhcos
Resource Type: image
Name: us-east-capi-subnet-1-v2vzf-vpc
Resource Type: vpc
43a3472
to
5828b2a
Compare
I don't see this error is local linting, and other instances don't appear to fail
Going to assume this is a test flake and get the test to rebuild. |
8fb2ad3
to
f0f141c
Compare
ah, think I found the error and have fixed it now. |
} | ||
|
||
options := &vpcv1.CreateSubnetOptions{} | ||
options.SetSubnetPrototype(&vpcv1.SubnetPrototype{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently in existing VPC flow we create subnet with these properties https://github.com/Karthik-K-N/cluster-api-provider-ibmcloud/blob/83e571fe6e85a6edd94d46b762a1844f9f7dd195/cloud/scope/cluster.go#L219-L236. Do you think its not required to set CIDR block or is it not required when you set TotalIpv4AddressCount?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like the two are mutually exclusive (not completely sure though), so you are able to define one or the other (total IPs or a CIDR).
Given the history we have, I opted to keep using the total IP's for now, but once we consider allowing this to be configurable, we could provide both as options (total IP's or CIDR's).
f0f141c
to
baa81c8
Compare
cloud/scope/vpc_cluster.go
Outdated
if _, ok := subnetMap[*subnet.Name]; ok { | ||
subnetName = subnet.Name | ||
} | ||
} else if subnetID != nil && subnet.ID != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are you comparing subnetID != nil
here?
Are n't you defining this few lines above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I'll clean that up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
if subnetDetails == nil || subnetDetails.ID == nil || subnetDetails.CRN == nil { | ||
return fmt.Errorf("error failed creating subnet: %s", *subnet.Name) | ||
} | ||
|
||
// Initially populate subnet's status. | ||
resourceStatus := &infrav1beta2.ResourceStatus{ | ||
ID: *subnetDetails.ID, | ||
Name: subnetDetails.Name, | ||
Ready: false, | ||
} | ||
if isControlPlane { | ||
s.SetResourceStatus(infrav1beta2.ResourceTypeControlPlaneSubnet, resourceStatus) | ||
} else { | ||
s.SetResourceStatus(infrav1beta2.ResourceTypeWorkerSubnet, resourceStatus) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we use updateSubnetStatus()
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's better to default during the initial population and update on followup loops.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And to be honest, it doesn't matter anymore, we will have a followup loop after subnet(s) creation, so we will callback to updateSubnetStatus
on the next iteration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, what I meant is that I don't see a code difference here from line 986 to 1000 comparing to updateSubnetStatus()
func.
So why can't we just call that func instead of these dup lines?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code diff is Ready: false
is hardcoded here, compared with the function lookup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Won't that func sets the ready as false if you pass the subnet details from here? Since it won't be available immediately after creation.
IMHO we can just use that func here instead of building and setting the status from here. It looks duplicate to me.
@Karthik-K-N can you please see whether my opinion making sense here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if subnetDetails == nil || subnetDetails.ID == nil || subnetDetails.CRN == nil { | |
return fmt.Errorf("error failed creating subnet: %s", *subnet.Name) | |
} | |
// Initially populate subnet's status. | |
resourceStatus := &infrav1beta2.ResourceStatus{ | |
ID: *subnetDetails.ID, | |
Name: subnetDetails.Name, | |
Ready: false, | |
} | |
if isControlPlane { | |
s.SetResourceStatus(infrav1beta2.ResourceTypeControlPlaneSubnet, resourceStatus) | |
} else { | |
s.SetResourceStatus(infrav1beta2.ResourceTypeWorkerSubnet, resourceStatus) | |
} | |
if subnetDetails == nil || subnetDetails.ID == nil || subnetDetails.CRN == nil { | |
return fmt.Errorf("error failed creating subnet: %s", *subnet.Name) | |
} | |
return s.updateSubnetStatus(subnetDetails, isControlPlane) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure whats will be the value for these subnetDetails
, for the newly created subnet
but if either subnetDetails.Status is nil or subnetDetails.Status not vpcv1.SubnetStatusAvailableConst then anyhow we are setting ready as false. so we can reuse I believe.
subnetDetails.Status != nil && *subnetDetails.Status == string(vpcv1.SubnetStatusAvailableConst)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If create succeeds, it would populate and give subnetDetails properly or else it will throw an error.
Not sure in what case it would return err as nil and subnetDetails also as nil.
} | ||
|
||
// Otherwise, if these is an ID or name, attempt to lookup the subnet and update status as necessary. | ||
if subnet.ID != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you found an existing subnet with id or name, wondering how are you planning to differentiate this during delete phase? Just asking for my understanding!
If it via tags, hope all the resources involved are having CRN like you mentioned, to attach the tags.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At this moment, I don't have solid plans for deletion logic. Tagging is going to likely be involved/required to properly handle all of the edge cases involved with bring your own 'resource' (vpc, subnet, image, etc.).
Yes, I have already gone through and tested the tagging required for the resource CAPI is creating, but deletion has not been a focus, given time constraints.
baa81c8
to
f670ee5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM
return nil, err | ||
} | ||
if subnet == nil { | ||
return nil, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wondering if we can return some custom error here for better consumption?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
cloud/scope/vpc_cluster.go
Outdated
var subnets []infrav1beta2.Subnet | ||
var err error | ||
// If no ControlPlane Subnets were supplied, we default to create one in each availability zone of the region. | ||
if s.IBMVPCCluster.Spec.Network.ControlPlaneSubnets == nil || len(s.IBMVPCCluster.Spec.Network.ControlPlaneSubnets) == 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nil check is not required for the ControlPlaneSubnets
as it is a slice and len function will take care of that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
// For Subnets, we collect all of the required subnets, for each Plane, and reconcile them individually. Requeing if one is missing or just created. Reconciliation is attempted on all subnets each loop, to prevent single subnet creation per reconciliation loop. | ||
func (s *VPCClusterScope) ReconcileSubnets() (bool, error) { | ||
var subnets []infrav1beta2.Subnet | ||
var err error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see pointing in declaring this err variable here, can be safely removed and move the declaration wherever needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe I need this declared here, so that when assigning subnets
below, it is not redefined. The linting at least requires the declarations first, before assigning.
cloud/scope/vpc_cluster.go
Outdated
} | ||
|
||
// If no Worker subnets were supplied, attempt to create one in each zone. | ||
if s.IBMVPCCluster.Spec.Network.WorkerSubnets == nil || len(s.IBMVPCCluster.Spec.Network.WorkerSubnets) == 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above, remove the nil check
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
cloud/scope/vpc_cluster.go
Outdated
} else { | ||
// If no ID or name was provided, that is an error to be raised. One or the other must be specified when subnets are supplied. | ||
return false, fmt.Errorf("error subnet has no defined id or name, one is required") | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this can be checked in the beginning and return early failure
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I think that is a safe expectation. updated
} else if subnetDetails != nil { | ||
// Update status if subnet was found. | ||
return s.updateSubnetStatus(subnetDetails, isControlPlane) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updateSubnetStatus
called multiple times, lets optimise later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There might be a slight improvement possible, but the complexity of lookup calls, dependent on which fields/values are defined, may make the logic more complex to reduce the duplication of updateSubnetStatus
.
I could review once VPC support changes are completed.
cloud/scope/vpc_cluster.go
Outdated
} | ||
|
||
// TODO(cjschaef): Move to webhook validation. | ||
if subnet.Zone == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
check and return early
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
cloud/scope/vpc_cluster.go
Outdated
var ipCount int64 = 256 | ||
// We currnetly only support IPv4 | ||
ipVersion := "ipv4" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can this be moved to constant section in the debugging of the file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I moved to consts.
Add support to reoncile VPC subnets for the new v2 VPC Infrastructure reoncile logic. Related: kubernetes-sigs#1896
f670ee5
to
6689ccf
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: cjschaef, mkumatag The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Add support to reoncile VPC subnets for the new v2 VPC Infrastructure reoncile logic.
Related: #1896
What this PR does / why we need it:
Add new support for reconciling VPC subnets for VPC Infra
Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged):Fixes #
Special notes for your reviewer:
/area provider/ibmcloud
Release note: