-
Notifications
You must be signed in to change notification settings - Fork 611
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[occm] Multi region openstack cluster #2595
base: master
Are you sure you want to change the base?
[occm] Multi region openstack cluster #2595
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Hi @sergelogvinov. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
@mdbooth can you take a look on this PR. Thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't taken a deep look into this, but I much prefer this in principal: it only tells cloud-provider things we know to be true. This makes me much more confident that this will continue to work correctly as cloud-provider evolves.
Would you still run multiple CCMs, or switch to a single active CCM?
for _, region := range os.regions { | ||
opt := os.epOpts | ||
opt.Region = region | ||
|
||
compute[region], err = client.NewComputeV2(os.provider, opt) | ||
if err != nil { | ||
klog.Errorf("unable to access compute v2 API : %v", err) | ||
return nil, false | ||
} | ||
|
||
network[region], err = client.NewNetworkV2(os.provider, opt) | ||
if err != nil { | ||
klog.Errorf("unable to access network v2 API : %v", err) | ||
return nil, false | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pierreprinetti how much work is performed when initialising a new service client? Is it local-only, or do we have to go back to keystone?
I might be inclined to intialise this lazily anyway, tbh.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
similar thought, maybe init them until real usage?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've had one issue in proxmox with lazy initialization. The regions cannot exist, and during the rollout of OCCM, it starts without errors. The kubernetes administrator will think that all configuration is correct.
So we can check all regions here and crush if needed. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies for the late response. Building a ProviderClient requires a Keystone roundtrip; building ServiceClients is cheap.
pkg/openstack/instancesv2.go
Outdated
if node.Spec.ProviderID == "" { | ||
return i.getInstanceByName(node) | ||
} | ||
|
||
instanceID, instanceRegion, err := instanceIDFromProviderID(node.Spec.ProviderID) | ||
if err != nil { | ||
return nil, err | ||
return nil, "", err | ||
} | ||
|
||
if instanceRegion == "" { | ||
return i.getInstanceByID(instanceID, node.Labels[v1.LabelTopologyRegion]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd probably be a bit more explicit with where we're looking for stuff here. IIUC there are 2 possible places we can get a specific region from:
- providerID
- LabelTopologyRegion
Both may be unset because the node has not yet been adopted by the node-controller.
providerID may not contain a region either because it was set before we became multi-region, or because it was set by kubelet without a region and it's immutable.
But the end result is that either we know the region or we don't. If we know the region we should look only in that region. If we don't know the region we should look everywhere.
How about logic something like:
instanceID, instanceRegion, err := instanceIDFromProviderID(node.Spec.ProviderID)
..err omitted...
if instanceRegion == "" {
instanceRegion = node.Labels[v1.LabelTopologyRegion]
}
var searchRegions []string
if instanceRegion != "" {
if !slices.Contains(i.regions, instanceRegion) {
return ...bad region error...
}
searchRegions = []string{instanceRegion}
} else {
searchRegions = ..all the regions, preferred first...
}
for region := range searchRegions {
mc := ...
if instanceID != "" {
getInstanceByID()
} else {
getInstanceByName()
}
mc.ObserveRequest()
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, this very very good idea.
I've changed the implementation. But one thought - i cannot trust LabelTopologyRegion
, something can change it and node-lifecycle will remove node (for instance on reboot/upgrade event)...
So i can use LabelTopologyRegion only as prefered-region. And check this region first.
Thanks.
Hi @sergelogvinov |
Thank you for this PR, it is very interesting. Can we have a call/chat in slack #provider-openstack (Serge Logvinov)? |
aee43df
to
fa2bd50
Compare
/ok-to-test |
/ok-to-test |
docs/openstack-cloud-controller-manager/using-openstack-cloud-controller-manager.md
Outdated
Show resolved
Hide resolved
for _, region := range os.regions { | ||
opt := os.epOpts | ||
opt.Region = region | ||
|
||
compute[region], err = client.NewComputeV2(os.provider, opt) | ||
if err != nil { | ||
klog.Errorf("unable to access compute v2 API : %v", err) | ||
return nil, false | ||
} | ||
|
||
network[region], err = client.NewNetworkV2(os.provider, opt) | ||
if err != nil { | ||
klog.Errorf("unable to access network v2 API : %v", err) | ||
return nil, false | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
similar thought, maybe init them until real usage?
fa2bd50
to
46aebfb
Compare
46aebfb
to
6efe3b4
Compare
6efe3b4
to
2ea04e3
Compare
2ea04e3
to
6da4422
Compare
6da4422
to
2f86fa7
Compare
Is anything else we can do here? @jichenjc @mdbooth @kayrus We had conversation how we need initialize the openstack clients for _, region := range os.regions {
opt := os.epOpts
opt.Region = region
compute[region], err = client.NewComputeV2(os.provider, opt)
if err != nil {
klog.Errorf("unable to access compute v2 API : %v", err)
return nil, false
}
network[region], err = client.NewNetworkV2(os.provider, opt)
if err != nil {
klog.Errorf("unable to access network v2 API : %v", err)
return nil, false
} It seems to be a similar process to the one we followed in [Global]
auth-url="https://auth.cloud.openstackcluster.region-default.local/v3"
username="region-default-username"
password="region-default-password"
region="default"
tenant-id="region-default-tenant-id"
tenant-name="region-default-tenant-name"
domain-name="Default"
[Global "region-one"]
auth-url="https://auth.cloud.openstackcluster.region-one.local/v3"
username="region-one-username"
password="region-one-password"
region="one"
tenant-id="region-one-tenant-id"
tenant-name="region-one-tenant-name"
domain-name="Default" Thanks. |
PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
What this PR does / why we need it:
Openstack CCM multi region support, if it has one Identity provider.
Which issue this PR fixes(if applicable):
fixes #1924
Special notes for reviewers:
CCM config changes:
[Global] auth-url=https://auth.openstack.example.com/v3/ region=REGION1 # new param 'regions' can be specified multiple times regions=REGION1 regions=REGION2 regions=REGION3
Optionally can be set in cloud.conf
During the initialization process, OCCM checks for the existence of providerID. If providerID does not exist, it defaults to using
node.name
, as it did previously. Additionally, if the node has the labeltopology.kubernetes.io/region
, OCCM will prioritize using this region as the first one to check. This approach ensures that in the event of a region outage, OCCM can continue to function.In addition, we can assist CCM in locating the node by providing
kubelet
parameters:--provider-id=openstack:///$InstanceID
- InstanceID exists in metadata--provider-id=openstack://$REGION/$InstanceID
- if you can define the region (by default meta server does not have this information)--node-labels=topology.kubernetes.io/region=$REGION
set preferred REGION in label, OCCM will then prioritize searching for the node in this specified regionRelease note: