ProviderID set by capi infra providers should match the one set by the controller manager cloud-provider #4526

enxebre · 2021-04-26T09:54:37Z

What steps did you take and what happened:
In the existing providerID logic assumptions, cloudProvider and ID is what is used to compare, skipping the other segments of the providerID. See https://github.com/kubernetes-sigs/cluster-api/blob/master/controllers/noderefutil/providerid.go#L86

This makes the assumption - not necessarily true - that IDs in different regions/zones won't be reused by the cloud provider.

What did you expect to happen:
I'd be in favour of changing the expectation for the cluster-api-providers to set exactly the same providerID the controller manager cloud-provider sets.

We actually ensured this in AWS a while ago kubernetes-sigs/cluster-api-provider-aws#1693.
Though there might be other provider specific reasons why this is not possible I'm not aware of.

Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]

Environment:

Cluster-api version:
Minikube/KIND version:
Kubernetes version: (use kubectl version):
OS (e.g. from /etc/os-release):

/kind bug
[One or more /area label. See https://github.com/kubernetes-sigs/cluster-api/labels?q=area for the list of labels]

The text was updated successfully, but these errors were encountered:

enxebre · 2021-04-26T09:55:33Z

/assign

sbueringer · 2021-04-26T15:52:48Z

Seems reasonable to me, but I don't know about other providers.

I'm also not sure if this would to problems with CAPO providerIDs?

They currently look like this in CAPO and in the cloud provider openstack: openstack:///e85a1e5e-0340-423a-be12-23d3c52c9e10

(EDIT: for completeness, in OpenStack it's just openstack:/// + server id in OpenStack (which is a UUID))
I'm wondering if we're missing a slash there :)
xref:
https://github.com/kubernetes-sigs/cluster-api/blob/master/controllers/noderefutil/providerid.go#L27

CecileRobertMichon · 2021-04-26T16:24:03Z

What are you proposing the change be in cluster-api itself?

The providerIDs definitely need to be consistent with the ones expected by cloud-provider, we actually ran into cluster-autoscaler issues with the Azure provider recently because of an extra slash in the ID (kubernetes-sigs/cluster-api-provider-azure#1293). kubernetes-sigs/cluster-api-provider-azure#655 was merged a long time ago to make it "consistent" but it actually wasn't right because it assumed the format was the same as AWS, which isn't true because Azure has an extra leading slash in the ID.

This is what a providerID looks like in Azure right now: azure:///subscriptions/85d99e6d-f6d6-408f-a9f1-b7a97237d5c4/resourceGroups/default-template/providers/Microsoft.Compute/virtualMachines/default-template-control-plane-fhrvh. Note that we obtain this by doing azure:// + resource ID, unlike AWS which does aws:/// + resourceID (Azure resource ID starts with /).

enxebre · 2021-04-26T17:43:28Z

Thanks for that context @CecileRobertMichon.

What are you proposing the change be in cluster-api itself?

If we agree on "The providerIDs definitely need to be consistent with the ones expected by cloud-provider"
what I'm proposing is for capi to consider that a contract in the equality check, so i.e to change this method to compare the whole string ->

cluster-api/controllers/noderefutil/providerid.go

Line 87 in bfc6f80

return p.CloudProvider() == o.CloudProvider() && p.ID() == o.ID()

vincepri · 2021-04-26T17:50:08Z

IIRC the equality check was checking the entire string in the past, but we had to change it to match only on CloudProvider and the identifier given that some of the ProviderID information might be missing when the infrastructure provider provisions the machine and the cloud provider assigns the identifier later on.

By contract, the ProviderID's ID part (last chunk after /) should be unique, is the uniqueness not being guaranteed across multiple deployments? That sounds like an infrastructure provider issue that should be tackled separately.

The comparison method change proposed is also a breaking change, and I'm quite sure most infrastructure providers would break.

enxebre · 2021-04-27T09:46:53Z

IIRC the equality check was checking the entire string in the past, but we had to change it to match only on CloudProvider and the identifier given that some of the ProviderID information might be missing when the infrastructure provider provisions the machine and the cloud provider assigns the identifier later on.

This is a good point, we should probably revisit and verify this is still the case.

By contract, the ProviderID's ID part (last chunk after /) should be unique, is the uniqueness not being guaranteed across multiple deployments? That sounds like an infrastructure provider issue that should be tackled separately.

Isn't this contract something we just assumed to be true for convenience because of the limitation you describe above but there's actually no reason nor guarantees for Cloud providers to necessarily satisfy this?

My concern is that uniqueness might not be necessarily the case for all cloud providers. E.g It seems to not be the case in GCP where you can get same ID in different zones https://cloud.google.com/compute/docs/instances/verifying-instance-identity
Why would this be an "infrastructure provider issue"? Our equality check would be the one wrongly returning true for instances with legit ProviderIDs.

Happy to close this if this concern proves to not be justified.

vincepri · 2021-07-06T17:26:22Z

/milestone Next

vincepri · 2021-07-06T17:27:25Z

/lifecycle backlog

LochanRn · 2021-07-30T09:53:07Z

@enxebre @vincepri any progress on this issue ?

vincepri · 2021-08-11T17:52:05Z

/assign @alexeldeib @randomvariable

k8s-triage-robot · 2021-11-09T18:21:27Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot · 2021-12-09T19:05:11Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot · 2022-01-08T19:27:10Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue or PR with /reopen
Mark this issue or PR as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

k8s-ci-robot · 2022-01-08T19:27:23Z

@k8s-triage-robot: Closing this issue.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue or PR with /reopen

Mark this issue or PR as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

alexeldeib · 2022-03-29T21:24:52Z

/reopen
/unassign
/remove-lifecycle rotten

still valid I think, never tackled this unfortunately

k8s-ci-robot · 2022-03-29T21:25:04Z

@alexeldeib: Reopened this issue.

In response to this:

/reopen
/unassign
/remove-lifecycle rotten

still valid I think, never tackled this unfortunately

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

alexeldeib · 2022-04-04T22:33:30Z

I think comparing the full string as suggested in #4526 (comment) is correct

By contract, the ProviderID's ID part (last chunk after /) should be unique, is the uniqueness not being guaranteed across multiple deployments? That sounds like an infrastructure provider issue that should be tackled separately.

Isn't this contract something we just assumed to be true for convenience because of the limitation you describe above but there's actually no reason nor guarantees for Cloud providers to necessarily satisfy this?

My concern is that uniqueness might not be necessarily the case for all cloud providers. E.g It seems to not be the case in GCP where you can get same ID in different zones https://cloud.google.com/compute/docs/instances/verifying-instance-identity
Why would this be an "infrastructure provider issue"? Our equality check would be the one wrongly returning true for instances with legit ProviderIDs.

+1 as the assumptions made here are totally false for azure, these are not even unique with multiple VMSS in same region. every VMSS has instances identified by integers unique to that scaleset only (i.e., reused for every VMSS) starting at 0.

/help

k8s-ci-robot · 2022-04-04T22:33:31Z

@alexeldeib:
This request has been marked as needing help from a contributor.

Guidelines

Please ensure that the issue body includes answers to the following questions:

Why are we solving this issue?
To address this issue, are there any code changes? If there are code changes, what needs to be done in the code and what places can the assignee treat as reference points?
Does this issue have zero to low barrier of entry?
How can the assignee reach out to you for help?

For more details on the requirements of such an issue, please see here and ensure that they are met.

If this request no longer meets these requirements, the label can be removed
by commenting with the /remove-help command.

In response to this:

I think comparing the full string as suggested in #4526 (comment) is correct

By contract, the ProviderID's ID part (last chunk after /) should be unique, is the uniqueness not being guaranteed across multiple deployments? That sounds like an infrastructure provider issue that should be tackled separately.

Isn't this contract something we just assumed to be true for convenience because of the limitation you describe above but there's actually no reason nor guarantees for Cloud providers to necessarily satisfy this?

My concern is that uniqueness might not be necessarily the case for all cloud providers. E.g It seems to not be the case in GCP where you can get same ID in different zones https://cloud.google.com/compute/docs/instances/verifying-instance-identity
Why would this be an "infrastructure provider issue"? Our equality check would be the one wrongly returning true for instances with legit ProviderIDs.

+1 as the assumptions made here are totally false for azure, these are not even unique with multiple VMSS in same region. every VMSS has instances identified by integers unique to that scaleset only (i.e., reused for every VMSS) starting at 0.

/help

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

jackfrancis · 2022-04-04T22:46:13Z

/assign

k8s-triage-robot · 2022-07-03T22:54:11Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

fabriziopandini · 2022-07-04T18:11:28Z

/lifecycle frozen

CecileRobertMichon · 2022-07-25T17:01:20Z

/reopen

#6971 didn't actually fix this, we'll want #6412 to fully fix this issue

k8s-ci-robot · 2022-07-25T17:01:30Z

@CecileRobertMichon: Reopened this issue.

In response to this:

/reopen

#6971 didn't actually fix this, we'll want #6412 to fully fix this issue

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

fabriziopandini · 2022-09-30T19:54:52Z

/triage accepted

k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Apr 26, 2021

enxebre mentioned this issue Apr 26, 2021

🌱 Add providerID index to get nodes #4521

Merged

k8s-ci-robot assigned enxebre Apr 26, 2021

k8s-ci-robot added this to the Next milestone Jul 6, 2021

enxebre mentioned this issue Jul 12, 2021

fix: provider ID should use full ID for comparison #4913

Closed

alexeldeib mentioned this issue Jul 23, 2021

Azure ManagedMachinePool provider id scheme is incompatible with capi kubernetes-sigs/cluster-api-provider-azure#1503

Closed

k8s-ci-robot assigned alexeldeib and randomvariable Aug 11, 2021

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 9, 2021

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Dec 9, 2021

k8s-ci-robot closed this as completed Jan 8, 2022

alexeldeib mentioned this issue Mar 29, 2022

Graduate AzureManagedCluster out of experimental kubernetes-sigs/cluster-api-provider-azure#2204

Closed

k8s-ci-robot unassigned alexeldeib Mar 29, 2022

k8s-ci-robot reopened this Mar 29, 2022

k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Mar 29, 2022

k8s-ci-robot added the help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. label Apr 4, 2022

k8s-ci-robot assigned jackfrancis Apr 4, 2022

jackfrancis mentioned this issue Apr 12, 2022

⚠️ Machine ProviderID equality is now strictly enforced #6412

Merged

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 3, 2022

k8s-ci-robot added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jul 4, 2022

CecileRobertMichon mentioned this issue Jul 22, 2022

🐛 Fix machinepool instance id bug #6971

Merged

k8s-ci-robot closed this as completed in #6971 Jul 25, 2022

k8s-ci-robot reopened this Jul 25, 2022

fabriziopandini added the triage/accepted Indicates an issue or PR is ready to be actively worked on. label Jul 29, 2022

fabriziopandini removed this from the Next milestone Jul 29, 2022

fabriziopandini removed the triage/accepted Indicates an issue or PR is ready to be actively worked on. label Jul 29, 2022

jackfrancis mentioned this issue Aug 9, 2022

providerId in lowerCase does not match the providerId on the nodes kubernetes-sigs/cluster-api-provider-azure#2533

Closed

k8s-ci-robot added the triage/accepted Indicates an issue or PR is ready to be actively worked on. label Sep 30, 2022

k8s-ci-robot closed this as completed in #6412 Oct 18, 2022

enxebre mentioned this issue Apr 10, 2023

Allow provider id regex matching to support more provider formats #8485

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ProviderID set by capi infra providers should match the one set by the controller manager cloud-provider #4526

ProviderID set by capi infra providers should match the one set by the controller manager cloud-provider #4526

enxebre commented Apr 26, 2021

enxebre commented Apr 26, 2021

sbueringer commented Apr 26, 2021 •

edited

Loading

CecileRobertMichon commented Apr 26, 2021

enxebre commented Apr 26, 2021

vincepri commented Apr 26, 2021

enxebre commented Apr 27, 2021

vincepri commented Jul 6, 2021

vincepri commented Jul 6, 2021

LochanRn commented Jul 30, 2021

vincepri commented Aug 11, 2021

k8s-triage-robot commented Nov 9, 2021

k8s-triage-robot commented Dec 9, 2021

k8s-triage-robot commented Jan 8, 2022

k8s-ci-robot commented Jan 8, 2022

alexeldeib commented Mar 29, 2022

k8s-ci-robot commented Mar 29, 2022

alexeldeib commented Apr 4, 2022

k8s-ci-robot commented Apr 4, 2022

jackfrancis commented Apr 4, 2022

k8s-triage-robot commented Jul 3, 2022

fabriziopandini commented Jul 4, 2022

CecileRobertMichon commented Jul 25, 2022

k8s-ci-robot commented Jul 25, 2022

fabriziopandini commented Sep 30, 2022

ProviderID set by capi infra providers should match the one set by the controller manager cloud-provider #4526

ProviderID set by capi infra providers should match the one set by the controller manager cloud-provider #4526

Comments

enxebre commented Apr 26, 2021

enxebre commented Apr 26, 2021

sbueringer commented Apr 26, 2021 • edited Loading

CecileRobertMichon commented Apr 26, 2021

enxebre commented Apr 26, 2021

vincepri commented Apr 26, 2021

enxebre commented Apr 27, 2021

vincepri commented Jul 6, 2021

vincepri commented Jul 6, 2021

LochanRn commented Jul 30, 2021

vincepri commented Aug 11, 2021

k8s-triage-robot commented Nov 9, 2021

k8s-triage-robot commented Dec 9, 2021

k8s-triage-robot commented Jan 8, 2022

k8s-ci-robot commented Jan 8, 2022

alexeldeib commented Mar 29, 2022

k8s-ci-robot commented Mar 29, 2022

alexeldeib commented Apr 4, 2022

k8s-ci-robot commented Apr 4, 2022

Guidelines

jackfrancis commented Apr 4, 2022

k8s-triage-robot commented Jul 3, 2022

fabriziopandini commented Jul 4, 2022

CecileRobertMichon commented Jul 25, 2022

k8s-ci-robot commented Jul 25, 2022

fabriziopandini commented Sep 30, 2022

sbueringer commented Apr 26, 2021 •

edited

Loading