FailureDomain selection order possibly incorrect #679

richardcase · 2020-06-04T09:47:07Z

/kind bug

What steps did you take and what happened:
With #439 we added support for reporting failure domains (i.e. Azure AZs) back to CAPI so that it could then try to spread machines across failure domains (by populating Machine.Spec.FailureDomain). Logic was added to the machine scope to get the failure domain for use during machine reconciliation based on the precedence:

Machine.Spec.FailureDomain
AzureMachine.Spec.FailureDomain
AzureMachine.Spec.AvailabilityZone.ID (This is DEPRECATED)
No AZ

If a region has availability zones these will be reported back to CAPI, CAPI will then choose one of the AZs and set Machine.Spec.FailureDomain (if that field isn't already set). This means that if AZs are explicitly set in AzureMachine.Spec.FailureDomain or AzureMachine.Spec.AvailabilityZone.ID they will be ignored.

This behavior doesn't feel correct.

What did you expect to happen:
I would expect the user to be able to explicitly set an AZ for machine and for that to be honored.

We could change the order to:

AzureMachine.Spec.AvailabilityZone.ID (This is DEPRECATED)
AzureMachine.Spec.FailureDomain
Machine.Spec.FailureDomain

This would allow us to support old definitions and also take advantage of automatic AZ selection if no AZ is explicitly set.

I also think that AzureMachine.Spec.FailureDomain feels a little redundant and we could change the precedence to:

AzureMachine.Spec.AvailabilityZone.ID (This is DEPRECATED)
Machine.Spec.FailureDomain

So if a user wants to explicitly set the AZ for a machine then they use Machine.Spec.FailureDomain

Environment:

cluster-api-provider-azure version: 0.43

richardcase · 2020-06-04T09:47:28Z

/assign

richardcase · 2020-06-04T09:52:07Z

@CecileRobertMichon @devigned @detiber - be good to get your thoughts on this.

detiber · 2020-06-04T15:39:57Z

@richardcase both AzureMachine.Spec.AvailabilityZone.ID and
AzureMachine.Spec.FailureDomain should be considered deprecated in favor of Machine.Spec.FailureDomain.

The addition of AzureMachine.Spec.FailureDomain was required to allow for migrating existing values that were previously set in AzureMachine.Spec.AvailabilityZone.ID to Machine.Spec.FailureDomain automatically for existing users.

The Machine controller will pull the value of .Spec.FailureDomain to populate Machine.Spec.FailureDomain (if it is not already set).

The reason for this was to facilitate the treatment of failure domains as first class across common Cluster API resources. Previously they were only available as an infrastructure provider specific with no real commonality across providers to standardize with.

richardcase · 2020-06-04T15:53:05Z

Ah yes, thanks @detiber. I forgot we discussed and added this to the machine controller:

cluster-api-provider-azure/controllers/azuremachine_controller.go

Line 218 in c0d09b2

if machineScope.AzureMachine.Spec.FailureDomain == nil {

CecileRobertMichon · 2020-06-04T16:27:05Z

@richardcase does that mean there is nothing to do here for now? Maybe just make this more clear in docs? And then eventually remove the AzureMachine fields when we move to v1alpha4?

richardcase · 2020-06-05T08:19:56Z

@CecileRobertMichon - yes i think a small docs update for now and then removal of field in v1alpha4 would work.

I'll do the docs update and reference this issue. Should i then raise an issue for removal in v1alpha4 or will it be covered by #618?

CecileRobertMichon · 2020-06-05T16:38:31Z

I commented #618 (comment) I think that should be enough. Thanks!

alexeldeib · 2020-07-22T21:35:18Z

/kind documentation
/kind cleanup
/remove-kind bug

fejta-bot · 2020-10-20T21:53:58Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

CecileRobertMichon · 2020-11-03T23:46:34Z

/remove-lifecycle stale

@richardcase did you still want to update the docs for this one?

richardcase · 2020-12-03T07:57:24Z

Sorry @CecileRobertMichon i completely forgot about this, yes i will update the dcos for this.

fejta-bot · 2021-03-03T08:04:17Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale

richardcase · 2021-03-03T14:05:02Z

I've still not got around to this, sorry. With v1alpha4 not far off is there any benefit in doing this?

fejta-bot · 2021-04-02T14:54:07Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten

CecileRobertMichon · 2021-04-02T17:14:53Z

/close

availabilityZones were removed in #1233

k8s-ci-robot · 2021-04-02T17:14:59Z

@CecileRobertMichon: Closing this issue.

In response to this:

/close

availabilityZones were removed in #1233

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Jun 4, 2020

k8s-ci-robot assigned richardcase Jun 4, 2020

richardcase mentioned this issue Jun 4, 2020

Add support for FailureDomains to AzureMachinePool #667

Closed

CecileRobertMichon mentioned this issue Jun 4, 2020

RFE: [v1alpha4] redesign user-facing API #618

Closed

CecileRobertMichon added this to the next milestone Jun 4, 2020

k8s-ci-robot added kind/documentation Categorizes issue or PR as related to documentation. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. and removed kind/bug Categorizes issue or PR as related to a bug. labels Jul 22, 2020

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 20, 2020

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 3, 2020

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 3, 2021

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Apr 2, 2021

k8s-ci-robot closed this as completed Apr 2, 2021

CecileRobertMichon removed this from the next milestone May 4, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FailureDomain selection order possibly incorrect #679

FailureDomain selection order possibly incorrect #679

richardcase commented Jun 4, 2020

richardcase commented Jun 4, 2020

richardcase commented Jun 4, 2020

detiber commented Jun 4, 2020

richardcase commented Jun 4, 2020

CecileRobertMichon commented Jun 4, 2020

richardcase commented Jun 5, 2020

CecileRobertMichon commented Jun 5, 2020

alexeldeib commented Jul 22, 2020

fejta-bot commented Oct 20, 2020

CecileRobertMichon commented Nov 3, 2020

richardcase commented Dec 3, 2020

fejta-bot commented Mar 3, 2021

richardcase commented Mar 3, 2021

fejta-bot commented Apr 2, 2021

CecileRobertMichon commented Apr 2, 2021

k8s-ci-robot commented Apr 2, 2021

FailureDomain selection order possibly incorrect #679

FailureDomain selection order possibly incorrect #679

Comments

richardcase commented Jun 4, 2020

richardcase commented Jun 4, 2020

richardcase commented Jun 4, 2020

detiber commented Jun 4, 2020

richardcase commented Jun 4, 2020

CecileRobertMichon commented Jun 4, 2020

richardcase commented Jun 5, 2020

CecileRobertMichon commented Jun 5, 2020

alexeldeib commented Jul 22, 2020

fejta-bot commented Oct 20, 2020

CecileRobertMichon commented Nov 3, 2020

richardcase commented Dec 3, 2020

fejta-bot commented Mar 3, 2021

richardcase commented Mar 3, 2021

fejta-bot commented Apr 2, 2021

CecileRobertMichon commented Apr 2, 2021

k8s-ci-robot commented Apr 2, 2021