Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KEP for Azure Availability Zones #2364

Merged
merged 1 commit into from
Jul 25, 2018

Conversation

feiskyer
Copy link
Member

The KEP adds Azure Availability Zones support to Kubernetes, which includes:

  • Detect availability zones automatically when registering new nodes and node's label failure-domain.beta.kubernetes.io/zone will be replaced with AZ instead of fault domain
  • LoadBalancer and PublicIP will be provisioned with zone redundant
  • GetLabelsForVolume interface will be implemented for Azure managed disks and it will also be added to PersistentVolumeLabel admission controller so as to support DynamicProvisioningScheduling

/sig azure
/sig storage

@k8s-ci-robot k8s-ci-robot added area/provider/azure Issues or PRs related to azure provider sig/storage Categorizes an issue or PR as relevant to SIG Storage. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. sig/architecture Categorizes an issue or PR as relevant to SIG Architecture. labels Jul 11, 2018
@feiskyer
Copy link
Member Author

/assign @brendandburns @khenidak @colemickens

/cc @justaugustus @andyzhangx @kubernetes/sig-azure-misc @kubernetes/sig-storage-feature-requests

@k8s-ci-robot k8s-ci-robot added the kind/feature Categorizes issue or PR as related to a new feature. label Jul 11, 2018

### PersistentVolumeLabel

Besides PVLabeler interface, [PersistentVolumeLabel](https://github.com/kubernetes/kubernetes/blob/master/plugin/pkg/admission/storage/persistentvolume/label/admission.go) admission controller should also updated with AzureDisk support, so that new PVs could be applied with above labels automatically.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This admission controller is deprecated now and I believe its replacement is in the cloud controller manager

Copy link
Member Author

@feiskyer feiskyer Jul 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, didn't notice this has been deprecated. Cloud controller manager will GA in v1.13 or v1.14. What's the replacement before its GA?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if there is a replacement in the meantime, but both methods should be implemented

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@msau42 you're right,

  • if cloud-controller-manager is used, then that is actually PVLabeler interface which is used persistent volume label controller
  • or else, this admission controller is used.

Will add this info to both this and PVLabler part.

parameters:
kind: Managed
storageaccounttype: Premium_LRS
zone: "centralus-1"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new topology aware dynamic provisioning feature adds a new allowedTopologies field to Storageclass which is intended to replace the zone/zones parameters.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Topology-aware dynamic provisioning feature is still alpha in v1.12 if I'm not wrong, so these two parameters are still required before its GA, right?

And thanks of reminding this. allowedTopologies feature should also be added here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct just wanted to make sure you know that this area is changing and as part of beta, we want to get all the 3 in tree cloud provider plugins to start moving to the new method as well.


## AzureDisk

When Azure managed disks are created, the `PersistentVolumeLabel` admission controller automatically adds zone labels to them. The scheduler (via `VolumeZonePredicate`) will then ensure that pods that claim a given volume are only placed into the same zone as that volume, as volumes cannot be attached across zones.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eventually the VolumeZone predicate will be deprecated and replaced by PV.NodeAffinity


The proposal includes required changes to support availability zones for various functions in Azure cloud provider and AzureDisk volumes:

- Detect availability zones automatically when registering new nodes and node's label `failure-domain.beta.kubernetes.io/zone` will be replaced with AZ instead of fault domain
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rather than having the kubelet self-register this, can the node controller pull this information and add it into the node object?


## Node registration

When nodes are started, kubelet automatically adds labels to them with region and zone information:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

allowing nodes to self-label has been problematic (enumerated in #911). is there a reason we wouldn't have the node controller pull this info from the cloud provider and manage the labels on the node objects?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

@feiskyer feiskyer Jul 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there're actually two ways supported today:

  • Without cloud-controller-manager (kubelet --cloud-privider=azure), kubelet registers itself with zone labels
  • With cloud-controller-manager (kubelet --cloud-provider=external), node controller adds zone lables

Until cloud-controller-manager GA (probably v1.13 or v1.14), both way should be supported. I should probably state this more clearly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Until cloud-controller-manager GA (probably v1.13 or v1.14), both way should be supported

I agree we need to support the old way until we can transition away from it, but I don't think we should expand it to new labels if possible

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree we need to support the old way until we can transition away from it, but I don't think we should expand it to new labels if possible

Agreed. this is only for existing zone labels.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. this is only for existing zone labels.

great, thanks for confirming

Copy link
Member

@thockin thockin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not clear to me - is this proposing changes to Kubernetes concepts or adding new labels, or is it just changing the Azure implementation with no impact to the rest of the system?

I'm not sure that changes that stay within the azure implementation need a KEP?


The proposal includes required changes to support availability zones for various functions in Azure cloud provider and AzureDisk volumes:

- Detect availability zones automatically when registering new nodes and node's label `failure-domain.beta.kubernetes.io/zone` will be replaced with AZ instead of fault domain
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the statement. You want to replace a standard label with ... what?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Azure is setting fault domain in label failure-domain.beta.kubernetes.io/zone today. It will be replaced with AZ. Fault domain and AZ are different on Azure.


Currently, Azure nodes are registered with label `failure-domain.beta.kubernetes.io/zone=faultDomain`.

The format of fault domain is numbers (e.g. `1` or `2`), which is in same format with AZ (e.g. `1` or `3`). If AZ is using same format with faultDomain, then there'll be scheduler issues for clusters with both AZ and non-AZ nodes. So AZ will use a different format in kubernetes: `<region>-<AZ>`, e.g. `centralus-1`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you should lead in with some details about what AZ means and what faultDomain means? Why are they not interchangeable terms (they are both pretty commonly used to mean the same thing)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. This should be definitely addressed.

@feiskyer feiskyer force-pushed the kep-azure-az branch 2 times, most recently from 7a9eb71 to f03f856 Compare July 12, 2018 03:34
@feiskyer
Copy link
Member Author

Added a new commit with topology-aware provisioning and difference between fault domain and availability zones.

@brendandburns
Copy link
Contributor

brendandburns commented Jul 12, 2018

Hey @thockin good question. The KEP process is somewhat vague on what constitutes a KEP:

https://kubernetes.io/docs/imported/community/keps/#what-type-of-work-should-be-tracked-by-a-kep

I think that it is clearly within the scope of a SIG to track something as a KEP if they want too:

"As the local bodies of governance, SIGs should have broad latitude in describing what constitutes an enhancement which should be tracked through the KEP process."

But it's also clear that I don't think that this meets the guidelines for things that are required to be a KEP:

"any technical effort (refactoring, major architectural change) that will impact a large section of the development community should also be communicated widely. The KEP process is suited for this even if it will have zero impact on the typical user or operator."

...

"Enhancements that have major impacts on multiple SIGs should use the KEP process."

So I guess given that this KEP has been started, we may as well finish it, but going forward SIG-Azure will develop guidelines for what should and should not be a KEP

"SIGs may find that helpful to enumerate what does not require a KEP rather than what does."

Sound good?

@justaugustus
Copy link
Member

@brendandburns / @thockin --

So from a Features Maintainer perspective, Feature Owners are required to submit design proposals for units of work that qualify as Features i.e., those tracked within k/features. The feature submission for Azure AZs can be found here: kubernetes/enhancements#586
I'd expect to see an uptick in KEPs across release if contributors are following the Features submission process.

From SIG Azure Chair perspective, anything that moves us along the line of transparency w.r.t. SIG roadmap is a step in the right direction.
One thing that I appreciate about what's currently defined in the KEP metadata is that we consider scope; whether a KEP is kubernetes-wide, multi-SIG, single SIG, which lends itself to considering KEPs for what might be considered a smaller feature.

That said, I'm happy to firm up guidelines on expectations for KEP submission in SIG Azure! :)

@@ -139,13 +146,13 @@ Note that zonal PublicIPs are not supported. We may add this easily if there’r

## AzureDisk

When Azure managed disks are created, the `PersistentVolumeLabel` admission controller automatically adds zone labels to them. The scheduler (via `VolumeZonePredicate`) will then ensure that pods that claim a given volume are only placed into the same zone as that volume, as volumes cannot be attached across zones.
When Azure managed disks are created, the `PersistentVolumeLabel` admission controller or PV controller automatically adds zone labels to them. The scheduler (via `VolumeZonePredicate` or `PV.NodeAffinity` in the future) will then ensure that pods that claim a given volume are only placed into the same zone as that volume, as volumes cannot be attached across zones.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PV controller is actually the "PV label controller in cloud-controller-manager" (not to be be confused with the PV controller in kube-controller-manager).

The admission controller needs to be updated to also add PV.NodeAffinity in addition to the zone labels.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, exactly. Updated with that information.

@feiskyer
Copy link
Member Author

@thockin @brendandburns @justaugustus Thanks for clarification for KEP process. As Brendan's suggestion, could we continue the KEP and develop guidelines in sig-azure?

@justaugustus
Copy link
Member

Absolutely @feiskyer. Let's keep this going!

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 14, 2018
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 19, 2018
@feiskyer
Copy link
Member Author

@thockin @brendandburns @justaugustus @ddebroy Addressed comments and squashed commits. PTAL

Copy link
Member

@ddebroy ddebroy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @feiskyer . /lgtm

@justaugustus
Copy link
Member

Overall LGTM, but I'd like to hold pending a review from Kal or Cole as well.
/lgtm
/hold
/cc @khenidak @colemickens

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 20, 2018
@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 20, 2018
@khenidak
Copy link
Contributor

/LGTM

@justaugustus
Copy link
Member

Awesome, thanks for the review, Kal!
@feiskyer -- thanks for putting this together! :)

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 20, 2018
@justaugustus
Copy link
Member

Ooooooo, I'm going to need to add an OWNERS for the sig-azure folder.
Will handle next week.

@khenidak
Copy link
Contributor

khenidak commented Jul 21, 2018 via email

@justaugustus
Copy link
Member

Ooooooo, I'm going to need to add an OWNERS for the sig-azure folder.
Will handle next week.

Fixed in #2399

/approve

@justaugustus
Copy link
Member

/cc @jdumars @calebamiles @idvoretskyi

@justaugustus
Copy link
Member

@jdumars @calebamiles @idvoretskyi -- could you /approve this?
It's approved from the SIG Azure side.

Copy link
Member

@idvoretskyi idvoretskyi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: idvoretskyi, justaugustus

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 25, 2018
@k8s-ci-robot k8s-ci-robot merged commit eb02547 into kubernetes:master Jul 25, 2018
@justaugustus
Copy link
Member

Thanks @idvoretskyi 🎉

@feiskyer feiskyer deleted the kep-azure-az branch July 26, 2018 01:44
@feiskyer
Copy link
Member Author

Thanks @idvoretskyi and @justaugustus

k8s-github-robot pushed a commit to kubernetes/kubernetes that referenced this pull request Aug 6, 2018
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Add availability zones support to Azure managed disks

**What this PR does / why we need it**:

Continue of [Azure Availability Zone feature](kubernetes/enhancements#586).

This PR adds availability zone support for Azure managed disks and its storage class. Zoned managed disks is enabled by default if there are zoned nodes in the cluster.

The zone could also be customized by `zone` or `zones` parameter, e.g.

```yaml
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  annotations:
  name: managed-disk-zone-1
parameters:
  zone: "southeastasia-1"
  # zones: "southeastasia-1,"southeastasia-2"
  cachingmode: None
  kind: Managed
  storageaccounttype: Standard_LRS
provisioner: kubernetes.io/azure-disk
reclaimPolicy: Delete
volumeBindingMode: Immediate
```

All zoned AzureDisk PV will also be labeled with its availability zone, e.g.

```sh
$ kubectl get pvc pvc-azuredisk-az-1
NAME                 STATUS    VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS          AGE
pvc-azuredisk-az-1   Bound     pvc-5ad0c7b8-8f0b-11e8-94f2-000d3a07de8c   5Gi        RWO            managed-disk-zone-1   2h

$ kubectl get pv pvc-5ad0c7b8-8f0b-11e8-94f2-000d3a07de8c --show-labels
NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS    CLAIM                        STORAGECLASS          REASON    AGE       LABELS
pvc-5ad0c7b8-8f0b-11e8-94f2-000d3a07de8c   5Gi        RWO            Delete           Bound     default/pvc-azuredisk-az-1   managed-disk-zone-1             2h        failure-domain.beta.kubernetes.io/region=southeastasia,failure-domain.beta.kubernetes.io/zone=southeastasia-1
```

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #

**Special notes for your reviewer**:

See also the [KEP](kubernetes/community#2364).

DynamicProvisioningScheduling feature would be added in a following PR.

**Release note**:

```release-note
Azure managed disks now support availability zones and new parameters `zoned`, `zone` and `zones` are added for AzureDisk storage class.
```

/kind feature
/sig azure
/assign @brendandburns @khenidak @andyzhangx
calebamiles pushed a commit to calebamiles/community that referenced this pull request Sep 5, 2018
> - Availability Zones are unique physical locations within an Azure region. Each zone is made up of one or more data centers equipped with independent power, cooling, and networking.
>
> An Availability Zone in an Azure region is a combination of a fault domain and an update domain (Same like FD, but for updates. When upgrading a deployment, it is carried out one update domain at a time). For example, if you create three or more VMs across three zones in an Azure region, your VMs are effectively distributed across three fault domains and three update domains.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there further classification like does an update domain have multiple fault domains ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, each VM is assigned with a single update domain.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/provider/azure Issues or PRs related to azure provider cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. sig/architecture Categorizes an issue or PR as relevant to SIG Architecture. sig/storage Categorizes an issue or PR as relevant to SIG Storage. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.