Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is it possible to use the standard k8s topology labels? #729

Closed
TBBle opened this issue Feb 6, 2021 · 37 comments
Closed

Is it possible to use the standard k8s topology labels? #729

TBBle opened this issue Feb 6, 2021 · 37 comments
Assignees
Labels
lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness.

Comments

@TBBle
Copy link

TBBle commented Feb 6, 2021

Is your feature request related to a problem?/Why is this needed

Looking into the interactions of this driver with the Cluster Autoscaler, i.e., kubernetes/autoscaler#3230 (comment), I have been unable to identify a good way to support the scale-from-zero case without needing to either change Cluster Autoscaler to always add this driver's topology label, or have each node group's ASG manually specify the topology label topology.ebs.csi.aws.com/zone.

/feature

Describe the solution you'd like in detail

A parameter to specify the topology label used by the PV dynamic allocation, which in the EKS case, can be set to topology.kubernetes.io/zone (or failure-domain.beta.kubernetes.io/zone on older clusters) and behave correctly.

Describe alternatives you've considered

  • Hard-coding the topology.ebs.csi.aws.com/zone label into Cluster Autoscaler AWS support the same way the standard zone label is hard-coded.
  • Manually including the tag k8s.io/cluster-autoscaler/node-template/label/topology.ebs.csi.aws.com/zone on each node group definition in eksctl to which the EBS CSI Daemonset will be deployed, i.e. manually matching the local taint/toleration setup.

See kubernetes/autoscaler#3230 for more details on these alternatives.

Additional context

I'm not sure what the reason is for using a new label originally, which is why I suggest it be an option, rather than changing the default. I'm guessing in non-EKS deployments on EC2, the standard zone label may not be reliably the AWS Zone, if the AWS Cloud Controller is not in-use?

That said, if the reason is that that the label must be owned by the driver for spec or behaviour reasons, then this wouldn't be feasible at all, and we'll probably be stuck manually ensuring that the node groups with the tag correctly match the tolerations of the Daemonset.

@ayberk
Copy link
Contributor

ayberk commented Feb 16, 2021

I really don't know why we didn't use the well-known label. @wongma7 @leakingtapan maybe?

I agree we should just use them instead. Initially we should support both labels though so we can provide a deprecation warning.

@leakingtapan
Copy link
Contributor

leakingtapan commented Feb 16, 2021

It is because the well-known label wasn't well-known yet by the time this driver's topology key was set. There used to be a separate label failure-domain.beta.kubernetes.io/zone and it got renamed to topology.kubernetes.io/zone as part of the work from sig-cloudprovider. See here. And see here for when this driver's label was set.

Given the fact that the driver is not GA yet, I think it makes sense to fix the label before GA if no other concerns. But also keep in mind to update the CSI translation lib for CSI migration here to avoid breaking CSI migration.

Hope this helps

@leakingtapan
Copy link
Contributor

related to: #675

@ayberk
Copy link
Contributor

ayberk commented Feb 17, 2021

That's really helpful, thanks a lot @leakingtapan!

Since this requires k/k changes, I'll try to come up with a plan.

/assign

@ayberk
Copy link
Contributor

ayberk commented Feb 26, 2021

We'll need to gracefully retire the current label (topology key), so we need to start with writing both.

  1. Driver starts writing both keys: topology.ebs.csi.aws.com/zone and topology.kubernetes.io/zone. For topology constraints it should use the new one if available or fallback to the old one.
  2. Translation lib similarly has to read both when converting to in-tree. However, we have two options while converting from in-tree to CSI:
    1. Only write the new one. This would break the backward-compatibility and older versions of the driver wouldn't be able to read the topology key.
    2. Write both keys so we don't break the older versions. Since we're already beta, this is preferred.
  3. At this point all e2e tests should still pass.
  4. Update the e2e tests to use the new key. Unit tests should suffice for supporting the old key.
  5. Remove the old key. Eventually.

Skipping the beta labels should be safe since it's only used by the translation lib.

cc @wongma7 for sanity check

@wongma7
Copy link
Contributor

wongma7 commented Feb 27, 2021

sounds good to me. I understand concretely this to mean

NodeGetInfo must respond such that CSINode.topologyKeys has both oldKey and newKey.

CreateVolume, if provided topology requirement in request, must extract the zone from newKey if exists or oldKey if not

CreateVolume must respond such that PVs have nodeAffininity.nodeSelector oldKey==A || newKey==A (by returning AccessibleTopology array containing both)

@TBBle
Copy link
Author

TBBle commented Mar 28, 2021

If I'm following correctly, landing #773 would be sufficient for Cluster Autoscaler to operate correctly with --balancing-ignore-label=topology.ebs.csi.aws.com/zone, because #773 is ensuring that it has the same value as topology.kubernetes.io/zone. So CA can rely on the well-known label for "will this Pod be schedulable on this node-template", even though csi-translation-lib is hinging off the deprecated-but-identical value when it actually interacts with the scheduler.

It looks like #773 missed the 0.10 release branch, so I guess it'll be part of 0.11?

@ayberk
Copy link
Contributor

ayberk commented Mar 29, 2021

You're right, #773 is enough. Remaining work is irrelevant.

I can see it in the release PR. We haven't released it but 0.10 will include it. See #811

@denniswebb
Copy link

We are running 0.10 and hit an issue of it complaining about not having a topology key on the node to provision a new volume from a PVC. Does #773 prevent needing the key topology.ebs.csi.aws.com/zone or is it still required for now?

@ayberk
Copy link
Contributor

ayberk commented Apr 6, 2021

For now you should still use both. Driver itself priorities topology.kubernetes.io/zone, but it's not present falls back to topology.ebs.csi.aws.com/zone. We haven't updated the csi-translation on the upstream though, so we're still writing and reading both keys.

Do you mind opening an issue ideally with more details?

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 5, 2021
@wongma7
Copy link
Contributor

wongma7 commented Jul 6, 2021

/remove-lifecycle stale

label best practices needs discussion with kubernetes upstream . It's not clear at this time whether we should be putting the standard label or if the standard label should be reserved forr use by in-tree drivers (which simplifies migration because migration translates from our custom label to the standard label.)

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 6, 2021
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 4, 2021
@k1rk
Copy link

k1rk commented Oct 5, 2021

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 5, 2021
elmiko added a commit to elmiko/cluster-autoscaler-operator that referenced this issue Nov 4, 2021
when balancing similar nodes, the autoscaler will look for differences
in labels as one of its checks for similarity. by default it will ignore
the well known kubernetes zone label (topology.kubernetes.io/zone), but
the aws-ebs-csi-driver will add a unique label for its zone,
`topology.ebs.csi.aws.com/zone`. this label will eventually be
deprecated in favor of the common well known topology label. for more
information about this see the references.

ref: kubernetes/autoscaler#3230
ref: kubernetes-sigs/aws-ebs-csi-driver#729
elmiko added a commit to elmiko/kubernetes-autoscaler that referenced this issue Nov 10, 2021
This change adds the aforementioned label to the list of ignored labels
in the AWS nodegroupset processor. This change is being made in response
to the addition of this label by the aws-ebs-csi-driver. This label will
eventually be deprecated by the driver, but its use will prevent AWS
users from properly balancing similar nodes. Also adds unit test for the
AWS processor.

ref: kubernetes#3230
ref: kubernetes-sigs/aws-ebs-csi-driver#729
@ConnorJC3
Copy link
Contributor

/close

fixed in #1360 - available in the driver since version v1.12.0

@k8s-ci-robot
Copy link
Contributor

@ConnorJC3: Closing this issue.

In response to this:

/close

fixed in #1360 - available in the driver since version v1.12.0

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@jbg
Copy link

jbg commented Mar 6, 2023

@ConnorJC3 I think this should be re-opened. From what I can see, the fix was reverted because it was causing duplicate topology selector terms in the CSI migration case.

So it's still the case, even with latest driver version, that provisioned PVs select only on the non-standardised topology.ebs.csi.aws.com/zone label, and don't work out of the box with cluster-autoscaler when scaling node groups up from zero. (You have to manually label the ASGs with a node template so that cluster-autoscaler knows that the first node will have the topology.ebs.csi.aws.com/zone label once it starts up.)

One solution to solve this without harming compatibility with existing clusters would be to provide an option, disabled by default, to use the standardised topology.kubernetes.io/zone label. This seems to be what is discussed in #962, which was closed due to no activity.

@ConnorJC3
Copy link
Contributor

Ah, that is correct, I will reopen this, sorry. Will look into fixing this in the near-ish future completely.

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 4, 2023
@z0rc
Copy link

z0rc commented Jun 5, 2023

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 5, 2023
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 21, 2024
@z0rc
Copy link

z0rc commented Jan 22, 2024

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 22, 2024
@ConnorJC3
Copy link
Contributor

ConnorJC3 commented Feb 13, 2024

/lifecycle frozen

Hi, sorry about the delay on this, the team has decided on a plan:

We do want to move over to the standard labels, but don't want to immediately do so as it breaks downgrading and is a minor breaking change. Thus, the plan is:

  1. Immediately change the node to report both our non-standard label as well as the standard label. This enables forwards compatibility for a future release when the topology on the volume is switched over (being worked on in Report zone via well-known topology key in NodeGetInfo #1931)
  2. Announce the intent to change the topology label for created volumes to the standard label in a future release in the next release of the EBS CSI Driver (currently planned for mid/late February)
  3. In some future release (exact timeline TBD, but likely 2-3 monthly releases after the announcement), switch the driver over to report the standard label for all created volumes instead of the non-standard label. After upgrading to this version, users will only be able to downgrade to the February release or later (or else their volumes will not be able to schedule).

@k8s-ci-robot k8s-ci-robot added the lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. label Feb 13, 2024
@lgfa29
Copy link

lgfa29 commented Mar 7, 2024

Hi @ConnorJC3 👋

I am engineer on the Nomad team and I think #1931 may have created an irreconcilable topology.

With v1.28.0, the created volume only has the topology segment "topology.ebs.csi.aws.com/zone": "eu-west-2a", while the node has both domains in the same segment. Since segments are compared with AND it means that volumes are never able to be scheduled.

On step 3, if the volumes only receive the new label, would the non-standard label be removed from node as well? Otherwise the irreconcilable topology would still be a problem.

Apologies for my ignorance on this, but, currently, where is the standard label applied the the volume on creation? Otherwise I would imagine the same problem would be happening on Kubernetes clusters?

Thank you!

@ConnorJC3
Copy link
Contributor

Hi @lgfa29 - My understanding of the CSI spec is that a volume with key topology.ebs.csi.aws.com/zone should match a node with keys topology.ebs.csi.aws.com/zone and topology.kubernetes.io/zone (as long as the value for topology.ebs.csi.aws.com/zone matches). In other words, essentially extra key(s) on the node should not be considered a mismatch.

This is also how it is implemented in Kubernetes: https://github.com/kubernetes/kubernetes/blob/909faa3a9b33161992af0684c5582353b5be6d9b/pkg/scheduler/framework/plugins/volumezone/volume_zone.go#L187-L193

// ... The requirement is that any
// volume zone-labels must match the equivalent zone-labels on the node.  It is OK for
// the node to have more zone-label constraints (for example, a hypothetical replicated
// volume might allow region-wide access)

On step 3, if the volumes only receive the new label, would the non-standard label be removed from node as well? Otherwise the irreconcilable topology would still be a problem.

The intent is to serve both labels from the node indefinitely, to support volumes created with the old topology key to continue to work on new versions of the EBS CSI Driver.

Apologies for my ignorance on this, but, currently, where is the standard label applied the the volume on creation? Otherwise

Currently volumes receive topology.ebs.csi.aws.com/zone upon creation. We intend at some future date to switch this to the now standard label topology.kubernetes.io/zone.

@lgfa29
Copy link

lgfa29 commented Mar 7, 2024

Thank you for the extra information @ConnorJC3.

I've opened hashicorp/nomad#20094 in the Nomad repo for us to track this.

graveland added a commit to timescale/aws-ebs-csi-driver that referenced this issue Jul 30, 2024
This is the latest version where the topology key is still
topology.ebs.csi.aws.com/zone. Version 1.33.0 switched to
the well-known topology.kubernetes.io/zone, and is a breaking
change that we'll have to plan for.

kubernetes-sigs#729 (comment)
@ConnorJC3
Copy link
Contributor

ConnorJC3 commented Aug 6, 2024

/close

As of v1.33.0 of the EBS CSI Driver, the driver creates volumes with the AZ topology label topology.kubernetes.io/zone. As a result, the three step plan from #729 (comment) is completed. See the CHANGELOG for more info.

@k8s-ci-robot
Copy link
Contributor

@ConnorJC3: Closing this issue.

In response to this:

/close

As of v1.33.0 of the EBS CSI Driver, the driver creates volumes with the AZ topology label topology.kubernetes.io/zone. As a result, the three step plan from #729 (comment) is completed.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness.
Projects
None yet
Development

No branches or pull requests