WIP: Add a POC of an alternate partitioning scheme #35

klueska · 2024-06-16T21:59:19Z

I haven't yet written this up properly (or added any code for it), but I wanted to push something out there with my thoughts around how to support partitioning in a more compact way.

Below is the (incomplete) YAML for what one A100 GPU with MIG disabled, one A100 with MIG enabled, and one H100 GPU (regardless of MIG mode) would look like. I am currently only showing the full GPUs and the 1g.*gb devices (because I wrote this by hand), but you can imagine how it would be expanded with the rest.

Most of it is self-explanatory, except for 1 thing -- what the new sharedCapacityInstances field on a device implies. It is a way to define a "boundary" for any shared capacity referenced in a device template. Meaning that all devices that provide the same mappings for a given sharedCapacityInstance will pull from the same SharedCapacity.

I will add more details soon (as well as a full prototype), but I wanted to get this out for initial comments before then.

sharedAttributeGroups:
- name: common-attributes
  attributes:
  - name: brand
    string: Nvidia
  - name: cuda-compute-capability
    version: 8.0.0
  - name: driver-version
    version: 550.54.15
  - name: cuda-driver-version
    version: 12.4.0

- name: a100-common-attributes
  attributes:
  - name: product-name
    string: Mock NVIDIA A100-SXM4-40GB
  - name: architecture
    string: Ampere

- name: h100-common-attributes
  attributes:
  - name: product-name
    string: Mock NVIDIA H100-SXM4-80GB
  - name: architecture
    string: Hopper

- name: gpu-0-common-attributes
  attributes:
  - name "k8s.io/pcie-root"
    string "pci_0"

- name: gpu-1-common-attributes
  attributes:
  - name "k8s.io/pcie-root"
    string "pci_1"

- name: gpu-2-common-attributes
  attributes:
  - name "k8s.io/pcie-root"
    string "pci_2"

sharedCapacityTemplates:
- name: a100-shared-resources
  capacities:
  - name: multiprocessors
    quantity: "98"
  - name: copy-engines
    quantity: "7"
  - name: decoders
    quantity: "5"
  - name: encoders
    quantity: "0"
  - name: jpeg-engines
    quantity: "1"
  - name: ofa-engines
    quantity: "1"
  - name: memory-slices
    intRange: 0-7

- name: h100-shared-resources
  capacities:
  - name: multiprocessors
    quantity: "132"
  - name: copy-engines
    quantity: "8"
  - name: decoders
    quantity: "7"
  - name: encoders
    quantity: "0"
  - name: jpeg-engines
    quantity: "7"
  - name: ofa-engines
    quantity: "1"
  - name: memory-slices
    intRange: 0-7

deviceTemplates:
- name: a100-whole-gpu
  sharedAttributeGroups:
  - common-attributes
  - a100-common-attributes
  attributes:
  - name: memory
    quantity: 40Gi
  - name: mig-capable
    bool: true
  sharedCapacitiesConsumed:
  - sharedCapacityTemplateName: a100-shared-resources
    capacities:
    - name: multiprocessors
      quantity: "98"
    - name: copy-engines
      quantity: "7"
    - name: decoders
      quantity: "5"
    - name: encoders
      quantity: "0"
    - name: jpeg-engines
      quantity: "1"
    - name: ofa-engines
      quantity: "1"
    - name: memory-slices
      intRange: 0-7

- name: h100-whole-gpu
  sharedAttributeGroups:
  - common-attributes
  - h100-common-attributes
  attributes:
  - name: memory
    quantity: 80Gi
  - name: mig-capable
    bool: true
  sharedCapacitiesConsumed:
  - sharedCapacityTemplateName: h100-shared-resources
    capacities:
    - name: multiprocessors
      quantity: "132"
    - name: copy-engines
      quantity: "8"
    - name: decoders
      quantity: "7"
    - name: encoders
      quantity: "0"
    - name: jpeg-engines
      quantity: "7"
    - name: ofa-engines
      quantity: "1"
    - name: memory-slices
      intRange: 0-7

- name: a100-mig-1g.5gb-base
  sharedAttributeGroups:
  - common-attributes
  - a100-common-attributes
  attributes:
  - name: mig-profile
    string: 1g.5gb
  - name: memory
    quantity: 4864Mi
  sharedCapacitiesConsumed:
  - sharedCapacityTemplateName: a100-shared-resources
    capacities:
    - name: multiprocessors
      quantity: "14"
    - name: copy-engines
      quantity: "1"
    - name: decoders
      quantity: "0"
    - name: encoders
      quantity: "0"
    - name: jpeg-engines
      quantity: "0"
    - name: ofa-engines
      quantity: "0"

- name: h100-mig-1g.10gb-base
  sharedAttributeGroups:
  - common-attributes
  - h100-common-attributes
  attributes:
  - name: mig-profile
    string: 1g.10gb
  - name: memory
    quantity: 9728Mi
  sharedCapacitiesConsumed:
  - sharedCapacityTemplateName: h100-shared-resources
    capacities:
    - name: multiprocessors
      quantity: "16"
    - name: copy-engines
      quantity: "1"
    - name: decoders
      quantity: "1"
    - name: encoders
      quantity: "0"
    - name: jpeg-engines
      quantity: "1"
    - name: ofa-engines
      quantity: "0"

- name: a100-mig-1g.5gb-0
  deviceTemplateName: a100-mig-1g.5gb-base
  sharedCapacitiesConsumed:
  - sharedCapacityTemplateName: a100-shared-resources
    capacities:
    - name: memory-slices
      intRange: "0"

- name: a100-mig-1g.5gb-1
  deviceTemplateName: a100-mig-1g.5gb-base
  sharedCapacitiesConsumed:
  - sharedCapacityTemplateName: a100-shared-resources
    capacities:
    - name: memory-slices
      intRange: "1"

- name: a100-mig-1g.5gb-2
  deviceTemplateName: a100-mig-1g.5gb-base
  sharedCapacitiesConsumed:
  - sharedCapacityTemplateName: a100-shared-resources
    capacities:
    - name: memory-slices
      intRange: "2"

- name: a100-mig-1g.5gb-3
  deviceTemplateName: a100-mig-1g.5gb-base
  sharedCapacitiesConsumed:
  - sharedCapacityTemplateName: a100-shared-resources
    capacities:
    - name: memory-slices
      intRange: "3"

- name: a100-mig-1g.5gb-4
  deviceTemplateName: a100-mig-1g.5gb-base
  sharedCapacitiesConsumed:
  - sharedCapacityTemplateName: a100-shared-resources
    capacities:
    - name: memory-slices
      intRange: "4"

- name: a100-mig-1g.5gb-5
  deviceTemplateName: a100-mig-1g.5gb-base
  sharedCapacitiesConsumed:
  - sharedCapacityTemplateName: a100-shared-resources
    capacities:
    - name: memory-slices
      intRange: "5"

- name: a100-mig-1g.5gb-6
  deviceTemplateName: a100-mig-1g.5gb-base
  sharedCapacitiesConsumed:
  - sharedCapacityTemplateName: a100-shared-resources
    capacities:
    - name: memory-slices
      intRange: "6"

- name: h100-mig-1g.10gb-0
  deviceTemplateName: h100-mig-1g.10gb-base
  sharedCapacitiesConsumed:
  - sharedCapacityTemplateName: h100-shared-resources
    capacities:
    - name: memory-slices
      intRange: "0"

- name: h100-mig-1g.10gb-1
  deviceTemplateName: h100-mig-1g.10gb-base
  sharedCapacitiesConsumed:
  - sharedCapacityTemplateName: h100-shared-resources
    capacities:
    - name: memory-slices
      intRange: "1"

- name: h100-mig-1g.10gb-2
  deviceTemplateName: h100-mig-1g.10gb-base
  sharedCapacitiesConsumed:
  - sharedCapacityTemplateName: h100-shared-resources
    capacities:
    - name: memory-slices
      intRange: "2"

- name: h100-mig-1g.10gb-3
  deviceTemplateName: h100-mig-1g.10gb-base
  sharedCapacitiesConsumed:
  - sharedCapacityTemplateName: h100-shared-resources
    capacities:
    - name: memory-slices
      intRange: "3"

- name: h100-mig-1g.10gb-4
  deviceTemplateName: h100-mig-1g.10gb-base
  sharedCapacitiesConsumed:
  - sharedCapacityTemplateName: h100-shared-resources
    capacities:
    - name: memory-slices
      intRange: "4"

- name: h100-mig-1g.10gb-5
  deviceTemplateName: h100-mig-1g.10gb-base
  sharedCapacitiesConsumed:
  - sharedCapacityTemplateName: h100-shared-resources
    capacities:
    - name: memory-slices
      intRange: "5"

- name: h100-mig-1g.10gb-6
  deviceTemplateName: h100-mig-1g.10gb-base
  sharedCapacitiesConsumed:
  - sharedCapacityTemplateName: h100-shared-resources
    capacities:
    - name: memory-slices
      intRange: "6"

devices:
# GPU 0 is an A100 with MIG disabled so is only advertised as a full GPU
- name: gpu-0
  deviceTemplateName: a100-whole-gpu
  sharedCapacityInstances:
  - templateName: a100-shared-resources
    instanceName: gpu-0
  sharedAttributeGroups:
  - gpu-0-common-attributes
  attributes:
  - name: index
    string: "0"
  - name: minor
    string: "0"
  - name: uuid
    string: GPU-0eaad900-5263-4fd6-b020-f03d30efac31

# GPU 1 is an A100 with MIG enabled so is only advertised as its full set of MIG devices
- name: gpu-1-mig-1g.5gb-0
  deviceTemplateName: a100-mig-1g.5gb-0
  sharedCapacityInstances:
  - templateName: a100-shared-resources
    instanceName: gpu-1
  sharedAttributeGroups:
  - gpu-1-common-attributes
  attributes:
  - name: parentIndex
    string: "1"

- name: gpu-1-mig-1g.5gb-1
  deviceTemplateName: a100-mig-1g.5gb-1
  sharedCapacityInstances:
  - templateName: a100-shared-resources
    instanceName: gpu-1
  sharedAttributeGroups:
  - gpu-1-common-attributes
  attributes:
  - name: parentIndex
    string: "1"

- name: gpu-1-mig-1g.5gb-2
  deviceTemplateName: a100-mig-1g.5gb-2
  sharedCapacityInstances:
  - templateName: a100-shared-resources
    instanceName: gpu-1
  sharedAttributeGroups:
  - gpu-1-common-attributes
  attributes:
  - name: parentIndex
    string: "1"

- name: gpu-1-mig-1g.5gb-3
  deviceTemplateName: a100-mig-1g.5gb-3
  sharedCapacityInstances:
  - templateName: a100-shared-resources
    instanceName: gpu-1
  sharedAttributeGroups:
  - gpu-1-common-attributes
  attributes:
  - name: parentIndex
    string: "1"

- name: gpu-1-mig-1g.5gb-4
  deviceTemplateName: a100-mig-1g.5gb-4
  sharedCapacityInstances:
  - templateName: a100-shared-resources
    instanceName: gpu-1
  sharedAttributeGroups:
  - gpu-1-common-attributes
  attributes:
  - name: parentIndex
    string: "1"

- name: gpu-1-mig-1g.5gb-5
  deviceTemplateName: a100-mig-1g.5gb-5
  sharedCapacityInstances:
  - templateName: a100-shared-resources
    instanceName: gpu-1
  sharedAttributeGroups:
  - gpu-1-common-attributes
  attributes:
  - name: parentIndex
    string: "1"

- name: gpu-1-mig-1g.5gb-6
  deviceTemplateName: a100-mig-1g.5gb-6
  sharedCapacityInstances:
  - templateName: a100-shared-resources
    instanceName: gpu-1
  sharedAttributeGroups:
  - gpu-1-common-attributes
  attributes:
  - name: parentIndex
    string: "1"

# GPU 2 is an H100 and advertises both its full GPU and all of its MIG devices
- name: gpu-2
  deviceTemplateName: h100-whole-gpu
  sharedCapacityInstances:
  - templateName: h100-shared-resources
    instanceName: gpu-2
  sharedAttributeGroups:
  - gpu-2-common-attributes
  attributes:
  - name: index
    string: "2"
  - name: minor
    string: "2"
  - name: uuid
    string: GPU-4404041a-04cf-1ccf-9e70-f139a9b1e23c

- name: gpu-2-mig-1g.10gb-0
  deviceTemplateName: h100-mig-1g.10gb-0
  sharedCapacityInstances:
  - templateName: h100-shared-resources
    instanceName: gpu-2
  sharedAttributeGroups:
  - gpu-2-common-attributes
  attributes:
  - name: index
    string: "2:0"
  - name: parentIndex
    string: "2"

- name: gpu-2-mig-1g.10gb-1
  deviceTemplateName: h100-mig-1g.10gb-1
  sharedCapacityInstances:
  - templateName: h100-shared-resources
    instanceName: gpu-2
  sharedAttributeGroups:
  - gpu-2-common-attributes
  attributes:
  - name: parentIndex
    string: "2"

- name: gpu-2-mig-1g.10gb-2
  deviceTemplateName: h100-mig-1g.10gb-2
  sharedCapacityInstances:
  - templateName: h100-shared-resources
    instanceName: gpu-2
  sharedAttributeGroups:
  - gpu-2-common-attributes
  attributes:
  - name: parentIndex
    string: "2"

- name: gpu-2-mig-1g.10gb-3
  deviceTemplateName: h100-mig-1g.10gb-3
  sharedCapacityInstances:
  - templateName: h100-shared-resources
    instanceName: gpu-2
  sharedAttributeGroups:
  - gpu-2-common-attributes
  attributes:
  - name: parentIndex
    string: "2"

- name: gpu-2-mig-1g.10gb-4
  deviceTemplateName: h100-mig-1g.10gb-4
  sharedCapacityInstances:
  - templateName: h100-shared-resources
    instanceName: gpu-2
  sharedAttributeGroups:
  - gpu-2-common-attributes
  attributes:
  - name: parentIndex
    string: "2"

- name: gpu-2-mig-1g.10gb-5
  deviceTemplateName: h100-mig-1g.10gb-5
  sharedCapacityInstances:
  - templateName: h100-shared-resources
    instanceName: gpu-2
  sharedAttributeGroups:
  - gpu-2-common-attributes
  attributes:
  - name: parentIndex
    string: "2"

- name: gpu-2-mig-1g.10gb-6
  deviceTemplateName: h100-mig-1g.10gb-6
  sharedCapacityInstances:
  - templateName: h100-shared-resources
    instanceName: gpu-2
  sharedAttributeGroups:
  - gpu-2-common-attributes
  attributes:
  - name: parentIndex
    string: "2"

k8s-ci-robot · 2024-06-16T21:59:25Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: klueska

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [klueska]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

johnbelamaric

I think I understand.

Create common attribute groups
Create common sharedCapacityTemplates that are used to represent what a "namespace" should contain for shared capacities
Create deviceTemplates that describe the shape of the main GPU as well as each partition
Create device instances that reference those templates and overlay a name, attributes, and capacities specific to that instance, as well as specify the "namespace" (physical card in this example) from which the capacities are drawn

I think this can work. It is sort of like Option 4, except it flattens the "DeviceShape contains partitions" into "one shape per partition", then lists all the devices and partitions explicitly, referencing the shape (template) to reduce repetition.

I don't like the word "namespace", it already has too strong of a meaning and is pretty confusing to see here.

Since you are explicitly listing every device, I think this will not achieve order-of-magnitude level reduction in size. Quick calculation: each partition takes ~13 lines of yaml, and so the size of the yaml will grow and X + 13dp where d = number of physical devices and p = partitions per device, and X is some constant for the template. since p is 28(?) we are looking at X + 364d for total lines.

Option 4 encodes the shape in roughly the same X, but the device list does not grow O(dp) but just O(d), and with a smaller constant - something like X + 7d.

But we can see once you finish the prototype.

johnbelamaric · 2024-06-17T03:57:16Z

dra-evolution/pkg/api/poc.yaml

+  - name: mig-capable
+    bool: true
+  sharedCapacitiesConsumed:
+  - sharedCapacityTemplateName: gpu-shared-resources


I don't understand what the sharedCapacityTemplate is for? What information does the template provide that you are not repeating below? How do we use the named template and what is its relationship to the capacities below.

Yes, it is repeated here, but only because a full GPU happens to consume all of the shared capacity that this example contains. It doesn't need to be true in general though.

In fact, I haven't brought it up much, but on A100, we will not actually be advertising full GPUs at the same time as MIG devices. If an A100 has MIG enabled we will only advertise its MIG devices; if it has MIG disabled we will only advertise it as a full GPU.

This is due to the fact that putting a GPU into and out of MIG mode on Ampere is very difficult (it requires all GPU workloads on all GPUs to be drained, and a GPU reset to be performed).

However, on Hopper+ we will advertise both the full GPU and its MIG device (because a GPU reset is no longer required to flip in and out of MIG mode on these newer generation GPUs).

I have updated the example to match reality better.

dra-evolution/pkg/api/poc.yaml

Signed-off-by: Kevin Klues <[email protected]>

klueska · 2024-06-17T13:19:56Z

In putting this together, it's become obvious that alot of what is being "templated" would be repeated in each and every slice. Would it be possible to create a separate API server object to hold the "template" objects for a given driver that can then be referenced by its resource slices? Possibly even leveraging a config map to do it instead of defining a new type.

johnbelamaric · 2024-06-17T14:51:02Z

In putting this together, it's become obvious that alot of what is being "templated" would be repeated in each and every slice. Would it be possible to create a separate API server object to hold the "template" objects for a given driver that can then be referenced by its resource slices? Possibly even leveraging a config map to do it instead of defining a new type.

Certainly it's possible, the question is whether it is worth the complexity, since then you have another independent object that can change or be missing, etc. This would help for any of the options 2, 4+ actually.

One thing we may want to think about is which factors drive scale and which are likely to grow over time, fastest:

individual partition shapes/templates (Ps)- if we think GPU sizes / memory blocks are going to increase dramatically, this number will increase
number of partitions per physical device (Ppd)- similarly, this will increase even more as memory blocks increase
number of physical devices per node (Dpn) - I expect this will stay around 8-16 for quite some time, WDYT?
number of nodes (N) - varies per cluster, but we should think O(10,000) at least, if not 10x that in the long run based on historical trends
number of slices per node (Spn) - depends on the particular slice size and specific slice design choices

We can characterize each suggestion then based on which of these scaling factors are relevant:

All options scale linearly with N (holding others fixed), but suggestions like the one quoted above can reduce the scale constant for some of the options. Nonetheless, for now let's think per node.
Options 1 and 3 on a per node basis scale O(Ppd * Dpn)
Option 2 puts some common attributes into the slice, so on a per-node basis, it splits the growth into two functions:
- O(Spn) for the factored out common attributes
- O(Ppd * Dpn) for the rest
Option 4 puts common attributes and partition map into the slice once, so it has two parts as well:
- O(Spn * Ppd) for the device shape
- O(Dpn) for the rest
Option 5 (yuck) shifts this to:
- O(Spn * Ps) - I suspect that Ps = O(log Ppd) so this is an improvement, to basically O(Spn * log Ppd)
- O(Dpn) for the rest
Option 6 (this option) shifts this to:
- O(Spn * Ps) - since you list each partition shape
- O(Ppd * Dpn) - since you explicitly list each device partition

The suggestion above would change these (on a per node basis):

Option 2 would stay O(Spn) for the common attributes but the scaling factor would change
Option 4 would become O(Spn) for the device shape since it would just be a constant reference
Option 5 would become O(Spn) for the device shape
Option 6 would become O(Spn) for the templates

Setting that aside, going back to the options without that suggestion, it would be possible to merge options 4 and 6 (option "10"...no, better stick with 7), such that: 1) We capture each partition shape once like in option 6; 2) Implicitly generate partitions like in option 4. If we did that, we would have:

O(Spn * Ps) for the shapes/templates
O(Dpn) - for the rest

which seems like the best we can do while keeping the repeated items in the slice.

johnbelamaric · 2024-06-17T15:05:05Z

Thinking more, I really do think that the things that will likely increase the most in the next 3-5 years are:

Partition shapes
Partitions per device
Number of nodes

This means that factoring out things that are duplicated per slice is a good idea, as number of slices will increase with N. Not only that, but if the "front matter" - the duplicated things like shapes/templates - increase in size, we leave less and less space for the actual devices. This causes an increase in slices per node!

In other words, let's try to prevent growth being a multiplicative factor of N with either Ps or Ppd.

This makes me think our best bet is going to be:

A resource that captures device/partition shapes/templates. The size and/or number of these will grow O(Ps * Ppd), but NOT with N.
The slice resource that references that resource for shapes. This pulls the "front matter" to a small constant (relative to Ps and Ppd). Thus, the total for this becomes O(N * Spn * Dpn). Since we expect Dpn to be relatively fixed, and since we moved all the "growth" out of the slice, Spn will also be fixed, so this is effectively O(N), which is really the best we can do.

klueska · 2024-06-17T15:09:35Z

I hadn't put the numbers together, but your conclusion at the end is where my head was when suggesting this. There will still need to be some per-slice "template" data (e.g. the pcie-root attributes from my example), but it would be info that is relevant just to the devices in the slice, so it actually lives in the appropriate place.

klueska · 2024-06-17T15:55:03Z

I picture one "front matter" object per GPU type which defines everything that is non-node-specific. And then each device in a resource slice has fields that point to a specific "front matter" object and then pull bits and pieces from it as appropriate.

klueska · 2024-06-17T15:56:52Z

Simple devices can still be just a named list of attributes, but if you want anything more sophisticated you have to start using this more complex structure.

johnbelamaric · 2024-06-17T15:57:52Z

I picture one "front matter" object per GPU type which defines everything that is non-node-specific. And then each device in a resource slice has fields that point to a specific "front matter" object and then pull bits and pieces from it as appropriate.

Yes, that's what I am thinking too. Basically push the invariant stuff across nodes into a separate object, and then refer to it. Those "front matter" pieces are probably constant for a given combination of hardware, firmware and driver versions.

johnbelamaric · 2024-06-17T17:07:34Z

FYI I added this as "Option 6" as well as the "Option 7" here: #20 (comment)

klueska · 2024-06-17T20:24:48Z

In relation to what came up in the call tonight ...

Instead of having a single centralized object with all of the "front matter", we could have have one "front matter" object per node that all of the slices for that node refer to. It would likely have redundant information to most other nodes, but then we at least keep the front-matter separate from the resource slices that consume it (and if a driver does want to go through the headache of centralizing it, they still can).

k8s-triage-robot · 2024-09-15T21:11:35Z

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

Mark this PR as fresh with /remove-lifecycle stale
Close this PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot · 2024-10-15T21:57:50Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

Mark this PR as fresh with /remove-lifecycle rotten
Close this PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot · 2024-11-14T22:56:43Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

Reopen this PR with /reopen
Mark this PR as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

k8s-ci-robot · 2024-11-14T22:56:49Z

@k8s-triage-robot: Closed this PR.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

Reopen this PR with /reopen

Mark this PR as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 16, 2024

k8s-ci-robot requested review from johnbelamaric and pohly June 16, 2024 21:59

k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. approved Indicates a PR has been approved by an approver from all required OWNERS files. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jun 16, 2024

klueska mentioned this pull request Jun 16, 2024

Partitionable model with a common shared shape #31

Closed

klueska force-pushed the partitionable-6 branch 2 times, most recently from e2d0d09 to dea2041 Compare June 16, 2024 22:13

klueska changed the title ~~WIP: Add a POC of an alternate partitioaing scheme~~ WIP: Add a POC of an alternate partitioning scheme Jun 16, 2024

klueska force-pushed the partitionable-6 branch from dea2041 to 97ab967 Compare June 16, 2024 22:15

johnbelamaric reviewed Jun 17, 2024

View reviewed changes

klueska force-pushed the partitionable-6 branch from 97ab967 to eada27e Compare June 17, 2024 12:43

k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jun 17, 2024

klueska force-pushed the partitionable-6 branch 2 times, most recently from 0408229 to 51cef6d Compare June 17, 2024 13:10

k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Jun 17, 2024

WIP: Add a POC of an alternate partitioaing scheme

cf2ce30

Signed-off-by: Kevin Klues <[email protected]>

klueska force-pushed the partitionable-6 branch from 51cef6d to cf2ce30 Compare June 17, 2024 13:14

johnbelamaric mentioned this pull request Jun 17, 2024

dra-evolution: partitioning of devices #20

Closed

johnbelamaric mentioned this pull request Jun 18, 2024

Update API for 1.31 without partitionable model #36

Closed

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 15, 2024

klueska mentioned this pull request Sep 25, 2024

DRA: Add support for partitionable devices kubernetes/enhancements#4815

Open

4 tasks

klueska mentioned this pull request Oct 8, 2024

KEP-4815: DRA Partitionable Devices kubernetes/enhancements#4874

Merged

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Oct 15, 2024

k8s-ci-robot closed this Nov 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: Add a POC of an alternate partitioning scheme #35

WIP: Add a POC of an alternate partitioning scheme #35

klueska commented Jun 16, 2024 •

edited

Loading

k8s-ci-robot commented Jun 16, 2024

johnbelamaric left a comment

johnbelamaric Jun 17, 2024

klueska Jun 17, 2024

klueska commented Jun 17, 2024

johnbelamaric commented Jun 17, 2024

johnbelamaric commented Jun 17, 2024 •

edited

Loading

klueska commented Jun 17, 2024

klueska commented Jun 17, 2024

klueska commented Jun 17, 2024

johnbelamaric commented Jun 17, 2024

johnbelamaric commented Jun 17, 2024

klueska commented Jun 17, 2024

k8s-triage-robot commented Sep 15, 2024

k8s-triage-robot commented Oct 15, 2024

k8s-triage-robot commented Nov 14, 2024

k8s-ci-robot commented Nov 14, 2024

WIP: Add a POC of an alternate partitioning scheme #35

WIP: Add a POC of an alternate partitioning scheme #35

Conversation

klueska commented Jun 16, 2024 • edited Loading

k8s-ci-robot commented Jun 16, 2024

johnbelamaric left a comment

Choose a reason for hiding this comment

johnbelamaric Jun 17, 2024

Choose a reason for hiding this comment

klueska Jun 17, 2024

Choose a reason for hiding this comment

klueska commented Jun 17, 2024

johnbelamaric commented Jun 17, 2024

johnbelamaric commented Jun 17, 2024 • edited Loading

klueska commented Jun 17, 2024

klueska commented Jun 17, 2024

klueska commented Jun 17, 2024

johnbelamaric commented Jun 17, 2024

johnbelamaric commented Jun 17, 2024

klueska commented Jun 17, 2024

k8s-triage-robot commented Sep 15, 2024

k8s-triage-robot commented Oct 15, 2024

k8s-triage-robot commented Nov 14, 2024

k8s-ci-robot commented Nov 14, 2024

klueska commented Jun 16, 2024 •

edited

Loading

johnbelamaric commented Jun 17, 2024 •

edited

Loading