Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DRA: Resource Claim Status with possible standardized network interface data #4817

Open
1 of 4 tasks
klueska opened this issue Aug 30, 2024 · 16 comments
Open
1 of 4 tasks
Assignees
Labels
lead-opted-in Denotes that an issue has been opted in to a release sig/node Categorizes an issue or PR as relevant to SIG Node.
Milestone

Comments

@klueska
Copy link
Contributor

klueska commented Aug 30, 2024

Enhancement Description

@k8s-ci-robot k8s-ci-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Aug 30, 2024
@kannon92
Copy link
Contributor

/sig node

@k8s-ci-robot k8s-ci-robot added sig/node Categorizes an issue or PR as relevant to SIG Node. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Aug 30, 2024
@squeed
Copy link

squeed commented Sep 9, 2024

FYI, here is the CNI result type.

A typical result looks something like:

{
    "ips": [
        {
          "address": "10.1.0.5/16",
          "gateway": "10.1.0.1",
          "interface": 2
        },
        {
          "address": "2001:db8::42/64",
          "gateway": "2001:db8::1",
          "interface": 2
        }
    ],
    "routes": [
      {
        "dst": "0.0.0.0/0"
      },
      {
        "dst": "::/0"
      }
    ],
    "interfaces": [
        {
            "name": "cni0",
            "mac": "00:11:22:33:44:55"
        },
        {
            "name": "veth3243",
            "mac": "55:44:33:22:11:11"
        },
        {
            "name": "eth0",
            "mac": "99:88:77:66:55:44",
            "sandbox": "/var/run/netns/blue"
        }
    ]
}

The CNI result type is mostly not exposed to k8s end-users. That said, IP addresses of the PodSandbox would be the bare minimum.

Note that IP addresses are, of course, mutable; if a node reboots, the PodSandbox will be recreated and new IPs allocated.

@aojea
Copy link
Member

aojea commented Sep 10, 2024

I was more leaning towards standardize by convention and not via Kubernetes API fields, think that it will be very complex on reach an agreement and support all the use cases, and also very difficult to iterate if everytime we need to go through API review. Adding a new field to routes or interfaces per example, will require to go through all the process, and there will be always niche use cases for some protocols

I was thinking using the CNI result types as foundation, but there are fields we don't want on the API for users, sandbox in the example.

I'm proposing to add a new field unstructured to Claim Status (right now the Configuration is also unstructured) where DRA drivers can write the results of each allocation, and use the CNI types to create a convention object we can share across networking DRA drivers

// DeviceConfiguration must have exactly one field set. It gets embedded
// inline in some other structs which have other fields, so field names must
// not conflict with those.
type DeviceConfiguration struct {
	// Opaque provides driver-specific configuration parameters.
	//
	// +optional
	// +oneOf=ConfigurationType
	Opaque *OpaqueDeviceConfiguration `json:"opaque,omitempty" protobuf:"bytes,1,opt,name=opaque"`
}

cc: @danwinship

@LionelJouin
Copy link
Contributor

I made a PoC about the ResourceClaim status for networking: https://gist.github.com/LionelJouin/5cfc11eecf73663b5657ed3be1eb6c00

The AllocatedDeviceStatus has been extended to contain two new fields:

  • DeviceInfo: A field accepting any kind of data like the opaque parameters (.spec.devices.config.opaque.parameters).
  • NetworkDeviceInfo: A field only for the network devices.

For reference for the NetworkDeviceInfo field, Multus works with annotations and reports the status in a special annotation. Here is the API it uses for it: https://github.com/k8snetworkplumbingwg/network-attachment-definition-client/blob/v1.7.1/pkg/apis/k8s.cni.cncf.io/v1/types.go#L103

This PoC uses CNI, the result is parsed and the IPs, Interface Name and Mac address is added to the NetworkDeviceInfo field and the full CNI result is added to the DeviceInfo field.

Here are the changes to the API:

// AllocatedDeviceStatus contains the status of an allocated device, if the
// driver chooses to report it. This may include driver-specific information.
type AllocatedDeviceStatus struct {
    ...
	// DeviceInfo contains Arbitrary driver-specific data.
	//
	// +optional
	DeviceInfo runtime.RawExtension `json:"deviceInfo,omitempty" protobuf:"bytes,6,rep,name=deviceInfo"`

	// NetworkDeviceInfo contains network-related information specific to the device.
	//
	// +optional
	NetworkDeviceInfo NetworkDeviceInfo `json:"networkDeviceInfo,omitempty" protobuf:"bytes,7,rep,name=networkDeviceInfo"`
}

// NetworkDeviceInfo provides network-related details for the allocated device.
// This information may be filled by drivers or other components to configure
// or identify the device within a network context.
type NetworkDeviceInfo struct {
	// Interface specifies the name of the network interface associated with
	// the allocated device. This might be the name of a physical or virtual
	// network interface.
	//
	// +optional
	Interface string `json:"interface,omitempty" protobuf:"bytes,1,rep,name=interface"`

	// IPs lists the IP addresses assigned to the device's network interface.
	// This can include both IPv4 and IPv6 addresses.
	//
	// +optional
	IPs []string `json:"ips,omitempty" protobuf:"bytes,2,rep,name=ips"`

	// Mac represents the MAC address of the device's network interface.
	//
	// +optional
	Mac string `json:"mac,omitempty" protobuf:"bytes,3,rep,name=mac"`
}

Here is an example of the final ResourceClaim after running the demo of this PoC:

apiVersion: resource.k8s.io/v1alpha3
kind: ResourceClaim
metadata:
  name: macvlan-eth0-attachment
spec:
  devices:
    config:
    - opaque:
        driver: poc.dra.networking
        parameters:
          config: '{ "cniVersion": "1.0.0", "name": "macvlan-eth0", "plugins": [ {
            "type": "macvlan", "master": "eth0", "mode": "bridge", "ipam": { "type":
            "host-local", "ranges": [ [ { "subnet": "10.10.1.0/24" } ] ] } } ] }'
          interface: net1
      requests:
      - macvlan-eth0
    requests:
    - allocationMode: ExactCount
      count: 1
      deviceClassName: cni-v1
      name: macvlan-eth0
status:
  allocation:
    devices:
      config:
      - opaque:
          driver: poc.dra.networking
          parameters:
            config: '{ "cniVersion": "1.0.0", "name": "macvlan-eth0", "plugins": [
              { "type": "macvlan", "master": "eth0", "mode": "bridge", "ipam": { "type":
              "host-local", "ranges": [ [ { "subnet": "10.10.1.0/24" } ] ] } } ] }'
            interface: net1
        requests:
        - macvlan-eth0
        source: FromClaim
      results:
      - device: cni
        driver: poc.dra.networking
        pool: kind-worker
        request: macvlan-eth0
    nodeSelector:
      nodeSelectorTerms:
      - matchFields:
        - key: metadata.name
          operator: In
          values:
          - kind-worker
  deviceStatuses:
  - conditions: null
    device: cni
    deviceInfo:
      cniVersion: 1.0.0
      interfaces:
      - mac: 1e:32:6c:b7:c9:66
        name: net1
        sandbox: /var/run/netns/cni-5b7c0846-7995-9450-f441-a177399d08d5
      ips:
      - address: 10.10.1.2/24
        gateway: 10.10.1.1
        interface: 0
    driver: poc.dra.networking
    networkDeviceInfo:
      interface: net1
      ips:
      - 10.10.1.2/24
      mac: 1e:32:6c:b7:c9:66
    pool: kind-worker
    request: macvlan-eth0
  reservedFor:
  - name: demo-a
    resource: pods
    uid: 2bd46adf-b478-4e25-9e37-828539799169

@pohly
Copy link
Contributor

pohly commented Sep 11, 2024

The AllocatedDeviceStatus has been extended to contain two new fields:

Not just that, the entire claim.status.deviceStatuses field is new. Who writes that field? I see in https://github.com/LionelJouin/kubernetes/commits/dra-device-status/ that you grant kubelet permission to write the claim status, but not the actual code.

The direction from SIG Auth is to move away from having kubelet publish information on behalf of node agents. It's better to authorize those agents and let them do the writes themselves. That enables version skew (kubelet from 1.31, control plane from 1.32 with a different API) and keeps kubelet out of the authorization path.

I don't have a strong opinion about what information should be published. API reviewers will want a strong justification for using a RawExtension. But a strongly typed NetworkDeviceInfo struct isn't an easy sell either unless there's consensus about which information should be standardized like that.

      - opaque:
          driver: poc.dra.networking
          parameters:
            config: '{ "cniVersion": "1.0.0", "name": "macvlan-eth0", "plugins": [
              { "type": "macvlan", "master": "eth0", "mode": "bridge", "ipam": { "type":
              "host-local", "ranges": [ [ { "subnet": "10.10.1.0/24" } ] ] } } ] }'
            interface: net1

Why is config a string? Wouldn't this be better:

      - opaque:
          driver: poc.dra.networking
          parameters:
            config:
              cniVersion: "1.0.0"
              name: "macvlan-eth0
              ...
            interface: net1

There's no kind and version in the parameters content. You will have to think about versioning those parameters when going forward with standardizing them.

@LionelJouin
Copy link
Contributor

Oops, that's correct, my bad, sorry, here it the API change:

// ResourceClaimStatus tracks whether the resource has been allocated and what
// the result of that was.
type ResourceClaimStatus struct {
	...

	// DeviceStatuses contains the status of each device allocated for this
	// claim, as reported by the driver. This can include driver-specific
	// information. Entries are owned by their respective drivers.
	//
	// +optional
	// +listType=map
	// +listMapKey=devicePoolName
	// +listMapKey=deviceName
	DeviceStatuses []AllocatedDeviceStatus `json:"deviceStatuses,omitempty" protobuf:"bytes,4,opt,name=deviceStatuses"`
}


// AllocatedDeviceStatus contains the status of an allocated device, if the
// driver chooses to report it. This may include driver-specific information.
type AllocatedDeviceStatus struct {
	// Request is the name of the request in the claim which caused this
	// device to be allocated. Multiple devices may have been allocated
	// per request.
	//
	// +required
	Request string `json:"request" protobuf:"bytes,1,rep,name=request"`

	// Driver specifies the name of the DRA driver whose kubelet
	// plugin should be invoked to process the allocation once the claim is
	// needed on a node.
	//
	// Must be a DNS subdomain and should end with a DNS domain owned by the
	// vendor of the driver.
	//
	// +required
	Driver string `json:"driver" protobuf:"bytes,2,rep,name=driver"`

	// This name together with the driver name and the device name field
	// identify which device was allocated (`<driver name>/<pool name>/<device name>`).
	//
	// Must not be longer than 253 characters and may contain one or more
	// DNS sub-domains separated by slashes.
	//
	// +required
	Pool string `json:"pool" protobuf:"bytes,3,rep,name=pool"`

	// Device references one device instance via its name in the driver's
	// resource pool. It must be a DNS label.
	//
	// +required
	Device string `json:"device" protobuf:"bytes,4,rep,name=device"`

	// Conditions contains the latest observation of the device's state.
	// If the device has been configured according to the class and claim
	// config references, the `Ready` condition should be True.
	//
	// +optional
	// +listType=atomic
	Conditions []metav1.Condition `json:"conditions" protobuf:"bytes,5,rep,name=conditions"`

	// DeviceInfo contains Arbitrary driver-specific data.
	//
	// +optional
	DeviceInfo runtime.RawExtension `json:"deviceInfo,omitempty" protobuf:"bytes,6,rep,name=deviceInfo"`

	// NetworkDeviceInfo contains network-related information specific to the device.
	//
	// +optional
	NetworkDeviceInfo NetworkDeviceInfo `json:"networkDeviceInfo,omitempty" protobuf:"bytes,7,rep,name=networkDeviceInfo"`
}

// NetworkDeviceInfo provides network-related details for the allocated device.
// This information may be filled by drivers or other components to configure
// or identify the device within a network context.
type NetworkDeviceInfo struct {
	// Interface specifies the name of the network interface associated with
	// the allocated device. This might be the name of a physical or virtual
	// network interface.
	//
	// +optional
	Interface string `json:"interface,omitempty" protobuf:"bytes,1,rep,name=interface"`

	// IPs lists the IP addresses assigned to the device's network interface.
	// This can include both IPv4 and IPv6 addresses.
	//
	// +optional
	IPs []string `json:"ips,omitempty" protobuf:"bytes,2,rep,name=ips"`

	// Mac represents the MAC address of the device's network interface.
	//
	// +optional
	Mac string `json:"mac,omitempty" protobuf:"bytes,3,rep,name=mac"`
}

The ResourceClaim Status is set via the Kubernetes API by the component calling the CNI add. The flow is the following:

  1. NodePrepareResources is called (Server is Containerd)
    1. The ResourceClaim is retrieved via the Kubernetes API
  2. RunPodSandbox is called (Server is Containerd)
    1. The CNIs are called
    2. The ResourceClaim status is updated via the Kubernetes API

In this PoC, The container runtime acts as DRA-Driver (I mixed several PoCs together), so it calls the CNIs and sets the status via the Kubernetes API. I used the kubelet kubeconfig to access the Kubernetes API (as for simplicity since Containerd was not using the Kubernetes API before).

Yes, that's right, the config could be in better format, for this PoC I mainly focused on the ResourceClaim Status and DRA-Driver in Containerd.

@pohly
Copy link
Contributor

pohly commented Sep 11, 2024

I used the kubelet kubeconfig to access the Kubernetes API (as for simplicity since Containerd was not using the Kubernetes API before).

So it wasn't kubelet itself, thus avoiding the version skew problem there at the cost moving it into containerd. How to secure that write will be tricky.

@johnbelamaric
Copy link
Member

@LionelJouin this is great to see. Antonio, Tim, and I put together some thoughts on the API options (open to [email protected]):

https://docs.google.com/document/d/1smjEm7WxFm8btQwywdr_F5BZkiifatNqDvrQUaesu1M/edit?usp=sharing

Personally, I like Option 5 - real, built-in API.

If code snippets help, I can add those. Once we decide on the rough API, I can start the KEP. We'll need to figure out the authorization bit: this needs to be writeable by the drivers, but only for claims on the same node as the driver instance. @pohly I know you have looked a lot at the node authorizer, do you think we can use that here?

@johnbelamaric johnbelamaric self-assigned this Sep 13, 2024
@johnbelamaric
Copy link
Member

/assign @aojea

@pohly
Copy link
Contributor

pohly commented Sep 16, 2024

@pohly I know you have looked a lot at the node authorizer, do you think we can use that here?

Better ask SIG Auth. The way I remember it, the answer is currently "no, node authorizer is only for kubelet, not agents running on a node". But I think there were plans to change that, which might be sufficient for the this issue (start as alpha without locking down access, make it a requirement for beta that it gets locked down through some SIG Auth KEP).

@haircommander
Copy link
Contributor

/milestone v1.32
/label lead-opted-in

@k8s-ci-robot k8s-ci-robot added this to the v1.32 milestone Sep 17, 2024
@k8s-ci-robot k8s-ci-robot added the lead-opted-in Denotes that an issue has been opted in to a release label Sep 17, 2024
@johnbelamaric
Copy link
Member

/assign @LionelJouin

Lionel is getting a PR ready.

@jenshu
Copy link

jenshu commented Sep 29, 2024

Hello @klueska @pohly @johnbelamaric @thockin @aojea 👋, Enhancements team here.

Just checking in as we approach enhancements freeze on 02:00 UTC Friday 11th October 2024 / 19:00 PDT Thursday 10th October 2024.

This enhancement is targeting for stage alpha for v1.32 (correct me, if otherwise)

Here's where this enhancement currently stands:

  • KEP readme using the latest template has been merged into the k/enhancements repo.
  • KEP status is marked as implementable for latest-milestone: v1.32.
  • KEP readme has up-to-date graduation criteria
  • KEP has a production readiness review that has been completed and merged into k/enhancements. (For more information on the PRR process, check here). If your production readiness review is not completed yet, please make sure to fill the production readiness questionnaire in your KEP by the PRR Freeze deadline on Thursday 3rd October 2024 so that the PRR team has enough time to review your KEP.

For this KEP, we would just need to update the following:

  • The KEP readme looks complete, but the PR hasn't been merged yet.
  • In kep.yaml, status must be set to implementable and latest-milestone must be set to v1.32
  • A PRR needs to be created.

The status of this enhancement is marked as at risk for enhancement freeze. Please keep the issue description up-to-date with appropriate stages as well.

If you anticipate missing enhancements freeze, you can file an exception request in advance. Thank you!

@jenshu
Copy link

jenshu commented Oct 11, 2024

1.32 Enhancements team here. I see that the KEP and PRR have been merged, so I've updated the status to tracked for enhancements freeze!

@chanieljdan
Copy link

Hi @klueska @pohly @johnbelamaric @thockin @aojea 👋, 1.32 Release Docs Lead here.

Does this enhancement work planned for 1.32 require any new docs or modification to existing docs?

If so, please follows the steps here to open a PR against dev-1.32 branch in the k/website repo. This PR can be just a placeholder at this time and must be created before Thursday October 24th 2024 18:00 PDT.

Also, take a look at Documenting for a release to get yourself familiarize with the docs requirement for the release.

Thank you!

@mbianchidev
Copy link
Member

Hey hey @LionelJouin @aojea @johnbelamaric 👋 from the v1.32 Communications Team!

We'd love for you to consider writing a feature blog about your enhancement.
Some reasons why you might want to write a blog for this feature include (but are not limited to) if this introduces breaking changes, is important to our users, or has been in progress for a long time and it is graduating.

To opt-in, let us know by opening a Feature Blog placeholder PR against the website repository by 30th Oct 2024. For more information about writing a blog see the blog contribution guidelines.

Note: In your placeholder PR, use XX characters for the blog date in the front matter and file name. We will work with you on updating the PR with the publication date once we finalize the blog schedule.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lead-opted-in Denotes that an issue has been opted in to a release sig/node Categorizes an issue or PR as relevant to SIG Node.
Projects
Status: Tracked for enhancements freeze
Status: Pre-Alpha
Status: Tracked
Development

No branches or pull requests