Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

updates to device failures for alpha2 #4862

Merged
merged 1 commit into from
Oct 10, 2024

Conversation

SergeyKanzhelev
Copy link
Member

/sig node
/cc @johnbelamaric

KEP: #4680

@k8s-ci-robot k8s-ci-robot added sig/node Categorizes an issue or PR as relevant to SIG Node. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Sep 19, 2024
@SergeyKanzhelev SergeyKanzhelev force-pushed the device-failures branch 3 times, most recently from 9f1a3f1 to 3f71be9 Compare October 8, 2024 23:46
Copy link
Member

@johnbelamaric johnbelamaric left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's better, it's simpler too.

@SergeyKanzhelev SergeyKanzhelev force-pushed the device-failures branch 2 times, most recently from 9ca9314 to aebf199 Compare October 10, 2024 15:14
@johnbelamaric
Copy link
Member

@pohly @klueska if you have a chance, would appreciate your review especially on the kubelet/driver interface.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 10, 2024
@johnbelamaric
Copy link
Member

/cc @mrunalp

since he has been handling DRA related changes, and this is adding DRA support to this feature

@k8s-ci-robot k8s-ci-robot requested a review from mrunalp October 10, 2024 16:05
}
```

In alpha2 in order to support pod level DRA resources, the following field will be added to the PodStatus:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How are "pod level DRA resources" defined?

Is the goal to list health of all allocated devices here and then also for each container that references them?

Not a blocker for the KEP, but this needs to be clarified before implementing this API.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would think container-associated devices will be in container statuses. And this is for network DRA for now

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I would expect devices associated with requests that are attached to containers get sorted into those. Resource claims not tied to any container, but tied to this pod, would show up in here. In the normal case, that's just network devices. The one question I have is if someone does tie a request to a container, but the actual content of the request is pod level, which slice should it show up in? I think probably pod level?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's why I was asking for a definition. The code needs to decide where to put things and users must know where to expect what.

The only definition that I can think of is "each container reports everything it has access to" and then this here is "whatever is not accessed by any container" - so basically everything that is left.

We don't have an explicit "pod level" flag.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But a container doesn't have "access" to a network device. The network namespace is pod level. So, that driver would know to put it at the pod level.

Also, who is doing the sorting? Based on the grpc call, it's actually kubelet that sorts the devices into these buckets, since the response only includes the deviceid and health. kubelet wouldn't know how to sort these except by how the requests are defined.

Looking a little deeper, if kubelet is going to sort individual device ids coming back from the grpc call into the right container buckets, it needs the mapping from device id -> container. That would come from here. So kubelet will need to load the resource claim status, and create a map of allocated device ID from that array into claim and request, then look up in the podspec and containerspecs which container and/or pod that goes to.

So, if a user lists a pod-level request in a container, then it will show up in that container. Likewise, if two containers share a device via a claim, it will show up in BOTH containers statuses!

I guess that's ok? or should we just have one list of all the claims and their device statuses, and make the user map those back to containers? Seems unfriendly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think it's ok.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The grRPC API isn't important enough to block the KEP, but this here looks like something that hasn't been thought through yet.

However, it's also "only the API", so perhaps we can waive it through and do the real API review during the implementation review? I'll defer to @johnbelamaric here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to see the discussion above captured in the KEP - basically grab the comment above and add it to design details. Key points being:

  • driver only needs to report deviceID/health pairs
  • kubelet needs to use the resource claims status and pod/container specs to map deviceID -> claim/request names -> correct status field here
  • for DRA, resource name for container resources is something like the string "claim" along with the Name and Request from the container ResourceRequirements ResourceClaim
  • for DRA, any device not mapped to a container will show up in the pod resource list, and will use "claim" along with the name from PodResourceClaim Name. I don't think we can get a request name at that point, I think without associating the claim to a specfic container, we just actuate all the requests in the claim?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think without associating the claim to a specfic container, we just actuate all the requests in the claim?

We always prepare all claims referenced by a pod , whether they are referenced by a container or not. Based on the information returned by the driver, the kubelet then knows how to associated CDI device IDs with containers.

There is no such thing as "actuate a request".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no such thing as "actuate a request".

Got it, thanks. So the above should work.

...

// AllocatedResourcesStatus represents the status of various resources
// allocated for this Pod (Pod level resources).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See question about "pod level DRA resources" above.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a small clarification here

// returns the new list.
// This method is optional and may not be implemented.
rpc WatchDevicesStatus(Empty) returns (stream DevicesStatusResponse) {}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you put this into a new gRPC interface? Then it can be versioned as part of this KEP and drivers are not forced to implement this method - they would be if it became part of the NodeServer interface.

At a practical level it'll help avoid code conflicts. I need to promote that dra/v1alpha4 API to dra/v1beta1 in 1.32.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any implications of having two interfaces running? Would it mean we need an extra port for the new interface?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can serve different gRPC interfaces over the same connection. They are like different endpoints in an HTTP server.

That we version them is a bit odd. There's no difference between v1alpha4 and v1beta1, but because we put the version into the name, they are completely different as far as gRPC is concerned. But that's just an aside.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm okay with treating this as an implementation detail that'll get addressed later - no need to update the KEP for it now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added a note about it.

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 10, 2024
@johnbelamaric
Copy link
Member

/lgtm

Thanks @SergeyKanzhelev, will be great to have this.

@mrunalp or @dchen1107 can you PTAL and approve?

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 10, 2024
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mrunalp, SergeyKanzhelev

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 10, 2024
@k8s-ci-robot k8s-ci-robot merged commit 8b30549 into kubernetes:master Oct 10, 2024
4 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v1.32 milestone Oct 10, 2024
@SergeyKanzhelev SergeyKanzhelev deleted the device-failures branch October 10, 2024 19:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory lgtm "Looks good to me", indicates that a PR is ready to be merged. sig/node Categorizes an issue or PR as relevant to SIG Node. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants