-
Notifications
You must be signed in to change notification settings - Fork 14.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Device plugin failures: KEP docs #47029
Device plugin failures: KEP docs #47029
Conversation
✅ Pull request preview available for checkingBuilt without sensitive environment variables
To edit notification comments on pull requests, go to your Netlify site configuration. |
✅ Pull request preview available for checkingBuilt without sensitive environment variables
To edit notification comments on pull requests, go to your Netlify site configuration. |
Hello @SergeyKanzhelev 👋 please take a look at Documenting for a release - PR Ready for Review to get your PR ready for review before Tuesday July 16th 2024 18:00 PST. Thank you! |
Hi @SergeyKanzhelev, a gentle reminder that tomorrow is the deadline for having your Docs PR ready for review. Please take a look at Documenting for a release - PR Ready for Review to get your PR ready for review before tomorrow, Tuesday, July 16th, 2024 18:00 PST. |
Hello @SergeyKanzhelev 👋! I'm reaching out from the Docs team. Just checking in as we approach Docs Freeze on Tuesday, July 30th 18:00 PDT. This documentation appears to still be under review. To meet the Docs Freeze, this PR must have a technical review as well as lgtm and approve labels applied, without any unaddressed comments or concerns from SIG Docs. The status of this enhancement is marked as at risk for docs freeze. Thank you! |
/sig node |
Hi, @SergeyKanzhelev. This PR is very, very far beyond its deadlines. Release Docs has tried to reach you repeatedly. The Docs Ready for Review deadline was last week, and the Docs Freeze is in 5 days. If you do not have documentation in this PR today and begin reviews, we will be forced to consider pulling this enhancement from the release. Please acknowledge. @kubernetes/sig-node-leads |
6e0ba11
to
6b542f0
Compare
👷 Deploy Preview for kubernetes-io-vnext-staging processing.
|
content/en/docs/concepts/extend-kubernetes/compute-storage-net/device-plugins.md
Outdated
Show resolved
Hide resolved
content/en/docs/concepts/extend-kubernetes/compute-storage-net/device-plugins.md
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
from sig-node. Content looks correct and reflects KEP and implementation, minor nonblocking comment inside
content/en/docs/concepts/extend-kubernetes/compute-storage-net/device-plugins.md
Outdated
Show resolved
Hide resolved
content/en/docs/concepts/extend-kubernetes/compute-storage-net/device-plugins.md
Outdated
Show resolved
Hide resolved
content/en/docs/reference/command-line-tools-reference/feature-gates/resource-health-status.md
Outdated
Show resolved
Hide resolved
Hello @SergeyKanzhelev, v1.31 Doc Lead here. This PR has been marked as at risk for Doc Freeze. The Doc Reviewers have given feedback on it, however, an update is yet to be made on it, based on feedback. Please note that the Doc freeze deadline is tomorrow, and if the PR is not merged by tomorrow, you will need to file an exception or this enhancement will be removed from this release. Thank you! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
LGTM label has been added. Git tree hash: 18451a0b4ae73242e5b211e8f16bc700cac8bc84
|
LGTM content-wise from sig-node perspective (I see @mrunalp already added the tag) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks
Docs feedback; this is close to being OK for alpha, but I do recommend some edits.
content/en/docs/concepts/extend-kubernetes/compute-storage-net/device-plugins.md
Outdated
Show resolved
Hide resolved
content/en/docs/concepts/extend-kubernetes/compute-storage-net/device-plugins.md
Outdated
Show resolved
Hide resolved
This status can be used to understand whether pod failure or misbehavior was associated | ||
with the device failure. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm guessing here:
This status can be used to understand whether pod failure or misbehavior was associated | |
with the device failure. | |
For a failed Pod, or or where you suspect a fault, you can use this status to understand whether | |
the Pod behavior may be associated with device failure. For example, if an accelerator is reporting | |
an over-temperature event, the `allocatedResourcesStatus` field may be able to report this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does that example look correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(would like confirmation I guessed right)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The device plugin API doesn't have provision to report unhealthy reason. The device will thus report to the kubelet just healthy/unhealthy. The current user-facing API (in the pod object) also report just healthy/unhealty. So the example is realistic, but in the pod status the user won't actually find "unhealthy because overheating" in 1.31, but just "unhealthy". @SergeyKanzhelev to keep me honest here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK cool. Polishing this up post-merge is fine.
How do we feel about a separate controller annotating the Pod with more detail vs. reporting an Event vs. neither of those?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sftim IMO this is great input for the next stages of this work!
content/en/docs/reference/command-line-tools-reference/feature-gates/resource-health-status.md
Outdated
Show resolved
Hide resolved
Hello @SergeyKanzhelev v1.31 Doc Lead here. This PR is still at risk for Doc Freeze because it is yet to be merged. Kindly note that if the PR is not merged by 18:00 PDT today, you will need to file an exception for the associated enhancement to be a part of this milestone. Thank you! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quick pass for grammar
content/en/docs/concepts/extend-kubernetes/compute-storage-net/device-plugins.md
Outdated
Show resolved
Hide resolved
content/en/docs/concepts/extend-kubernetes/compute-storage-net/device-plugins.md
Outdated
Show resolved
Hide resolved
content/en/docs/concepts/extend-kubernetes/compute-storage-net/device-plugins.md
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
contentwise from sig-node perspective
LGTM label has been added. Git tree hash: 84c0677cdd1dc57d57c92fe8714a9f85824d779c
|
/label tide/merge-method-squash If you squash this to 1 commit keeping the same merge base and final tree, Prow keeps the LGTM too. And it's tidier in the commit history. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/hold
/approve
I recommend a local squash, then forced push (git push --force-with-lease
), then unhold to merge the now squashed commits.
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: mrunalp, sftim The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
OK to unhold if this has been squashed, or if we decide to skip that because of deadlines. |
d8d2d87
to
bb5d11b
Compare
/unhold |
KEP: kubernetes/enhancements#4680
/sig node