-
Notifications
You must be signed in to change notification settings - Fork 248
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add inter process communication API for external features #858
Comments
FIFOs in non-blocking mode could also be used for this instead (or in addition to) sockets. This would allow clients to be shell scripts in addition to real programs. |
Performance impactCompared to hooks and feature files, dummy implementation of this could increase resource usage somewhat, but with additional effort, it will be lower. Wakeups / latency One difference of socket based API compared to hooks is that client decides how often it needs to check system for updates. My recommendation would be for NFD to send clients a default update interval (given on NFD command line) when it connects them. Clients can choose to ignore it, if their command line specifies an override, but by default they should heed it. This way it's easier for k8s admins to centrally lower idle node wakeup frequency, and to do per-client overrides when lower latency is needed. Memory and CPU usage If label value update functionality is in a separate/dedicated NFD client process, that will use more memory than hook programs that are invoked just at some interval by NFD, and not all at the same time / in parallel. Hook invocations can use more CPU, than idling NFD clients though. E.g. in device plugins case, I would recommend folding the NFD hook functionality to the plugin itself, as device plugins are always running, and need to occasionally scan sysfs by anyway. This way memory usage would be lower than with separate hook programs. @mythi Any comments from the device plugins side on this proposal? |
not from device plugins perspective but a generic comment: worker API for this sounds unnecessary since the external features could also be pushed to master directly, right? |
This ticket could be pivoted from local NFD worker API to master remote service API instead, assuming one is suitable for updating features directly. Looking at NFD master code:
The impact of using current master remote API directly, seems to be following... Clients:
Beside that security impact, this would also complicate duplicate detection for life-cycle management proposed in #857 (duplicate feature detection is intended to catch out obsolete clients, and their feature files). If some labels would go directly to master, that would need to add origin ID to all labels (= remote API change), and do per-node label origin ID checks to detect conflicts, instead of worker. IMHO doing that in NFD worker would be easier and more robust (when worker creates the origin ID, faulty client can't fake it). Moving work from workers to master, would also load that more, which might be a problem in larger clouds, if NFD does not support master replication yet. |
Nope, it can't atm. Sending a labeling request removes all other labels from the node, i.e. in practice each node can only have one client. One inconvenience from the deployment pov would also be setting up TLS authentication. Moreover, there is a plan to ditch the gRPC communication completely and replace it with a CRD API (#828). Maybe that could provide a meaningful way for extensions to create features/labels. Finalizers could be used to delete features when an extension is undeployed. |
If one wants to get rid of a host volume too, k8s has a (v1.23) Beta feature (enabled by default) which can help... When Pod I.e. one would be using network socket instead of unix one, but tell k8s to limit it within node. This would imply following changes to life-cycle management I originally suggested:
=> I'll change the original proposal so that it works both for Unix and network sockets. |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
/remove-lifecycle stale |
TGH, this sounds overly complicated and non-cloud-native-style to me I suggest to try to use the new NodeFeature CRD API |
That kind of communication is only few lines of code, but I don't really care how it's implemented. The main point is features life-cycle management (see #857).
This doesn't even mention feature life-cycle. |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. This bot triages issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /close not-planned |
@k8s-triage-robot: Closing this issue, marking it as "Not Planned". In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
What would you like to be added:
Local socket API to specify and update NFD external feature labels. This is extension of #857.
Why is this needed:
Hooks are going away for reasons explained in #855 / #856, whereas life-cycle management (#857) and updating [1] feature files is somewhat annoying.
[1] because one needs use atomic file write to avoid races between feature file write and NFD reading it:
Proposed solution:
spec.internalTrafficPolicy: Local
settingOptimizations:
(Feature updates are expected to be very rare, so spending a bit more mem + CPU on client & worker side to check for updates, is worthwhile.)
Life-time management (#857) for socket-based external features is implement with:
The text was updated successfully, but these errors were encountered: