-
Notifications
You must be signed in to change notification settings - Fork 557
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ability to debug nodes with running debug container #8720
Comments
Idea from planning:
This allows the command to work even if Talos machine can't pull any image from the registry at the moment, and any custom image can be pushed. Example: Talos APIs:
Maintenance mode:
|
To prevent any changes, probably mount host fs as read-only (?). Add a kernel arg to completely disable the feature (?). |
I think we should hold off on this for now. IMHO not having something like this is the point of Talos, really. I completely understand the urge to have a quick win but something like this almost immediately breaks our whole stance with Talos Linux. Could we better understand the use case? If we are going to be an API Linux, let’s be an API Linux and figure out what we are missing for the use case. The dashboard could be a place to put more local debug tooling. This feature will absolutely be abused it makes us look a weak in our stance. @kvaps What is/are the scenario(s) in which you think this would be used? |
My story began with the missed opportunity to run standard debugging tools such as Also, I have some scenarios for debugging specific CRI containers, for example, entering various Linux kernel namespaces and running thesee tools there. See the approach suggested by my kubectl node-shell plugin: https://github.com/kvaps/kubectl-node-shell/?tab=readme-ov-file#x-mode |
I have a debug daemonset running for the same reason. But if the kubelet does not run you cannot do anything to fix it. In my case things like zfs might be wrongly configured and need maintenance by an admin. |
I think there's nothing wrong with the APIs to run containers on Talos, as all Talos & Kubernetes do is run containers.
|
It 100% breaks the design goals I had in mind when creating Talos. One goal was to do APIs and over time add the APIs we need to replace as much of the user space as we could. Adding a debug container is going to open the door to possible attacks, be abused and set a precedence for lazy practices, and make us less motivated to add APIs. Everything can just be dumped off into a debug container. A debug container makes you ask if an API is even needed in the first place. The Talos “API” could literally just be I want to be pragmatic here and I completely understand the use case but philosophically this isn’t Talos. We should rather be asking what APIs we can add, what operational knowledge can we build in, what information can we expose, and/or can we automatically resolve the issue. There will be edge cases that become painful without a debug container, I completely understand that and I don’t want to tell anyone to just deal with it, but if we don’t have those pains we will never grow the Talos API and the automation goals it has, and worst of all it tears apart the whole argument for having an API in the first place. |
One option today could be a system extensions that runs a "debug container" with SSH enabled. Run it all the time if you really want these tools. The new ability to configure an extension could allow for adding allowed keys. This would address the corner cases, work today, and not break down the Talos Linux arguments we make day in and day out as it (a debug container) wouldn't be something we support first class. It is essentially the same but it isn't endorsed nor encouraged. |
Running containers is a basic feature of Talos, and I don't think adding this to the API breaks any promise, or blocks the development of the APIs going forward. The proposed here is to add APIs to run containers, and connect to their stdin/stdout/stderr ( A container running is still sandboxed with some set of permissions of what the container can actually do. I can understand the emotional reaction, but it's more about the way thing are being used vs. having or not having some feature. "Regular" Linux distro offers tools to do tons of things, but if |
Systemd offers a whole lot more than we do today, yet, here we are with very large companies using us and loving us and a community growing daily with zero marketing. The philosophy and stance of Talos Linux is just as important as the technical implementation. Talos Linux is a statement: we need to do infrastructure better and with APIs for everything. I agree I would be more than happy to talk more and would invite a deeper discussion around this but as of now I don't see this coming to Talos Linux. I don't think there is a right and wrong in this situation so this subject makes it very easy to take a strong stance on either side and feel like the other is wrong. To be clear, I don't thing this is wrong from a purely technical PoV but there are bigger things at play here. In fact I was excited about this idea when I first thought about it but over the course of a day other things began to break down around it. |
@andrewrynhard, we already have the cat command; how about adding another one called socat? Kubernetes uses it to implement proxy and port-forward in their API. We could do the same, which would enable us to debug CRI. For example:
|
not Andrew, I feel this might be powerful, but this is too much unconstrained access which we can't impose any limits on. API-level access has its limits which we can enforce, raw socket is all or nothing, plus as @rothgar pointed out it requires the user to have |
I would agree. Debug containers are powerful too. And I see the need and desire for both of these ideas. I really do. As Andrey points out, a reason we can do the level of automation we do and offer the level of security we do is because Talos imposes limits. I don't want those limits to be so restrictive that we start to lose adoption but I also want to stick to our principles. This is a tough situation. What we have done in the past is waited for the right idea to come about and we were always happy we didn't rush to fix problem X if we didn't think any existing solutions would stay within the Talos ethos. @kvaps Maybe we can start with what you needed specifically within the containerd API. |
I just want to point out one of the main goals of Talos: keep humans off a machine and from breaking things. That is literally why the API exists. At one point I had the kernel running the kubelet as PID1 and it was impractical. I was faced with a decision. Drop in a shell and lose my goal of removing humans from the machine or find another way. That is when the API idea came about. We should strive to push Talos towards this goal IMHO. |
I totally understand, but in this case we have to cover everything with the API. Especially me need the commands like |
debug is a special case , or a special mode , it is an option to give users . Everything looking like a nail to someone with a hammer. If hammer can handle everything ,that is fine, but the truth is not . For special cases ,I think it is okay to use speical/smart tools to handle . |
For now we're not going to implement an easy way to do this via talosctl. There are still some other ideas we're thinking about that could provide similar debug access but nothing is planned right now. I'm going to close this issue because we'll need to think about other ways we can implement this without giving raw/open access that bypasses the API. |
Feature request
Description
It would be nice to relaise API interface and command for talosctl to debug.
It might be done the same way as kubectl debug:
Or to proxy CRI socket the same way how ssh agent works:
which outputs:
then use crictl to run debug CRI:
Or run a container:
The text was updated successfully, but these errors were encountered: