-
Notifications
You must be signed in to change notification settings - Fork 368
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support "antctl query endpoint" command out-of-cluster #1137
Comments
@tnqn I would appreciate your feedback on this |
@antoninbas the proposal LGTM overall, just have a question on the API endpoint reachability.
|
@tnqn, yes my bad, the last bullet point is incorrect:
We can use the Node IP to access the Antrea Controller, which is what "antctl supportbundle" is doing now and that's what I tested. However, how can I enable TLS for the connection? The certificate used by the K8s apiserver includes the Node IP in the SAN list, but that's not the case for the Antrea apiserver. We could potentially make it work for the self-signed case, since we regenerate a certificate every time the controller starts, but not for the user-provided case. I feel like option 3 is best here. I don't think using a NodePort Service would solve anything. What do you think of antctl exec'ing into the antrea-controller Pod directly, instead of using a different Pod? We could use that for "antctl supportbundle" (exec onto antrea-controller / exec into antrea-agent) as well, and avoid using insecure connections. |
This is resolvable, when authenticating server certificate, a server name can be provided to be checked against. It will only check hostname (node ip) if server name is empty. Even when antrea-agent authenticates antrea-controller, it uses the server name (antrea.kube-system.svc) instead of service IP or node IP.
Yes, I agree this will work well and can address the security issue and may even get rid of APIServices. I don't quite remember why we didn't go this approach at the very beginning. @jianjuns @weiqiangt do you have opinions on this? |
Thanks @tnqn. for the pointer. The |
Right. I would keep the possibility of exposing public API which can enable potential integrations, even now we do not have many use cases. It is also a more direct and efficient way than Pod exec. And I do not see big issues with APIService. I do not like to support many ways to access API, but if you really believe remote execution and Pod exec are required for "antctl query", I would not be against to it either. Sounds to me APIService or CRD are better solutions though if we want to consider integration with an external manager or UI later. |
@tnqn so I was going to resolve this issue by connecting directly to the Controller and enabling secure mode by using |
@antoninbas I think using a standard K8s versioned API makes more sense now if we want to improve the feature later and keep compatibility. Although it's a "query", it's similar the TokenReview and SubjectAccessReview API in K8s, which uses the resources to represent actions. The APIs have only "create" verb, the query is in Spec of the Review resource and the response is in the Status, and no resources will be persistent anywhere. Examples as below: Do you think it could fall into ops as it's kind of troubleshooting action as well? |
That makes sense. I assume that the fact that the |
@tnqn "ops" is also used for CRDs so it doesn't seem that I can use "ops". Do you think I should use "system" (which does not seem like a great fit) or define a new group? |
While I was thinking a new group or existing groups, I realized the operation is essentially a series of queries against controlplane resources, and most logic should be the same even if it's executed on client side, unless it's kept on server side for other API consumers too. |
Thanks for commenting @tnqn. I do agree that handling everything client-side in antctl is an option, but don't you think we need a public API for these queries? I know that right now we are only looking at antctl as a client, but as part of possible integrations, there could be value in having this API be consumed by other tools, e.g. a UI. In that case, it would be better to have a public API so that: 1) clients don't have to implement the functionality locally using multiple dependent queries and 2) clients don't have to consume our internal "controlplane" API. I am not in a rush to implement this (can be pushed to v0.11), so I'm happy to bring this up at the Community Meeting next week. |
sure, I agree if we plan to have other clients to use the feature, server side implementation is better. We can discuss this in the community meeting. |
At the September 28th Community Meeting (https://github.com/vmware-tanzu/antrea/wiki/Community-Meetings#september-28-2020), we decided that introducing this new API in the |
This issue is stale because it has been open 180 days with no activity. Remove stale label or comment, or this will be closed in 180 days |
This issue is stale because it has been open 180 days with no activity. Remove stale label or comment, or this will be closed in 180 days |
This issue is stale because it has been open 90 days with no activity. Remove stale label or comment, or this will be closed in 90 days |
Describe what you are trying to solve
Currently "antctl query endpoint" can only be run from inside the antrea-controller Pod.
Describe the solution you have in mind
We should be able to run "antctl query endpoint" from out-of-cluster, by providing the appropriate Kubeconfig file.
Describe how your solution impacts user flows
It makes the life of users easier by letting them use the command without exec'ing into the antrea-controller Pod first.
Describe the main design/architecture of your solution
/query-endpoint
URL).antrea-ca
ConfigMap. It seems that this would be useful for the "antctl supportbundle" command, which correctly uses an insecure connection (see Antctl support-bundle command doesn't verify server certificate #758), so we should be able to share code.AntreaControllerInfo
CRD and the Node IP directly, but I am not sure why it chose this option instead.Alternative solutions that you considered
See above for discussion.
Test plan
The text was updated successfully, but these errors were encountered: