-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🌱 Use all available endpoints for etcd #2888
Conversation
@@ -107,7 +107,7 @@ func (m *Management) GetWorkloadCluster(ctx context.Context, clusterKey client.O | |||
RootCAs: caPool, | |||
Certificates: []tls.Certificate{clientCert}, | |||
} | |||
|
|||
cfg.InsecureSkipVerify = true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is being done because the endpoint used to be statically set to "127.0.0.1" which was in the SAN list for the etcd certs. With these changes, the TLS verify was failing silently as we now use a dynamic set of endpoints.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible to get an example where the endpoint advertised by the cluster isn't included in the SAN list for the etcd certs? kubeadm should be generating the certificates to include both localhost and the host IP in the listed SANs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kubeadm is doing the right thing but because we have our own dialer, we don't use endpoints as they were meant to be used. Right now, we send the etcd pod names as endpoints which will of course not be included in the SAN list. Our dialer then uses the podname to figure out which pod to set up the port forwarding to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, what @gab-satchi said. The address that is being attempted to be verified is pod.namespace
, API server itself is providing some guarantee of routing us to the correct place and mutual TLS authentication means we don't really need to verify the identity further.
/milestone v0.3.x Setting the milestone to after v0.3.4 for now, until we have better e2e in place for this change to go in |
/assign @detiber @randomvariable for reviews |
@@ -107,7 +107,7 @@ func (m *Management) GetWorkloadCluster(ctx context.Context, clusterKey client.O | |||
RootCAs: caPool, | |||
Certificates: []tls.Certificate{clientCert}, | |||
} | |||
|
|||
cfg.InsecureSkipVerify = true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible to get an example where the endpoint advertised by the cluster isn't included in the SAN list for the etcd certs? kubeadm should be generating the certificates to include both localhost and the host IP in the listed SANs.
} | ||
dialer, err := proxy.NewDialer(p) | ||
if err != nil { | ||
return nil, err | ||
} | ||
etcdclient, err := etcd.NewEtcdClient("127.0.0.1", dialer.DialContextWithAddr, c.tlsConfig) | ||
etcdclient, err := etcd.NewEtcdClient(endpoints, dialer.DialContextWithAddr, c.tlsConfig) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In Kubeadm, after connecting to etcd, we call client.Sync so we can get rid of discrepancies between the list of etcd endpoints (pods in this case) and the list of etcd members actually running.
should we do the same here as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably not because the real IPs aren't accessible by the management cluster. We always have to use API server to get the pods.
/retitle 🏃 Use all available endpoints for etcd |
/unhold |
@gab-satchi I think you need to rebase. |
@gab-satchi: The following test failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
"github.com/pkg/errors" | ||
kerrors "k8s.io/apimachinery/pkg/util/errors" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit. extra space.
etcdClient, err := clientv3.New(clientv3.Config{ | ||
Endpoints: []string{endpoint}, | ||
Endpoints: endpoints, | ||
DialTimeout: etcdTimeout, | ||
DialOptions: []grpc.DialOption{ | ||
grpc.WithBlock(), // block until the underlying connection is up |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of lines below, do we need this call: etcdClient.Endpoints()
Maybe after making this call, we want to check if there are any endpoints returned and error if empty?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clientv3.New
will error if it's given an empty slice so that should already be covered. Will remove the etcdClient.Endpoints()
Putting it back on hold. Found a bug in leadership forwarding |
/hold |
/retitle 🌱 Use all available endpoints for etcd |
- exclude node being removed from nodelist for etcd client
/unhold |
lgtm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/approve
/assign @detiber
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: gab-satchi, vincepri The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/lgtm |
What this PR does / why we need it:
This PR is an experiment to use all available etcd endpoints when creating an etcd client. KCP currently has 3 instances where it needs to create a client:
forLeader
is used in this instance.#2844
/hold