-
Notifications
You must be signed in to change notification settings - Fork 413
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add etcd-quorum-guard manifests and doc #613
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: RobertKrawitz If they are not already assigned, you can assign the PR to them by writing The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
6d09ab4
to
20df4fc
Compare
I suspect the pkg/operator/sync.go needs an update to include the deployment. perhaps @kikisdeliveryservice or @runcom can confirm. |
name: etcd-quorum-guard | ||
namespace: kube-system | ||
spec: | ||
replicas: 3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it should be possible to teach mco to scale up / decide the replica count based on number of master node.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed beyond 4.1; for 4.1, it has been decided to only support 3 masters.
effect: NoExecute | ||
operator: Exists | ||
containers: | ||
- image: registry.svc.ci.openshift.org/openshift/origin-v4.0:base |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This must be plumbed through release image.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"{{.Images.etcdQuorumGuardImage}}"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
declare -r key="${cert%.crt}.key" | ||
declare -r cacert="$croot/ca.crt" | ||
[[ -z $cert || -z $key ]] && exit 1 | ||
curl --max-time 2 --silent --cert "${cert//:/\:}" --key "$key" --cacert "$cacert" "$health_endpoint" |grep '{ *"health" *: *"true" *}' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/cc @hexfusion
please use the metrics client certs that were created to connect to etcd
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where are those certs located?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could get crt/key with something like
oc -n openshift-config get secrets etcd-metric-client -o yaml
ca
oc get configmap -n openshift-config etcd-metric-serving-ca -o yaml
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can use etcd proxy for /health with these certs. port 9979
vs 2379
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can do that inside the pod?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with the rationale; my question is how to get the appropriate cert.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
working on this now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that the etcd-quorum-guard proper does not have any Go code in it; it's simply (right now) a static deployment and disruption budget, with the lone pod being a trivial script.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
with #623 you should be able to mount the resources and then consume in your bash as local files. Something like.
volumeMounts:
- mountPath: "/etc/ssl/certs/etcd"
name: etcd-metric-client
readOnly: true
volumes:
- name: etcd-metric-client
secret:
secretName: etcd-metric-client
yeah, if we're now watching this manifest, we need to sync it up as well |
test/e2e/etcdquorumguard_test.go
Outdated
kclient, err := k8sclient.NewForConfig(config) | ||
if err != nil { | ||
return nil, fmt.Errorf("Error creating client: %s\n", err.Error()) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we do have this initialization in e2e, you can reuse
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup.
@RobertKrawitz added some comments, also could you ensure that your final commits have a brief i sentence "why" summary in the body. Thank you for adding the doc!! |
Yup. |
talking to @hexfusion , #623 needs to merge before this one, so adding a hold label to make sure they go in correctly. /hold |
8beb6b9
to
a331155
Compare
@@ -171,6 +172,28 @@ func (optr *Operator) syncMachineConfigController(config renderConfig) error { | |||
return nil | |||
} | |||
|
|||
func (optr *Operator) syncEtcdQuorumGuard(config renderConfig) error { | |||
eqgBytes, err := renderAsset(config, "manifests/etcdquorumguard/deployment.yaml") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this sync also wait for the deployment to correctly roll out? (see waitForDeploymentRollout
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Done.
Ok, this has the usual chicken and egg issue when adding something to bootkube (and a new image). Abhinav summarized what needs to happen (generally) when adding a new image here: #538 (comment) You can follow that (and should be pretty straightforward, you also already have the installer PR up) Other than that, by looking at past PR, e.g. adding infraImage, the flow has been like this:
I hope the above is clear enough (let me know otherwise) |
@runcom so to be clear, I need a mini PR against the MCO, the installer PR, @hexfusion's PR so I can get the correct cert, this PR, and finally an aditional installer PR for cleanup (five PRs all told)? |
declare -r cert="$croot/tls.crt" | ||
declare -r key="$croot/tls.key" | ||
declare -r cacert="/var/run/secrets/kubernetes.io/serviceaccount/etcd-metric-serving-ca.crt" | ||
ls -lR "$croot" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
etcd-metric-serving-ca
is a ConfigMap vs Secret
ccf5879
to
d24d38a
Compare
/retest |
dafd571
to
5f2cb56
Compare
We also have the MCD which is privileged and mounts the host in the
etcd is just |
7df7f01
to
9ff5246
Compare
9ff5246
to
ce3acd6
Compare
/retest |
2 similar comments
/retest |
/retest |
This times out in e2e-aws-op meaning that you (for now) just need to raise the test timeout to 70minutes let's say, you can do that here https://github.com/openshift/machine-config-operator/blob/master/Makefile#L101 |
First experiment will be to see whether it behaves differently done the original way. |
5bf8bc6
to
8598f3d
Compare
/retest |
8598f3d
to
901e42a
Compare
Switching back to using the etcd-quorum-guard standalone, so this is now moot. |
Wut |
- What I did
Add
etcd-quorum-guard
manifests and documentation describing it.- How to verify it
oc get pods -n kube-system | grep etcd-quorum-guard
- Description for the changelog
Add etcd-quorum-guard