Skip to content
This repository has been archived by the owner on Feb 5, 2020. It is now read-only.

*: enable debug logging for etcd-operator by default #1425

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion config.tf
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@ variable "tectonic_container_images" {
flannel = "quay.io/coreos/flannel:v0.7.1-amd64"
flannel_cni = "quay.io/coreos/flannel-cni:0.1.0"
etcd = "quay.io/coreos/etcd:v3.1.8"
etcd_operator = "quay.io/coreos/etcd-operator:v0.4.0"
etcd_operator = "quay.io/coreos/etcd-operator:v0.4.2"
kenc = "quay.io/coreos/kenc:8f6e2e885f790030fbbb0496ea2a2d8830e58b8f"
calico = "quay.io/calico/node:v1.3.0"
calico_cni = "quay.io/calico/cni:v1.9.1-4-g23fcd5f"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,11 @@ spec:
metadata:
labels:
k8s-app: etcd-operator
spec:
spec:
volumes:
- name: debug-volume
hostPath:
path: /var/tmp
Copy link
Contributor

@s-urbaniak s-urbaniak Jul 18, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am worried that we are hardcoding a host path here. The etcd-operator is a deployment, hence subject to be rescheduled by k8s at any time. This /var/tmp/etcd-operator/debug/debug.log file will eventually be sprinkled across all master nodes. Judging from https://github.com/coreos/etcd-operator/blob/c946e30490947dc8b171fc4439a98356c7a85078/pkg/debug/debug_logger.go#L51 I see that this at least opens the file file using O_APPEND, but those logs would still be pretty inconsistent in the face of rescheduling.

Cannot debug simply output to stdout such that its output is captured by standard k8s logging facilities?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we can force every tectonic users to use a logging system like splunk, then it is a great help. Most of the users we interact with today have no logging system setup, this brings a huge problem for debugging self hosted etcd. when k8s is down, we have no easy way to get logging.

with this hack way, we at least can get the logging we want by downloading files from a well known path on all master nodes. we do not really worry about logging spreading too much. the operator is leader elected and time skew should not be a really problem.

and something is better than nothing.

containers:
- env:
- name: MY_POD_NAMESPACE
Expand All @@ -31,6 +35,12 @@ spec:
value: /tmp
image: ${etcd_operator_image}
name: etcd-operator
command:
- /usr/local/bin/etcd-operator
- --debug-logfile-path=/var/tmp/etcd-operator/debug/debug.log
Copy link
Contributor

@s-urbaniak s-urbaniak Jul 18, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: why is this insisted to go into a log file? Are these debug messages also going to visible in kubectl logs your-etcd-operator?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for self hosted etcd, when it is down, k8s is down. when k8s is down, kubectl is unusable. the whole point of this is to enforce we log down to disk for debugging purpose.

volumeMounts:
- mountPath: /var/tmp/etcd-operator/debug
name: debug-volume
nodeSelector:
node-role.kubernetes.io/master: ""
securityContext:
Expand Down