Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[chore] add debug statements for e2e #26530

Closed

Conversation

atoulme
Copy link
Contributor

@atoulme atoulme commented Sep 8, 2023

No description provided.

@crobert-1
Copy link
Member

(For future reference, debugging statements are meant to help address #24223)

@atoulme
Copy link
Contributor Author

atoulme commented Sep 8, 2023

I have what I need from the debug logs:

looking at list items: [{map[apiVersion:v1 kind:Pod metadata:map[annotations:map[workload:job] creationTimestamp:2023-09-08T16:57:11Z generateName:telemetrygen-814e39f3-logs-job- labels:map[app:telemetrygen-814e39f3-logs-job controller-uid:90577953-ab60-4056-a24b-572805fbc39f job-name:telemetrygen-814e39f3-logs-job] managedFields:[map[apiVersion:v1 fieldsType:FieldsV1 fieldsV1:map[f:metadata:map[f:annotations:map[.:map[] f:workload:map[]] f:generateName:map[] f:labels:map[.:map[] f:app:map[] f:controller-uid:map[] f:job-name:map[]] f:ownerReferences:map[.:map[] k:{"uid":"90577953-ab60-4056-a24b-572805fbc39f"}:map[]]] f:spec:map[f:containers:map[k:{"name":"telemetrygen"}:map[.:map[] f:command:map[] f:image:map[] f:imagePullPolicy:map[] f:name:map[] f:resources:map[] f:terminationMessagePath:map[] f:terminationMessagePolicy:map[]]] f:dnsPolicy:map[] f:enableServiceLinks:map[] f:restartPolicy:map[] f:schedulerName:map[] f:securityContext:map[] f:terminationGracePeriodSeconds:map[]]] manager:kube-controller-manager operation:Update time:2023-09-08T16:57:11Z] map[apiVersion:v1 fieldsType:FieldsV1 fieldsV1:map[f:status:map[f:conditions:map[k:{"type":"ContainersReady"}:map[.:map[] f:lastProbeTime:map[] f:lastTransitionTime:map[] f:reason:map[] f:status:map[] f:type:map[]] k:{"type":"Initialized"}:map[.:map[] f:lastProbeTime:map[] f:lastTransitionTime:map[] f:status:map[] f:type:map[]] k:{"type":"Ready"}:map[.:map[] f:lastProbeTime:map[] f:lastTransitionTime:map[] f:reason:map[] f:status:map[] f:type:map[]]] f:containerStatuses:map[] f:hostIP:map[] f:phase:map[] f:podIP:map[] f:podIPs:map[.:map[] k:{"ip":"10.244.0.14"}:map[.:map[] f:ip:map[]]] f:startTime:map[]]] manager:kubelet operation:Update subresource:status time:2023-09-08T16:57:19Z]] name:telemetrygen-814e39f3-logs-job-45nb6 namespace:default ownerReferences:[map[apiVersion:batch/v1 blockOwnerDeletion:true controller:true kind:Job name:telemetrygen-814e39f3-logs-job uid:90577953-ab60-4056-a24b-572805fbc39f]] resourceVersion:1142 uid:8accaa80-b317-4989-85d4-987604ad6383] spec:map[containers:[map[command:[/telemetrygen logs --otlp-insecure --otlp-endpoint=otelcol-814e39f3:4317 --rate=1 --duration=36000s --otlp-attributes=service.name="test-logs-job" --otlp-attributes=k8s.container.name="telemetrygen"] image:ghcr.io/open-telemetry/opentelemetry-collector-contrib/telemetrygen:latest imagePullPolicy:Always name:telemetrygen resources:map[] terminationMessagePath:/dev/termination-log terminationMessagePolicy:File volumeMounts:[map[mountPath:/var/run/secrets/kubernetes.io/serviceaccount name:kube-api-access-7c6t4 readOnly:true]]]] dnsPolicy:ClusterFirst enableServiceLinks:true nodeName:kind-control-plane preemptionPolicy:PreemptLowerPriority priority:0 restartPolicy:Never schedulerName:default-scheduler securityContext:map[] serviceAccount:default serviceAccountName:default terminationGracePeriodSeconds:30 tolerations:[map[effect:NoExecute key:node.kubernetes.io/not-ready operator:Exists tolerationSeconds:300] map[effect:NoExecute key:node.kubernetes.io/unreachable operator:Exists tolerationSeconds:300]] volumes:[map[name:kube-api-access-7c6t4 projected:map[defaultMode:420 sources:[map[serviceAccountToken:map[expirationSeconds:3607 path:token]] map[configMap:map[items:[map[key:ca.crt path:ca.crt]] name:kube-root-ca.crt]] map[downwardAPI:map[items:[map[fieldRef:map[apiVersion:v1 fieldPath:metadata.namespace] path:namespace]]]]]]]]] status:map[conditions:[map[lastProbeTime:<nil> lastTransitionTime:2023-09-08T16:57:11Z status:True type:Initialized] map[lastProbeTime:<nil> lastTransitionTime:2023-09-08T16:57:11Z reason:PodFailed status:False type:Ready] map[lastProbeTime:<nil> lastTransitionTime:2023-09-08T16:57:11Z reason:PodFailed status:False type:ContainersReady] map[lastProbeTime:<nil> lastTransitionTime:2023-09-08T16:57:11Z status:True type:PodScheduled]] containerStatuses:[map[containerID:containerd://a18d78b2918cb4ed03175f3903c1d0a49665ddaccbf055ef25a3ee39c2bd3adc image:ghcr.io/open-telemetry/opentelemetry-collector-contrib/telemetrygen:latest imageID:ghcr.io/open-telemetry/opentelemetry-collector-contrib/telemetrygen@sha256:0746fd2a25d443d42e7f3f301d9f4011160a816df849d8a51c2e2bf115db509d lastState:map[] name:telemetrygen ready:false restartCount:0 started:false state:map[terminated:map[containerID:containerd://a18d78b2918cb4ed03175f3903c1d0a49665ddaccbf055ef25a3ee39c2bd3adc exitCode:1 finishedAt:2023-09-08T16:57:15Z reason:Error startedAt:2023-09-08T16:57:15Z]]]] hostIP:172.18.0.2 phase:Failed podIP:10.244.0.14 podIPs:[map[ip:10.244.0.14]] qosClass:BestEffort startTime:2023-09-08T16:57:11Z]]} {map[apiVersion:v1 kind:Pod metadata:map[annotations:map[workload:job] creationTimestamp:2023-09-08T16:57:19Z generateName:telemetrygen-814e39f3-logs-job- labels:map[app:telemetrygen-814e39f3-logs-job controller-uid:90577953-ab60-4056-a24b-572805fbc39f job-name:telemetrygen-814e39f3-logs-job] managedFields:[map[apiVersion:v1 fieldsType:FieldsV1 fieldsV1:map[f:metadata:map[f:annotations:map[.:map[] f:workload:map[]] f:generateName:map[] f:labels:map[.:map[] f:app:map[] f:controller-uid:map[] f:job-name:map[]] f:ownerReferences:map[.:map[] k:{"uid":"90577953-ab60-4056-a24b-572805fbc39f"}:map[]]] f:spec:map[f:containers:map[k:{"name":"telemetrygen"}:map[.:map[] f:command:map[] f:image:map[] f:imagePullPolicy:map[] f:name:map[] f:resources:map[] f:terminationMessagePath:map[] f:terminationMessagePolicy:map[]]] f:dnsPolicy:map[] f:enableServiceLinks:map[] f:restartPolicy:map[] f:schedulerName:map[] f:securityContext:map[] f:terminationGracePeriodSeconds:map[]]] manager:kube-controller-manager operation:Update time:2023-09-08T16:57:19Z] map[apiVersion:v1 fieldsType:FieldsV1 fieldsV1:map[f:status:map[f:conditions:map[k:{"type":"ContainersReady"}:map[.:map[] f:lastProbeTime:map[] f:lastTransitionTime:map[] f:reason:map[] f:status:map[] f:type:map[]] k:{"type":"Initialized"}:map[.:map[] f:lastProbeTime:map[] f:lastTransitionTime:map[] f:status:map[] f:type:map[]] k:{"type":"Ready"}:map[.:map[] f:lastProbeTime:map[] f:lastTransitionTime:map[] f:reason:map[] f:status:map[] f:type:map[]]] f:containerStatuses:map[] f:hostIP:map[] f:phase:map[] f:podIP:map[] f:podIPs:map[.:map[] k:{"ip":"10.244.0.19"}:map[.:map[] f:ip:map[]]] f:startTime:map[]]] manager:kubelet operation:Update subresource:status time:2023-09-08T16:57:23Z]] name:telemetrygen-814e39f3-logs-job-jhldr namespace:default ownerReferences:[map[apiVersion:batch/v1 blockOwnerDeletion:true controller:true kind:Job name:telemetrygen-814e39f3-logs-job uid:90577953-ab60-4056-a24b-572805fbc39f]] resourceVersion:1191 uid:38cf4c56-b32b-40a8-a21b-3ce87eeaa626] spec:map[containers:[map[command:[/telemetrygen logs --otlp-insecure --otlp-endpoint=otelcol-814e39f3:4317 --rate=1 --duration=36000s --otlp-attributes=service.name="test-logs-job" --otlp-attributes=k8s.container.name="telemetrygen"] image:ghcr.io/open-telemetry/opentelemetry-collector-contrib/telemetrygen:latest imagePullPolicy:Always name:telemetrygen resources:map[] terminationMessagePath:/dev/termination-log terminationMessagePolicy:File volumeMounts:[map[mountPath:/var/run/secrets/kubernetes.io/serviceaccount name:kube-api-access-9vfb9 readOnly:true]]]] dnsPolicy:ClusterFirst enableServiceLinks:true nodeName:kind-control-plane preemptionPolicy:PreemptLowerPriority priority:0 restartPolicy:Never schedulerName:default-scheduler securityContext:map[] serviceAccount:default serviceAccountName:default terminationGracePeriodSeconds:30 tolerations:[map[effect:NoExecute key:node.kubernetes.io/not-ready operator:Exists tolerationSeconds:300] map[effect:NoExecute key:node.kubernetes.io/unreachable operator:Exists tolerationSeconds:300]] volumes:[map[name:kube-api-access-9vfb9 projected:map[defaultMode:420 sources:[map[serviceAccountToken:map[expirationSeconds:3607 path:token]] map[configMap:map[items:[map[key:ca.crt path:ca.crt]] name:kube-root-ca.crt]] map[downwardAPI:map[items:[map[fieldRef:map[apiVersion:v1 fieldPath:metadata.namespace] path:namespace]]]]]]]]] status:map[conditions:[map[lastProbeTime:<nil> lastTransitionTime:2023-09-08T16:57:19Z status:True type:Initialized] map[lastProbeTime:<nil> lastTransitionTime:2023-09-08T16:57:19Z reason:PodFailed status:False type:Ready] map[lastProbeTime:<nil> lastTransitionTime:2023-09-08T16:57:19Z reason:PodFailed status:False type:ContainersReady] map[lastProbeTime:<nil> lastTransitionTime:2023-09-08T16:57:19Z status:True type:PodScheduled]] containerStatuses:[map[containerID:containerd://b04579a15f65ce2995e270bb99b2791742fb4eea4ceca1fb4463ecc4a9e80ac1 image:ghcr.io/open-telemetry/opentelemetry-collector-contrib/telemetrygen:latest imageID:ghcr.io/open-telemetry/opentelemetry-collector-contrib/telemetrygen@sha256:0746fd2a25d443d42e7f3f301d9f4011160a816df849d8a51c2e2bf115db509d lastState:map[] name:telemetrygen ready:false restartCount:0 started:false state:map[terminated:map[containerID:containerd://b04579a15f65ce2995e270bb99b2791742fb4eea4ceca1fb4463ecc4a9e80ac1 exitCode:1 finishedAt:2023-09-08T16:57:21Z reason:Error startedAt:2023-09-08T16:57:21Z]]]] hostIP:172.18.0.2 phase:Failed podIP:10.244.0.19 podIPs:[map[ip:10.244.0.19]] qosClass:BestEffort startTime:2023-09-08T16:57:19Z]]} {map[apiVersion:v1 kind:Pod metadata:map[annotations:map[workload:job] creationTimestamp:2023-09-08T16:57:23Z finalizers:[batch.kubernetes.io/job-tracking] generateName:telemetrygen-814e39f3-logs-job- labels:map[app:telemetrygen-814e39f3-logs-job controller-uid:90577953-ab60-4056-a24b-572805fbc39f job-name:telemetrygen-814e39f3-logs-job] managedFields:[map[apiVersion:v1 fieldsType:FieldsV1 fieldsV1:map[f:metadata:map[f:annotations:map[.:map[] f:workload:map[]] f:finalizers:map[.:map[] v:"batch.kubernetes.io/job-tracking":map[]] f:generateName:map[] f:labels:map[.:map[] f:app:map[] f:controller-uid:map[] f:job-name:map[]] f:ownerReferences:map[.:map[] k:{"uid":"90577953-ab60-4056-a24b-572805fbc39f"}:map[]]] f:spec:map[f:containers:map[k:{"name":"telemetrygen"}:map[.:map[] f:command:map[] f:image:map[] f:imagePullPolicy:map[] f:name:map[] f:resources:map[] f:terminationMessagePath:map[] f:terminationMessagePolicy:map[]]] f:dnsPolicy:map[] f:enableServiceLinks:map[] f:restartPolicy:map[] f:schedulerName:map[] f:securityContext:map[] f:terminationGracePeriodSeconds:map[]]] manager:kube-controller-manager operation:Update time:2023-09-08T16:57:23Z] map[apiVersion:v1 fieldsType:FieldsV1 fieldsV1:map[f:status:map[f:conditions:map[k:{"type":"ContainersReady"}:map[.:map[] f:lastProbeTime:map[] f:lastTransitionTime:map[] f:status:map[] f:type:map[]] k:{"type":"Initialized"}:map[.:map[] f:lastProbeTime:map[] f:lastTransitionTime:map[] f:status:map[] f:type:map[]] k:{"type":"Ready"}:map[.:map[] f:lastProbeTime:map[] f:lastTransitionTime:map[] f:status:map[] f:type:map[]]] f:containerStatuses:map[] f:hostIP:map[] f:phase:map[] f:podIP:map[] f:podIPs:map[.:map[] k:{"ip":"10.244.0.20"}:map[.:map[] f:ip:map[]]] f:startTime:map[]]] manager:kubelet operation:Update subresource:status time:2023-09-08T16:57:25Z]] name:telemetrygen-814e39f3-logs-job-q82tf namespace:default ownerReferences:[map[apiVersion:batch/v1 blockOwnerDeletion:true controller:true kind:Job name:telemetrygen-814e39f3-logs-job uid:90577953-ab60-4056-a24b-572805fbc39f]] resourceVersion:1202 uid:31b2283c-23b7-47fe-8100-1ea2a9d96d18] spec:map[containers:[map[command:[/telemetrygen logs --otlp-insecure --otlp-endpoint=otelcol-814e39f3:4317 --rate=1 --duration=36000s --otlp-attributes=service.name="test-logs-job" --otlp-attributes=k8s.container.name="telemetrygen"] image:ghcr.io/open-telemetry/opentelemetry-collector-contrib/telemetrygen:latest imagePullPolicy:Always name:telemetrygen resources:map[] terminationMessagePath:/dev/termination-log terminationMessagePolicy:File volumeMounts:[map[mountPath:/var/run/secrets/kubernetes.io/serviceaccount name:kube-api-access-h9vcz readOnly:true]]]] dnsPolicy:ClusterFirst enableServiceLinks:true nodeName:kind-control-plane preemptionPolicy:PreemptLowerPriority priority:0 restartPolicy:Never schedulerName:default-scheduler securityContext:map[] serviceAccount:default serviceAccountName:default terminationGracePeriodSeconds:30 tolerations:[map[effect:NoExecute key:node.kubernetes.io/not-ready operator:Exists tolerationSeconds:300] map[effect:NoExecute key:node.kubernetes.io/unreachable operator:Exists tolerationSeconds:300]] volumes:[map[name:kube-api-access-h9vcz projected:map[defaultMode:420 sources:[map[serviceAccountToken:map[expirationSeconds:3607 path:token]] map[configMap:map[items:[map[key:ca.crt path:ca.crt]] name:kube-root-ca.crt]] map[downwardAPI:map[items:[map[fieldRef:map[apiVersion:v1 fieldPath:metadata.namespace] path:namespace]]]]]]]]] status:map[conditions:[map[lastProbeTime:<nil> lastTransitionTime:2023-09-08T16:57:23Z status:True type:Initialized] map[lastProbeTime:<nil> lastTransitionTime:2023-09-08T16:57:25Z status:True type:Ready] map[lastProbeTime:<nil> lastTransitionTime:2023-09-08T16:57:25Z status:True type:ContainersReady] map[lastProbeTime:<nil> lastTransitionTime:2023-09-08T16:57:23Z status:True type:PodScheduled]] containerStatuses:[map[containerID:containerd://45b9617ab1384d9bbdadba2f25b9a4b0851e1b042b78f5c0189a2676c421c272 image:ghcr.io/open-telemetry/opentelemetry-collector-contrib/telemetrygen:latest imageID:ghcr.io/open-telemetry/opentelemetry-collector-contrib/telemetrygen@sha256:0746fd2a25d443d42e7f3f301d9f4011160a816df849d8a51c2e2bf115db509d lastState:map[] name:telemetrygen ready:true restartCount:0 started:true state:map[running:map[startedAt:2023-09-08T16:57:25Z]]]] hostIP:172.18.0.2 phase:Running podIP:10.244.0.20 podIPs:[map[ip:10.244.0.20]] qosClass:BestEffort startTime:2023-09-08T16:57:23Z]]}]
looking at pod phase: Failed

@atoulme atoulme closed this Sep 8, 2023
dmitryax pushed a commit that referenced this pull request Sep 11, 2023
**Description:**
Set up the telemetrygen job to restart on failure

**Link to tracking Issue:**
Fixes #24223


**Testing:**
See
#26530 (comment)
This investigation shows that the job fails to run sometimes. All other
deployments have a policy to restart on failure.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants