Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Complete logs not shown for failed TaskRun #587

Closed
bobcatfish opened this issue Jan 9, 2020 · 5 comments
Closed

Complete logs not shown for failed TaskRun #587

bobcatfish opened this issue Jan 9, 2020 · 5 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. kind/question Issues or PRs that are questions around the project or a particular feature
Milestone

Comments

@bobcatfish
Copy link
Contributor

Version and Operating System

tkn Version:
dev - I just installed from HEAD which was 32662c8 from earlier today

tekton pipelines Version:

(Thought this might be useful! maybe worth adding to the issue template?)
v0.9.2
(gcr.io/tekton-releases/github.com/tektoncd/pipeline/cmd/controller:v0.9.2@sha256:cc5e186131c9141f512786e3e55aca432e4dae841cad55fbb57d51b17b79371a)

Operating System:
linux

Expected Behavior

When a TaskRun fails, if I run tkn tr logs for that TaskRun, I should see all of the logs.

Actual Behavior

In this case, instead of seeing any logs, I saw what seems like the status of a step that was after the first step that failed, which looked like:

(⎈ |tekton:default)➜  cli git:(master) tkn --context dogfood  tr logs pipeline-release-nightly-4zc78-dv7rz-publish-images-tkjj2
Error: task publish-images has failed: "step-artifact-copy-to-tekton-bucket-nightly-4zc78-vchbw" exited with code 1 (image: "docker-pullable://google/cloud-sdk@sha256:4ef6b0e969fa96f10acfd893644d100469e979f4384e5e70f58be5cb80593a8a"); for logs run: kubectl -n default logs pipeline-release-nightly-4zc78-dv7rz-publish-images-tkjj2-pod-10c8fd -c step-artifact-copy-to-tekton-bucket-nightly-4zc78-vchbw

(more details in next section)

Steps to Reproduce the Problem

I'm not 100% sure how to reproduce but I'll give you as much detail as I can about what I saw!

Yesterday [the tekton nightly release pipeline]((⎈ |tekton:default)➜ cli git:(master) tkn --context dogfood tr logs pipeline-release-nightly-4zc78-dv7rz-publish-images-tkjj2
Error: task publish-images has failed: "step-artifact-copy-to-tekton-bucket-nightly-4zc78-vchbw" exited with code 1 (image: "docker-pullable://google/cloud-sdk@sha256:4ef6b0e969fa96f10acfd893644d100469e979f4384e5e70f58be5cb80593a8a"); for logs run: kubectl -n default logs pipeline-release-nightly-4zc78-dv7rz-publish-images-tkjj2-pod-10c8fd -c step-artifact-copy-to-tekton-bucket-nightly-4zc78-vchbw) failed (tektoncd/plumbing#178).

I found the pipelinerun that failed:

(⎈ |tekton:default)➜  cli git:(master) tkn --context dogfood pr list
NAME                                           STARTED        DURATION     STATUS                                    
pipeline-release-nightly-4zc78-dv7rz           18 hours ago   16 minutes   Failed    

I tried to get logs for it with tkn --context dogfood pr logs pipeline-release-nightly-4zc78-dv7rz , this is the tail end:

[build : build] github.com/tektoncd/pipeline/pkg/pullrequest
[build : build] github.com/tektoncd/pipeline/cmd/pullrequest-init
[build : build] github.com/tektoncd/pipeline/vendor/github.com/gobuffalo/envy
[build : build] github.com/tektoncd/pipeline/vendor/github.com/markbates/inflect
[build : build] github.com/tektoncd/pipeline/vendor/k8s.io/api/admission/v1beta1
[build : build] github.com/tektoncd/pipeline/vendor/knative.dev/pkg/webhook
[build : build] github.com/tektoncd/pipeline/cmd/webhookhttps://github.com/tektoncd/plumbing/issues/178

task publish-images has failed: "step-artifact-copy-to-tekton-bucket-nightly-4zc78-vchbw" exited with code 1 (image: "docker-pullable://google/cloud-sdk@sha256:4ef6b0e969fa96f10acfd893644d100469e979f4384e5e70f58be5cb80593a8a"); for logs run: kubectl -n default logs pipeline-release-nightly-4zc78-dv7rz-publish-images-tkjj2-pod-10c8fd -c step-artifact-copy-to-tekton-bucket-nightly-4zc78-vchbw
TaskRun pipeline-release-nightly-4zc78-dv7rz-publish-images-tkjj2 has failed

I wanted more detail on the taskrun that failed so tkn --context dogfood tr logs pipeline-release-nightly-4zc78-dv7rz-publish-images-tkjj2:

Error: task publish-images has failed: "step-artifact-copy-to-tekton-bucket-nightly-4zc78-vchbw" exited with code 1 (image: "docker-pullable://google/cloud-sdk@sha256:4ef6b0e969fa96f10acfd893644d100469e979f4384e5e70f58be5cb80593a8a"); for logs run: kubectl -n default logs pipeline-release-nightly-4zc78-dv7rz-publish-images-tkjj2-pod-10c8fd -c step-artifact-copy-to-tekton-bucket-nightly-4zc78-vchbw

This is weird thing number 1 cuz I'm asking for the logs and being told to run kubectl to get the logs (I think that's actually the status and not the logs?)

So I run kubectl --context dogfood -n default logs pipeline-release-nightly-4zc78-dv7rz-publish-images-tkjj2-pod-10c8fd -c step-artifact-copy-to-tekton-bucket-nightly-4zc78-vchbw and I get:

2020/01/09 02:16:20 Skipping step because a previous step failed

wut

So I describe the task run (tkn --context dogfood tr describe pipeline-release-nightly-4zc78-dv7rz-publish-images-tkjj2) and there is a lot going on, it looks like technically one step failed, and then the subsequent steps were also marked as failed:

Steps
NAME                                                 STATUS
build-push-base-images                               Completed
artifact-copy-to-tekton-bucket-nightly-4zc78-vchbw   Error
link-input-bucket-to-output                          Completed
ensure-release-dirs-exist                            Completed
create-dir-builtbaseimage-67kff                      Completed
create-dir-builtcontrollerimage-zcwqk                Completed
tag-images                                           Error
create-dir-builtdigestexporterimage-q8tf2            Completed
create-dir-builtentrypointimage-hkrp5                Completed
create-dir-builtgcsfetcherimage-mrsr6                Completed
create-dir-builtgitinitimage-7l4jw                   Completed
create-dir-builtkubeconfigwriterimage-g57zb          Completed
create-dir-builtpullrequestinitimage-nq9jk           Completed
create-dir-builtwebhookimage-rd7z9                   Completed
create-dir-notification-tnr95                        Completed
create-dir-tekton-bucket-nightly-4zc78-fvbpw         Completed
create-ko-yaml                                       Completed
create-dir-bucket-hzrq5                              Completed
fetch-tekton-bucket-nightly-4zc78-4wm2m              Completed
git-source-git-source-4zc78-ncnrc                    Completed
image-digest-exporter-qtkxh                          Error
copy-to-tagged-bucket                                Completed
run-ko                                               Completed
create-dir-builtcredsinitimage-r5zpt                 Completed
upload-tekton-bucket-nightly-4zc78-rrj5r             Error

The first meaningful looking step that failed was tag-images (it's hard to tell b/c the order these appeared in isn't actually the order they were executed in), and when used kubectl logs to get the logs of that specific container I finally found the actual error (tektoncd/plumbing#178)

Additional Info

Totally willing to be told that my cli install is just in a bad state or something XD

Maybe this is because of a mismatch in the tekton pipelines version and the CLI version? (thought: maybe the issue template would benefit from including the tekton pipelines version also?)

@danielhelfand
Copy link
Member

@bobcatfish Thanks so much for this detailed write up! I think you have pointed out a couple of things here that are known issues, but this provides much more insight than what we have documented in the issues that are currently open.

#460 is a known issue with regard to steps not appearing in order of execution for tkn tr desc. I can place a priority on that before our next release.

I know that @chmouel recently opened #573 that has a similar outcome where the message is to run a kubectl command. Wondering if he can provide some insight on this.

@chmouel
Copy link
Member

chmouel commented Jan 9, 2020

it's the same issue as #573 indeed, i am not really sure why we are not showing them for failed pods as this seems pretty straightforward...

@vdemeester
Copy link
Member

/kind bug
/kind question

@tekton-robot tekton-robot added kind/bug Categorizes issue or PR as related to a bug. kind/question Issues or PRs that are questions around the project or a particular feature labels Jan 10, 2020
@danielhelfand danielhelfand added this to the 0.8.0 🐯 milestone Feb 4, 2020
16yuki0702 added a commit to 16yuki0702/cli that referenced this issue Feb 7, 2020
related issue is tektoncd#587

When pod status is failed hasTaskRunFailed method returns error immediately,
so pod's logs can't be shown up.

This fix stop returning error immediately, if pod status is failed just logging error on standard error.
tekton-robot pushed a commit that referenced this issue Feb 10, 2020
related issue is #587

When pod status is failed hasTaskRunFailed method returns error immediately,
so pod's logs can't be shown up.

This fix stop returning error immediately, if pod status is failed just logging error on standard error.
@vdemeester
Copy link
Member

I think this got fixed by #691 too 👼
/close

@tekton-robot
Copy link
Contributor

@vdemeester: Closing this issue.

In response to this:

I think this got fixed by #691 too 👼
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

vdemeester added a commit that referenced this issue Feb 28, 2020
#674 | [Piyush Garg] Update readme for 0.7.1 release | 2020/02/04-09:25
#678 | [Vincent Demeester] Bump gotest.tools/v3 dep to 3.0.1 | 2020/02/05-02:08
#609 | [Chmouel Boudjnah] Start doing testing against nightly | 2020/02/05-09:50
#676 | [Daniel Helfand] --all option for tkn taskrun delete | 2020/02/05-12:10
#682 | [Chmouel Boudjnah] Add flag --use-pipelinerun to start to rerun with a target pr | 2020/02/06-03:07
#677 | [Piyush Garg] Improve release docs | 2020/02/06-08:25
#681 | [Chmouel Boudjnah] Add --output by name for TaskRuns | 2020/02/10-02:32
#681 | [Chmouel Boudjnah] Add --output by name for TriggerBinding | 2020/02/10-02:32
#692 | [Daniel Helfand] add tkn cluster delete --all | 2020/02/10-04:22
#694 | [Piyush Garg] Add --step flag for taskrun logs | 2020/02/10-09:30
null | [Daniel Helfand] --all option for tkn el delete | 2020/02/10-15:01
null | [Daniel Helfand] --all option for tkn resource delete | 2020/02/10-15:24
null | [Daniel Helfand] -all option for tkn condition delete | 2020/02/10-15:25
null | [16yuki0702] Fix bug about Complete logs not shown for failed TaskRun related issue is #587 | 2020/02/10-15:38
null | [Daniel Helfand] --all option for tkn pipelinerun delete | 2020/02/10-16:37
null | [Chmouel Boudjnah] Shows --- for empty values on "DEFAULT VALUES" column | 2020/02/11-02:38
null | [Chmouel Boudjnah] Makefile fixes | 2020/02/11-05:20
null | [Chmouel Boudjnah] Update github pull request templates | 2020/02/11-05:20
null | [Chmouel Boudjnah] Add pipeline version information | 2020/02/12-01:36
null | [Daniel Helfand] --all option for tkn triggertemplate delete | 2020/02/12-01:49
null | [Chmouel Boudjnah] Fix help command | 2020/02/12-03:22
null | [Chmouel Boudjnah] Fix yaml errors | 2020/02/12-08:00
null | [Chmouel Boudjnah] Update presubmit jobs to use make lint | 2020/02/12-08:00
null | [Chmouel Boudjnah] Add support for filtering by label on pipelinerun | 2020/02/12-08:19
null | [Chmouel Boudjnah] Add support for filtering by label on taskruns | 2020/02/12-08:19
null | [Daniel Helfand] --all option for tkn triggerbinding delete | 2020/02/12-16:03
null | [Chmouel Boudjnah] Add --last support to `tkn pr logs` | 2020/02/13-01:51
null | [Daniel Helfand] nil check for lr.Stream for taskrun logs | 2020/02/13-02:08
null | [Pradeep Kumar] Fix typo | 2020/02/14-06:44
null | [Chmouel Boudjnah] Modify test names as they are pipelinerun not pipeline tests | 2020/02/14-12:30
null | [Chmouel Boudjnah] Add --last flag to tkn taskrun logs | 2020/02/14-12:30
null | [Daniel Helfand] --prefix-name option for tkn pipeline start | 2020/02/17-03:32
null | [Daniel Helfand] add warning for --all flags for pipeline and task delete | 2020/02/17-03:33
null | [Vincent Demeester] Update golangci-lint target 🍵 | 2020/02/18-03:38
null | [Chmouel Boudjnah] Generate manpages at the same time as make docs | 2020/02/18-04:42
null | [Chmouel Boudjnah] Make sure golden files have been generated | 2020/02/18-04:42
null | [Pradeep Kumar] Adds timeout to pipeline start | 2020/02/18-07:50
null | [Chmouel Boudjnah] Add fuzzyfinder library | 2020/02/18-09:08
null | [Chmouel Boudjnah] Add fuzzy finder selection for pipelinerun and taskrun | 2020/02/18-09:08
null | [Chmouel Boudjnah] Increase default limit when using with fzf | 2020/02/18-09:08
null | [16yuki0702] Refactoring log writer. related issue is #708 | 2020/02/19-02:24
null | [Vincent Demeester] Use -mod=vendor for golangci-lint too 👼 | 2020/02/19-03:36
null | [Vincent Demeester] Fix linting on pkg/log 🚈 | 2020/02/19-03:36
null | [Chmouel Boudjnah] Add --use-taskrun for taskrun | 2020/02/19-07:26
null | [Daniel Helfand] show sidecar names with tkn tr desc | 2020/02/24-06:08
null | [Daniel Helfand] add warning message for tkn task start --timeout | 2020/02/26-02:18
null | [16yuki0702] Refactoring log reader. related issue is #748 | 2020/02/26-08:14
null | [Daniel Helfand] --prefix-name option for tkn task start | 2020/02/27-02:03
null | [Daniel Helfand] add error message for deletion when args=0 and no --all flag | 2020/02/27-02:15
null | [Vincent Demeester] Remove trigger alias for start subcommands 🚉 | 2020/02/27-07:37
null | [Chmouel Boudjnah] Add --keep to --all, to keep the last N pipelineruns | 2020/02/27-09:32
null | [Vincent Demeester] Add --keep to --all, to keep the last N taskruns | 2020/02/27-09:32
null | [Vincent Demeester] Enhance the error message for --keep ⌨ | 2020/02/27-09:32
null | [Vincent Demeester] Pin the go version in the release pipeline 🖊 | 2020/02/27-11:03
null | [Vincent Demeester] Use -mod=vendor during the release 👼 | 2020/02/27-11:59

Signed-off-by: Vincent Demeester <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. kind/question Issues or PRs that are questions around the project or a particular feature
Projects
None yet
Development

No branches or pull requests

5 participants