Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support launching shell on a build step for debugging #2813

Closed
wants to merge 1 commit into from

Conversation

ktock
Copy link
Collaborator

@ktock ktock commented Apr 19, 2022

It would be great if BuildKit supports interactive debugging of each step of a build by launching shell via buildctl. This patch is an initial PoC toward that feature. Any feedback about the current design and implementation are very welcome.

  • The digest of each vertex is printed on the progress output (only plain is supported as of now).
  • --debug-build option of buildctl build defers the cleanup of resources related to the build job until the user explicitly runs buildctl debug close <JOB-ID>.
  • buildctl debug shell allows the user launching shell on an arbitrary vertex as long as its op supports process execution (only execOp is supported as of now).

The following is an example of debugging a failed build.

# mkdir -p /tmp/ctx && cat <<EOF > /tmp/ctx/Dockerfile
FROM registry2-buildkit:5000/ubuntu:20.04-org
RUN echo hello > hello
RUN cat /non-existing-file
EOF
# buildctl build --debug-build --progress=plain --frontend=dockerfile.v0 \
               --local context=/tmp/ctx --local dockerfile=/tmp/ctx \
               --output=type=oci,dest=/tmp/img.tar
#1 [sha256:bd4dbb498154ebdadcaa164282398cd817706c880f9f846ec88591256b0a69f3] [internal] load build definition from Dockerfile
#1 transferring dockerfile: 133B done
#1 DONE 0.1s

#2 [sha256:f4f1cdac588c5f18e7a2b5540b711cad8f609692762dacf34af5853059383022] [internal] load .dockerignore
#2 transferring context: 2B done
#2 DONE 0.1s

#3 [sha256:ad2e650bcbd0897a4f4c4f3f50466acd5cd1d004cfc8988e84338a842cd310d3] [internal] load metadata for registry2-buildkit:5000/ubuntu:20.04-org
#3 DONE 0.1s

#4 [sha256:87a9334cb54492620b000088bc83271b24305352c7448757ed02924cf1a280a7] [auth] sharing credentials for registry2-buildkit:5000
#4 DONE 0.0s

#5 [sha256:43d046edee2ed6c17447726ae6b565b9b9cafc413467f31c5f1208287771fdc3] [1/3] FROM registry2-buildkit:5000/ubuntu:20.04-org@sha256:adf73ca014822ad8237623d388cedf4d5346aa72c270c5acc01431cc93e18e2d
#5 resolve registry2-buildkit:5000/ubuntu:20.04-org@sha256:adf73ca014822ad8237623d388cedf4d5346aa72c270c5acc01431cc93e18e2d 0.0s done
#5 DONE 0.0s

#5 [sha256:43d046edee2ed6c17447726ae6b565b9b9cafc413467f31c5f1208287771fdc3] [1/3] FROM registry2-buildkit:5000/ubuntu:20.04-org@sha256:adf73ca014822ad8237623d388cedf4d5346aa72c270c5acc01431cc93e18e2d
#5 sha256:345e3491a907bb7c6f1bdddcf4a94284b8b6ddd77eb7d93f09432b17b20f2bbe 25.17MB / 28.54MB 0.2s
#5 extracting sha256:345e3491a907bb7c6f1bdddcf4a94284b8b6ddd77eb7d93f09432b17b20f2bbe
#5 extracting sha256:345e3491a907bb7c6f1bdddcf4a94284b8b6ddd77eb7d93f09432b17b20f2bbe 0.8s done
#5 extracting sha256:57671312ef6fdbecf340e5fed0fb0863350cd806c92b1fdd7978adbd02afc5c3 0.0s done
#5 extracting sha256:5e9250ddb7d0fa6d13302c7c3e6a0aa40390e42424caed1e5289077ee4054709 0.0s done
#5 DONE 1.1s

#6 [sha256:089a986c1594039a21d6699c07a3cd969ea2d63f73f44893e312fb71cc1fc21a] [2/3] RUN echo hello > hello
#6 DONE 0.2s

#7 [sha256:9c219f0b87585cb92cb1a4a8727c412af26ce896f2d5bb05a2110e0602f0bc36] [3/3] RUN cat /non-existing-file
#0 0.137 cat: /non-existing-file: No such file or directory
#7 ERROR: process "/bin/sh -c cat /non-existing-file" did not complete successfully: exit code: 1
INFO[2022-04-19T15:35:16Z] canceled watching status                      error="rpc error: code = Canceled desc = context canceled" spanID=e4f710ddec511afa traceID=fee098b9089fb78e9e760c4813c79552
INFO[2022-04-19T15:35:16Z] debug for build "oywfw5gz2f2avxvtmk4b18mru" is enabled  spanID=e4f710ddec511afa traceID=fee098b9089fb78e9e760c4813c79552
------
 > [3/3] RUN cat /non-existing-file:
#0 0.137 cat: /non-existing-file: No such file or directory
------
Dockerfile:3
--------------------
   1 |     FROM registry2-buildkit:5000/ubuntu:20.04-org
   2 |     RUN echo hello > hello
   3 | >>> RUN cat /non-existing-file
   4 |     
--------------------
error: failed to solve: process "/bin/sh -c cat /non-existing-file" did not complete successfully: exit code: 1

This build can be debugged using the ID printed in the above log (oywfw5gz2f2avxvtmk4b18mru).

INFO[2022-04-19T15:35:16Z] debug for build "oywfw5gz2f2avxvtmk4b18mru" is enabled  spanID=e4f710ddec511afa traceID=fee098b9089fb78e9e760c4813c79552

For example, we debug RUN cat /non-existing-file.
In the above log, this RUN execution is printed as the following:

#7 [sha256:9c219f0b87585cb92cb1a4a8727c412af26ce896f2d5bb05a2110e0602f0bc36] [3/3] RUN cat /non-existing-file

This prints the digest of the vertex where the RUN is executed. We can use it for buildctl debug shell command to launch a shell on that vertex.

buildctl debug shell oywfw5gz2f2avxvtmk4b18mru sha256:9c219f0b87585cb92cb1a4a8727c412af26ce896f2d5bb05a2110e0602f0bc36
# ls /
bin   dev  hello  lib	 lib64	 media	opt   root  sbin  sys  usr
boot  etc  home   lib32  libx32  mnt	proc  run   srv   tmp  var
# cat /hello
hello
# cat /non-existing-file
cat: /non-existing-file: No such file or directory

Finally, job-related resouces needs to be released using buildctl debug close oywfw5gz2f2avxvtmk4b18mru.

  • Some considerations

    • Should we limit the access of the shell against some resources in the container?
    • Now control service heavily relies on github.com/moby/buildkit/frontend/gateway package for launching a shell process. Maybe we should move the process-execution-related logic into a separated common package.
  • TODOs

    • Add more tests and comments
    • Better UI
    • Display vertex digests also in non-plain progress mode as well
    • Jobs remaining very long time (e.g. a day) should be forcibly cleaned up
  • Future works

    • Add source-related information to each progress information to enable further debugging features (e.g. breaktpoints)

@tonistiigi
Copy link
Member

PTAL #1472

I think we should start with interactive container on build result and on error point and last snapshot before error point. The LLB digests are not stable so not very useful for this. To debug a specific point of the build I think some contract with frontend is needed.

Why are there so many solver/gateway changes in this PR? My understanding was (and based on my own similar POC attempts) that we have already added all the features required for debugging support in current releases and no daemon update is needed. If some internal updates or refactoring is needed it should be done separately. If there are new features then they need new caps etc.

Also note that buildctl is considered a API test client. Some basic debugging capabilities are ok in here but it will soon get very opinionated, what is better fit for docker buildx. Apart from possible "debugger image" support I'm not sure how reusable the client side code would be and don't want to maintain two versions if they both grow pretty big.

@ktock
Copy link
Collaborator Author

ktock commented Apr 20, 2022

@tonistiigi

PTAL #1472
I think we should start with interactive container on build result and on error point and last snapshot before error point. The LLB digests are not stable so not very useful for this. To debug a specific point of the build I think some contract with frontend is needed.

Thank you for the pointer. I'll take a look at that. Are there any on-going PRs?

Why are there so many solver/gateway changes in this PR?

Changes are mainly for enabling the client to execute a container with the same configuration and mounts as an arbitrary execOp. This commit also adds a change deferring solver.Job.Discard() to allow the client to access the job until the debugging is done.

Also note that buildctl is considered a API test client. Some basic debugging capabilities are ok in here but it will soon get very opinionated, what is better fit for docker buildx.

Thank you for the suggestion. I'll take look at buildx as well.

@tonistiigi
Copy link
Member

This commit also adds a change deferring solver.Job.Discard() to allow the client to access the job until the debugging is done.

This lifecycle should be covered by the client.Build() call. As long as Build() is active you can debug its internal Solve calls and results and errors of these solves. I think it is ok if client.Solve() is not debuggable. We might want to deprecate and remove it as well. Atm it serves as a shortcut for a specific version of the client.Build call.

@ktock
Copy link
Collaborator Author

ktock commented May 9, 2022

#2835 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants