Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

monitor: Enable to exec into the container #1626

Merged
merged 1 commit into from
Mar 7, 2023

Conversation

ktock
Copy link
Collaborator

@ktock ktock commented Feb 14, 2023

Related to #1104

PR6
Add exec command to execute new processes in the same debug container. All processes are attachable as described in the previous section.

exec execs a new command in the container

$ BUILDX_EXPERIMENTAL=1 /tmp/out/buildx build --invoke sh /tmp/ctx
...
(buildx) exec ps
Process "6zqrfs121s8fqo097mjf7j1t4" started. Press Ctrl-a-c to switch to that process.
(buildx) Switched IO


PID   USER     TIME  COMMAND
    1 root      0:00 sh
    7 root      0:00 ps
Switched IO
(buildx) exec ls
Process "57c0vhpkpo621nla15b2pj8oh" started. Press Ctrl-a-c to switch to that process.
(buildx) Switched IO


bin     etc     hello2  home    root    tmp     var
dev     hello   hello3  proc    sys     usr
Switched IO
(buildx) exec sh
Process "njnu3unnfb5zf5omot93yh6ap" started. Press Ctrl-a-c to switch to that process.
(buildx) Switched IO


/ # 
/ # ps
PID   USER     TIME  COMMAND
    1 root      0:00 sh
    8 root      0:00 sh
   14 root      0:00 ps

attach attaches IO to the specified process.

$ BUILDX_EXPERIMENTAL=1 /tmp/out/buildx build --invoke sh /tmp/ctx
...
Interactive container was restarted with process "tmb96qnbc2q7xfjhochczysga". Press Ctrl-a-c to switch to the new container
(buildx) exec sh
Process "ooybngbcn0nnz584i8xvgbnii" started. Press Ctrl-a-c to switch to that process.
<C-a-c>
(buildx) exec sh
Process "xy51w8j7lzll5cduwfwdm7w50" started. Press Ctrl-a-c to switch to that process.
<C-a-c>
(buildx) exec sh
Process "yx8of20854f0jkd1ga3d9odan" started. Press Ctrl-a-c to switch to that process.
<C-a-c>
(buildx) ps
PID				CURRENT_SESSION	COMMAND
ooybngbcn0nnz584i8xvgbnii	false		[sh]
tmb96qnbc2q7xfjhochczysga	false		[sh]
xy51w8j7lzll5cduwfwdm7w50	false		[sh]
yx8of20854f0jkd1ga3d9odan	true		[sh]
(buildx) attach ooybngbcn0nnz584i8xvgbnii
Switched IO
Attached to process "ooybngbcn0nnz584i8xvgbnii". Press Ctrl-a-c to switch to the new container

/ # ps
PID   USER     TIME  COMMAND
    1 root      0:00 sh
    7 root      0:00 sh
   13 root      0:00 sh
   19 root      0:00 sh
   25 root      0:00 ps
/ # echo $$
7
/ # Switched IO
(buildx) attach tmb96qnbc2q7xfjhochczysga
Switched IO
Attached to process "tmb96qnbc2q7xfjhochczysga". Press Ctrl-a-c to switch to the new container

/ # echo $$
1
/ # Switched IO
(buildx) attach xy51w8j7lzll5cduwfwdm7w50
Switched IO
Attached to process "xy51w8j7lzll5cduwfwdm7w50". Press Ctrl-a-c to switch to the new container

/ # echo $$
13
/ # Switched IO
(buildx) attach yx8of20854f0jkd1ga3d9odan
Switched IO
Attached to process "yx8of20854f0jkd1ga3d9odan". Press Ctrl-a-c to switch to the new container

/ # echo $$
19

note

  • This commit includes a change to docker-container driver to always include init process in the driver container. This is for making sure to clean up exited processes created by BuildKit's container API.

Copy link
Collaborator

@jedevc jedevc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shared some initial thoughts - but I think generally, it works! 🎉 I'll definitely make sure that we don't stall on this PR as we did on #1296 😢😢

Aside from the technical things, we definitely need to make sure we revisit the DX here (probably after all of #1104 is completed) - the combination of commands we have some weird confusions, like kill kills the server instead of processes spawned with exec and seen with ps. I gave some suggestions in a comment on session lifetimes, so we may actually want to get rid of the explicit kill command in the server at some point (or maybe just rename it if we really want it).

controller/local/controller.go Outdated Show resolved Hide resolved
controller/local/controller.go Outdated Show resolved Hide resolved
controller/remote/server.go Outdated Show resolved Hide resolved
build/build.go Outdated Show resolved Hide resolved
build/build.go Outdated Show resolved Hide resolved
build/build.go Outdated Show resolved Hide resolved
build/build.go Outdated
}
close(doneCh)
}()
func (c *Container) IsUnavailable() bool {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to have a separate method to mark this as unavailable? Atm, I think what we'll get is a separate container every time all the processes in that container terminate? e.g.:

Launching interactive container. Press Ctrl-a-c to switch to monitor console
Interactive container was restarted with process "7vs69sixlm1osw0jywealy8j3". Press Ctrl-a-c to switch to the new container
Switched IO
Switched IO


/ # 
/ # touch x
/ # ls
bin    etc    lib    mnt    proc   run    srv    tmp    var    x
dev    home   media  opt    root   sbin   sys    usr    work
/ # 
Switched IO
(buildx) exec sh
Process "fa8khvxhu0xkre6lqvpr4pz8i" started. Press Ctrl-a-c to switch to that process.
(buildx) Switched IO


/ # 
/ # ls
bin    etc    lib    mnt    proc   run    srv    tmp    var
dev    home   media  opt    root   sbin   sys    usr    work
/ # 

I think that we could simplify this by making sure that the lifetime of the container corresponds to the lifetime of the debugging session, or until the user calls rollback (so this might actually play weirdly with 56b9e78 from #1620 and leak resources forever).

I think we need to work out exactly what the lifetime semantics of the debugging session should be, but generally I think we should follow:

  • Processes are terminated up 1. when they exit, or 2. when their parent session terminates
  • A session should terminate 1. when it is explicitly terminated by the user, 2. when its parent controller is terminated (for local, when the calling process ends, for remote, when the server terminates), or 3. after a deadline during which no user interaction has occurred
    • The deadline is useful, so that a user can run a build, watch it fail, and then connect to the session to investigate and invoke, without needing to re-trigger the build.
  • The remote server should terminate itself when 1. the user requests it to using kill, or 2. it has no more sessions (so that after the deadlines of all the sessions have expired, we can cleanly exit).

We can refine these over time, but TL;DR I think the right call here is to not clean up the container when all the processes are terminated.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(also we seem to get a strange newline from stdin in the code example above, I'm not sure where it comes from 🤔)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall SGTM. Fixed not to mark it as unavailable except 2 cases:

  • 1: When PID1 exits. Currently PID1 is the first command executed by the user in the container (e.g. command specified by --invoke). In the future, maybe we can introduce a special PID1 process (e.g. tini?) for keeping the container available even when all user-specified commands exit.
  • 2: When one of the processes returns non-zero exit code. In this case, BuildKit's container API kills the container. To fix this behaviour, we need to fix BuildKit first so we should do this as a long-term following up work.

curRef = ref
}
fmt.Fprintf(stdout, "Attached to process %q. Press Ctrl-a-c to switch to the new container\n", id)
case "exec":
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably something for a follow-up, but I've noticed that this switch statement keeps growing and getting more and more indented 😢

We should probably try and work out how to get all of this functionality into separate methods on the monitor struct, to try and make it a little easier to have modular functionality.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the suggestion. It can be a large refactoring so let's work in a following-up PR.

controller/pb/controller.proto Show resolved Hide resolved
build/build.go Outdated Show resolved Hide resolved
@ktock ktock force-pushed the monitor-exec branch 2 times, most recently from d5b6705 to 6e45206 Compare February 16, 2023 09:54
@ktock
Copy link
Collaborator Author

ktock commented Feb 16, 2023

Thanks for the comments. Fixed the patch based on the comments.
Now processes are managed on the server side so monitor can switch IO among processes in different sessions without killing processes.

exec a process in a session:

(buildx) exec ps
Process "lg4atdbxa3al3l7l42lgo5sq3" started. Press Ctrl-a-c to switch to that process.
(buildx) Switched IO
PID   USER     TIME  COMMAND
    1 root      0:00 sh
    6 root      0:00 ps
Switched IO
(buildx) exec sh
Process "nwkw4tpg1zngwu1mkz600yrfe" started. Press Ctrl-a-c to switch to that process.
(buildx) Switched IO
/ # ps
PID   USER     TIME  COMMAND
    1 root      0:00 sh
   11 root      0:00 sh
   17 root      0:00 ps
/ # echo $$
11

attach to another process in the same session:

(buildx) ps
PID				CURRENT_SESSION	COMMAND
zncafsibzg3i6zriaviwsmzxw	false		[sh]
nwkw4tpg1zngwu1mkz600yrfe	true		[sh]
(buildx) Switched IO

/ # Switched IO
(buildx) attach zncafsibzg3i6zriaviwsmzxw
Attached to process "zncafsibzg3i6zriaviwsmzxw". Press Ctrl-a-c to switch to the new container
(buildx) Switched IO

/ # echo $$
1

attach to another process in another session:
The following attaches to process 3mqxm55ww9bcguidmkwmh96u2 in another session 1vo8nbia21t6rprjjtrmph3rv.

(buildx) list
ID				CURRENT_SESSION
1vo8nbia21t6rprjjtrmph3rv	false
y0vvv1ta2bkpgt8osm0yhpzbd	true
(buildx) attach 1vo8nbia21t6rprjjtrmph3rv
Attached to process "". Press Ctrl-a-c to switch to the new container
(buildx) ps
PID				CURRENT_SESSION	COMMAND
3mqxm55ww9bcguidmkwmh96u2	false		[sh]
(buildx) attach 3mqxm55ww9bcguidmkwmh96u2
Attached to process "3mqxm55ww9bcguidmkwmh96u2". Press Ctrl-a-c to switch to the new container
(buildx) Switched IO

/ # echo $$
1

If other session already attached to this process, this will be automatically detached.

attach back to the original process in the original session (process nwkw4tpg1zngwu1mkz600yrfe in y0vvv1ta2bkpgt8osm0yhpzbd):

(buildx) list
ID				CURRENT_SESSION
1vo8nbia21t6rprjjtrmph3rv	true
y0vvv1ta2bkpgt8osm0yhpzbd	false
(buildx) attach y0vvv1ta2bkpgt8osm0yhpzbd
Attached to process "". Press Ctrl-a-c to switch to the new container
(buildx) ps
PID				CURRENT_SESSION	COMMAND
zncafsibzg3i6zriaviwsmzxw	false		[sh]
nwkw4tpg1zngwu1mkz600yrfe	false		[sh]
(buildx) attach nwkw4tpg1zngwu1mkz600yrfe
Attached to process "nwkw4tpg1zngwu1mkz600yrfe". Press Ctrl-a-c to switch to the new container
(buildx) Switched IO

/ # echo $$
11

Copy link
Collaborator

@jedevc jedevc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ktock ❤️ This already looks a lot cleaner to me, just had a few more minor comments.

I noticed you also opened #1640, which seems to complete #1104? I'll take a look at that one asap as well, but ideally before moving forward past that into the next steps, we can dive into the TODOs left in the code as well as tidy up the lifetimes stuff, and clean up the UX before going too much further? I don't want to block merging this or any of the other debug PRs in, but I do want to make sure we have a strong foundation before trying to add too much more functionality (like breakpoints, etc).

build/invoke.go Outdated Show resolved Hide resolved
build/invoke.go Outdated
}

func (c *Container) Exec(ctx context.Context, cfg *InvokeConfig) error {
killCh := make(chan struct{})
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noticed this here, but also in some other places: I'd prefer to use contexts instead of channels for the purpose of sending cancel signals.

When ctx passed in here is done, then we'd send the kill signal, but I'm not sure why we can't just listen for ctx.Done in exec.

In general, I think it makes sense to use channels for cases where we explicitly need low-level concurrent primitives, or where we need to send messages between routines, but we should use contexts wherever possible to handle cancelling, for consistency with the rest of the codebase (in buildx and in buildkit).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

build/invoke.go Outdated Show resolved Hide resolved
// Invoke starts an IO session into the specified process.
// If pid doesn't matche to any running processes, it starts a new process with the specified config.
// If there is no container running or InvokeConfig.Rollback is speicfied, the process will start in a newly created container.
// NOTE: If needed, in the future, we can split this API into three APIs (NewContainer, NewProcess and Attach).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I prefer the suggested semantics over the current Invoke call - at the moment, it's a bit like Invoke does a bit of everything.

That said, I think it's good for now, but it would good to see a follow-up for this 😄

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion. Let's work on the refactor on this API in the following-up PRs.

controller/processes/processes.go Outdated Show resolved Hide resolved
statusChan chan *client.SolveStatus
result *build.ResultContext
inputPipe *io.PipeWriter
cancelBuild func()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is running multiple builds in a single session a supported scenario? If not (I'm fairly sure this is the case), we need to explicitly handle this and error out, if yes, then we need to make sure that subsequent builds don't lose the previous cancel handle.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No it isn't. Fixed to return an error if a build happens during another ongoing.

monitor/monitor.go Outdated Show resolved Hide resolved
Comment on lines +233 to +238
plist, err := c.ListProcesses(ctx, curRef)
if err != nil {
fmt.Println("cannot list process:", err)
continue
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At some point, as a follow-up it would be nice if we could use the actual PIDs in the container instead of the random identity.NewIDs - at the moment BuildKit doesn't expose that information though, so we'd need to make changes (probably in StartedMessage).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion. SGTM.

monitor/monitor.go Outdated Show resolved Hide resolved
@ktock
Copy link
Collaborator Author

ktock commented Feb 22, 2023

@jedevc Thank you for the review. Fixed the patch based on the comments,

@ktock
Copy link
Collaborator Author

ktock commented Feb 22, 2023

I noticed you also opened #1640, which seems to complete #1104? I'll take a look at that one asap as well, but ideally before moving forward past that into the next steps, we can dive into the TODOs left in the code as well as tidy up the lifetimes stuff, and clean up the UX before going too much further? I don't want to block merging this or any of the other debug PRs in, but I do want to make sure we have a strong foundation before trying to add too much more functionality (like breakpoints, etc).

SGTM 👍

build/invoke.go Outdated
)

// InvokeConfig is configuration for a process to run in the container.
type InvokeConfig struct {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should remove this struct, and instead use controllerapi.InvokeConfig, it would be good to avoid having multiple duplicate struct. The main difference seems to be that we obviously can't attach io objects to it, we can pass those as separate args (maybe using ioset.In to stick them together).

Note on having the duplicate User and NoUser fields to represent an optional value (which if we were using a modern version of protobuf, we could use the optional field in https://github.com/protocolbuffers/protobuf/blob/main/docs/field_presence.md). I guess there's not reason we have to use gogo proto for buildx like we do for buildkit (we don't care about version compatibility), we could just switch to using upstream? https://pkg.go.dev/google.golang.org/protobuf. Definitely follow-up material 🎉 cc @crazy-max since you've done protobuf investigations.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the comment. Removed InvokeConfig.

controller/local/controller.go Outdated Show resolved Hide resolved
controller/local/controller.go Show resolved Hide resolved
@jedevc
Copy link
Collaborator

jedevc commented Mar 2, 2023

These should just be nits now, any major changes we can do in follow-ups 👍

Copy link
Collaborator

@jedevc jedevc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, happy to address the unresolved comments as follow-ups later.

PTAL @tonistiigi 😄

@ktock
Copy link
Collaborator Author

ktock commented Mar 7, 2023

@tonistiigi PTAL 🙏

@tonistiigi tonistiigi merged commit c5ce08b into docker:master Mar 7, 2023
@ktock ktock deleted the monitor-exec branch March 8, 2023 01:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants