-
Notifications
You must be signed in to change notification settings - Fork 503
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
monitor: Enable to exec into the container #1626
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shared some initial thoughts - but I think generally, it works! 🎉 I'll definitely make sure that we don't stall on this PR as we did on #1296 😢😢
Aside from the technical things, we definitely need to make sure we revisit the DX here (probably after all of #1104 is completed) - the combination of commands we have some weird confusions, like kill
kills the server instead of processes spawned with exec
and seen with ps
. I gave some suggestions in a comment on session lifetimes, so we may actually want to get rid of the explicit kill command in the server at some point (or maybe just rename it if we really want it).
build/build.go
Outdated
} | ||
close(doneCh) | ||
}() | ||
func (c *Container) IsUnavailable() bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to have a separate method to mark this as unavailable? Atm, I think what we'll get is a separate container every time all the processes in that container terminate? e.g.:
Launching interactive container. Press Ctrl-a-c to switch to monitor console
Interactive container was restarted with process "7vs69sixlm1osw0jywealy8j3". Press Ctrl-a-c to switch to the new container
Switched IO
Switched IO
/ #
/ # touch x
/ # ls
bin etc lib mnt proc run srv tmp var x
dev home media opt root sbin sys usr work
/ #
Switched IO
(buildx) exec sh
Process "fa8khvxhu0xkre6lqvpr4pz8i" started. Press Ctrl-a-c to switch to that process.
(buildx) Switched IO
/ #
/ # ls
bin etc lib mnt proc run srv tmp var
dev home media opt root sbin sys usr work
/ #
I think that we could simplify this by making sure that the lifetime of the container corresponds to the lifetime of the debugging session, or until the user calls rollback
(so this might actually play weirdly with 56b9e78 from #1620 and leak resources forever).
I think we need to work out exactly what the lifetime semantics of the debugging session should be, but generally I think we should follow:
- Processes are terminated up 1. when they exit, or 2. when their parent session terminates
- A session should terminate 1. when it is explicitly terminated by the user, 2. when its parent controller is terminated (for local, when the calling process ends, for remote, when the server terminates), or 3. after a deadline during which no user interaction has occurred
- The deadline is useful, so that a user can run a build, watch it fail, and then connect to the session to investigate and invoke, without needing to re-trigger the build.
- The remote server should terminate itself when 1. the user requests it to using
kill
, or 2. it has no more sessions (so that after the deadlines of all the sessions have expired, we can cleanly exit).
We can refine these over time, but TL;DR I think the right call here is to not clean up the container when all the processes are terminated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(also we seem to get a strange newline from stdin in the code example above, I'm not sure where it comes from 🤔)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall SGTM. Fixed not to mark it as unavailable except 2 cases:
- 1: When PID1 exits. Currently PID1 is the first command executed by the user in the container (e.g. command specified by
--invoke
). In the future, maybe we can introduce a special PID1 process (e.g. tini?) for keeping the container available even when all user-specified commands exit. - 2: When one of the processes returns non-zero exit code. In this case, BuildKit's container API kills the container. To fix this behaviour, we need to fix BuildKit first so we should do this as a long-term following up work.
curRef = ref | ||
} | ||
fmt.Fprintf(stdout, "Attached to process %q. Press Ctrl-a-c to switch to the new container\n", id) | ||
case "exec": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably something for a follow-up, but I've noticed that this switch statement keeps growing and getting more and more indented 😢
We should probably try and work out how to get all of this functionality into separate methods on the monitor struct, to try and make it a little easier to have modular functionality.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the suggestion. It can be a large refactoring so let's work in a following-up PR.
d5b6705
to
6e45206
Compare
Thanks for the comments. Fixed the patch based on the comments.
If other session already attached to this process, this will be automatically detached.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ktock ❤️ This already looks a lot cleaner to me, just had a few more minor comments.
I noticed you also opened #1640, which seems to complete #1104? I'll take a look at that one asap as well, but ideally before moving forward past that into the next steps, we can dive into the TODO
s left in the code as well as tidy up the lifetimes stuff, and clean up the UX before going too much further? I don't want to block merging this or any of the other debug PRs in, but I do want to make sure we have a strong foundation before trying to add too much more functionality (like breakpoints, etc).
build/invoke.go
Outdated
} | ||
|
||
func (c *Container) Exec(ctx context.Context, cfg *InvokeConfig) error { | ||
killCh := make(chan struct{}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noticed this here, but also in some other places: I'd prefer to use contexts instead of channels for the purpose of sending cancel signals.
When ctx
passed in here is done, then we'd send the kill signal, but I'm not sure why we can't just listen for ctx.Done
in exec
.
In general, I think it makes sense to use channels for cases where we explicitly need low-level concurrent primitives, or where we need to send messages between routines, but we should use contexts wherever possible to handle cancelling, for consistency with the rest of the codebase (in buildx and in buildkit).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
// Invoke starts an IO session into the specified process. | ||
// If pid doesn't matche to any running processes, it starts a new process with the specified config. | ||
// If there is no container running or InvokeConfig.Rollback is speicfied, the process will start in a newly created container. | ||
// NOTE: If needed, in the future, we can split this API into three APIs (NewContainer, NewProcess and Attach). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I prefer the suggested semantics over the current Invoke
call - at the moment, it's a bit like Invoke
does a bit of everything.
That said, I think it's good for now, but it would good to see a follow-up for this 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestion. Let's work on the refactor on this API in the following-up PRs.
controller/remote/server.go
Outdated
statusChan chan *client.SolveStatus | ||
result *build.ResultContext | ||
inputPipe *io.PipeWriter | ||
cancelBuild func() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is running multiple builds in a single session a supported scenario? If not (I'm fairly sure this is the case), we need to explicitly handle this and error out, if yes, then we need to make sure that subsequent builds don't lose the previous cancel handle.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No it isn't. Fixed to return an error if a build happens during another ongoing.
plist, err := c.ListProcesses(ctx, curRef) | ||
if err != nil { | ||
fmt.Println("cannot list process:", err) | ||
continue | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At some point, as a follow-up it would be nice if we could use the actual PIDs in the container instead of the random identity.NewID
s - at the moment BuildKit doesn't expose that information though, so we'd need to make changes (probably in StartedMessage
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestion. SGTM.
@jedevc Thank you for the review. Fixed the patch based on the comments, |
SGTM 👍 |
build/invoke.go
Outdated
) | ||
|
||
// InvokeConfig is configuration for a process to run in the container. | ||
type InvokeConfig struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should remove this struct, and instead use controllerapi.InvokeConfig
, it would be good to avoid having multiple duplicate struct. The main difference seems to be that we obviously can't attach io
objects to it, we can pass those as separate args (maybe using ioset.In
to stick them together).
Note on having the duplicate User
and NoUser
fields to represent an optional value (which if we were using a modern version of protobuf, we could use the optional
field in https://github.com/protocolbuffers/protobuf/blob/main/docs/field_presence.md). I guess there's not reason we have to use gogo proto for buildx like we do for buildkit (we don't care about version compatibility), we could just switch to using upstream? https://pkg.go.dev/google.golang.org/protobuf. Definitely follow-up material 🎉 cc @crazy-max since you've done protobuf investigations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the comment. Removed InvokeConfig
.
These should just be nits now, any major changes we can do in follow-ups 👍 |
Signed-off-by: Kohei Tokunaga <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, happy to address the unresolved comments as follow-ups later.
PTAL @tonistiigi 😄
@tonistiigi PTAL 🙏 |
Related to #1104
exec
execs a new command in the containerattach
attaches IO to the specified process.note
docker-container
driver to always include init process in the driver container. This is for making sure to clean up exited processes created by BuildKit's container API.