Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shim Connect method returns VM-internal information #210

Open
sipsma opened this issue Jun 13, 2019 · 4 comments
Open

Shim Connect method returns VM-internal information #210

sipsma opened this issue Jun 13, 2019 · 4 comments

Comments

@sipsma
Copy link
Contributor

sipsma commented Jun 13, 2019

The Connect method served by a v2 runtime service is supposed to provide the Shim's PID and a given Task's PID.

Right now, our host-side runtime shim just forwards the request to the VM Agent, so the results returned are specific to the VM, which is a bit surprising for a caller outside the VM to receive (they can't really do much with a PID that exists inside the VM):

func (s *service) Connect(requestCtx context.Context, req *taskAPI.ConnectRequest) (*taskAPI.ConnectResponse, error) {
defer logPanicAndDie(log.G(requestCtx))
log.G(requestCtx).WithField("id", req.ID).Debug("connect")
task, err := s.taskManager.Task(req.ID)
if err != nil {
return nil, err
}
resp, err := task.Connect(requestCtx, req)
if err != nil {
return nil, err
}

At the time of this writing, the Connect API doesn't appear to be exposed to containerd clients; it does however appear to be used by containerd internally in order to reconnect to a shim after the containerd service restarts.

There needs to be more investigation to determine the proper fix here. I suspect it would make sense to at least return the host-side Shim PID instead of the VM Agent PID, however it's less immediately clear what we should return as the Task PID since there is not a host-side PID for the task. Depending on how containerd ends up actually using that Task PID it may or may not make sense to just return the PID of the VM as the Task PID.

@samuelkarp samuelkarp added this to the "pod"/"task" support milestone Jun 17, 2019
@sipsma sipsma removed this from the "pod"/"task" support milestone Jun 18, 2019
@kzys
Copy link
Contributor

kzys commented Jul 17, 2019

Yeah. Not so sure what people want to do with the PIDs. The .proto file doesn't explain the semantics much.

https://github.com/containerd/containerd/blob/faec567304bbdf6864b1663d4f813641b5880a4a/runtime/v2/task/shim.proto#L168

@sipsma
Copy link
Contributor Author

sipsma commented Jul 17, 2019

Yeah. Not so sure what people want to do with the PIDs. The .proto file doesn't explain the semantics much.

https://github.com/containerd/containerd/blob/faec567304bbdf6864b1663d4f813641b5880a4a/runtime/v2/task/shim.proto#L168

Yeah, just going by its use in their code, containerd internally appears to use it to check if a shim is still alive after the containerd daemon restarts.

If they exposed it to clients in the future (I don't think it is currently) I'd suspect it could be used for monitoring the process so they can know directly if something goes wrong and the shim exits unexpectedly or something like that.

I'm now also wondering if in the short term we should replace the implementation of this method with one that returns a ErrNotImplemented, which the containerd runtimev2 docs suggest is the right thing to do. Once we figure out an actual solution to this problem, we can replace it with a different implementation if needed.

@kzys
Copy link
Contributor

kzys commented Jul 26, 2019

Now we return ErrNotImplemented instead.

I'd like keep this issue open until we figure out a long-term solution (or decide that returning the error is the long-term solution).

@henry118
Copy link
Member

It's probably the time to formally address this issue now for containerd v1.6.0 release.

The task creation logic here needs a PID from shim. And the new logic in shim layer will call Connect API to fetch the PID, as opposed to the old way.

This essentially breaks the entire task creation process in fc-cd.

The error is:

# docker run --rm -it --privileged --ipc=host --volume /dev:/dev --volume /run/udev/control:/run/udev/control --volume /home/ec2-user/devel/my-fccd/examples/etc/containerd/firecracker-runtime.json:/etc/containerd/firecracker-runtime.json --volume /home/ec2-user/devel/my-fccd/examples/logs:/var/log/firecracker-containerd-test --volume /home/ec2-user/devel/my-fccd/examples/..:/src --env FICD_DM_VOLUME_GROUP= --env FICD_DM_POOL=testpool --env EXTRAGOARGS="" --workdir="/src/examples" localhost/firecracker-containerd-test:latest "make examples && make testtap && sleep 3 && ./taskworkflow -ip 172.16.0.2/24 -gw 172.16.0.1 -ss devmapper"
make: Nothing to be done for 'examples'.
ip link add br0 type bridge
ip tuntap add tap0 mode tap
ip link set tap0 master br0
ip link set dev tap0 up
ip link set dev br0 up
ip addr add dev br0 172.16.0.1/24
2022/02/25 04:23:15.189142 Creating containerd client
2022/02/25 04:23:15.189855 Created containerd client
2022/02/25 04:23:19.634736 Successfully pulled docker.io/library/nginx:1.17-alpine image with devmapper
2022/02/25 04:23:30.035466 failed to stop VM, err: rpc error: code = Internal desc = the VMM was killed forcibly: 1 error occurred:
	* signal: terminated


2022/02/25 04:23:30.035517 creating task: failed to get task pid: not implemented: unknown

fangn2 pushed a commit to fangn2/firecracker-containerd that referenced this issue Mar 23, 2023
…kips

Move soft_fail back to master tests
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants