Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid race when opening exec fifo #1698

Merged

Conversation

craigfurman
Copy link

When starting a container with runc start or runc run, the stub
process (runc[2:INIT]) opens a fifo for writing. Its parent runc process
will open the same fifo for reading. In this way, they synchronize.

If the stub process exits at the wrong time, the parent runc process
will block forever.

This can happen when racing 2 runc operations against each other: runc run/start, and runc delete. It could also happen for other reasons,
e.g. the kernel's OOM killer may select the stub process.

This commit resolves this race by racing the opening of the exec fifo
from the runc parent process against the stub process exiting. If the
stub process exits before we open the fifo, we return an error.

Another solution is to wait on the stub process. However, it seems it
would require more refactoring to avoid calling wait multiple times on
the same process, which is an error.

Note: We aren't really sure how to integration test this in a sane way. In Garden, we wrote a test but it involves patching in:

diff --git a/libcontainer/standard_init_linux.go b/libcontainer/standard_init_linux.go
index 8a544ed5..84cd8765 100644
--- a/libcontainer/standard_init_linux.go
+++ b/libcontainer/standard_init_linux.go
@@ -6,7 +6,9 @@ import (
 	"fmt"
 	"os"
 	"os/exec"
+	"strings"
 	"syscall" //only for Exec
+	"time"
 
 	"github.com/opencontainers/runc/libcontainer/apparmor"
 	"github.com/opencontainers/runc/libcontainer/configs"
@@ -169,6 +171,9 @@ func (l *linuxStandardInit) Init() error {
 	// user process. We open it through /proc/self/fd/$fd, because the fd that
 	// was given to us was an O_PATH fd to the fifo itself. Linux allows us to
 	// re-open an O_PATH fd through /proc.
+	if !strings.Contains(name, "init") {
+		time.Sleep(time.Hour)
+	}
 	fd, err := unix.Open(fmt.Sprintf("/proc/self/fd/%d", l.fifoFd), unix.O_WRONLY|unix.O_CLOEXEC, 0)
 	if err != nil {
 		return newSystemErrorWithCause(err, "open exec fifo")

to expose the race condition, and then performing runc run and runc delete operations. Hopefully someone has a better idea of how to get a more sensible test into runc.

[Fixes: #1697]

Cheers,
@williammartin & Craig

if err := readFromExecFifo(f); err != nil {
return err
}
return os.Remove(path)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should the file be closed first or does it not matter?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why this should matter. I suppose the fd will point to an inaccessible location on the filesystem for some amount of time, but in terms of code cleanliness, the defer seems better?


func awaitProcessExit(pid int) <-chan struct{} {
isDead := make(chan struct{})
go func() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens when the exec fifo is opened successfully? Will this go routine live forever?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good spot, this is definitely a problem in the attach case. Think we've fixed it.

func awaitFifoOpen(path string) <-chan openResult {
fifoOpened := make(chan openResult)
go func() {
f, err := os.OpenFile(path, os.O_RDONLY, 0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same issue here, if the process dies, how do we unblock this goroutine?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure this is in an issue like the other one, if the process dies we error out https://github.com/cloudfoundry-incubator/runc/blob/exec-fifo-race/libcontainer/container_linux.go#L275 and then cleanup happens as it would in any other case.

When starting a container with `runc start` or `runc run`, the stub
process (runc[2:INIT]) opens a fifo for writing. Its parent runc process
will open the same fifo for reading. In this way, they synchronize.

If the stub process exits at the wrong time, the parent runc process
will block forever.

This can happen when racing 2 runc operations against each other: `runc
run/start`, and `runc delete`. It could also happen for other reasons,
e.g. the kernel's OOM killer may select the stub process.

This commit resolves this race by racing the opening of the exec fifo
from the runc parent process against the stub process exiting. If the
stub process exits before we open the fifo, we return an error.

Another solution is to wait on the stub process. However, it seems it
would require more refactoring to avoid calling wait multiple times on
the same process, which is an error.

Signed-off-by: Craig Furman <[email protected]>
@crosbymichael
Copy link
Member

crosbymichael commented Jan 22, 2018

LGTM

Approved with PullApprove

@crosbymichael
Copy link
Member

ping @mrunalp @hqhq

@mrunalp
Copy link
Contributor

mrunalp commented Jan 22, 2018

Looking

@mrunalp
Copy link
Contributor

mrunalp commented Jan 22, 2018

LGTM

Approved with PullApprove

go func() {
f, err := os.OpenFile(path, os.O_RDONLY, 0)
if err != nil {
fifoOpened <- openResult{err: newSystemErrorWithCause(err, "open exec fifo for reading")}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might not affect cleanup, but I think we better return here, because consumer only read it once.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We agree from a code cleanliness perspective, although you're right that it doesn't affect cleanup. We've added a return after that line.

@hqhq
Copy link
Contributor

hqhq commented Jan 23, 2018

One minor tip, otherwise LGTM to me.

@phsiao
Copy link

phsiao commented Jan 23, 2018

I was able to test this patch, and have updated moby/moby#36010 with my finding so far. In short, it does appear to resolve the issue I was having.

@craigfurman
Copy link
Author

We added another commit to address a comment. If you'd like us to rebase and squash let us know.

@hqhq
Copy link
Contributor

hqhq commented Jan 23, 2018

LGTM, thanks.

Approved with PullApprove

@crosbymichael
Copy link
Member

crosbymichael commented Jan 23, 2018

LGTM

Approved with PullApprove

mstemm added a commit to falcosecurity/falco that referenced this pull request Jan 31, 2020
Sample Falco alert:

```
File below / or /root opened for writing (user=<NA>
command=runc:[1:CHILD] init parent=docker-runc-cur file=/exec.fifo
program=runc:[1:CHILD] CID1 image=<NA>)
```

This github issue provides some context:
opencontainers/runc#1698

Signed-off-by: Mark Stemm <[email protected]>
mstemm added a commit to falcosecurity/falco that referenced this pull request Feb 1, 2020
Sample Falco alert:

```
File below / or /root opened for writing (user=<NA>
command=runc:[1:CHILD] init parent=docker-runc-cur file=/exec.fifo
program=runc:[1:CHILD] CID1 image=<NA>)
```

This github issue provides some context:
opencontainers/runc#1698

Signed-off-by: Mark Stemm <[email protected]>
poiana pushed a commit to falcosecurity/falco that referenced this pull request Feb 3, 2020
Sample Falco alert:

```
File below / or /root opened for writing (user=<NA>
command=runc:[1:CHILD] init parent=docker-runc-cur file=/exec.fifo
program=runc:[1:CHILD] CID1 image=<NA>)
```

This github issue provides some context:
opencontainers/runc#1698

Signed-off-by: Mark Stemm <[email protected]>
leogr pushed a commit to falcosecurity/rules that referenced this pull request Dec 21, 2022
Sample Falco alert:

```
File below / or /root opened for writing (user=<NA>
command=runc:[1:CHILD] init parent=docker-runc-cur file=/exec.fifo
program=runc:[1:CHILD] CID1 image=<NA>)
```

This github issue provides some context:
opencontainers/runc#1698

Signed-off-by: Mark Stemm <[email protected]>
leogr pushed a commit to falcosecurity/rules that referenced this pull request Dec 21, 2022
Sample Falco alert:

```
File below / or /root opened for writing (user=<NA>
command=runc:[1:CHILD] init parent=docker-runc-cur file=/exec.fifo
program=runc:[1:CHILD] CID1 image=<NA>)
```

This github issue provides some context:
opencontainers/runc#1698

Signed-off-by: Mark Stemm <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

runc run can hang indefinitely
8 participants