-
Notifications
You must be signed in to change notification settings - Fork 949
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] refactor: redesign container io in pouch #2375
[RFC] refactor: redesign container io in pouch #2375
Conversation
Codecov Report
@@ Coverage Diff @@
## master #2375 +/- ##
==========================================
+ Coverage 68.31% 68.58% +0.26%
==========================================
Files 275 272 -3
Lines 18331 18206 -125
==========================================
- Hits 12523 12486 -37
+ Misses 4372 4302 -70
+ Partials 1436 1418 -18
|
ping @starnop and @YaoZengzeng to help me review the cri part. Thanks |
cli/run.go
Outdated
@@ -71,6 +72,7 @@ func (rc *RunCommand) runRun(args []string) error { | |||
} | |||
containerName := rc.name | |||
config.ContainerConfig.OpenStdin = rc.stdin | |||
config.ContainerConfig.StdinOnce = rc.stdin // close stdin after attach connection closed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if pouch run with -i -d flag enable and when the terminal detach, this container can't be attached again, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add a another flag for end-users to control is a good choice?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it has been blocked by
if (rc.attach || rc.stdin) && rc.detach {
return fmt.Errorf("Conflicting options: -a (or -i) and -d")
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add a another flag for end-users to control is a good choice?
basically, I don't think so. The cli is just to help user to use pouch easier. This configuration is too complicated. I don't see any requirement for this.
daemon/containerio/io.go
Outdated
"github.com/alibaba/pouch/daemon/logger" | ||
"github.com/alibaba/pouch/daemon/logger/crilog" | ||
"github.com/alibaba/pouch/pkg/streams" | ||
"github.com/sirupsen/logrus" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add blank line before this line?
In this pr desc, the |
Yes. it can be shared by more than one attacher. I was struggled from naming this. |
I have tested the function |
if !stdin { | ||
fifos.In = "" | ||
if !withTerminal { | ||
cfg.Stderr = filepath.Join(fifoDir, processID+"-stderr") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not think stderr fifo has related with terminal is open
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. I checked that containerd-shim. The shim will open socket as pty-slave. There is only one output from pty-slave so that we don't need stderr fifo if the withTerminal
is true
} | ||
if g.dir != "" { | ||
return os.RemoveAll(g.dir) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before, we remove fifo dir here, now it moved to closeFn, does it matter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so. Before we need the dir to cleanup. I move it the closeFn
, like [email protected]. All the cio.go
's changes are for upgrading containerd.
func (g *wgCloser) Cancel() { | ||
g.cancel() | ||
// TODO(fuweid): pipes will be removed when update vendor to [email protected]. | ||
type pipes struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we use pipe, direct IO before ? I will do more check
} | ||
|
||
func copyIO(fifos *containerdio.FIFOSet, ioset *ioSet, tty bool) (_ *wgCloser, err error) { | ||
func openPipes(ctx context.Context, fifos *CioFIFOSet) (_ pipes, err0 error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know, you implement pipe for here use.
any update @Ace-Tang @YaoZengzeng |
ctrd/container.go
Outdated
|
||
// Both the caller and container/exec process holds write side pipe | ||
// for the stdin. When the caller closes the write pipe, the process doesn't | ||
// exit until the caller. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we missed some comment here? 😆
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes. Wait() to make sure pouchd receives all the data from shim. And Close()
is used to make sure the Attach() should exit because there is no data. Thank @YaoZengzeng. In the cleanup, we should Wait() before Close(). I miss that. will update. 👍
@YaoZengzeng I have updated that the |
mw.writers = append(mw.writers[:i-n], mw.writers[i-n+1:]...) | ||
} | ||
mw.Unlock() | ||
return len(p), nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if all of the writer writing bytes fails here, still return len(p)
. Is it reasonable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since we don't return error here, we need to return the len(p) here. The writer hub uses io.Pipe as io.WriteCloser
. All data are handled in memory, not the real file or network IO. If it run into a error, there is some wrong with pouch and we should restart pouchd.
I think I can add the warning message here if it run into error during writing. How do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Print the warning message is enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am confused about the lastErr pattern. You will lose some error messages. What about using multierror or print a log.
mw.writers = append(mw.writers[:i-n], mw.writers[i-n+1:]...) | ||
} | ||
mw.Unlock() | ||
return len(p), nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Print the warning message is enough.
func (s *Stream) CopyPipes(p Pipes) { | ||
copyfn := func(styp string, w io.WriteCloser, r io.ReadCloser) { | ||
s.Add(1) | ||
go func() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Controlled by waitGroup to promise all of the data has finished copying(abnormal exit is also regarded as finish). Just comment to record this.
} | ||
|
||
if s.stdin != nil && p.Stdin != nil { | ||
go func() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not controlled by waitGroup. Because it may be suddenly closed by user. Just comment to record this.
apis/server/container_bridge.go
Outdated
} | ||
|
||
attach.UseStdin = req.FormValue("stdin") == "1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
using httputils.BoolValue(req, "stdin")
instead?
LGTM, more test later. |
ping @zhuangqh I update with your recommendation. please take a look. |
return fifos, nil | ||
closeFn := func() error { | ||
if err := os.RemoveAll(fifoDir); err != nil { | ||
logrus.WithError(err).Warnf("failed to remove process(id=%v) fifo dir", processID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not return error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
got it.
// NewFifos returns a new set of fifos for the task | ||
func NewFifos(id string, stdin bool) (*containerdio.FIFOSet, error) { | ||
// NewCioFIFOSet prepares fifo files. | ||
func NewCioFIFOSet(processID string, withStdin bool, withTerminal bool) (*CioFIFOSet, error) { | ||
root := "/run/containerd/fifo" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can config for io root directory?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. But I just keep it unchanged. If we have this requirement, we can file other PR to handle this.
|
||
err = c.ContainerMgr.StartExec(ctx, execid, attachConfig) | ||
if err != nil { | ||
if err := c.ContainerMgr.StartExec(ctx, execid, attachCfg); err != nil { | ||
return nil, fmt.Errorf("failed to start exec for container %q: %v", id, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
change fmt.Errorf to errors.Wrapf?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just keep it unchanged and file other PR to handle this.
@@ -27,6 +26,7 @@ import ( | |||
"github.com/alibaba/pouch/pkg/errtypes" | |||
"github.com/alibaba/pouch/pkg/meta" | |||
"github.com/alibaba/pouch/pkg/reference" | |||
pkgstreams "github.com/alibaba/pouch/pkg/streams" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why don't use steams?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because cri has a package named by streams
. Just use pkgstreams to distinguish two packages
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
emmmmm, cri package is steam
, your package is steams
, he he he he
}, nil | ||
execConfig, err := c.ContainerMgr.GetExecConfig(ctx, execid) | ||
if err != nil { | ||
return nil, fmt.Errorf("failed to inspect exec for container %q: %v", id, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we change all fmt.Errorf("failed to xxxx %q: %v", id, err)
to errors.Wrapf()
@@ -791,3 +831,107 @@ func withCheckpointOpt(checkpoint *containerdtypes.Descriptor) containerd.NewTas | |||
return nil | |||
} | |||
} | |||
|
|||
// InitStdio allows caller to handle any initialize job. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/job/jobs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think job is ok 😂
daemon/logger/crilog/log.go
Outdated
type streamType string | ||
|
||
const ( | ||
// Stdout stream type. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/streamStdout/Stdout
daemon/logger/crilog/log.go
Outdated
const ( | ||
// Stdout stream type. | ||
streamStdout streamType = "stdout" | ||
// Stderr stream type. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/streamStderr/Stderr
options = append(options, containerio.WithStdin(attach.Stdin)) | ||
} else { | ||
options = append(options, containerio.WithDiscard()) | ||
if cntrio = mgr.IOs.Get(c.ID); cntrio == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
containerIO's abbreviation is cntrio, is it ok? some is container -> ctr
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
container -> cntr I think it's ok 😅
LGTM |
Signed-off-by: Wei Fu <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Signed-off-by: Wei Fu [email protected]
Ⅰ. Describe what this PR did
Redesign container io in pouch. but Why?
The original design always uses
ringbuffer
to cache the data which causes streams lost data sometime. It makes the regression test unstable.It also makes caller hard to touch data from containerd-shim. That is why we still miss the ReopenLog in CRI. I think we should fix that.
design overview
Each container/process will hold streams struct to handle the io stream. The struct is like:
In the stream struct, both the
stdout
andstderr
are the multiple writer's hub. It means that the data from stdout/stderr of process will have more copies. For example, when user userun
to start container, the container's output data will have two copies. One is for the terminal side, the other one is for container's log, such as json-file by default.But for the
stdin
, we don't need the hub likestdout/stderr
, because the input of process is only one. Any attach requests can send the data to the process in container. So we exports thestdinPipe
to allow different attach requests to touch the same input channel.Based on this, the stream acts like bridge to connect attach request and stream of process in container. If the user wants to attach running container, just adds hijack/stream into the output hub and attach it to the
stdinPipe
. The stream can provides more flexible than before.Before this change, the CRI struggles to integrating with Container Mgr. Why? because CRI component doesn't know the stream has been finished. Always use dead loop to check the
Exec
status. It's not good patten.why add logdriver/crilog into the writer hub?
Both logdriver and crilog don't act like
io.Writer
. It always add meta data for the raw data. Adding the partial label is one of examples for the acting way of logdriver. In order to avoid to add duplicate code for each driver, we useLogCopier
to handle this.There is a little bit different between
crilog
andlogdriver
. Thecrilog
hasReopenLog
request. It means that it will add new crilog and close previous one.In current design, we might have duplicate data in the crilog. Because we close previous, the writer-hub might write something into that. I think it's ok right now.
What is the CloseIO and
StdinOnce
?The contained-shim will open fifo twice for reading and writing. For the writing mode, the shim doesn't close stdin fifo until the client calls
CloseIO
. In some case, the pouch daemon might be crash before finishing the input. If shim doesn't hold writing mode fifo, the process in container will consider that it is EOF signal and exit.Based on this case, if the client sends EOF signal to input channel, the pouchd should send the CloseIO to shim to let the process exit.
StdinOnce
in container's config is used by attach request. If theStdinOnce
is true, when one attach request finishes stream copy, the pouchd will closes the input of process. So other attach requests to the same container will be stopped.If the user wants
StdinOnce
, it should set it to true during creating container.Ⅱ. Does this pull request fix one issue?
NONE
Ⅲ. Why don't you add test cases (unit test/integration test)? (你真的觉得不需要加测试吗?)
Added it.
Ⅳ. Describe how to verify it
Wait for CI.
Ⅴ. Special notes for reviews
I remove the
non-blocking
feature in this change and will add it in next patch.