Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature: add timeout handler for execSync in cri part #1280

Closed
wants to merge 1 commit into from

Conversation

ZouRui89
Copy link
Contributor

@ZouRui89 ZouRui89 commented May 8, 2018

Signed-off-by: Zou Rui [email protected]

Ⅰ. Describe what this PR did

This pr adds timeout handler for execSync in cri part.
I/O pipe mechanism is used to enhance the degree of simplicity, based on which the data type of memBuffer is changed to allow I/O operations.
Thus, instead of checking if the execution is finished every 100 millisecond, the error io.copy returns is used to control the data flow.

Ⅱ. Does this pull request fix one issue?

fixes #1274

Ⅲ. Describe how you did it

Ⅳ. Describe how to verify it

the cri-test has passed.

Ⅴ. Special notes for reviews

@@ -23,6 +24,7 @@ import (
// NOTE: "golang.org/x/net/context" is compatible with standard "context" in golang1.7+.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove the useless line :)

var output bytes.Buffer
startConfig := &apitypes.ExecStartConfig{}
_, err = writer.Write(output.Bytes())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pipe doesn't work here.

@ZouRui89 ZouRui89 force-pushed the cri_timeout branch 13 times, most recently from f67060d to 16ecda0 Compare May 11, 2018 03:49
var execConfig *ContainerExecConfig
for {
execConfig, err = c.ContainerMgr.GetExecConfig(ctx, execid)
var recv bytes.Buffer
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should create goroutine here to make the timeout work.

readWaitCh := make(chan error, 1)
go func() {
    defer reader.Close()
    _ , err = io.Copy(&recv, reader)
    readWaitCh <- err
}()

....

select {
case <-ctx.Done():
...
case <-readWaitCh:
...
}

@ZouRui89 ZouRui89 force-pushed the cri_timeout branch 9 times, most recently from 9f3b771 to 263b28a Compare May 11, 2018 09:35
@pouchrobot pouchrobot added size/L and removed size/M labels May 11, 2018
@ZouRui89 ZouRui89 force-pushed the cri_timeout branch 2 times, most recently from 97312bb to 5432020 Compare May 11, 2018 11:01
@ZouRui89 ZouRui89 force-pushed the cri_timeout branch 10 times, most recently from b88a93f to b665e79 Compare May 13, 2018 10:18
readWaitCh <- err
}()

select {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add the ctx.Done() selector here because the request maybe canceled without waiting the result.

Copy link
Contributor Author

@ZouRui89 ZouRui89 May 14, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

timeoutCh is doing the same thing as ctx.Done( ), since I have tried ctx.Done( ) and some unknown errors will be brought about by that.
And this is the exact way that cri-containerd used to check the timeout moment, therefore I am following the way it is.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, the ctx.Done() has been handled by the interceptor. It has the functionality like the timeoutCh, but not the same thing. Therefore, could you mind to provide information about what unknown error you mentioned?


select {
case <-timeoutCh:
return nil, fmt.Errorf("timeout %v exceeded", timeout)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is the timeout, should we stop the exec process? If not, the timeout doesn't make senses here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you be more specific about the stop execution? I have thought about that but have no idea about how to implement it. :)

@@ -12,7 +11,7 @@ func init() {
}

type memBuffer struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we have changed the backend from memory buffer to pipe, the name memBuffer is no longer properly.

@ZouRui89 Please change to a more appropriate name.

@@ -102,7 +102,7 @@ func WithStdinHijack() func(*Option) {
}

// WithMemBuffer specified the memory buffer backend.
func WithMemBuffer(memBuffer *bytes.Buffer) func(*Option) {
func WithMemBuffer(memBuffer *io.PipeWriter) func(*Option) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as the issue mentioned above.

select {
case <-ctx.Done():
return nil, fmt.Errorf("timeout %v exceeded", timeout)
case <-readWaitCh:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If what we get from readWaitCh is an error other then nil that means something wrong with the exec.

So if we get an error, maybe report it directly to k8s is more properly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the error is not nil, I have used logrus.Infof("failed to read data from the pipe") to report the situation.

go func() {
defer reader.Close()
_, err = io.Copy(&recv, reader)
if err != nil {
Copy link
Contributor

@fuweid fuweid May 14, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we don't need to check the status here. Just put the error into the readWaitCh and check it on the receiver's side. As @YaoZengzeng said, we should return the error to the k8s. WDYT

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. That sounds more reasonable. :)

@ZouRui89 ZouRui89 force-pushed the cri_timeout branch 2 times, most recently from 774f70a to c95951b Compare May 14, 2018 06:34
@pouchrobot
Copy link
Collaborator

ping @ZouRui89

CI fails according integration system.
Please refer to the CI failure Details button to corresponding test, and update your PR to pass CI.

If this is flaky test, welcome to track this with profiling an issue.

build url: https://travis-ci.org/alibaba/pouch/builds/378578974
build duration: 1259s

@pouchrobot
Copy link
Collaborator

ping @ZouRui89
Conflict happens after merging a previous commit.
Please rebase the branch against master and push it back again. Thanks a lot.

@ZouRui89 ZouRui89 closed this May 15, 2018
@ZouRui89 ZouRui89 deleted the cri_timeout branch May 16, 2018 01:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[feature request] add timeout in ExecSync for CRI
4 participants