Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

daemon.ContainerStop(): fix for a negative timeout #36874

Merged
merged 2 commits into from
May 30, 2018

Conversation

kolyshkin
Copy link
Contributor

@kolyshkin kolyshkin commented Apr 17, 2018

As daemon.ContainerStop() documentation says,

If a negative number of seconds is given, ContainerStop
will wait for a graceful termination.

but since commit cfdf84d (PR #32237) this is no longer the case.
This happens because context.WithTimeout(ctx, timeout) is implemented
as WithDeadline(ctx, time.Now().Add(timeout)), resulting in a deadline
which is in the past.

To fix, don't use WithDeadline() if the timeout is negative.

A test case is added to validate the correct behavior and as a means
to prevent a similar regression in the future.

This is a replacement of #35418, and should fix #35311.

@kolyshkin
Copy link
Contributor Author

Created for the sake of CI, will add more later.

@kolyshkin kolyshkin requested a review from dnephin as a code owner April 17, 2018 03:07
@kolyshkin kolyshkin changed the title [WIP] StopContainer(): fix for a negative timeout [WIP] daemon.ContainerStop(): fix for a negative timeout Apr 17, 2018
@codecov
Copy link

codecov bot commented Apr 17, 2018

Codecov Report

❗ No coverage uploaded for pull request base (master@185ae7e). Click here to learn what that means.
The diff coverage is 0%.

@@            Coverage Diff            @@
##             master   #36874   +/-   ##
=========================================
  Coverage          ?   34.97%           
=========================================
  Files             ?      614           
  Lines             ?    45641           
  Branches          ?        0           
=========================================
  Hits              ?    15965           
  Misses            ?    27587           
  Partials          ?     2089

@kolyshkin
Copy link
Contributor Author

CI failure on janky (logs here) is a flaky TestAPIServiceUpdatePort (#36501).

CI failure on z (logs here) is a flaky test TestAPISwarmNodeDrainPause (#23516).

CI failure on Windows is weird, the build just timed out without running any tests (logs here); restarted.

// a timeout works as documented, i.e. in case of negative timeout
// waiting is not limited (issue #35311).
func TestStopContainerWithTimeout(t *testing.T) {
t.Parallel()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes for a flakey test whennused with the main integration daemon.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please elaborate why?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because it is running in the main daemon.
The test setup takes a snapshot of the current state and then at the end of the test cleans up anything that was not in the original state. But if there are multiple tests doing this cleanup they will conflict with each other, clean up stuff they shouldn't be, etc.

}

for _, d := range testData {
name := container.WithName(t.Name() + "_" + strconv.Itoa(d.timeout))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use "t.Run" so the different cases run as separate tests. This makes it easiee to know which case failed (if there is a failure)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, thanks for suggestion

The unit test is checking that setting of non-default StopTimeout
works, but it checked the value of StopSignal instead.

Amazingly, the test was working since the default StopSignal is SIGTERM,
which has the numeric value of 15.

Fixes: commit e66d210 ("Add config parameter to change ...")
Signed-off-by: Kir Kolyshkin <[email protected]>
@kolyshkin
Copy link
Contributor Author

OK this is no longer WIP. CI failure in experimental is flaky test DockerSwarmSuite.TestAPIServiceUpdatePort (#36501).

@kolyshkin
Copy link
Contributor Author

@cpuguy83 can you PTAL? All your comments were acted upon

d := d
name := t.Name() + "_" + strconv.Itoa(d.timeout)
t.Run(name, func(t *testing.T) {
t.Parallel()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think each one of these is treated as a separate test, so they all get grouped together at the end of the test run with the rest of the parallel tests. So I'm not sure this is any safer than before.

Copy link
Contributor Author

@kolyshkin kolyshkin May 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so. Surely I might be wrong, but according to the golang pkg/testing documentation, such tests are only run in parallel with each other:

Subtests can also be used to control parallelism. A parent test will only complete once all of its subtests complete. In this example, all tests are run in parallel with each other, and only with each other, regardless of other top-level tests that may be defined:
func TestGroupedParallel(t *testing.T) {
    for _, tc := range tests {
        tc := tc // capture range variable
        t.Run(tc.Name, func(t *testing.T) {
            t.Parallel()
            ...
        })
    }
}

Also, I got the idea from similar tests like this one:

for _, tc := range testCases {
tc := tc
t.Run(tc.doc, func(t *testing.T) {
t.Parallel()
_, err := client.ContainerCreate(context.Background(),
&container.Config{Image: tc.image},

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cpuguy83 PTAL ^^^

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, this does seem to work.

1. As daemon.ContainerStop() documentation says,

> If a negative number of seconds is given, ContainerStop
> will wait for a graceful termination.

but since commit cfdf84d (PR moby#32237) this is no longer the case.

This happens because `context.WithTimeout(ctx, timeout)` is implemented
as `WithDeadline(ctx, time.Now().Add(timeout))`, resulting in a deadline
which is in the past.

To fix, don't use WithDeadline() if the timeout is negative.

2. Add a test case to validate the correct behavior and
as a means to prevent a similar regression in the future.

3. Fix/improve daemon.ContainerStop() and client.ContainerStop()
description for clarity and completeness.

4. Fix/improve DefaultStopTimeout description.

Fixes: cfdf84d ("Update Container Wait")
Signed-off-by: Kir Kolyshkin <[email protected]>
@kolyshkin
Copy link
Contributor Author

Minor change:

  • don't add test name to subtest name as it's redundant
  • don't set container name
-               name := t.Name() + "_" + strconv.Itoa(d.timeout)
-               t.Run(name, func(t *testing.T) {
+               t.Run(strconv.Itoa(d.timeout), func(t *testing.T) {
                        t.Parallel()
-                       id := container.Run(t, ctx, client, testCmd, container.WithName(name))
+                       id := container.Run(t, ctx, client, testCmd)

Copy link
Member

@cpuguy83 cpuguy83 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@kolyshkin
Copy link
Contributor Author

CI failure in TestAPISwarmLeaderElection is #32673

@kolyshkin kolyshkin changed the title [WIP] daemon.ContainerStop(): fix for a negative timeout daemon.ContainerStop(): fix for a negative timeout May 10, 2018
@kolyshkin
Copy link
Contributor Author

not sure why I had it as [WIP] -- it's definitely not )

Copy link
Member

@vdemeester vdemeester left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🐯

@cpuguy83
Copy link
Member

16 days since last CI run... but this is probably fine.
YOLO merge.

@cpuguy83 cpuguy83 merged commit b85799b into moby:master May 30, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

docker restart -t with negative input time return different results after change the backend
5 participants