Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix flaky tests #2808

Merged
merged 1 commit into from
Jun 20, 2019
Merged

Fix flaky tests #2808

merged 1 commit into from
Jun 20, 2019

Conversation

dperny
Copy link
Collaborator

@dperny dperny commented Jan 16, 2019

It is likely that a large portion of test flakiness, especially in CI, comes from the fact that swarmkit components under test are started in goroutines, but those goroutines never have an opportunity to run. This adds code ensuring those goroutines are scheduled and run, which should hopefully solve many inexplicably flaky tests.

@dperny dperny force-pushed the fix-flaky-tests branch 2 times, most recently from ecd71e3 to 30e7367 Compare January 16, 2019 19:37
@codecov
Copy link

codecov bot commented Jan 16, 2019

Codecov Report

Merging #2808 into master will decrease coverage by 0.07%.
The diff coverage is 0%.

@@            Coverage Diff             @@
##           master    #2808      +/-   ##
==========================================
- Coverage   62.24%   62.16%   -0.08%     
==========================================
  Files         139      139              
  Lines       22339    22342       +3     
==========================================
- Hits        13905    13889      -16     
- Misses       6955     6977      +22     
+ Partials     1479     1476       -3

@dperny
Copy link
Collaborator Author

dperny commented Jan 16, 2019

CI came up green on the first run after I fixed the obvious issues. Running it a few more times, then we can merge this.

@dperny dperny changed the title [WIP] Fix flaky tests Fix flaky tests Jan 16, 2019
@dperny
Copy link
Collaborator Author

dperny commented Jan 16, 2019

Update tests are still failing. I believe this is due to the same underlying issue of goroutines not running, but it's difficult to say

@dperny
Copy link
Collaborator Author

dperny commented Jan 17, 2019

I increased some timeout, and now I'm just hitting "rebuild" until this shows no sign of failing.

@dperny
Copy link
Collaborator Author

dperny commented Jan 17, 2019

I'm on 3 consecutive test runs with no failures. I think increasing the timeout has completed the fix.

@dperny
Copy link
Collaborator Author

dperny commented Jan 17, 2019

Nope, there it goes. Failed again.

@@ -22,7 +38,7 @@ func WatchTaskCreate(t *testing.T, watch chan events.Event) *api.Task {
if _, ok := event.(api.EventUpdateTask); ok {
assert.FailNow(t, "got EventUpdateTask when expecting EventCreateTask", fmt.Sprint(event))
}
case <-time.After(2 * time.Second):
case <-time.After(3 * time.Second):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whats the rational behind increasing this by 50% here and 100% below?

}()

<-started
return stopped
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whats the use of the returned channel? It doesn't seem to be used in any of the calls

It is likely that a large portion of test flakiness, especially in CI,
comes from the fact that swarmkit components under test are started in
goroutines, but those goroutines never have an opportunity to run. This
adds code ensuring those goroutines are scheduled and run, which should
hopefully solve many inexplicably flaky tests.

Additionally, increased test timeouts, to hopefully cover a few more
flaky cases.

Finally, removed direct use of the atomic package, in favor of less
efficient but higher-level mutexes.

Signed-off-by: Drew Erny <[email protected]>
@dperny
Copy link
Collaborator Author

dperny commented Jun 20, 2019

Tests now passing several runs in a row. Gonna merge this.

@dperny dperny merged commit 3df33bc into moby:master Jun 20, 2019
@thaJeztah
Copy link
Member

should we cherry pick this into the release branches?

@dperny
Copy link
Collaborator Author

dperny commented Jun 20, 2019

@thaJeztah maybe, but it's no rush to do so.

thaJeztah added a commit to thaJeztah/docker that referenced this pull request Sep 12, 2019
…3 branch)

full diff: moby/swarmkit@4fb9e96...bbe3418

changes included:

- moby/swarmkit#2889 [19.03 backport] Fix update out of sequence and increase max recv gRPC message size for nodes and secrets

Which relates to

- moby#39531 integration-cli: fix swarm tests flakiness
- docker-archive#345 [19.03 backport] integration-cli: fix swarm tests flakiness

And includes backports of

- moby/swarmkit#2808 Fix flaky tests
- moby/swarmkit#2866 Swap gometalinter for golangci-lint
- moby/swarmkit#2869 Increase max recv gRPC message size to initialize connection broker
 - related / similar to moby#38103 / docker-archive#102 cluster: set bigger grpc limit for array requests
 - related / similar to moby#39306 Increase max recv gRPC message size for nodes and secrets
 - fixes moby/swarmkit#2733 Error generated when messages size is too big
- moby/swarmkit#2870 Fix update out of sequence

Signed-off-by: Sebastiaan van Stijn <[email protected]>
docker-jenkins pushed a commit to docker-archive/docker-ce that referenced this pull request Sep 12, 2019
…3 branch)

full diff: moby/swarmkit@4fb9e96...bbe3418

changes included:

- moby/swarmkit#2889 [19.03 backport] Fix update out of sequence and increase max recv gRPC message size for nodes and secrets

Which relates to

- moby/moby#39531 integration-cli: fix swarm tests flakiness
- docker-archive/engine#345 [19.03 backport] integration-cli: fix swarm tests flakiness

And includes backports of

- moby/swarmkit#2808 Fix flaky tests
- moby/swarmkit#2866 Swap gometalinter for golangci-lint
- moby/swarmkit#2869 Increase max recv gRPC message size to initialize connection broker
 - related / similar to moby/moby#38103 / docker-archive/engine#102 cluster: set bigger grpc limit for array requests
 - related / similar to moby/moby#39306 Increase max recv gRPC message size for nodes and secrets
 - fixes moby/swarmkit#2733 Error generated when messages size is too big
- moby/swarmkit#2870 Fix update out of sequence

Signed-off-by: Sebastiaan van Stijn <[email protected]>
Upstream-commit: f7dbee3eeaa1dd218116f85b8f60361acbd5b214
Component: engine
thaJeztah added a commit to thaJeztah/docker that referenced this pull request Oct 9, 2019
…v18.09)

full diff: moby/swarmkit@142a737...5c86095

- moby/swarmkit#2892 [18.09 backport] Remove hardcoded IPAM config subnet value for ingress network
    - backport of moby/swarmkit#2890 Remove hardcoded IPAM config subnet value for ingress network
    - fixes [ENGORC-2651](https://docker.atlassian.net/browse/ENGORC-2651)
- moby/swarmkit#2836 [18.09 backport] Switch to go 1.11
    - backport of moby/swarmkit#2752 Switch to go 1.11
- moby/swarmkit#2901 [18.09 backport] Bump to golang 1.12.9
    - backport of moby/swarmkit#2880 Bump to golang 1.12.9
- moby/swarmkit#2900 [18.09 backport] Fix update out of sequence and increase max recv gRPC message size for nodes and secrets
    - backport of moby/swarmkit#2762 Increased wait time on test utils WaitForCluster and WatchTaskCreate
    - backport of moby/swarmkit#2771 Allow using Configs as CredentialSpecs
        - **second commit only** (attempt to fix weirdly broken tests)
    - backport of moby/swarmkit#2808 Fix flaky tests
    - backport of moby/swarmkit#2866 Swap gometalinter for golangci-lint
    - backport of moby/swarmkit#2869 Increase max recv gRPC message size to initialize connection broker
        - related / similar to moby#38103 / docker-archive#102 cluster: set bigger grpc limit for array requests
        - related / similar to moby#39306 Increase max recv gRPC message size for nodes and secrets
        - fixes moby/swarmkit#2733 Error generated when messages size is too big
    - backport of moby/swarmkit#2870 Fix update out of sequence

Signed-off-by: Sebastiaan van Stijn <[email protected]>
docker-jenkins pushed a commit to docker-archive/docker-ce that referenced this pull request Oct 23, 2019
…v18.09)

full diff: moby/swarmkit@142a737...5c86095

- moby/swarmkit#2892 [18.09 backport] Remove hardcoded IPAM config subnet value for ingress network
    - backport of moby/swarmkit#2890 Remove hardcoded IPAM config subnet value for ingress network
    - fixes [ENGORC-2651](https://docker.atlassian.net/browse/ENGORC-2651)
- moby/swarmkit#2836 [18.09 backport] Switch to go 1.11
    - backport of moby/swarmkit#2752 Switch to go 1.11
- moby/swarmkit#2901 [18.09 backport] Bump to golang 1.12.9
    - backport of moby/swarmkit#2880 Bump to golang 1.12.9
- moby/swarmkit#2900 [18.09 backport] Fix update out of sequence and increase max recv gRPC message size for nodes and secrets
    - backport of moby/swarmkit#2762 Increased wait time on test utils WaitForCluster and WatchTaskCreate
    - backport of moby/swarmkit#2771 Allow using Configs as CredentialSpecs
        - **second commit only** (attempt to fix weirdly broken tests)
    - backport of moby/swarmkit#2808 Fix flaky tests
    - backport of moby/swarmkit#2866 Swap gometalinter for golangci-lint
    - backport of moby/swarmkit#2869 Increase max recv gRPC message size to initialize connection broker
        - related / similar to moby/moby#38103 / docker-archive/engine#102 cluster: set bigger grpc limit for array requests
        - related / similar to moby/moby#39306 Increase max recv gRPC message size for nodes and secrets
        - fixes moby/swarmkit#2733 Error generated when messages size is too big
    - backport of moby/swarmkit#2870 Fix update out of sequence

Signed-off-by: Sebastiaan van Stijn <[email protected]>
Upstream-commit: e06f07ef337ab890f211397d6b408b75a2512dc5
Component: engine
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants