Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't run tests multiple times using "go test -count N" when finding flaky tests #4183

Closed
invidian opened this issue Feb 11, 2021 · 13 comments
Closed
Labels
help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/bug Categorizes issue or PR as related to a bug. priority/backlog Higher priority than priority/awaiting-more-evidence.

Comments

@invidian
Copy link
Member

What steps did you take and what happened:
It seems make test TEST_ARGS='-count 2' always fails for me when running on bfc1498

What did you expect to happen:
make test TEST_ARGS='-count 2' should always pass.

Anything else you would like to add:
I was getting at least 3 different errors when trying that, so I'm not sure if it make sense to post them here.

Environment:

  • Cluster-api version: bfc1498
  • Minikube/KIND version:
  • Kubernetes version: (use kubectl version):
  • OS (e.g. from /etc/os-release):

/kind bug
[One or more /area label. See https://github.com/kubernetes-sigs/cluster-api/labels?q=area for the list of labels]

@k8s-ci-robot k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Feb 11, 2021
@vincepri
Copy link
Member

Tried this locally, and I get a single error:

• Failure in Spec Setup (BeforeEach) [0.000 seconds]
Kubeadm Control Plane Controller
/Users/vince/go/src/sigs.k8s.io/cluster-api/controlplane/kubeadm/controllers/controller_test.go:55
  Reconcile a KubeadmControlPlane [BeforeEach]
  /Users/vince/go/src/sigs.k8s.io/cluster-api/controlplane/kubeadm/controllers/controller_test.go:59
    should return error if owner cluster is missing
    /Users/vince/go/src/sigs.k8s.io/cluster-api/controlplane/kubeadm/controllers/controller_test.go:60

    You may only call BeforeEach from within a Describe, Context or When

    /Users/vince/go/src/sigs.k8s.io/cluster-api/controlplane/kubeadm/controllers/controller_test.go:56
------------------------------
E0211 10:02:12.149519   25684 controller.go:123]  "msg"="Failed to retrieve owner Cluster from the API Server" "error"="Cluster.cluster.x-k8s.io \"kcp-foo-x6v3kj\" not found"  
•


Summarizing 1 Failure:

[Fail] Kubeadm Control Plane Controller [BeforeEach] Reconcile a KubeadmControlPlane should return error if owner cluster is missing 
/Users/vince/go/src/sigs.k8s.io/cluster-api/controlplane/kubeadm/controllers/controller_test.go:56

Ran 2 of 2 Specs in 0.146 seconds
FAIL! -- 1 Passed | 1 Failed | 0 Pending | 0 Skipped

Did you get anything else?

@invidian
Copy link
Member Author

IIUC, when you scroll up, I can also find:

ok      sigs.k8s.io/cluster-api/cmd/clusterctl/client/cluster   4.018s
--- FAIL: Test_viperReader_Get (0.00s)
    --- FAIL: Test_viperReader_Get/Read_from_env (0.00s)
        reader_viper_test.go:183:
            Expected
                <string>: bar
            to equal
                <string>: foo
FAIL
FAIL    sigs.k8s.io/cluster-api/cmd/clusterctl/client/config    0.048s

And:


goroutine 5909 [chan receive]:
sigs.k8s.io/controller-runtime/pkg/cache/internal.(*specificInformersMap).Start(0xc001094820, 0x1d59860, 0xc000d3ea80)
        /home/invidian/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/cache/internal/informers_map.go:144 +0x6a
created by sigs.k8s.io/controller-runtime/pkg/cache/internal.(*InformersMap).Start
        /home/invidian/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/cache/internal/deleg_map.go:67 +0xce

goroutine 5602 [select]:
k8s.io/client-go/tools/cache.(*Reflector).ListAndWatch.func2(0xc000d548c0, 0xc000d39bc0, 0xc0010804e0, 0xc00077f980)
        /home/invidian/go/pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:373 +0x16f
created by k8s.io/client-go/tools/cache.(*Reflector).ListAndWatch
        /home/invidian/go/pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:367 +0x2c5

goroutine 5603 [select]:
net/http.setRequestCancel.func4(0x0, 0xc000888ed0, 0xc000e1aeb0, 0xc000bb27ac, 0xc0010805a0)
        /usr/lib/go/src/net/http/client.go:398 +0xe5
created by net/http.setRequestCancel
        /usr/lib/go/src/net/http/client.go:397 +0x337
FAIL    sigs.k8s.io/cluster-api/controllers     20.533s
ok      sigs.k8s.io/cluster-api/controllers/external    0.057s
ok      sigs.k8s.io/cluster-api/controllers/mdutil      0.059s
ok      sigs.k8s.io/cluster-api/controllers/noderefutil 0.012s
Running Suite: Remote Controller Suite
======================================
Random Seed: 1613066750
Will run 5 of 5 specs

Perhaps running it per package would be easier to trace them down.

@vincepri
Copy link
Member

Found some of them, (didn't get the stack trace / panic). The main issue is that these are not only unit tests, but also integration tests which spin up a local Kubernetes api-server and etcd. There are parts of the codebase that aren't using random names, and therefore there are going to be conflict. Given that controllers are eventually consistent (backed by a cache), this might happen more often than not. Not sure if we can nail down all of the cases, but we can try to make things a little bit more resilient.

@fabriziopandini
Copy link
Member

/milestone v0.4.0
/help
Let keep this in the radar for now, we can eventually punt if there is no bandwidth

@k8s-ci-robot
Copy link
Contributor

@fabriziopandini:
This request has been marked as needing help from a contributor.

Please ensure the request meets the requirements listed here.

If this request no longer meets these requirements, the label can be removed
by commenting with the /remove-help command.

In response to this:

/milestone v0.4.0
/help
Let keep this in the radar for now, we can eventually punt if there is no bandwidth

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added this to the v0.4.0 milestone Mar 5, 2021
@k8s-ci-robot k8s-ci-robot added the help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. label Mar 5, 2021
@fabriziopandini
Copy link
Member

/priority backlog

@k8s-ci-robot k8s-ci-robot added the priority/backlog Higher priority than priority/awaiting-more-evidence. label Mar 10, 2021
@vincepri
Copy link
Member

/milestone Next

@k8s-ci-robot k8s-ci-robot modified the milestones: v0.4.0, Next Mar 13, 2021
@mboersma
Copy link
Contributor

mboersma commented May 6, 2021

Supporting go test -count <n> seems infeasible given that so many of the tests are based on Ginkgo, and
-count isn't supported by Ginkgo.

I have a branch where I've tried to hack around this by ensuring namespace and cluster names don't collide, but that's not sufficient. AFAICT each It clause is taken as a unit to run n number of times, but BeforeEach and friends are not called as expected for each iteration. This leads to on-again, off-again bugs like You may only call It from within a Describe, Context or When where the framework thinks a test run isn't nested correctly.

While it's suggested that the ginkgo author might implement -count, the proposal was to do so within the ginkgo CLI, similar to how -untilItFails works. But neither of those helps when running tests via go test as CAPI does now.

It seems unrealistic either to abandon Ginkgo entirely or to expect -count to work in future versions (including Ginkgo v2.0 which is imminent). Instead Ginkgo plans to error out with an informative message when -count is provided. I think we should document that -count doesn't work and close this issue.

@vincepri
Copy link
Member

vincepri commented May 6, 2021

@mboersma We're actually in the process (almost there!) to move away from Ginkgo in favor of using just plain go tests + envtest

@mboersma
Copy link
Contributor

mboersma commented May 6, 2021

move away from Ginkgo in favor of using just plain go tests + envtest

Awesome--I think this will be a non-problem once we have moved away from Ginkgo.

@fabriziopandini
Copy link
Member

/close
Given that there is agreement that using go test instead of ginkgo will address this issue.
Opened #4588 to track this effort

@k8s-ci-robot
Copy link
Contributor

@fabriziopandini: Closing this issue.

In response to this:

/close
Given that there is agreement that using go test instead of ginkgo will address this issue.
Opened #4588 to track this effort

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@jonyhy96
Copy link

@mboersma We're actually in the process (almost there!) to move away from Ginkgo in favor of using just plain go tests + envtest

Hi! if there is any plan to moved away from Ginkgo for k/k?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/bug Categorizes issue or PR as related to a bug. priority/backlog Higher priority than priority/awaiting-more-evidence.
Projects
None yet
Development

No branches or pull requests

6 participants