Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix inventory controller tests after adding additional ping #49663

Merged
merged 2 commits into from
Dec 3, 2024

Conversation

vapopov
Copy link
Contributor

@vapopov vapopov commented Dec 2, 2024

Fix for the inventory controller tests identified by flaky tests detector https://github.com/gravitational/teleport.e/actions/runs/12116931575/job/33778324175

After adding additional ping for time reconciliation, many tests which don't have downstream consumer, or consumers expecting only one ping request start failing

Tested with

inventory % go test . -v -count 500 -race -timeout 0 -failfast
...
ok  	github.com/gravitational/teleport/lib/inventory	1534.162s

@vapopov vapopov added the no-changelog Indicates that a PR does not require a changelog entry label Dec 2, 2024
Copy link

This pull request is automatically being deployed by Amplify Hosting (learn more).

Access this pull request here: https://pr-49663.d3pp5qlev8mo18.amplifyapp.com

@rosstimothy
Copy link
Contributor

Seems like the panic reported in the nightly run is still present on this branch.

$ go test -c ./lib/inventory
$ docker run --cpus="0.15" -v.:/teleport golang:1.22.4-bookworm /teleport/inventory.test -test.count=1000 -test.timeout=60m -test.failfast -test.v -test.run "TestTimeReconciliation"
--- PASS: TestTimeReconciliation (0.18s)
=== RUN   TestTimeReconciliation
--- PASS: TestTimeReconciliation (0.07s)
=== RUN   TestTimeReconciliation
--- PASS: TestTimeReconciliation (0.17s)
=== RUN   TestTimeReconciliation
--- PASS: TestTimeReconciliation (0.12s)
=== RUN   TestTimeReconciliation
--- PASS: TestTimeReconciliation (0.15s)
=== RUN   TestTimeReconciliation
--- PASS: TestTimeReconciliation (0.02s)
=== RUN   TestTimeReconciliation
--- PASS: TestTimeReconciliation (0.12s)
=== RUN   TestTimeReconciliation
--- PASS: TestTimeReconciliation (0.05s)
=== RUN   TestTimeReconciliation
--- PASS: TestTimeReconciliation (0.08s)
=== RUN   TestTimeReconciliation
--- PASS: TestTimeReconciliation (0.02s)
=== RUN   TestTimeReconciliation
--- PASS: TestTimeReconciliation (0.04s)
=== RUN   TestTimeReconciliation
--- FAIL: TestTimeReconciliation (0.00s)
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
        panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x1f34f24]

goroutine 242 [running]:
testing.tRunner.func1.2({0x2137280, 0x3ef1720})
        /home/tim.linux/go/pkg/mod/golang.org/[email protected]/src/testing/testing.go:1632 +0x1bc
testing.tRunner.func1()
        /home/tim.linux/go/pkg/mod/golang.org/[email protected]/src/testing/testing.go:1635 +0x334
panic({0x2137280?, 0x3ef1720?})
        /home/tim.linux/go/pkg/mod/golang.org/[email protected]/src/runtime/panic.go:785 +0x124
github.com/gravitational/teleport/lib/inventory.TestTimeReconciliation(0x4000bc61a0)
        /Users/tim/src/teleport/lib/inventory/controller_test.go:1543 +0x564
testing.tRunner(0x4000bc61a0, 0x267e218)
        /home/tim.linux/go/pkg/mod/golang.org/[email protected]/src/testing/testing.go:1690 +0xe4
created by testing.(*T).Run in goroutine 1
        /home/tim.linux/go/pkg/mod/golang.org/[email protected]/src/testing/testing.go:1743 +0x314

Remove unused code
@vapopov
Copy link
Contributor Author

vapopov commented Dec 3, 2024

@rosstimothy fixed, interesting that I wasn't able to reproduce in darwin platform

root@3e255c82fc04:/go/teleport/lib/inventory# go test . -v -count 1000 -race -timeout 0 -failfast -run "TestTimeReconciliation"
PASS
ok  	github.com/gravitational/teleport/lib/inventory	110.439s

Took a while to debug, there were two issues, one was related to test only in this line: https://github.com/gravitational/teleport/pull/49663/files#diff-0559b6dbb369be5e2e76ed5a7248e6d882b7229d9a6eb4128b350ddddc6d207eL1491

Since we have variable timer, and when value of the first duration is very small like 2ms or less might happen that heartbeat executed before ping request and this return actually stops receiving messages by downstream, as result next ping request blocks inventory controller stream.

Second one test event signals about time reconciliation is done, but we only send ping request and didn't wait response

@rosstimothy
Copy link
Contributor

@rosstimothy fixed, interesting that I wasn't able to reproduce in darwin platform

root@3e255c82fc04:/go/teleport/lib/inventory# go test . -v -count 1000 -race -timeout 0 -failfast -run "TestTimeReconciliation"
PASS
ok  	github.com/gravitational/teleport/lib/inventory	110.439s

Your machine is likely too fast/reliable to reproduce by invoking go test directly. The reason I tested with docker run --cpus="0.15" is to try to replicate our very constrained CI environment and it has proven to be a very reliable way to reproduce flaky tests locally. I'll pull the branch and test it out again.

@rosstimothy
Copy link
Contributor

rosstimothy commented Dec 3, 2024

It took a few more iterations, but the panic is still present on the latest commit.

$ git rev-parse HEAD
4a723251432abb3ee66e68d5c878353e4620cf32

$ docker run --cpus="0.15" -v.:/teleport golang:1.22.4-bookworm /teleport/inventory.test -test.count=1000 -test.timeout=60m -test.failfast -test.v -test.run "TestTimeReconciliation"
...
--- PASS: TestTimeReconciliation (0.17s)
=== RUN   TestTimeReconciliation
--- PASS: TestTimeReconciliation (0.14s)
=== RUN   TestTimeReconciliation
--- PASS: TestTimeReconciliation (0.19s)
=== RUN   TestTimeReconciliation
--- PASS: TestTimeReconciliation (0.03s)
=== RUN   TestTimeReconciliation
--- FAIL: TestTimeReconciliation (0.00s)
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
        panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x1f3bf34]

goroutine 1189 [running]:
testing.tRunner.func1.2({0x213e960, 0x3f01720})
        /home/tim.linux/go/pkg/mod/golang.org/[email protected]/src/testing/testing.go:1632 +0x1bc
testing.tRunner.func1()
        /home/tim.linux/go/pkg/mod/golang.org/[email protected]/src/testing/testing.go:1635 +0x334
panic({0x213e960?, 0x3f01720?})
        /home/tim.linux/go/pkg/mod/golang.org/[email protected]/src/runtime/panic.go:785 +0x124
github.com/gravitational/teleport/lib/inventory.TestTimeReconciliation(0x4000d11a00)
        /Users/tim/src/teleport/lib/inventory/controller_test.go:1543 +0x564
testing.tRunner(0x4000d11a00, 0x26865d8)
        /home/tim.linux/go/pkg/mod/golang.org/[email protected]/src/testing/testing.go:1690 +0xe4
created by testing.(*T).Run in goroutine 1
        /home/tim.linux/go/pkg/mod/golang.org/[email protected]/src/testing/testing.go:1743 +0x314

@vapopov
Copy link
Contributor Author

vapopov commented Dec 3, 2024

@rosstimothy could you please ensure that it was last commit, because it more looks like error of previous version, current line 1543

@vapopov
Copy link
Contributor Author

vapopov commented Dec 3, 2024

Results from linux container for all inventory controller tests

root@3e255c82fc04:/go/teleport/lib/inventory# go test . -v -count 1000 -race -timeout 0 -failfast
ok  	github.com/gravitational/teleport/lib/inventory	3078.046s

@rosstimothy
Copy link
Contributor

@rosstimothy could you please ensure that it was last commit, because it more looks like error of previous version, current line 1543

Hrm I did verify I was on the latest commit, see the output from git rev-parse HEAD. Perhaps the go test invocation failed for some reason and I didn't notice. Let me try again.

Copy link
Contributor

@rosstimothy rosstimothy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After rebuilding the test binary I've not seen a failure in over an hour.

@vapopov vapopov added this pull request to the merge queue Dec 3, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Dec 3, 2024
@vapopov vapopov added this pull request to the merge queue Dec 3, 2024
Merged via the queue into master with commit bbfeabf Dec 3, 2024
41 checks passed
@vapopov vapopov deleted the vapopov/fix-invetory-controller-tests branch December 3, 2024 22:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
no-changelog Indicates that a PR does not require a changelog entry size/sm
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants