Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unit tests flaking in CI #1559

Closed
punya opened this issue Feb 19, 2021 · 6 comments · Fixed by #1551 or #1569
Closed

Unit tests flaking in CI #1559

punya opened this issue Feb 19, 2021 · 6 comments · Fixed by #1551 or #1569
Milestone

Comments

@punya
Copy link
Member

punya commented Feb 19, 2021

While working on #1551, I ran into several unit tests that flaked in CI:

https://github.com/open-telemetry/opentelemetry-go/runs/1930834290
(Go 1.14 only)

--- FAIL: TestExactMerge (0.02s)
    --- FAIL: TestExactMerge/Odd=true (0.01s)
        --- FAIL: TestExactMerge/Odd=true/Absolute=false (0.01s)
            --- FAIL: TestExactMerge/Odd=true/Absolute=false/Float64Kind (0.00s)
                exact_test.go:162: 
                    	Error Trace:	exact_test.go:162
                    	            				test.go:75
                    	Error:      	Should be true
                    	Test:       	TestExactMerge/Odd=true/Absolute=false/Float64Kind
        --- FAIL: TestExactMerge/Odd=true/Absolute=true (0.00s)
            --- FAIL: TestExactMerge/Odd=true/Absolute=true/Float64Kind (0.00s)
                exact_test.go:162: 
                    	Error Trace:	exact_test.go:162
                    	            				test.go:75
                    	Error:      	Should be true
                    	Test:       	TestExactMerge/Odd=true/Absolute=true/Float64Kind
    --- FAIL: TestExactMerge/Odd=false (0.01s)
        --- FAIL: TestExactMerge/Odd=false/Absolute=false (0.01s)
            --- FAIL: TestExactMerge/Odd=false/Absolute=false/Float64Kind (0.00s)
                exact_test.go:162: 
                    	Error Trace:	exact_test.go:162
                    	            				test.go:75
                    	Error:      	Should be true
                    	Test:       	TestExactMerge/Odd=false/Absolute=false/Float64Kind
        --- FAIL: TestExactMerge/Odd=false/Absolute=true (0.00s)
            --- FAIL: TestExactMerge/Odd=false/Absolute=true/Float64Kind (0.00s)
                exact_test.go:162: 
                    	Error Trace:	exact_test.go:162
                    	            				test.go:75
                    	Error:      	Should be true
                    	Test:       	TestExactMerge/Odd=false/Absolute=true/Float64Kind
--- FAIL: TestExactFloat64 (0.00s)
    exact_test.go:285: 
        	Error Trace:	exact_test.go:285
        	Error:      	Should be true
        	Test:       	TestExactFloat64
FAIL
FAIL	go.opentelemetry.io/otel/sdk/metric/aggregator/exact	0.035s

https://github.com/open-telemetry/opentelemetry-go/runs/1931724468
(Go 1.15 only)

--- FAIL: TestExactMerge (0.03s)
    --- FAIL: TestExactMerge/Odd=false (0.01s)
        --- FAIL: TestExactMerge/Odd=false/Absolute=false (0.01s)
            --- FAIL: TestExactMerge/Odd=false/Absolute=false/Float64Kind (0.00s)
                exact_test.go:162: 
                    	Error Trace:	exact_test.go:162
                    	            				test.go:75
                    	Error:      	Should be true
                    	Test:       	TestExactMerge/Odd=false/Absolute=false/Float64Kind
--- FAIL: TestExactFloat64 (0.00s)
    exact_test.go:285: 
        	Error Trace:	exact_test.go:285
        	Error:      	Should be true
        	Test:       	TestExactFloat64
FAIL
FAIL	go.opentelemetry.io/otel/sdk/metric/aggregator/exact	0.052s

https://github.com/open-telemetry/opentelemetry-go/runs/1931857269
(Go 1.14 only)

--- FAIL: TestExportTimeout (0.01s)
    controller_test.go:313: 
        	Error Trace:	controller_test.go:313
        	Error:      	An error is expected but got nil.
        	Test:       	TestExportTimeout
--- FAIL: TestPushExportError (0.01s)
    --- FAIL: TestPushExportError/errNone (0.00s)
        push_test.go:212: 
            	Error Trace:	push_test.go:212
            	Error:      	Expected nil, but got: context.deadlineExceededError{}
            	Test:       	TestPushExportError/errNone
FAIL
coverage: 50.9% of statements in ./...
FAIL	go.opentelemetry.io/otel/sdk/metric/controller/basic	0.093s

I wasn't able to reproduce any of these locally, even when I ran the tests 10000 times.

@punya
Copy link
Member Author

punya commented Feb 19, 2021

Another one: https://github.com/open-telemetry/opentelemetry-go/pull/1551/checks?check_run_id=1936404855 (Go 1.15 only)

--- FAIL: TestExactMerge (0.01s)
    --- FAIL: TestExactMerge/Odd=true (0.00s)
        --- FAIL: TestExactMerge/Odd=true/Absolute=false (0.00s)
            --- FAIL: TestExactMerge/Odd=true/Absolute=false/Int64Kind (0.00s)
                exact_test.go:162: 
                    	Error Trace:	exact_test.go:162
                    	            				test.go:75
                    	Error:      	Should be true
                    	Test:       	TestExactMerge/Odd=true/Absolute=false/Int64Kind
            --- FAIL: TestExactMerge/Odd=true/Absolute=false/Float64Kind (0.00s)
                exact_test.go:162: 
                    	Error Trace:	exact_test.go:162
                    	            				test.go:75
                    	Error:      	Should be true
                    	Test:       	TestExactMerge/Odd=true/Absolute=false/Float64Kind
        --- FAIL: TestExactMerge/Odd=true/Absolute=true (0.00s)
            --- FAIL: TestExactMerge/Odd=true/Absolute=true/Int64Kind (0.00s)
                exact_test.go:162: 
                    	Error Trace:	exact_test.go:162
                    	            				test.go:75
                    	Error:      	Should be true
                    	Test:       	TestExactMerge/Odd=true/Absolute=true/Int64Kind
            --- FAIL: TestExactMerge/Odd=true/Absolute=true/Float64Kind (0.00s)
                exact_test.go:162: 
                    	Error Trace:	exact_test.go:162
                    	            				test.go:75
                    	Error:      	Should be true
                    	Test:       	TestExactMerge/Odd=true/Absolute=true/Float64Kind
    --- FAIL: TestExactMerge/Odd=false (0.00s)
        --- FAIL: TestExactMerge/Odd=false/Absolute=false (0.00s)
            --- FAIL: TestExactMerge/Odd=false/Absolute=false/Int64Kind (0.00s)
                exact_test.go:162: 
                    	Error Trace:	exact_test.go:162
                    	            				test.go:75
                    	Error:      	Should be true
                    	Test:       	TestExactMerge/Odd=false/Absolute=false/Int64Kind
            --- FAIL: TestExactMerge/Odd=false/Absolute=false/Float64Kind (0.00s)
                exact_test.go:162: 
                    	Error Trace:	exact_test.go:162
                    	            				test.go:75
                    	Error:      	Should be true
                    	Test:       	TestExactMerge/Odd=false/Absolute=false/Float64Kind
        --- FAIL: TestExactMerge/Odd=false/Absolute=true (0.00s)
            --- FAIL: TestExactMerge/Odd=false/Absolute=true/Int64Kind (0.00s)
                exact_test.go:162: 
                    	Error Trace:	exact_test.go:162
                    	            				test.go:75
                    	Error:      	Should be true
                    	Test:       	TestExactMerge/Odd=false/Absolute=true/Int64Kind
            --- FAIL: TestExactMerge/Odd=false/Absolute=true/Float64Kind (0.00s)
                exact_test.go:162: 
                    	Error Trace:	exact_test.go:162
                    	            				test.go:75
                    	Error:      	Should be true
                    	Test:       	TestExactMerge/Odd=false/Absolute=true/Float64Kind
--- FAIL: TestExactFloat64 (0.00s)
    exact_test.go:285: 
        	Error Trace:	exact_test.go:285
        	Error:      	Should be true
        	Test:       	TestExactFloat64
--- FAIL: TestMergeBehavior (0.00s)
    --- FAIL: TestMergeBehavior/Float64Kind (0.00s)
        --- FAIL: TestMergeBehavior/Float64Kind/Forward=false (0.00s)
            exact_test.go:339: 
                	Error Trace:	exact_test.go:339
                	Error:      	Should be true
                	Test:       	TestMergeBehavior/Float64Kind/Forward=false
        --- FAIL: TestMergeBehavior/Float64Kind/Forward=true (0.00s)
            exact_test.go:339: 
                	Error Trace:	exact_test.go:339
                	Error:      	Should be true
                	Test:       	TestMergeBehavior/Float64Kind/Forward=true
FAIL
FAIL	go.opentelemetry.io/otel/sdk/metric/aggregator/exact	0.014s
--- FAIL: TestBasicTimestamps (0.00s)
    basic_test.go:338: 
        	Error Trace:	basic_test.go:338
        	Error:      	Should be true
        	Test:       	TestBasicTimestamps
FAIL
FAIL	go.opentelemetry.io/otel/sdk/metric/processor/basic	0.158s

@punya
Copy link
Member Author

punya commented Feb 19, 2021

It's worth noting that CI runs each test multiple times (386 vs x64, with/without coverage), and there's no pattern to which of these fail in a given run.

punya added a commit to punya/opentelemetry-go that referenced this issue Feb 19, 2021
Fixes open-telemetry#1559.

Some unit tests were flaking in CI because they expected the timestamp
to advance during a test, when it's possible for it to be unchanged
instead. This change switches the offending "<" comparisons to "<=".
This was referenced Feb 19, 2021
Aneurysm9 pushed a commit that referenced this issue Feb 19, 2021
* Vendor Thrift dependency

* Fix build

* Changelog entry

* Ignore third_party for coverage purposes

* Re-run tests

* Re-run tests

* Re-run tests

* Re-run tests

* Relax time comparisons

Fixes #1559.

Some unit tests were flaking in CI because they expected the timestamp
to advance during a test, when it's possible for it to be unchanged
instead. This change switches the offending "<" comparisons to "<=".

* Re-run tests

* Relax more time comparisons

* Re-run tests

Co-authored-by: Tyler Yahn <[email protected]>
@MrAlias
Copy link
Contributor

MrAlias commented Feb 22, 2021

I'm consistently seeing this failure for testing ubuntu-latest for Go 1.14 and 1.15 on both 386 and amd64 architecture. E.g.

https://github.com/open-telemetry/opentelemetry-go/runs/1956584735?check_suite_focus=true

@MrAlias MrAlias reopened this Feb 22, 2021
@MrAlias
Copy link
Contributor

MrAlias commented Feb 22, 2021

I'm testing this locally with:

docker run --rm -v "$PWD":/usr/src/go.opentelemetry.io/otel -w /usr/src/go.opentelemetry.io/otel/sdk/metric/aggregator/ -m 7g --cpus="0.1" golang:1.15-buster go test -c -race ./exact && while ./sdk/metric/aggregator/exact.test -test.run=TestExactMerge -test.count=100 -test.failfast; do date; done; echo -e '\a'

but have yet to reproduce the failure.

@Aneurysm9
Copy link
Member

I am able to reliably reproduce the error locally while not running the race detector. When I add the -race flag the tests no longer fail. :headscratch:

@MrAlias
Copy link
Contributor

MrAlias commented Feb 23, 2021

@Aneurysm9 that is interesting for sure! What system are you running on and what command do you run to reproduce?

Aneurysm9 pushed a commit that referenced this issue Feb 23, 2021
* Stagger timestamps in exact aggregator tests

Fixes #1559.

* Missed one

* Yield while you wait

* Just sleep for a teeny tiny bit

* Oops, wrong PR in Changelog

* Make sure that *some* time passes

* Keep time comparisons relaxed
ldelossa pushed a commit to ldelossa/opentelemetry-go that referenced this issue Mar 5, 2021
* Vendor Thrift dependency

* Fix build

* Changelog entry

* Ignore third_party for coverage purposes

* Re-run tests

* Re-run tests

* Re-run tests

* Re-run tests

* Relax time comparisons

Fixes open-telemetry#1559.

Some unit tests were flaking in CI because they expected the timestamp
to advance during a test, when it's possible for it to be unchanged
instead. This change switches the offending "<" comparisons to "<=".

* Re-run tests

* Relax more time comparisons

* Re-run tests

Co-authored-by: Tyler Yahn <[email protected]>
ldelossa pushed a commit to ldelossa/opentelemetry-go that referenced this issue Mar 5, 2021
* Stagger timestamps in exact aggregator tests

Fixes open-telemetry#1559.

* Missed one

* Yield while you wait

* Just sleep for a teeny tiny bit

* Oops, wrong PR in Changelog

* Make sure that *some* time passes

* Keep time comparisons relaxed
@pellared pellared added this to the untracked milestone Nov 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment