Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

StatsThreadLocalStoreTestNoFixture.MemoryWithoutTlsFakeSymbolTable flaky? #8282

Closed
alyssawilk opened this issue Sep 18, 2019 · 7 comments · Fixed by #8293
Closed

StatsThreadLocalStoreTestNoFixture.MemoryWithoutTlsFakeSymbolTable flaky? #8282

alyssawilk opened this issue Sep 18, 2019 · 7 comments · Fixed by #8293
Assignees

Comments

@alyssawilk
Copy link
Contributor

Josh, could you take a look? I don't think this failure is related to the PR it failed on ( #8281) so I think it's a random flake.

https://circleci.com/gh/envoyproxy/envoy/278531?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link

[ RUN ] StatsThreadLocalStoreTestNoFixture.MemoryWithoutTlsFakeSymbolTable
test/common/stats/thread_local_store_test.cc:898: Failure
Expected equality of these values:
memory_test.consumedBytes()
Which is: 15268288
15268336
Stack trace:
0x528d66: Envoy::Stats::StatsThreadLocalStoreTestNoFixture_MemoryWithoutTlsFakeSymbolTable_Test::TestBody()
0xa94ea4: testing::internal::HandleExceptionsInMethodIfSupported<>()
0xa94dcc: testing::Test::Run()
0xa95e1f: testing::TestInfo::Run()
... Google Test internal frames ...

[ FAILED ] StatsThreadLocalStoreTestNoFixture.MemoryWithoutTlsFakeSymbolTable (787 ms)

@zuercher
Copy link
Member

Saw this on a docs-only PR as well (#8285).

@jmarantz
Copy link
Contributor

It had been stable I think since June. I'd love to know what possible sources of non-determinism there might be that could affect this fairly narrow test.

I'm imagining it might be something else recent that got checked in.

@jmarantz
Copy link
Contributor

and in fact the unexpected behavior is that it consumed 336-288 = 48 fewer bytes than the golden value.

@jmarantz
Copy link
Contributor

I have a theory; there are actually real threads happening in this test, although they have nothing to do with the test.

TestRunner::RunTests() instantiates ProcessWide which runs grpc_init() which ultimately calls pthread_create(). In fact it create 3 threads. I suspect there's some allocation going on in there in parallel to the test body, adding some (rare) flakiness.

@jmarantz
Copy link
Contributor

Possible cause: #8196 -- maybe that new version of gRPC made threads active in the background, which is obviously harmful to memory tests.

@jmarantz
Copy link
Contributor

jmarantz commented Sep 19, 2019

Running

bazel test --cache_test_results=no --compilation_mode=opt \
  --test_env=ENVOY_MEMORY_TEST_EXACT=true \ 
  --cache_test_results=no //test/common/stats:thread_local_store_test \
  --runs_per_test=100

fails 7/100 times. If I comment out the calls to grpc_init() and grpc_shutdown() in source/exe/process_wide.cc, I see 0 failures, even with --runs_per_test=1000

@jmarantz
Copy link
Contributor

jmarantz commented Sep 19, 2019

@htuch WDYT about calling grpc_init/grpc_shutdown from the ctor/dtor of GoogleAsyncClientImpl ?

That would keep it more constrained and avoid having it run threads over the top of unrelated tests?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants