Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

geoip provider of maxmind flaky #35829

Closed
wbpcode opened this issue Aug 23, 2024 · 16 comments · Fixed by #35862, #36043 or #36118
Closed

geoip provider of maxmind flaky #35829

wbpcode opened this issue Aug 23, 2024 · 16 comments · Fixed by #35862, #36043 or #36118
Assignees

Comments

@wbpcode wbpcode changed the title geoip provider geoip provider of maxmind flaky Aug 23, 2024
@wbpcode
Copy link
Member Author

wbpcode commented Aug 23, 2024

@nezdolik

@nezdolik nezdolik self-assigned this Aug 23, 2024
@wbpcode
Copy link
Member Author

wbpcode commented Aug 27, 2024

If fixing this require some time, could we disable this test first? @nezdolik

@nezdolik
Copy link
Member

@wbpcode raised a patch here: #35862

@nezdolik
Copy link
Member

There is another type of flake in asan that needs fixing:

[ RUN      ] GeoipProviderTest.DbReloadError
/opt/llvm/bin/../include/c++/v1/vector:1441:12: runtime error: reference binding to null pointer of type 'std::function<absl::Status (unsigned int)>'
    #0 0x3302ae1 in Envoy::Extensions::GeoipProviders::Maxmind::GeoipProviderTest_DbReloadError_Test::TestBody() /opt/llvm/bin/../include/c++/v1/vector:1441:5
    #1 0xa552494 in void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) /proc/self/cwd/external/com_google_googletest/googletest/src/gtest.cc:2580:10
    #2 0xa519902 in testing::Test::Run() /proc/self/cwd/external/com_google_googletest/googletest/src/gtest.cc:2655:5
    #3 0xa51b477 in testing::TestInfo::Run() /proc/self/cwd/external/com_google_googletest/googletest/src/gtest.cc:2832:11
    #4 0xa51d814 in testing::TestSuite::Run() /proc/self/cwd/external/com_google_googletest/googletest/src/gtest.cc:2986:28
    #5 0xa53fbda in testing::internal::UnitTestImpl::RunAllTests() /proc/self/cwd/external/com_google_googletest/googletest/src/gtest.cc:5697:44
    #6 0xa5567a3 in bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) /proc/self/cwd/external/com_google_googletest/googletest/src/gtest.cc:2580:10
    #7 0xa53e8e6 in testing::UnitTest::Run() /proc/self/cwd/external/com_google_googletest/googletest/src/gtest.cc:5280:10
    #8 0x7026b4f in Envoy::TestRunner::runTests(int, char**) /proc/self/cwd/external/com_google_googletest/googletest/include/gtest/gtest.h:2485:46
    #9 0x702262d in main /proc/self/cwd/test/main.cc:34:10
    #10 0x7fdb08231082 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x24082) (BuildId: eebe5d5f4b608b8a53ec446b63981bba373ca0ca)
    #11 0x323872d in _start (/b/f/w/bazel-out/k8-dbg/bin/test/extensions/geoip_providers/maxmind/geoip_provider_test.runfiles/envoy/test/extensions/geoip_providers/maxmind/geoip_provider_test+0x323872d)

@phlax phlax reopened this Sep 5, 2024
@phlax
Copy link
Member

phlax commented Sep 5, 2024

@nezdolik not sure if its the same issue but im seeing this fail asan quite a bit (in other testing repo) - and its failed postsubmit here

https://dev.azure.com/cncf/envoy/_build/results?buildId=179259&view=logs&j=1439b9f7-a348-5b50-b5fe-ea612ea91241&t=1002ac43-da84-5fae-70b2-98833b702d09&l=282

possibly unrelated but ive also seen this fail tsan a few times

@nezdolik
Copy link
Member

nezdolik commented Sep 5, 2024

@phlax yup asan still needs addressing. Did not notice that issue got closed. Did you see tsan failures after 27th August?

@phlax
Copy link
Member

phlax commented Sep 5, 2024

yeah saw tsan issues yesterday, altho seemed to go away after rerunning - probably relevant is that the workers it was running on are pretty resource-constrained

@phlax
Copy link
Member

phlax commented Sep 7, 2024

so, im doing a load of testing that are non-cached, and im seeing this quite a bit in compile-time-options also

envoy_reloadable_features_mmdb_files_reload_enabled to: true
[       OK ] TestName/MmdbReloadImplTest.MmdbNotReloadedRuntimeFeatureDisabled/2 (398 ms)
[----------] 9 tests from TestName/MmdbReloadImplTest (3787 ms total)
[----------] Global test environment tear-down
[==========] 23 tests from 4 test suites ran. (14742 ms total)
[  PASSED  ] 23 tests.
Have memory regions w/o callers: might report false leaks
Leak check _main_ detected leaks of 144 bytes in 3 objects
The 2 largest leaks:
*** WARNING: Cannot convert addresses to symbols in output below.
*** Reason: Cannot find 'pprof' (is PPROF_PATH set correctly?)
*** If you cannot fix this, try running pprof directly.
Leak of 96 bytes in 2 objects allocated from:
	@ 1ecfe15 
	@ 1ecfd7d 

@phlax
Copy link
Member

phlax commented Sep 7, 2024

which i think doesnt fail anything - so may be expected/related to test - but also wondering if its related to issues seen elsewhere

@phlax
Copy link
Member

phlax commented Sep 9, 2024

contrary to what i said above it does fail the CI

also on uncached runs it seems ~50/50 whether this passes

@nezdolik i think this issue is pretty high priority given the frequency this is failing stuff, and the nature of what is failing

@phlax
Copy link
Member

phlax commented Sep 9, 2024

this is a tail of the tsan error

 Thread T10 'mmdb_reload_rou' (tid=14608, running) created by main thread at:
    #0 pthread_create ??:? (geoip_provider_test+0x246892d)
    #1 Envoy::Thread::PosixThreadFactory::createPthread(Envoy::Thread::ThreadHandle*) ??:? (geoip_provider_test+0x6cb4752)
    #2 Envoy::Thread::PosixThreadFactory::createThread(std::__1::function<void ()>, std::__1::optional<Envoy::Thread::Options> const&, bool) ??:? (geoip_provider_test+0x6cb48c7)
    #3 Envoy::Thread::PosixThreadFactory::createThread(std::__1::function<void ()>, std::__1::optional<Envoy::Thread::Options> const&) ??:? (geoip_provider_test+0x6cb4631)
    #4 Envoy::Extensions::GeoipProviders::Maxmind::GeoipProvider::GeoipProvider(Envoy::Event::Dispatcher&, Envoy::Api::Api&, std::__1::shared_ptr<Envoy::Singleton::Instance>, std::__1::shared_ptr<Envoy::Extensions::GeoipProviders::Maxmind::GeoipProviderConfig>) ??:? (geoip_provider_test+0x268781d)
    #5 Envoy::Extensions::GeoipProviders::Maxmind::GeoipProvider* std::__1::construct_at<Envoy::Extensions::GeoipProviders::Maxmind::GeoipProvider, Envoy::Event::Dispatcher&, Envoy::Api::Api&, std::__1::shared_ptr<Envoy::Extensions::GeoipProviders::Maxmind::DriverSingleton>&, std::__1::shared_ptr<Envoy::Extensions::GeoipProviders::Maxmind::GeoipProviderConfig> const&, Envoy::Extensions::GeoipProviders::Maxmind::GeoipProvider*>(Envoy::Extensions::GeoipProviders::Maxmind::GeoipProvider*, Envoy::Event::Dispatcher&, Envoy::Api::Api&, std::__1::shared_ptr<Envoy::Extensions::GeoipProviders::Maxmind::DriverSingleton>&, std::__1::shared_ptr<Envoy::Extensions::GeoipProviders::Maxmind::GeoipProviderConfig> const&) config.cc:? (geoip_provider_test+0x26580c4)
    #6 void std::__1::allocator_traits<std::__1::allocator<Envoy::Extensions::GeoipProviders::Maxmind::GeoipProvider> >::construct<Envoy::Extensions::GeoipProviders::Maxmind::GeoipProvider, Envoy::Event::Dispatcher&, Envoy::Api::Api&, std::__1::shared_ptr<Envoy::Extensions::GeoipProviders::Maxmind::DriverSingleton>&, std::__1::shared_ptr<Envoy::Extensions::GeoipProviders::Maxmind::GeoipProviderConfig> const&, void, void>(std::__1::allocator<Envoy::Extensions::GeoipProviders::Maxmind::GeoipProvider>&, Envoy::Extensions::GeoipProviders::Maxmind::GeoipProvider*, Envoy::Event::Dispatcher&, Envoy::Api::Api&, std::__1::shared_ptr<Envoy::Extensions::GeoipProviders::Maxmind::DriverSingleton>&, std::__1::shared_ptr<Envoy::Extensions::GeoipProviders::Maxmind::GeoipProviderConfig> const&) ??:? (geoip_provider_test+0x2657d5b)
    #7 std::__1::__shared_ptr_emplace<Envoy::Extensions::GeoipProviders::Maxmind::GeoipProvider, std::__1::allocator<Envoy::Extensions::GeoipProviders::Maxmind::GeoipProvider> >::__shared_ptr_emplace<Envoy::Event::Dispatcher&, Envoy::Api::Api&, std::__1::shared_ptr<Envoy::Extensions::GeoipProviders::Maxmind::DriverSingleton>&, std::__1::shared_ptr<Envoy::Extensions::GeoipProviders::Maxmind::GeoipProviderConfig> const&>(std::__1::allocator<Envoy::Extensions::GeoipProviders::Maxmind::GeoipProvider>, Envoy::Event::Dispatcher&, Envoy::Api::Api&, std::__1::shared_ptr<Envoy::Extensions::GeoipProviders::Maxmind::DriverSingleton>&, std::__1::shared_ptr<Envoy::Extensions::GeoipProviders::Maxmind::GeoipProviderConfig> const&) ??:? (geoip_provider_test+0x265781f)
    #8 std::__1::shared_ptr<Envoy::Extensions::GeoipProviders::Maxmind::GeoipProvider> std::__1::allocate_shared<Envoy::Extensions::GeoipProviders::Maxmind::GeoipProvider, std::__1::allocator<Envoy::Extensions::GeoipProviders::Maxmind::GeoipProvider>, Envoy::Event::Dispatcher&, Envoy::Api::Api&, std::__1::shared_ptr<Envoy::Extensions::GeoipProviders::Maxmind::DriverSingleton>&, std::__1::shared_ptr<Envoy::Extensions::GeoipProviders::Maxmind::GeoipProviderConfig> const&, void>(std::__1::allocator<Envoy::Extensions::GeoipProviders::Maxmind::GeoipProvider> const&, Envoy::Event::Dispatcher&, Envoy::Api::Api&, std::__1::shared_ptr<Envoy::Extensions::GeoipProviders::Maxmind::DriverSingleton>&, std::__1::shared_ptr<Envoy::Extensions::GeoipProviders::Maxmind::GeoipProviderConfig> const&) config.cc:? (geoip_provider_test+0x26574a2)
    #9 std::__1::shared_ptr<Envoy::Extensions::GeoipProviders::Maxmind::GeoipProvider> std::__1::make_shared<Envoy::Extensions::GeoipProviders::Maxmind::GeoipProvider, Envoy::Event::Dispatcher&, Envoy::Api::Api&, std::__1::shared_ptr<Envoy::Extensions::GeoipProviders::Maxmind::DriverSingleton>&, std::__1::shared_ptr<Envoy::Extensions::GeoipProviders::Maxmind::GeoipProviderConfig> const&, void>(Envoy::Event::Dispatcher&, Envoy::Api::Api&, std::__1::shared_ptr<Envoy::Extensions::GeoipProviders::Maxmind::DriverSingleton>&, std::__1::shared_ptr<Envoy::Extensions::GeoipProviders::Maxmind::GeoipProviderConfig> const&) config.cc:? (geoip_provider_test+0x26532fc)
    #10 Envoy::Extensions::GeoipProviders::Maxmind::DriverSingleton::get(std::__1::shared_ptr<Envoy::Extensions::GeoipProviders::Maxmind::DriverSingleton>, envoy::extensions::geoip_providers::maxmind::v3::MaxMindConfig const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, Envoy::Server::Configuration::FactoryContext&) ??:? (geoip_provider_test+0x2651a72)
    #11 Envoy::Extensions::GeoipProviders::Maxmind::MaxmindProviderFactory::createGeoipProviderDriverTyped(envoy::extensions::geoip_providers::maxmind::v3::MaxMindConfig const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, Envoy::Server::Configuration::FactoryContext&) ??:? (geoip_provider_test+0x264eb02)
    #12 Envoy::Extensions::GeoipProviders::Common::FactoryBase<envoy::extensions::geoip_providers::maxmind::v3::MaxMindConfig>::createGeoipProviderDriver(google::protobuf::Message const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, Envoy::Server::Configuration::FactoryContext&) ??:? (geoip_provider_test+0x2652c79)
    #13 Envoy::Extensions::GeoipProviders::Maxmind::GeoipProviderTestBase::initializeProvider(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::optional<Envoy::ConditionalInitializer>&) ??:? (geoip_provider_test+0x2503f2f)
    #14 Envoy::Extensions::GeoipProviders::Maxmind::GeoipProviderTest_ValidConfigEmptyLookupResult_Test::TestBody() ??:? (geoip_provider_test+0x24eb627)
    #15 void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) ??:? (geoip_provider_test+0x7aea187)
    #16 void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) ??:? (geoip_provider_test+0x7aceb3a)
    #17 testing::Test::Run() ??:? (geoip_provider_test+0x7ab1d18)
    #18 testing::TestInfo::Run() ??:? (geoip_provider_test+0x7ab2b51)
    #19 testing::TestSuite::Run() ??:? (geoip_provider_test+0x7ab36f0)
    #20 testing::internal::UnitTestImpl::RunAllTests() ??:? (geoip_provider_test+0x7ac372a)
    #21 bool testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) ??:? (geoip_provider_test+0x7af0127)
    #22 bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) ??:? (geoip_provider_test+0x7ad252a)
    #23 testing::UnitTest::Run() ??:? (geoip_provider_test+0x7ac2fed)
    #24 RUN_ALL_TESTS() ??:? (geoip_provider_test+0x5186af7)
    #25 Envoy::TestRunner::runTests(int, char**) ??:? (geoip_provider_test+0x5185890)
    #26 main ??:? (geoip_provider_test+0x5182ac6)

SUMMARY: ThreadSanitizer: data race geoip_provider_test.cc:? in _ZNSt3__14swapIPNS_8functionIFN4absl12lts_202308026StatusEjEEEEENS_9enable_ifIXaasr21is_move_constructibleIT_EE5valuesr18is_move_assignableIS9_EE5valueEvE4typeERS9_SC_

@nezdolik
Copy link
Member

nezdolik commented Sep 9, 2024

@phlax looking currently into asan failure

@antoniovleonti
Copy link
Contributor

@nezdolik nezdolik reopened this Sep 11, 2024
@nezdolik
Copy link
Member

should have probably created 3 separate issues per each ci failure. So far tsan and asan checks have been fixed. Now there is one more on release tests (not related to tsan or asan):

[ RUN      ] GeoipProviderTest.ValidConfigCityLookupError
[2024-09-11 21:26:45.073][4110][critical][backtrace] [./source/server/backtrace.h:127] Caught Segmentation fault, suspect faulting address 0x7566206b636f65
[2024-09-11 21:26:45.073][4110][critical][backtrace] [./source/server/backtrace.h:111] Backtrace (use tools/stack_decode.py to get line numbers):
[2024-09-11 21:26:45.073][4110][critical][backtrace] [./source/server/backtrace.h:112] Envoy version: 0/1.32.0-dev/test/RELEASE/BoringSSL
[2024-09-11 21:26:45.073][4110][critical][backtrace] [./source/server/backtrace.h:119] #0: __kernel_rt_sigreturn [0xfa4461d838f8]
[2024-09-11 21:26:45.077][4110][critical][backtrace] [./source/server/backtrace.h:119] #1: Envoy::Stats::IsolatedStoreImpl::~IsolatedStoreImpl() [0xe0e870]
[2024-09-11 21:26:45.080][4110][critical][backtrace] [./source/server/backtrace.h:119] #2: Envoy::Stats::TestUtil::TestStore::~TestStore() [0x7dd9a4]
[2024-09-11 21:26:45.083][4110][critical][backtrace] [./source/server/backtrace.h:119] #3: Envoy::Upstream::MockClusterInfo::~MockClusterInfo() [0xa5ecf0]
[2024-09-11 21:26:45.087][4110][critical][backtrace] [./source/server/backtrace.h:119] #4: Envoy::Upstream::MockHost::~MockHost() [0xa1e480]
[2024-09-11 21:26:45.090][4110][critical][backtrace] [./source/server/backtrace.h:119] #5: Envoy::Upstream::MockHost::~MockHost() [0xa1e4f0]
[2024-09-11 21:26:45.094][4110][critical][backtrace] [./source/server/backtrace.h:119] #6: testing::internal::ReturnAction<>::Impl<>::~Impl() [0xa17760]

@nezdolik
Copy link
Member

i believe the flakiness has been addressed. closing this one

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants