Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[native] Fix crash in httpclient when accessing null eventbase #23258

Merged
merged 1 commit into from
Jul 19, 2024

Conversation

amitkdutta
Copy link
Contributor

@amitkdutta amitkdutta commented Jul 19, 2024

Description

Fixing a crash when eventbase is null and accessed in httpclient

Motivation and Context

We are observing following crash sporadically when a worker is shutting down:

(unknown) at ../signal/../sysdeps/unix/sysv/linux/x86_64/libc_sigaction.c:-1
folly::EventBase::runInEventBaseThread(folly::Function<void ()>) at fbcode/third-party-buck/platform010/build/libgcc/include/c++/trunk/bits/atomic_base.h:636

facebook::presto::http::HttpClient::sendRequest(proxygen::HTTPMessage const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, long) at ./fbcode/github/presto-trunk/presto-native-execution/presto_cpp/main/http/HttpClient.cpp:536

facebook::presto::http::RequestBuilder::send(facebook::presto::http::HttpClient*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, long) at fbcode/github/presto-trunk/presto-native-execution/presto_cpp/main/http/HttpClient.h:248

facebook::presto::PrestoExchangeSource::doAbortResults(long) at ./fbcode/github/presto-trunk/presto-native-execution/presto_cpp/main/PrestoExchangeSource.cpp:481
facebook::presto::PrestoExchangeSource::handleAbortResponse(folly::Try<std::unique_ptr<facebook::presto::http::HttpResponse, std::default_delete<facebook::presto::http::HttpResponse> > >) at ./fbcode/github/presto-trunk/presto-native-execution/presto_cpp/main/PrestoExchangeSource.cpp:509

_ZN5folly6detail8function5call_IZNS_7futures6detail4CoreISt10unique_ptrIN8facebook6presto4http12HttpResponseESt14default_deleteISA_EEE11setCallbackIZNS4_10FutureBaseISD_E18thenImplementationIZNOS_6FutureISD_E7thenTryIZNS8_20PrestoExchangeSource14doAbortResultsElE3$_5EENSJ_INS4_17tryCallableResultISD_T_E10value_typeEEEOSP_EUlONS_8Executor9KeepAliveISU_EEONS_3TryISD_EEE_NS4_25tryExecutorCallableResultISD_S11_EEEENSt9enable_ifIXntsrNT0_13ReturnsFutureE5valueENSJ_INS15_10value_typeEEEE4typeEST_S15_NS4_18InlineContinuationEEUlSX_S10_E_EEvST_OSt10shared_ptrINS_14RequestContextEES1A_EUlRNS4_8CoreBaseESX_PNS_17exception_wrapperEE_Lb1ELb0EvJS1H_SX_S1J_EEET2_DpT3_RNS1_4DataE at ./fbcode/github/presto-trunk/presto-native-execution/presto_cpp/main/PrestoExchangeSource.cpp:485
void folly::detail::function::call_<folly::Executor::KeepAlive<folly::Executor>::add<folly::futures::detail::CoreBase::doCallback(folly::Executor::KeepAlive<folly::Executor>&&, folly::futures::detail::State)::$_0>(folly::futures::detail::CoreBase::doCallback(folly::Executor::KeepAlive<folly::Executor>&&, folly::futures::detail::State)::$_0&&) &&::{lambda()#1}, true, false, void>(, folly::detail::function::Data&) at fbcode/folly/Function.h:370
folly::ThreadPoolExecutor::runTask(std::shared_ptr<folly::ThreadPoolExecutor::Thread> const&, folly::ThreadPoolExecutor::Task&&) at fbcode/folly/Function.h:370
folly::CPUThreadPoolExecutor::threadRun(std::shared_ptr<folly::ThreadPoolExecutor::Thread>) at ./fbcode/folly/executors/CPUThreadPoolExecutor.cpp:350
void folly::detail::function::call_<std::_Bind<void (folly::ThreadPoolExecutor::*(folly::ThreadPoolExecutor*, std::shared_ptr<folly::ThreadPoolExecutor::Thread>))(std::shared_ptr<folly::ThreadPoolExecutor::Thread>)>, true, false, void>(, folly::detail::function::Data&) at fbcode/third-party-buck/platform010/build/libgcc/include/c++/trunk/bits/invoke.h:74
execute_native_thread_routine at /home/engshare/third-party2/libgcc/11.x/src/gcc-11.x/x86_64-facebook-linux/libstdc++-v3/src/c++11/../../../.././libstdc++-v3/src/c++11/thread.cc:82
start_thread at /home/engshare/third-party2/glibc/2.34/src/glibc-2.34/nptl/pthread_create.c:434
__clone3 at /home/engshare/third-party2/glibc/2.34/src/glibc-2.34/misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

The reason is eventbase_ is a raw pointer and it can be null in this scenario. Adding null check before accessing it.

Impact

Shutdown crash is not observed anymore.

Test Plan

Existing tests and ran in a cluster setup to see worker shutdown is going fine.

@amitkdutta amitkdutta requested a review from a team as a code owner July 19, 2024 01:56
Copy link
Contributor

@aditi-pandit aditi-pandit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @amitkdutta

@amitkdutta amitkdutta merged commit 2fdbc69 into prestodb:master Jul 19, 2024
59 checks passed
@@ -531,7 +531,7 @@ folly::SemiFuture<std::unique_ptr<HttpResponse>> HttpClient::sendRequest(
sendRequest(std::move(responseHandler));
};

if (delayMs > 0) {
if (delayMs > 0 && eventBase_ != nullptr) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

else condition uses eventBase_ as well. We should make sure life time of event base be equal or longer than client.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah you are spot on! Let me fix it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants