-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix race condition when the redis filter is destroyed. #11466
Fix race condition when the redis filter is destroyed. #11466
Conversation
Signed-off-by: Henry Yang <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks this makes sense but needs more tests including the failure to lock cases, thank you!
/wait
Signed-off-by: Henry Yang <[email protected]>
for future reference, this is related to #9940 |
Signed-off-by: Henry Yang <[email protected]>
Signed-off-by: Henry Yang <[email protected]>
Signed-off-by: Henry Yang <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks looks like this is on the right track. I don't think you need all the locks() that you have, so would prefer to simplify if possible. Thank you!
/wait
source/extensions/filters/network/redis_proxy/conn_pool_impl.cc
Outdated
Show resolved
Hide resolved
}); | ||
} | ||
|
||
// this method is always called from a InstanceSharedPtr we don't have to worry about tls_->getTyped |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: s/this/This, same below
@@ -99,6 +118,11 @@ void InstanceImpl::ThreadLocalPool::onClusterAddOrUpdateNonVirtual( | |||
if (cluster.info()->name() != cluster_name_) { | |||
return; | |||
} | |||
// ensure the filter is still exists |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: start with capital, end with period, etc. Same elsewhere.
@@ -265,6 +296,12 @@ Common::Redis::Client::PoolRequest* InstanceImpl::ThreadLocalPool::makeRequestTo | |||
return nullptr; | |||
} | |||
|
|||
// ensure filter is not removed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not convinced you need all of these locks everywhere. How can we be making a request with a removed filter? I'm pretty sure you only need it in the construction path for the original set call?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is that when the connection pool instance is destroyed the TLS slot is not immediately destroyed. This leads to the code that cancel the pending requests and closing connections to be delayed as well. During this delay, a redirection from a pending request would cause a new request to be sent to the new host.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. Please add more comments around this and also fix format.
@@ -379,7 +419,9 @@ void InstanceImpl::PendingRequest::onResponse(Common::Redis::RespValuePtr&& resp | |||
void InstanceImpl::PendingRequest::onFailure() { | |||
request_handler_ = nullptr; | |||
pool_callbacks_.onFailure(); | |||
parent_.parent_.onFailure(); | |||
if (auto shared_conn_pool = parent_.parent_.lock()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment here, if the filter goes away we should be cancelling things. I don't think these are all needed.
Co-authored-by: Matt Klein <[email protected]> Signed-off-by: Henry Yang <[email protected]> Signed-off-by: Henry Yang <[email protected]>
72d3b0d
to
f8aa4c7
Compare
Signed-off-by: Henry Yang <[email protected]>
Signed-off-by: Henry Yang <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems reasonable with one nit, but per offline convo did you plan on reworking this?
/wait
Signed-off-by: Henry Yang <[email protected]>
Signed-off-by: Henry Yang <[email protected]>
# Conflicts: # source/extensions/filters/network/redis_proxy/conn_pool_impl.cc Signed-off-by: Henry Yang <[email protected]>
This is already the reworked code. We're no longer doing a lock on each request, the lock is checked prior to construction of ThreadLocalPool and during the cluster update callback. |
Signed-off-by: Henry Yang <[email protected]>
Signed-off-by: Henry Yang <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/wait
if (resolve_timer_) { | ||
resolve_timer_->disableTimer(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this needed? If so comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add a comment. Sorry I missed this one.
Signed-off-by: Henry Yang <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Signed-off-by: Henry Yang [email protected]
For an explanation of how to fill out the fields, please see the relevant section
in PULL_REQUESTS.md
Commit Message: Fix race condition when the redis filter is destroyed.
Additional Description: This fix the bugs with the following stack traces.
Risk Level: Low
Testing:
Docs Changes: N/A
Release Notes: N/A
Backtrace:
#0 0x000055ff2c9cd3d6 in Envoy::Extensions::NetworkFilters::RedisProxy::ConnPool::InstanceImpl::ThreadLocalPool::ThreadLocalPool (this=0x55ff2ef432d8, parent=..., dispatcher=..., cluster_name=...) at external/envoy/source/extensions/filters/network/redis_proxy/conn_pool_impl.cc:74
#0 0x000055ff2c9cd3d6 in Envoy::Extensions::NetworkFilters::RedisProxy::ConnPool::InstanceImpl::ThreadLocalPool::ThreadLocalPool (this=0x55ff2ef432d8, parent=..., dispatcher=..., cluster_name=...) at external/envoy/source/extensions/filters/network/redis_proxy/conn_pool_impl.cc:74
#1 0x000055ff2c9d01ea in __compressed_pair_elem<Envoy::Extensions::NetworkFilters::RedisProxy::ConnPool::InstanceImpl&, Envoy::Event::Dispatcher&, std::__1::basic_string const&, 0, 1, 2> (this=0x55ff2ef432d8, __args=...) at /opt/llvm/bin/../include/c++/v1/memory:2155
#2 __compressed_pair<std::__1::allocatorEnvoy::Extensions::NetworkFilters::RedisProxy::ConnPool::InstanceImpl::ThreadLocalPool&, Envoy::Extensions::NetworkFilters::RedisProxy::ConnPool::InstanceImpl&, Envoy::Event::Dispatcher&, std::__1::basic_string const&> (this=0x55ff2ef432d8, __first_args=..., __second_args=..., __pc=...) at /opt/llvm/bin/../include/c++/v1/memory:2258
#3 std::__1::__shared_ptr_emplace<Envoy::Extensions::NetworkFilters::RedisProxy::ConnPool::InstanceImpl::ThreadLocalPool, std::__1::allocatorEnvoy::Extensions::NetworkFilters::RedisProxy::ConnPool::InstanceImpl::ThreadLocalPool >::__shared_ptr_emplace<Envoy::Extensions::NetworkFilters::RedisProxy::ConnPool::InstanceImpl&, Envoy::Event::Dispatcher&, std::__1::basic_string<char, std::__1::char_traits, std::__1::allocator > const&> (this=0x55ff2ef432c0, __a=..., __args=..., __args=..., __args=...) at /opt/llvm/bin/../include/c++/v1/memory:3543
#4 0x000055ff2c9d013a in make_shared<Envoy::Extensions::NetworkFilters::RedisProxy::ConnPool::InstanceImpl&, Envoy::Event::Dispatcher&, std::__1::basic_string const&> (__args=..., __args=..., __args=...) at /opt/llvm/bin/../include/c++/v1/memory:4202
#5 make_shared<Envoy::Extensions::NetworkFilters::RedisProxy::ConnPool::InstanceImpl::ThreadLocalPool, Envoy::Extensions::NetworkFilters::RedisProxy::ConnPool::InstanceImpl&, Envoy::Event::Dispatcher&, std::__1::basic_string const&> (__args=..., __args=..., __args=...) at /opt/llvm/bin/../include/c++/v1/memory:4581
#6 operator() (this=, dispatcher=...) at external/envoy/source/extensions/filters/network/redis_proxy/conn_pool_impl.cc:53
#7 __invoke<(lambda at external/envoy/source/extensions/filters/network/redis_proxy/conn_pool_impl.cc:51:13) &, Envoy::Event::Dispatcher &> (__f=..., __args=...) at /opt/llvm/bin/../include/c++/v1/type_traits:3530
#8 __call<(lambda at external/envoy/source/extensions/filters/network/redis_proxy/conn_pool_impl.cc:51:13) &, Envoy::Event::Dispatcher &> (__args=..., __args=...) at /opt/llvm/bin/../include/c++/v1/__functional_base:317
#9 operator() (this=, __arg=...) at /opt/llvm/bin/../include/c++/v1/functional:1533
#10 std::__1::__function::__func<Envoy::Extensions::NetworkFilters::RedisProxy::ConnPool::InstanceImpl::InstanceImpl(std::__1::basic_string<char, std::__1::char_traits, std::__1::allocator > const&, Envoy::Upstream::ClusterManager&, Envoy::Extensions::NetworkFilters::Common::Redis::Client::ClientFactory&, Envoy::ThreadLocal::SlotAllocator&, envoy::extensions::filters::network::redis_proxy::v3::RedisProxy_ConnPoolSettings const&, Envoy::Api::Api&, std::__1::unique_ptr<Envoy::Stats::Scope, std::__1::default_deleteEnvoy::Stats::Scope >&&, std::__1::shared_ptrEnvoy::Extensions::NetworkFilters::Common::Redis::RedisCommandStats const&, std::__1::shared_ptrEnvoy::Extensions::Common::Redis::ClusterRefreshManager)::$_0, std::__1::allocator<Envoy::Extensions::NetworkFilters::RedisProxy::ConnPool::InstanceImpl::InstanceImpl(std::__1::basic_string<char, std::__1::char_traits, std::__1::allocator > const&, Envoy::Upstream::ClusterManager&, Envoy::Extensions::NetworkF
ilters::Common::Redis::Client::ClientFactory&, Envoy::ThreadLocal::SlotAllocator&, envoy::extensions::filters::network::redis_proxy::v3::RedisProxy_ConnPoolSettings const&, Envoy::Api::Api&, std::__1::unique_ptr<Envoy::Stats::Scope, std::__1::default_deleteEnvoy::Stats::Scope >&&, std::__1::shared_ptrEnvoy::Extensions::NetworkFilters::Common::Redis::RedisCommandStats const&, std::__1::shared_ptrEnvoy::Extensions::Common::Redis::ClusterRefreshManager)::$_0>, std::__1::shared_ptrEnvoy::ThreadLocal::ThreadLocalObject (Envoy::Event::Dispatcher&)>::operator()(Envoy::Event::Dispatcher&) (this=, __arg=...) at /opt/llvm/bin/../include/c++/v1/functional:1707
#11 0x000055ff2ca8bd58 in operator() (this=, __args=...) at /opt/llvm/bin/../include/c++/v1/functional:1860
#12 operator() (this=, __arg=...) at /opt/llvm/bin/../include/c++/v1/functional:2419
#13 operator() (this=, dispatcher=...) at external/envoy/source/common/thread_local/thread_local_impl.cc:101
#14 __invoke<(lambda at external/envoy/source/common/thread_local/thread_local_impl.cc:100:14) &, Envoy::Event::Dispatcher &> (__f=..., __args=...) at /opt/llvm/bin/../include/c++/v1/type_traits:3530
#15 __call<(lambda at external/envoy/source/common/thread_local/thread_local_impl.cc:100:14) &, Envoy::Event::Dispatcher &> (__args=..., __args=...) at /opt/llvm/bin/../include/c++/v1/__functional_base:317
#16 operator() (this=, __arg=...) at /opt/llvm/bin/../include/c++/v1/functional:1533
#17 std::__1::__function::__func<Envoy::ThreadLocal::InstanceImpl::Bookkeeper::set(std::__1::function<std::__1::shared_ptrEnvoy::ThreadLocal::ThreadLocalObject (Envoy::Event::Dispatcher&)>)::$_7, std::__1::allocator<Envoy::ThreadLocal::InstanceImpl::Bookkeeper::set(std::__1::function<std::__1::shared_ptrEnvoy::ThreadLocal::ThreadLocalObject (Envoy::Event::Dispatcher&)>)::$_7>, std::__1::shared_ptrEnvoy::ThreadLocal::ThreadLocalObject (Envoy::Event::Dispatcher&)>::operator()(Envoy::Event::Dispatcher&) (this=, __arg=...) at /opt/llvm/bin/../include/c++/v1/functional:1707
#18 0x000055ff2ca8cf88 in operator() (this=, __args=...) at /opt/llvm/bin/../include/c++/v1/functional:1860
#19 operator() (this=, __arg=...) at /opt/llvm/bin/../include/c++/v1/functional:2419
#20 operator() (this=) at external/envoy/source/common/thread_local/thread_local_impl.cc:214
#21 __invoke<(lambda at external/envoy/source/common/thread_local/thread_local_impl.cc:214:21) &> (__f=...) at /opt/llvm/bin/../include/c++/v1/type_traits:3530
#22 __call<(lambda at external/envoy/source/common/thread_local/thread_local_impl.cc:214:21) &> (__args=...) at /opt/llvm/bin/../include/c++/v1/__functional_base:348
#23 operator() (this=) at /opt/llvm/bin/../include/c++/v1/functional:1533
#24 std::__1::__function::__func<Envoy::ThreadLocal::InstanceImpl::SlotImpl::set(std::__1::function<std::__1::shared_ptrEnvoy::ThreadLocal::ThreadLocalObject (Envoy::Event::Dispatcher&)>)::$_13, std::__1::allocator<Envoy::ThreadLocal::InstanceImpl::SlotImpl::set(std::__1::function<std::__1::shared_ptrEnvoy::ThreadLocal::ThreadLocalObject (Envoy::Event::Dispatcher&)>)::$_13>, void ()>::operator()() (this=) at /opt/llvm/bin/../include/c++/v1/functional:1707
#25 0x000055ff2cb0ef76 in operator() (this=0x7f3532c25ad0) at /opt/llvm/bin/../include/c++/v1/functional:1860
#26 operator() (this=0x7f3532c25ad0) at /opt/llvm/bin/../include/c++/v1/functional:2419
#27 Envoy::Event::DispatcherImpl::runPostCallbacks (this=0x55ff2e669080) at external/envoy/source/common/event/dispatcher_impl.cc:233
#28 0x000055ff2ce880a6 in event_process_active_single_queue (base=0x55ff2e2dcdc0, activeq=0x55ff2e4fd030, max_to_process=2147483647, endtime=0x0) at /root/.cache/bazel/_bazel_root/e500e327e6981ab742dbacbb6723d21d/sandbox/processwrapper-sandbox/586/execroot/envoy_lyft/external/com_github_libevent_libevent/event.c:1713
#29 0x000055ff2ce86c2e in event_process_active (base=0x55ff2e2dcdc0) at /root/.cache/bazel/_bazel_root/e500e327e6981ab742dbacbb6723d21d/sandbox/processwrapper-sandbox/586/execroot/envoy_lyft/external/com_github_libevent_libevent/event.c:1808
#30 event_base_loop (base=0x55ff2e2dcdc0, flags=) at /root/.cache/bazel/_bazel_root/e500e327e6981ab742dbacbb6723d21d/sandbox/processwrapper-sandbox/586/execroot/envoy_lyft/external/com_github_libevent_libevent/event.c:2047
#31 0x000055ff2cb035b1 in Envoy::Server::WorkerImpl::threadRoutine (this=0x55ff2e537bd0, guard_dog=...) at external/envoy/source/server/worker_impl.cc:119
#32 0x000055ff2cf397e3 in operator() (this=) at /opt/llvm/bin/../include/c++/v1/functional:1860
#33 operator() (this=) at /opt/llvm/bin/../include/c++/v1/functional:2419
#34 operator() (this=, arg=) at external/envoy/source/common/common/posix/thread_impl.cc:33
#35 Envoy::Thread::ThreadImplPosix::ThreadImplPosix(std::__1::function<void ()>)::$_0::__invoke(void*) (arg=) at external/envoy/source/common/common/posix/thread_impl.cc:32
#36 0x00007f353380a184 in start_thread (arg=0x7f3532c37700) at pthread_create.c:312
#37 0x00007f35335370ad in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111