Skip to content

Commit

Permalink
Pending connections - protect against race condtion
Browse files Browse the repository at this point in the history
Summary:
# Issue
Ucache tests in staging environment discovered that value of `pending_connections` counter can sometimes become negative.

We also have single recent case in `tsp_prn/admarket/adranker.prn.gold-jungle-bus.0/17` task:

https://fburl.com/canvas/wo4tp5c4

My theory is that since the enqueue event is reported after the connection is enqueued, it's possible for the IO worker to dequeue this connection and decrement the counter before the enqueue event is processed. This could result in a negative number.

However, the value of the counter eventually becomes correct, so the following values are legitimate.

# Fix
1. Increment and decrement counters before logging events to scuba, which may introduce unnecessary delay. This should reduce probability of this race condition.
2. Do not report negative nubers to the observer. Log info about the issue instead.

Reviewed By: stuclar

Differential Revision: D66126565

fbshipit-source-id: a29e83daf6b8f58ff729b3544399ec48e7c8cd3e
  • Loading branch information
kaczmarekmhl authored and facebook-github-bot committed Nov 19, 2024
1 parent 45cc234 commit 3e94a1b
Show file tree
Hide file tree
Showing 2 changed files with 19 additions and 5 deletions.
14 changes: 10 additions & 4 deletions thrift/lib/cpp2/server/ThriftServer.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -550,21 +550,27 @@ class ThriftServer::ConnectionEventCallback
void onConnectionEnqueuedForAcceptorCallback(
const folly::NetworkSocket,
const folly::SocketAddress& clientAddr) noexcept override {
THRIFT_CONNECTION_EVENT(connection_enqueued_acceptor)
.log(thriftServer_, clientAddr);
// Increment counters prior to logging Scuba events to mitigate the risk of
// race conditions between increment and decrement operations.
if (pendingConnectionsMetrics_) {
pendingConnectionsMetrics_->onConnectionEnqueuedToIoWorker();
}

THRIFT_CONNECTION_EVENT(connection_enqueued_acceptor)
.log(thriftServer_, clientAddr);
}

void onConnectionDequeuedByAcceptorCallback(
const folly::NetworkSocket,
const folly::SocketAddress& clientAddr) noexcept override {
THRIFT_CONNECTION_EVENT(connection_dequeued_acceptor)
.log(thriftServer_, clientAddr);
// Increment counters prior to logging Scuba events to mitigate the risk of
// race conditions between increment and decrement operations.
if (pendingConnectionsMetrics_) {
pendingConnectionsMetrics_->onConnectionDequedByIoWorker();
}

THRIFT_CONNECTION_EVENT(connection_dequeued_acceptor)
.log(thriftServer_, clientAddr);
}

void onBackoffStarted() noexcept override {}
Expand Down
10 changes: 9 additions & 1 deletion thrift/lib/cpp2/server/metrics/PendingConnectionsMetrics.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@
*/

#include <thrift/lib/cpp2/server/metrics/PendingConnectionsMetrics.h>
#include "folly/GLog.h"

THRIFT_FLAG_DEFINE_bool(enable_pending_connections_metrics, true);

Expand Down Expand Up @@ -61,7 +62,14 @@ void PendingConnectionsMetrics::onConnectionDequedByIoWorker() {
}

void PendingConnectionsMetrics::onPendingConnectionsChange() {
observer_->pendingConnections(pendingConnections_.load());
auto pendingConnections = pendingConnections_.load();
if (pendingConnections >= 0) {
observer_->pendingConnections(pendingConnections);
} else {
FB_LOG_EVERY_MS(INFO, 10000)
<< "Pending connections < 0 due to possible race condition. Current value="
<< pendingConnections;
}
}

} // namespace apache::thrift

0 comments on commit 3e94a1b

Please sign in to comment.