Skip to content

Commit

Permalink
[#21096] DocDB: Avoid fatal due to narrow_cast during ListTabletServers
Browse files Browse the repository at this point in the history
Summary:
During ListTabletServers, we occasionally experience a FATAL with the following stack whenever the last heartbeat was > 24 days ago. While this is a remote possibility, it is still a possibility.

So as part of ListTabletServers, if the last heartbeat is more than int32 max milliseconds, we just set it to int32 max.

```
F20240207 16:26:17 ../../src/yb/gutil/casts.cc:21] Bad narrow cast: 2205994749 > 2147483647
@ 0x55e4d34d1257 google::LogMessage::SendToLog()
@ 0x55e4d34d219d google::LogMessage::Flush()
@ 0x55e4d34d2819 google::LogMessageFatal::~LogMessageFatal()
@ 0x55e4d3cd4c69 yb::BadNarrowCast()
@ 0x55e4d3741898 yb::narrow_cast<>()
@ 0x55e4d3f163aa yb::master::(anonymous namespace)::MasterClusterServiceImpl::ListTabletServers()
@ 0x55e4d414a455 std::__1::__function::__func<>::operator()()
@ 0x55e4d414b33f yb::master::MasterClusterIf::Handle()
@ 0x55e4d44aaeda yb::rpc::ServicePoolImpl::Handle()
@ 0x55e4d43ea97f yb::rpc::InboundCall::InboundCallTask::Run()
@ 0x55e4d44b9a73 yb::rpc::(anonymous namespace)::Worker::Execute()
@ 0x55e4d4b8ab02 yb::Thread::SuperviseThread()
@ 0x7f8ebad27694 start_thread
@ 0x7f8ebb22941d __clone
```
Jira: DB-10056

Test Plan: MasterTest.TestRegisterAndHeartbeat

Reviewers: bkolagani, arybochkin

Reviewed By: bkolagani

Subscribers: ybase, bogdan

Differential Revision: https://phorge.dev.yugabyte.com/D32496
  • Loading branch information
lingamsandeep committed Feb 22, 2024
1 parent d330d06 commit aa2efd7
Showing 1 changed file with 8 additions and 2 deletions.
10 changes: 8 additions & 2 deletions src/yb/master/master_cluster_service.cc
Original file line number Diff line number Diff line change
Expand Up @@ -138,8 +138,14 @@ class MasterClusterServiceImpl : public MasterServiceBase, public MasterClusterI
*entry->mutable_registration() = std::move(*ts_info.mutable_registration());
auto last_heartbeat = desc->LastHeartbeatTime();
if (last_heartbeat) {
entry->set_millis_since_heartbeat(narrow_cast<int>(
MonoTime::Now().GetDeltaSince(last_heartbeat).ToMilliseconds()));
auto ms_since_heartbeat = MonoTime::Now().GetDeltaSince(last_heartbeat).ToMilliseconds();
if (ms_since_heartbeat > std::numeric_limits<int32_t>::max()) {
LOG(DFATAL) << entry->instance_id().permanent_uuid()
<< " has not heartbeated since "
<< ms_since_heartbeat;
ms_since_heartbeat = std::numeric_limits<int32_t>::max();
}
entry->set_millis_since_heartbeat(narrow_cast<int>(ms_since_heartbeat));
}
entry->set_alive(desc->IsLive());
desc->GetMetrics(entry->mutable_metrics());
Expand Down

0 comments on commit aa2efd7

Please sign in to comment.