rocksdb ingest causes the leader lease to be invalid #5265

tangyuanzhang · 2023-01-16T12:39:19Z

The nebula heartbeat is divided into two steps:
Step 1: If replicatingLogs_ is false, then the leader synchronizes an empty log of the follower, and then writes it to rocksdb
Step 2: Initiate rpc, synchronize current information, do not write rocksdb,

And both steps will update lastMsgAcceptedCostMs_ and lastMsgAcceptedTime_
, the condition for the leader to judge that the lease is valid is：
time::WallClock::fastNowInMilliSec() - lastMsgAcceptedTime_ <
FLAGS_raft_heartbeat_interval_secs * 1000 - lastMsgAcceptedCostMs_;

When rocksdb is ingesting, trigger write stall, which will block writing. At this time, the empty log in step 1 cannot be written to rocksdb, and replicatingLogs_ cannot be updated false.
Each execution of sendHeartbeat() will only execute step 2. When the write stall disappears, the blocked write will be completed, and the lastMsgAcceptedTime_ and lastMsgAcceptedCostMs_ (very large) will be updated at the same time. At the same time, an error will be reported when querying this part leader: E_LEADER_LEASE_FAILED。

I think it needs to be judged when it should be updated. Only the latest AppendLog can update the data.

critical27 · 2023-01-17T09:40:42Z

Good catch~
I suppose we check compare nowTime and lastMsgAcceptedTime_ first:

if nowTime is bigger, compare nowTime - nowCosMs and lastMsgAcceptedTime_ - lastMsgAcceptedCostMs_ as the code you posted above
if lastMsgAcceptedTime_ is bigger, just do nothing

Would you like to contribute the a pull request?

BTW, what is the size of sst that you ingest?

tangyuanzhang · 2023-01-17T10:04:11Z

I think there is no need to compare nowTime and lastMsg Accepted Time, the leader updates lastMsgAcceptedTime_ is to lock raftLock_, so here nowTime > lastMsgAcceptedTime_ is always true。

tangyuanzhang · 2023-01-17T10:07:02Z

I will contribute the a pull request 。
The ingest sst file is 200G, and the storage data is 1T。

wey-gu · 2023-01-17T10:11:22Z

Dear @ShiXiangZ ,

We would like to send you a gift for the "good catch" NebulaGraph community award(and for the contributor award when PR is merged), would you mind sending us a mail to [email protected] with your address so that can receive the gift shipment?

Thanks and welcome to the community!

cc @QingZ11 @lisahui

qiwei9743 · 2023-01-17T10:12:11Z

What we should do is to update a max point-in-time lease time = lastMsgAcceptedTime_ - lastMsgAcceptedCostMs_ + FLAGS_raft_heartbeat_interval_secs. FLAGS_raft_heartbeat_interval_secs is a constant. The comparison is only related to lastMsgAcceptedTime_ - lastMsgAcceptedCostMs_.

tangyuanzhang added the type/bug Type: something is unexpected label Jan 16, 2023

github-actions bot added affects/none PR/issue: this bug affects none version. severity/none Severity of bug labels Jan 16, 2023

Sophie-Xie assigned critical27 Jan 17, 2023

This was referenced Jan 17, 2023

fix heartbeat handling update old time #5270

Closed

fix heartbeat handling update old time #5271

Merged

wey-gu mentioned this issue Jan 21, 2023

Weekly Report 2023-01-20 vesoft-inc/nebula-community#208

Closed

Sophie-Xie assigned tangyuanzhang and unassigned critical27 Jan 29, 2023

Sophie-Xie closed this as completed Feb 7, 2023

github-actions bot added the process/fixed Process of bug label Feb 7, 2023

wey-gu mentioned this issue Feb 11, 2023

Weekly Report 2023-02-10 vesoft-inc/nebula-community#319

Closed

tangyuanzhang mentioned this issue May 5, 2023

Prevent the heartbeat time from going back and causing leader lease invalid #5534

Merged

11 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rocksdb ingest causes the leader lease to be invalid #5265

rocksdb ingest causes the leader lease to be invalid #5265

tangyuanzhang commented Jan 16, 2023 •

edited

Loading

critical27 commented Jan 17, 2023

tangyuanzhang commented Jan 17, 2023

tangyuanzhang commented Jan 17, 2023

wey-gu commented Jan 17, 2023

qiwei9743 commented Jan 17, 2023

rocksdb ingest causes the leader lease to be invalid #5265

rocksdb ingest causes the leader lease to be invalid #5265

Comments

tangyuanzhang commented Jan 16, 2023 • edited Loading

critical27 commented Jan 17, 2023

tangyuanzhang commented Jan 17, 2023

tangyuanzhang commented Jan 17, 2023

wey-gu commented Jan 17, 2023

qiwei9743 commented Jan 17, 2023

tangyuanzhang commented Jan 16, 2023 •

edited

Loading