-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rocksdb ingest causes the leader lease to be invalid #5265
Comments
Good catch~
Would you like to contribute the a pull request? BTW, what is the size of sst that you ingest? |
I will contribute the a pull request 。 |
Dear @ShiXiangZ , We would like to send you a gift for the "good catch" NebulaGraph community award(and for the contributor award when PR is merged), would you mind sending us a mail to [email protected] with your address so that can receive the gift shipment? Thanks and welcome to the community! |
What we should do is to update a max point-in-time lease time = lastMsgAcceptedTime_ - lastMsgAcceptedCostMs_ + FLAGS_raft_heartbeat_interval_secs. FLAGS_raft_heartbeat_interval_secs is a constant. The comparison is only related to lastMsgAcceptedTime_ - lastMsgAcceptedCostMs_. |
The nebula heartbeat is divided into two steps:
Step 1: If replicatingLogs_ is false, then the leader synchronizes an empty log of the follower, and then writes it to rocksdb
Step 2: Initiate rpc, synchronize current information, do not write rocksdb,
And both steps will update lastMsgAcceptedCostMs_ and lastMsgAcceptedTime_
, the condition for the leader to judge that the lease is valid is:
time::WallClock::fastNowInMilliSec() - lastMsgAcceptedTime_ <
FLAGS_raft_heartbeat_interval_secs * 1000 - lastMsgAcceptedCostMs_;
When rocksdb is ingesting, trigger write stall, which will block writing. At this time, the empty log in step 1 cannot be written to rocksdb, and replicatingLogs_ cannot be updated false.
Each execution of sendHeartbeat() will only execute step 2. When the write stall disappears, the blocked write will be completed, and the lastMsgAcceptedTime_ and lastMsgAcceptedCostMs_ (very large) will be updated at the same time. At the same time, an error will be reported when querying this part leader: E_LEADER_LEASE_FAILED。
I think it needs to be judged when it should be updated. Only the latest AppendLog can update the data.
The text was updated successfully, but these errors were encountered: