Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Signed-off-by: Liu Cong [email protected]
What problem does this PR solve?
Improve TiKV performance on non-NVMe disk.
Problem Summary:
fio -fdatasync
orpg_test_sync
to profile the 'sync' performance of a diskSo, the 'sync' performance will be one of the bottlenecks of TiKV performance, especially under some types of disk.
This PR provide a config entry to control the raftstore's 'sync' frequency.
What is changed and how it works?
When peers receive data, the 'write' operation goes as usual, but the 'sync' will be hold for a while.
(not holding 'write' operation is the key, IO will be smoother comparing to simple batching, which will hold both 'write' and 'sync')
Before 'sync' is called, the related raft follower's messages of
AppendResp
will be blocked(held and cached),when follower perform 'sync', the raft log will be considered well persisted, then the messages will be released and delivered to leader.
In current implementation of
raft-rs
, thecommitted index
of araft group
will advance immediately, even though the log is not persisted yet, because currently it assume that the log will(should, must) be persisted before next round of handling.This assumption is not fit for our purpose, so we changed
raft-rs
for a bit:raft_group.on_synced(index)
to deliver 'sync' event to raft group, the committed index will not advance until on_synced is called.raft_group.ready_since(index)
toraft_group.ready_from_range(index_low, index_high)
, to fetch ready only belong to synced logs.Related changes
Tests
Side effects
Release note