Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

region_cache: check epoch before insert #1079

Merged
merged 10 commits into from
Dec 20, 2023
Merged

Conversation

CabinfeverB
Copy link
Member

@CabinfeverB CabinfeverB commented Dec 15, 2023

There are two or more situations in which the region we got is stale. The first case is that the process of getting a region is concurrent. The stale region may be returned later due to network reasons.
The second case is that the region may be obtained from the PD follower in the future, and there is the synchronization time between the pd follower and the leader.
So we should check the epoch to avoid stale region.

Some details of the insertRegionToCache are described as follows.

Firstly, mu.latestVersions[cachedRegion.VerID().id] is used to check if there is an old version with the same ID. If it exists, the code compares the version numbers (newVer.GetVer() and oldVer.GetVer()) as well as the conf version numbers (newVer.GetConfVer() and oldVer.GetConfVer()). If the old version number is higher, it considers the newly obtained Region as stale.

Next, mu.sorted.removeIntersecting is called to check and remove any old Regions that overlap with the new Region, and to determine if the new Region is stale.

Afterward, the new Region is inserted into the cache using the mu.sorted.ReplaceOrInsert(cachedRegion) method.

If there are existing old Regions that overlap with the new Region, it inherits some information from the first overlapping Region, such as working indexes for TiKV and TiFlash, as well as bucket information.

Finally, it iterates over the overlapping old Regions, removes them from the cache, and marks them as expired. If the invalidateOldRegion parameter is true, the old Regions are marked as invalid to prevent certain requests from using expired Region information.

Lastly, relevant variables are updated, including adding the new Region and its corresponding version to mu.regions and mu.latestVersions.

Throughout this process, by comparing version numbers and cluster change version numbers, as well as removing overlapping old Regions, it ensures that the Region information in the cache is up-to-date and valid.


Benchmark test result (count = 20)
InsertRegionToCache tests the case in which the key range and epoch change and delay appear. InsertRegionToCache2 tests the case in which the key range does not change.
master compares this PR

name                    old time/op    new time/op    delta
InsertRegionToCache-8     1.44µs ± 1%    1.34µs ± 1%   -7.24%  (p=0.000 n=17+18)
InsertRegionToCache2-8    1.13µs ±11%    1.12µs ± 0%     ~     (p=0.567 n=18+18)

name                    old alloc/op   new alloc/op   delta
InsertRegionToCache-8       653B ± 1%      765B ± 0%  +17.21%  (p=0.000 n=19+20)
InsertRegionToCache2-8      464B ± 0%      472B ± 0%   +1.72%  (p=0.000 n=20+20)

name                    old allocs/op  new allocs/op  delta
InsertRegionToCache-8       26.0 ± 0%      26.0 ± 0%     ~     (all equal)
InsertRegionToCache2-8      23.0 ± 0%      24.0 ± 0%   +4.35%  (p=0.000 n=20+20)

We can see that the results of the two tests are significantly different. There is no problem in terms of speed, mainly in terms of memory allocation.
To find out why this PR allocates more bytes, I remove the code that checks the intersected region(in sorted_btree.go).

And master compares it.

name                    old time/op    new time/op    delta
InsertRegionToCache-8     1.44µs ± 1%    1.38µs ± 0%   -4.10%  (p=0.000 n=17+16)
InsertRegionToCache2-8    1.13µs ±11%    1.12µs ± 2%     ~     (p=0.625 n=18+19)

name                    old alloc/op   new alloc/op   delta
InsertRegionToCache-8       653B ± 1%      563B ± 0%  -13.74%  (p=0.000 n=19+20)
InsertRegionToCache2-8      464B ± 0%      472B ± 0%   +1.72%  (p=0.000 n=20+20)

name                    old allocs/op  new allocs/op  delta
InsertRegionToCache-8       26.0 ± 0%      26.0 ± 0%     ~     (all equal)
InsertRegionToCache2-8      23.0 ± 0%      24.0 ± 0%   +4.35%  (p=0.000 n=20+20)

From this, we can see that the additional overhead is mainly due to checking intersecting regions. IMO, this is reasonable.

Signed-off-by: Cabinfever_B <[email protected]>
Signed-off-by: Cabinfever_B <[email protected]>
Signed-off-by: Cabinfever_B <[email protected]>
Signed-off-by: Cabinfever_B <[email protected]>
// The second case is that the region may be obtained from the PD follower,
// and there is the synchronization time between the pd follower and the leader.
// So we should check the epoch.
if ok && (latest.GetVer() > newVer.GetVer() || latest.GetConfVer() > newVer.GetConfVer()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The naming is confused, which one is newer?

@@ -55,6 +55,7 @@ import (
"github.com/tikv/client-go/v2/internal/mockstore/mocktikv"
"github.com/tikv/client-go/v2/kv"
pd "github.com/tikv/pd/client"
uatomic "go.uber.org/atomic"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uatomic?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because sync/atomic is also imported.

Signed-off-by: Cabinfever_B <[email protected]>
Signed-off-by: Cabinfever_B <[email protected]>
return true
}
// Also check the intersecting regions.
intersectedRegions, stale := mu.sorted.removeIntersecting(cachedRegion, &newVer)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the region is not stale, will the intersecting regions be removed first?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but we will use lock first. So it seems no affect.

Signed-off-by: Cabinfever_B <[email protected]>
Signed-off-by: Cabinfever_B <[email protected]>
@CabinfeverB
Copy link
Member Author

cc @disksing

internal/locate/region_cache.go Outdated Show resolved Hide resolved
internal/locate/sorted_btree.go Outdated Show resolved Hide resolved
internal/locate/region_cache.go Show resolved Hide resolved
Signed-off-by: Cabinfever_B <[email protected]>

address comment

Signed-off-by: Cabinfever_B <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants