-
Notifications
You must be signed in to change notification settings - Fork 726
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PatrolKeyspaceAssignment got stuck in the middle of process for hours #6628
Comments
The problem seems to be that when it got stuck when trying to load keyspaces from the range [4008, 4127] |
[2023/06/16 22:52:11.305 +00:00] [INFO] [keyspace.go:704] ["SplitKeyspaceGroupByID in PatrolKeyspaceAssignment after LoadRangeKeyspace !!!!!!!!!!!!!!!!"] [current-start-id=3116] [load-keyspaces-len=120] [keyspace-num=120] [current-start-id=3116] [next-start-id=3236] [more-to-patrol=true] |
goroutine 214735 [semacquire]: It's blocked on manager.metaLock.Lock(ks.Id) below.
|
The keyspace ids aren't consecutive. /pd/7187976276065784319/keyspaces/meta/00000517 /pd/7187976276065784319/keyspaces/meta/00000601 /pd/7187976276065784319/keyspaces/meta/00000602 |
There are more holes in the keyspace range: /pd/7187976276065784319/keyspaces/meta/00000863 /pd/7187976276065784319/keyspaces/meta/00000864 /pd/7187976276065784319/keyspaces/meta/00000880 /pd/7187976276065784319/keyspaces/meta/00000899 /pd/7187976276065784319/keyspaces/meta/00000920 /pd/7187976276065784319/keyspaces/meta/00000921 |
The process stuck at MetaLock.Lock(ks-id=4520) which has the same last byte 0xA8 as the first keyspace id 4008 in this batch, because MetaLock.Lock(ks-id=4008) already acquired. [2023/06/16 23:54:08.240 +00:00] [INFO] [keyspace.go:704] ["SplitKeyspaceGroupByID in PatrolKeyspaceAssignment after LoadRangeKeyspace !!!!!!!!!!!!!!!!"] [current-start-id=4008] [load-keyspaces-len=120] [keyspace-num=120] [current-start-id=4008] [next-start-id=4836] [more-to-patrol=true] |
close #6628 Enhance LockGroup with RemoveEntryOnUnlock. Remove the lock of the given key from the lock group when unlock to keep minimal working set, which is suited for low qps, non-time-critical and non-consecutive large key space scenarios. One example of the last use case is that keyspace group split loads non-consecutive keyspace meta in batches and lock all loaded keyspace meta within a batch at the same time. Signed-off-by: Bin Shi <[email protected]>
…#6629) close tikv#6628 Enhance LockGroup with RemoveEntryOnUnlock. Remove the lock of the given key from the lock group when unlock to keep minimal working set, which is suited for low qps, non-time-critical and non-consecutive large key space scenarios. One example of the last use case is that keyspace group split loads non-consecutive keyspace meta in batches and lock all loaded keyspace meta within a batch at the same time. Signed-off-by: Bin Shi <[email protected]>
Enhancement Task
[2023/06/16 20:44:48.545 +00:00] [INFO] [tso_keyspace_group.go:172] ["enter into SplitKeyspaceGroupByID !!!!!!!!!!!!!!!!"]
[2023/06/16 20:44:48.545 +00:00] [INFO] [tso_keyspace_group.go:192] ["SplitKeyspaceGroupByID splitParams.Keyspaces !!!!!!!!!!!!!!!!"] [splitParams.Keyspaces="[4000]"]
[2023/06/16 20:44:48.545 +00:00] [INFO] [keyspace.go:673] ["SplitKeyspaceGroupByID after PatrolKeyspaceAssignment !!!!!!!!!!!!!!!!"]
[2023/06/16 20:44:48.545 +00:00] [INFO] [keyspace.go:676] ["SplitKeyspaceGroupByID in PatrolKeyspaceAssignment !!!!!!!!!!!!!!!!"] [next-start-id=0]
[2023/06/16 20:44:48.561 +00:00] [INFO] [keyspace.go:676] ["SplitKeyspaceGroupByID in PatrolKeyspaceAssignment !!!!!!!!!!!!!!!!"] [next-start-id=120]
[2023/06/16 20:44:48.572 +00:00] [INFO] [keyspace.go:676] ["SplitKeyspaceGroupByID in PatrolKeyspaceAssignment !!!!!!!!!!!!!!!!"] [next-start-id=240]
[2023/06/16 20:44:48.599 +00:00] [INFO] [keyspace.go:676] ["SplitKeyspaceGroupByID in PatrolKeyspaceAssignment !!!!!!!!!!!!!!!!"] [next-start-id=360]
...
[2023/06/16 20:44:48.843 +00:00] [INFO] [keyspace.go:676] ["SplitKeyspaceGroupByID in PatrolKeyspaceAssignment !!!!!!!!!!!!!!!!"] [next-start-id=3803]
[2023/06/16 20:44:48.853 +00:00] [INFO] [keyspace.go:676] ["SplitKeyspaceGroupByID in PatrolKeyspaceAssignment !!!!!!!!!!!!!!!!"] [next-start-id=4008]
There are more keyspaces to go, but it neither proceed nor exit from the assignment loop.
This might be related -- the keyspaces aren't consequtives.
Environment
dev env (dev-us-east-1-f01), namespace(tidb-serverless)
Root Cause Analysis
It's observed that it's blocked on manager.metaLock.Lock(keyspace.Id).
metaLock is created with syncutil.NewLockGroup(syncutil.WithHash(MaskKeyspaceID)),
// MaskKeyspaceID is used to hash the spaceID inside the lockGroup.
// A simple mask is applied to spaceID to use its last byte as map key,
// limiting the maximum map length to 256.
// Since keyspaceID is sequentially allocated, this can also reduce the chance
// of collision when comparing with random hashes.
func MaskKeyspaceID(id uint32) uint32 {
return id & 0xFF
}
Which means that if two keyspace ids in the loaded batch have the same last byte, it encounters deadlock. In our case, the keyspace ids loaded from keyspace meta storage isn't consecutive -- though the keyspace id is created sequentially, the stored keyspace meta might be expired or deleted as time goes.
There are holes in the keyspace range in keyspace meta space:
...
/pd/7187976276065784319/keyspaces/meta/00000862
/pd/7187976276065784319/keyspaces/meta/00000863
/pd/7187976276065784319/keyspaces/meta/00000864
/pd/7187976276065784319/keyspaces/meta/00000880
/pd/7187976276065784319/keyspaces/meta/00000899
/pd/7187976276065784319/keyspaces/meta/00000920
/pd/7187976276065784319/keyspaces/meta/00000921
...
In our case, the process stuck at MetaLock.Lock(ks-id=4520) which has the same last byte 0xA8 as the first keyspace id 4008 in this batch, because MetaLock.Lock(ks-id=4008) already acquired.
The text was updated successfully, but these errors were encountered: