-
Notifications
You must be signed in to change notification settings - Fork 949
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: add lock timeout to reduce lock failure cases #2287
Conversation
Codecov Report
@@ Coverage Diff @@
## master #2287 +/- ##
==========================================
- Coverage 66.97% 66.93% -0.05%
==========================================
Files 211 210 -1
Lines 17306 17298 -8
==========================================
- Hits 11591 11578 -13
- Misses 4312 4313 +1
- Partials 1403 1407 +4
|
1e822b7
to
f8d49a0
Compare
if ok && len(v.c) == 0 { | ||
// filled up the chan to identify caller has released the lock. | ||
v.c <- struct{}{} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it better that free unused lockedKeys in this Unlock
method than using a extra go routine?
https://github.com/alibaba/pouch/blob/f56461421b3a2148db94d825a5e715fb156e2972/pkg/kmutex/kmutex.go#L41-L52
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually this is a legacy function which is designed by @skoo87 .
Could you help explain a little bit more about the goroutine usage here?
ctrd/utils_test.go
Outdated
@@ -5,7 +5,6 @@ import ( | |||
"testing" | |||
|
|||
"github.com/alibaba/pouch/pkg/errtypes" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please add blank line here
Signed-off-by: Allen Sun <[email protected]>
return false | ||
} | ||
} | ||
|
||
// LockWithTimeout trys to lock, if can't acquire the lock, will block util timeout. | ||
// LockWithTimeout tries to lock. | ||
// It can't acquire the lock, will block util timeout. | ||
func (m *KMutex) LockWithTimeout(k string, to time.Duration) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/LockWithTimeout/TrylockWithTimeout
select { | ||
case <-v.c: | ||
// the locker has released the lock. | ||
return true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this line, we think there is a bug there.
If this goroutine gets the lock and returns true, there is a very tiny possibility that the GC function would delete the corresponding locked key.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree. These lock functions, such as Trylock
, LockWithTimeout
and Lock
all have same problem that they didn't not get the lock after <-v.c
.
select { | ||
case <-v.c: | ||
// the locker has released the lock. | ||
return true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this line, we think there is a bug there.
If this goroutine gets the lock and returns true, there is a very tiny possibility that the GC function would delete the corresponding locked key.
Currently block this pull request. For the issue #2281 , we will try to make the code :
to be:
|
select { | ||
case <-v.c: | ||
// the locker has released the lock. | ||
return true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree. These lock functions, such as Trylock
, LockWithTimeout
and Lock
all have same problem that they didn't not get the lock after <-v.c
.
ping @allencloud |
this issue has been fixed by #2319 |
Signed-off-by: Allen Sun [email protected]
Ⅰ. Describe what this PR did
failed to xxx container xxxx: lock failed
is a normal error when pouchd meets lots of concurrent request for the same container object, such ascontainer
.In the original implementation, ctrd will try to get the lock for one container to continue the request's operation. While if the container's access(lock) is owned by others, then the request will not be blocked and will return a lock failure instead. This action has benefit and shortcomings as well. The benefit is that it could increase the real time of request even if it fails. The drawback is that it would depends on the client callers to retry the request.
To avoid this, we could take other ways like:
This pull request uses the second way to make it.
Besides the logic update for timeout(default is 5 seconds). I also changed the following things:
container xxx is accessed by other request and please try again: lock failed
;Ⅱ. Does this pull request fix one issue?
fix #2281
Ⅲ. Why don't you add test cases (unit test/integration test)? (你真的觉得不需要加测试吗?)
no need for this, I added more test cases for the common package knutex.
Ⅳ. Describe how to verify it
none
Ⅴ. Special notes for reviews
none