-
Notifications
You must be signed in to change notification settings - Fork 949
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
enhance: add retry helper method for Trylock #2319
enhance: add retry helper method for Trylock #2319
Conversation
Codecov Report
@@ Coverage Diff @@
## master #2319 +/- ##
==========================================
+ Coverage 67.16% 67.45% +0.28%
==========================================
Files 215 217 +2
Lines 17574 17663 +89
==========================================
+ Hits 11804 11915 +111
+ Misses 4360 4350 -10
+ Partials 1410 1398 -12
|
@@ -55,7 +56,7 @@ func (c *Client) ContainerStats(ctx context.Context, id string) (*containerdtype | |||
|
|||
// containerStats returns stats of the container. | |||
func (c *Client) containerStats(ctx context.Context, id string) (*containerdtypes.Metric, error) { | |||
if !c.lock.Trylock(id) { | |||
if !c.lock.TrylockWithTimeout(id, defaultTrylockTimeout) { | |||
return nil, errtypes.ErrLockfailed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This part would return an error of
failed: rpc error: code = Unknown desc =
failed to stop container "c69994fce9611e75ca826c644edb8f4dbc32a2592afb66dd34e1d4a2d9a76c0e":
failed to destroy container c69994fce9611e75ca826c644edb8f4dbc32a2592afb66dd34e1d4a2d9a76c0e:
lock failed"
While I think some users of PouchContainer feel a little bit unclear about the error message, could we add code like
// LockFailedError is constructing a much more readable lock failed error.
func LockFailedError(containerID string) error {
return errors.Wrapf(errtypes.ErrLockfailed, "container %s is accessed by other request and please try again", containerID)
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the error will be disappeared after this PR 😢
LGTM until some minor request updated. |
ctrd/container_lock.go
Outdated
func (l *containerLock) TrylockWithTimeout(id string, t time.Duration) bool { | ||
var retry = 32 | ||
|
||
lctx, cancel := context.WithTimeout(context.TODO(), t) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a brand new context? rather than an inherited context?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about using two timer?
func (l *containerLock) TrylockWithTimeout(id string, t time.Duration) bool {
var retry = 32
timeoutTimer := time.NewTimer(t)
for {
ok := l.Trylock(id)
if ok {
timeoutTimer.Stop()
return true
}
// sleep random duration by retry
select {
case <-time.After(time.Millisecond * time.Duration(rand.Intn(retry))):
if retry < 2048 {
retry = retry << 1
}
continue
case <-timeoutTimer.C:
return false
}
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think both are the same.
defer l.Unlock(id) | ||
|
||
var ( | ||
releaseCh = make(chan struct{}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the usage of this variable releaseCh
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The releaseCh
variable try to promise that TrylockWithRetry
will execute before the sleep and unlock function in the main go routine. Otherwise, unlock is done, TrylockWithRetry
will success immediately.
The ctrd maybe hold the lock for a while. In the concurrent request, it might fail to serve request frequently. For this case, pouchcontainer should retry in the specific duration to reduce the number of failed request. Signed-off-by: Wei Fu <[email protected]>
@allencloud @allencloud I used context as timeout. If the request doesn't set the timeout, it will retry like spin-lock. |
next PR I will remove the TryLock.... |
LGTM |
Signed-off-by: Wei Fu [email protected]
Ⅰ. Describe what this PR did
The ctrd maybe hold the lock for a while. In the concurrent request, it
might fail to serve request frequently. For this case, pouchcontainer
should retry in the specific duration to reduce the number of failed
request.
Ⅱ. Does this pull request fix one issue?
fix #2281
Ⅲ. Why don't you add test cases (unit test/integration test)? (你真的觉得不需要加测试吗?)
Added
Ⅳ. Describe how to verify it
Check case
Ⅴ. Special notes for reviews
5 seconds is enough? or use request context?