Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The test named TestHostClientMaxConnWaitTimeoutError fails under certain conditions. #1830

Closed
newacorn opened this issue Aug 12, 2024 · 1 comment

Comments

@newacorn
Copy link
Contributor

func TestHostClientMaxConnWaitTimeoutSuccess(t *testing.T) {

Reproduce

❯ go test -run=TestHostClientMaxConnWaitTimeoutError -count=2000  -failfast  .
--- FAIL: TestHostClientMaxConnWaitTimeoutError (0.21s)
    client_test.go:3021: connsWait has 1 items remaining
FAIL
FAIL	github.com/valyala/fasthttp	24.859s
FAIL

Cause

fasthttp/client.go

Lines 1692 to 1704 in a1db411

if q := c.connsWait; q != nil && q.len() > 0 {
for q.len() > 0 {
w := q.popFront()
if w.waiting() {
go c.dialConnFor(w)
dialed = true
break
}
}
}
if !dialed {
c.connsCount--
}

fasthttp/client.go

Lines 1747 to 1758 in a1db411

if q := c.connsWait; q != nil && q.len() > 0 {
for q.len() > 0 {
w := q.popFront()
if w.waiting() {
delivered = w.tryDeliver(cc, nil)
break
}
}
}
if !delivered {
c.conns = append(c.conns, cc)
}

These two pieces of code assume that when w.waiting() returns true, we can definitely wake up a valid waiter, and then have that waiter continue to wake up the next one upon exiting.

However, there is a window of time where w.waiting() may return true, but w could time out and set the value of wantConn.err afterward, causing the wake-up chain to be interrupted.

The more serious issue is not whether c.connsWait.len() returns zero after all client calls return, but rather that there are errors in the length of c.conns and changes in the c.connsCount count. This can lead to a situation where, during a connsCount semaphore release, there are still valid waiters in c.connsWait.

Issue 1: The semaphore released for connsCount does not increase the length of c.conns nor does it decrement the c.connsCount count.
Issue 2: Even if the release is reflected in the length of c.conns or the calculation of c.connsCount, the problem lies in that it should have passed the semaphore but instead chose to release it.
Let's first address the connsCount semaphore issue.
Next, resolve the issue with the TestHostClientMaxConnWaitTimeoutError test case.

@erikdubbelboer
Copy link
Collaborator

Thanks for diving into this, and writing two pull requests to fix both issues!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants