Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net/http: TestTransportMaxPerHostIdleConns failures #57476

Closed
gopherbot opened this issue Dec 27, 2022 · 20 comments
Closed

net/http: TestTransportMaxPerHostIdleConns failures #57476

gopherbot opened this issue Dec 27, 2022 · 20 comments
Labels
NeedsFix The path to resolution is known, but the work has not been done. Testing An issue that has been verified to require only test changes, not just a test failure.
Milestone

Comments

@gopherbot
Copy link
Contributor

#!watchflakes
post <- pkg == "net/http" && test == "TestTransportMaxPerHostIdleConns"

Issue created automatically to collect these failures.

Example (log):

--- FAIL: TestTransportMaxPerHostIdleConns (0.00s)
    --- FAIL: TestTransportMaxPerHostIdleConns/h1 (0.52s)
        transport_test.go:552: after second response, idle conns = 1; want 2
1 second passes in backend, proxygone= false
1 second passes in backend, proxygone= false
1 second passes in backend, proxygone= false
1 second passes in backend, proxygone= false
1 second passes in backend, proxygone= false
1 second passes in backend, proxygone= false
1 second passes in backend, proxygone= false
1 second passes in backend, proxygone= false

watchflakes

@gopherbot gopherbot added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Dec 27, 2022
@gopherbot
Copy link
Contributor Author

Found new dashboard test flakes for:

#!watchflakes
post <- pkg == "net/http" && test == "TestTransportMaxPerHostIdleConns"
2022-12-21 20:18 freebsd-arm-paulzhol go@fadd77c0 net/http.TestTransportMaxPerHostIdleConns (log)
--- FAIL: TestTransportMaxPerHostIdleConns (0.00s)
    --- FAIL: TestTransportMaxPerHostIdleConns/h1 (0.52s)
        transport_test.go:552: after second response, idle conns = 1; want 2
1 second passes in backend, proxygone= false
1 second passes in backend, proxygone= false
1 second passes in backend, proxygone= false
1 second passes in backend, proxygone= false
1 second passes in backend, proxygone= false
1 second passes in backend, proxygone= false
1 second passes in backend, proxygone= false
1 second passes in backend, proxygone= false

watchflakes

@gopherbot
Copy link
Contributor Author

Found new dashboard test flakes for:

#!watchflakes
post <- pkg == "net/http" && test == "TestTransportMaxPerHostIdleConns"
2022-12-01 22:04 freebsd-arm-paulzhol go@6a70292d net/http.TestTransportMaxPerHostIdleConns (log)
--- FAIL: TestTransportMaxPerHostIdleConns (0.00s)
    --- FAIL: TestTransportMaxPerHostIdleConns/h1 (0.81s)
        transport_test.go:552: after second response, idle conns = 1; want 2
2022-12-02 16:29 freebsd-arm-paulzhol go@1711f953 net/http.TestTransportMaxPerHostIdleConns (log)
--- FAIL: TestTransportMaxPerHostIdleConns (0.00s)
    --- FAIL: TestTransportMaxPerHostIdleConns/h1 (0.08s)
        transport_test.go:552: after second response, idle conns = 1; want 2

watchflakes

@bcmills
Copy link
Contributor

bcmills commented Jan 6, 2023

Only observed on freebsd-arm-paulzhol so far, and three times.

@paulzhol, any ideas?

(CC @neild)

@paulzhol
Copy link
Member

paulzhol commented Jan 8, 2023

I think it is related to the TestTransportMaxIdleConns failures. I tried looking at some of the log outputs here: #56960 (comment).
Couldn't find anything concrete, but I had a hunch that concurrent tests (maybe scheduled like they do on the slow builder) could be re-using ports (maybe even properly closed sockets but which are still in TIME_WAIT state in the kernel), so connecting to them with ends up with an RST reply.

@bcmills
Copy link
Contributor

bcmills commented Jan 10, 2023

In these failures, the observed number of idle connections always seems to undershoot the expected number.

That leads me to suspect that the idle connections are being returned asynchronously, and for some reason they're a little too slow on this builder to be visible in the pool.

@gopherbot
Copy link
Contributor Author

Found new dashboard test flakes for:

#!watchflakes
post <- pkg == "net/http" && test == "TestTransportMaxPerHostIdleConns"
2023-01-10 17:59 freebsd-arm-paulzhol go@82f09b75 net/http.TestTransportMaxPerHostIdleConns (log)
--- FAIL: TestTransportMaxPerHostIdleConns (0.00s)
    --- FAIL: TestTransportMaxPerHostIdleConns/h1 (0.32s)
        transport_test.go:538: after first response, expected 1 idle conn cache keys; got 0
1 second passes in backend, proxygone= false
1 second passes in backend, proxygone= false
1 second passes in backend, proxygone= false
1 second passes in backend, proxygone= false

watchflakes

@gopherbot
Copy link
Contributor Author

Found new dashboard test flakes for:

#!watchflakes
post <- pkg == "net/http" && test == "TestTransportMaxPerHostIdleConns"
2023-01-19 20:46 darwin-amd64-13 go@24a9d7bc net/http.TestTransportMaxPerHostIdleConns (log)
--- FAIL: TestTransportMaxPerHostIdleConns (0.00s)
    --- FAIL: TestTransportMaxPerHostIdleConns/h1 (0.08s)
        transport_test.go:538: after first response, expected 1 idle conn cache keys; got 0
2023-01-23 15:51 freebsd-arm-paulzhol go@e22bd234 net/http.TestTransportMaxPerHostIdleConns (log)
--- FAIL: TestTransportMaxPerHostIdleConns (0.00s)
    --- FAIL: TestTransportMaxPerHostIdleConns/h1 (0.06s)
        transport_test.go:552: after second response, idle conns = 1; want 2

watchflakes

@gopherbot
Copy link
Contributor Author

Found new dashboard test flakes for:

#!watchflakes
post <- pkg == "net/http" && test == "TestTransportMaxPerHostIdleConns"
2023-01-27 21:59 freebsd-arm-paulzhol go@b15297fc net/http.TestTransportMaxPerHostIdleConns (log)
--- FAIL: TestTransportMaxPerHostIdleConns (0.00s)
    --- FAIL: TestTransportMaxPerHostIdleConns/h1 (0.18s)
        transport_test.go:552: after second response, idle conns = 1; want 2
1 second passes in backend, proxygone= false
2023-01-31 13:55 freebsd-arm-paulzhol go@da0c375c net/http.TestTransportMaxPerHostIdleConns (log)
--- FAIL: TestTransportMaxPerHostIdleConns (0.00s)
    --- FAIL: TestTransportMaxPerHostIdleConns/h1 (0.22s)
        transport_test.go:538: after first response, expected 1 idle conn cache keys; got 0
2023-02-01 19:47 linux-amd64-race go@ab0f0459 net/http.TestTransportMaxPerHostIdleConns (log)
--- FAIL: TestTransportMaxPerHostIdleConns (0.00s)

watchflakes

@gopherbot
Copy link
Contributor Author

Found new dashboard test flakes for:

#!watchflakes
post <- pkg == "net/http" && test == "TestTransportMaxPerHostIdleConns"
2023-02-06 20:39 darwin-amd64-11_0 go@103f3749 net/http.TestTransportMaxPerHostIdleConns (log)
--- FAIL: TestTransportMaxPerHostIdleConns (0.00s)
    --- FAIL: TestTransportMaxPerHostIdleConns/h1 (0.08s)
        transport_test.go:552: after second response, idle conns = 1; want 2
2023-02-10 18:59 darwin-amd64-13 go@ae2f1201 net/http.TestTransportMaxPerHostIdleConns (log)
--- FAIL: TestTransportMaxPerHostIdleConns (0.00s)
    --- FAIL: TestTransportMaxPerHostIdleConns/h1 (0.07s)
        transport_test.go:538: after first response, expected 1 idle conn cache keys; got 0

watchflakes

@gopherbot
Copy link
Contributor Author

Found new dashboard test flakes for:

#!watchflakes
post <- pkg == "net/http" && test == "TestTransportMaxPerHostIdleConns"
2023-03-09 20:32 freebsd-arm-paulzhol go@7042ea62 net/http.TestTransportMaxPerHostIdleConns (log)
--- FAIL: TestTransportMaxPerHostIdleConns (0.00s)
    --- FAIL: TestTransportMaxPerHostIdleConns/h1 (0.11s)
        transport_test.go:538: after first response, expected 1 idle conn cache keys; got 0
1 second passes in backend, proxygone= false
1 second passes in backend, proxygone= false

watchflakes

@gopherbot
Copy link
Contributor Author

Found new dashboard test flakes for:

#!watchflakes
post <- pkg == "net/http" && test == "TestTransportMaxPerHostIdleConns"
2023-03-14 03:09 darwin-amd64-13 go@8b3dc539 net/http.TestTransportMaxPerHostIdleConns (log)
--- FAIL: TestTransportMaxPerHostIdleConns (0.00s)
    --- FAIL: TestTransportMaxPerHostIdleConns/h1 (0.08s)
        transport_test.go:538: after first response, expected 1 idle conn cache keys; got 0

watchflakes

@bcmills
Copy link
Contributor

bcmills commented Mar 14, 2023

I had a hunch that concurrent tests (maybe scheduled like they do on the slow builder) could be re-using ports (maybe even properly closed sockets but which are still in TIME_WAIT state in the kernel)

If so, that's arguably a kernel bug: since these are loopback connections, the kernel shouldn't allow the port to be reused until it is done sending RST packets to that port. (And it has enough information to know that!)

@gopherbot
Copy link
Contributor Author

Found new dashboard test flakes for:

#!watchflakes
post <- pkg == "net/http" && test == "TestTransportMaxPerHostIdleConns"
2023-03-17 04:44 freebsd-arm-paulzhol go@3360be4a net/http.TestTransportMaxPerHostIdleConns (log)
--- FAIL: TestTransportMaxPerHostIdleConns (0.00s)
    --- FAIL: TestTransportMaxPerHostIdleConns/h1 (0.68s)
        transport_test.go:538: after first response, expected 1 idle conn cache keys; got 0

watchflakes

@paulzhol
Copy link
Member

I had a hunch that concurrent tests (maybe scheduled like they do on the slow builder) could be re-using ports (maybe even properly closed sockets but which are still in TIME_WAIT state in the kernel)

If so, that's arguably a kernel bug: since these are loopback connections, the kernel shouldn't allow the port to be reused until it is done sending RST packets to that port. (And it has enough information to know that!)

But we're always setting SO_REUSEADDR in setDefaultListenerSockopts() for TCP Listeners. This allows them to rebind to a still TIME_WAIT port.
The test will then try to Dial a new connection to that port, which the kernel will drop with RST if the 5 tuple matches.
The source pairs don't look like they are coming from the randomized ephemeral range at all in these logs:
#56960 (comment)

@bcmills
Copy link
Contributor

bcmills commented Mar 20, 2023

The test will then try to Dial a new connection to that port, which the kernel will drop with RST if the 5 tuple matches.

The test may be assuming RFC 6191 semantics, which should cause the kernel to accept all incoming self-connections regardless of TIME-WAIT (since the kernel can always arrange for its own timestamps and/or sequence numbers to satisfy the requirements for accepting a SYN segment on a connection in the TIME-WAIT state).

Does the FreeBSD network stack implement RFC 6191 semantics?

@paulzhol
Copy link
Member

The test may be assuming RFC 6191 semantics, which should cause the kernel to accept all incoming self-connections regardless of TIME-WAIT (since the kernel can always arrange for its own timestamps and/or sequence numbers to satisfy the requirements for accepting a SYN segment on a connection in the TIME-WAIT state).

Does the FreeBSD network stack implement RFC 6191 semantics?

I can't tell for sure, it looks disabled?

If we are looking for something along the lines of tcp_tw_reuse and tcp_tw_recycle (disabled after kernel v4.11) sysctls in Linux, I think we have nolocaltimewait in FreeBSD:

Suppress creating of compressed TCP TIME_WAIT states for connections in which both endpoints are local.

Alternatively I can try to disable syncookies which are supposed to interfere with the basic PAWS timestamps mechanism:

Syncookies have a certain number of disadvantages that a paranoid admin-
istrator may wish to take note of. Since the TCP options from the ini-
tial SYN are not saved, they are not applied to the connection, preclud-
ing use of features like window scale, timestamps, or exact MSS sizing.

@paulzhol
Copy link
Member

paulzhol commented Apr 1, 2023

I've started running the builder with net.inet.tcp.nolocaltimewait=1 since 2023-04-01 16:53:54.926870786 +0000 UTC.

@gopherbot
Copy link
Contributor Author

Change https://go.dev/cl/483116 mentions this issue: net/http: wait forever for write results in tests

@gopherbot
Copy link
Contributor Author

Found new dashboard test flakes for:

#!watchflakes
post <- pkg == "net/http" && test == "TestTransportMaxPerHostIdleConns"
2023-04-07 11:34 freebsd-arm-paulzhol go@949fdd9f net/http.TestTransportMaxPerHostIdleConns (log)
--- FAIL: TestTransportMaxPerHostIdleConns (0.00s)
    --- FAIL: TestTransportMaxPerHostIdleConns/h1 (0.48s)
        transport_test.go:554: after second response, idle conns = 1; want 2
1 second passes in backend, proxygone= false
1 second passes in backend, proxygone= false

watchflakes

@dmitshur dmitshur added this to the Go1.21 milestone May 24, 2023
@dmitshur dmitshur removed the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label May 24, 2023
@dmitshur dmitshur added Testing An issue that has been verified to require only test changes, not just a test failure. NeedsFix The path to resolution is known, but the work has not been done. labels May 24, 2023
@gopherbot gopherbot reopened this Mar 25, 2024
@gopherbot
Copy link
Contributor Author

Found new dashboard test flakes for:

#!watchflakes
post <- pkg == "net/http" && test == "TestTransportMaxPerHostIdleConns"
2024-03-21 22:14 linux-s390x-ibm-race go@4f0408a3 net/http.TestTransportMaxPerHostIdleConns (log)
--- FAIL: TestTransportMaxPerHostIdleConns (0.00s)
    testing.go:1413: race detected during execution of test

watchflakes

@dmitshur dmitshur removed this from Test Flakes May 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsFix The path to resolution is known, but the work has not been done. Testing An issue that has been verified to require only test changes, not just a test failure.
Projects
None yet
Development

No branches or pull requests

4 participants