-
Notifications
You must be signed in to change notification settings - Fork 297
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Close cannot be called in the middle of a frame read by Read #355
Comments
@kylecarbs reached out to me on email about this and I responded there. I think you just need to go over the docs again. Can you point to the code where you think the race condition is? As far as I can tell, everything is properly synchronized and there are no race conditions. I've tested the library extensively under the race detector. |
The race condition works like this: One goroutine calls
The way the code is structured, when you call A second goroutine calls A. Write the close header to the socket. Both Step 2 from The golang race detector won't help with this sort of problem because it only looks for unsynchronized memory access. That's not the problem here --- mutexes ensure only one goroutine is reading. The problem here is that |
As far as I can tell, The code is at https://github.com/nhooyr/websocket/blob/14fb98eba64eeb5e9d06a88b98c47ae924ac82b4/close_notjs.go#L91 I do understand what you're getting at but you wouldn't get a data corruption error. You'll just get a Once waitCloseHandshake has the read mutex locked, it doesn't let go until the corresponding close frame is received. It discards all further read payload bytes until it gets the close frame. That's what this loop here is doing: https://github.com/nhooyr/websocket/blob/14fb98eba64eeb5e9d06a88b98c47ae924ac82b4/close_notjs.go#L107 After all, once you've called |
Then it starts checking the data it read for reserved bits And checking the opcode: If the next data on the buffer was not a header, then you get these protocol errors returned to |
It sounds like you think that calling |
Ah fair enough I see the problem. Yes this is definitely a bug in need of fixing. Thank you for identifying it and sorry for dismissing you earlier. |
I do feel the need to defend why I was confused. This isn't a race. You can produce it by simply calling Reader, reading a single byte of a frame. And then calling Close(). That's supposed to be completely legal e.g. if you read malformed payload bytes and want to close the connection now but will produce the same issue with Close trying to read the payload bytes as a WebSocket frame. |
We've recently switch to this library as well in go-libp2p. Thank you for your work on this, @nhooyr! We're currently running into the same error, Is there any way we can help get this fixed soon, or do you have a timeline for a fix? |
I don't think @marten-seemann |
I think your issue is unrelated like @spikecurtis noted. If you were encountering this bug, you wouldn't see Close blocking, it would error out quickly. If it's taking 5s, then the connection is either broken somehow, the peer hasn't yet read the close frame because you sent a really large message first and that message is taking longer than 5s to read or lastly you're reading a really large frame from the peer and it's taking longer than 5s and so you're not getting to the close frame. All of which are really only possible on a slow connection. If you're able to reproduce on a local setup, please provide it in a new issue and I can debug. |
@nhooyr I managed to condense it down to a minimal test case that reliably reproduces the bug. There's no read and write calls at all, just the server closing the connection right after accepting it. func TestWebsocketImmediateClose(t *testing.T) {
closed := make(chan struct{})
mux := http.NewServeMux()
mux.HandleFunc("/", func(w http.ResponseWriter, r *http.Request) {
defer close(closed)
c, err := ws.Accept(w, r, &ws.AcceptOptions{InsecureSkipVerify: true})
if err != nil {
return // The upgrader writes a response for us.
}
start := time.Now()
c.Close(ws.StatusUnsupportedData, "bye")
if took := time.Since(start); took > 2*time.Second {
t.Fatalf("closing took too long: %s", took)
}
})
server := &http.Server{Handler: mux, Addr: "localhost:8080"}
done := make(chan struct{})
go func() {
defer close(done)
server.ListenAndServe()
}()
time.Sleep(25 * time.Millisecond) // give the server some time to boot
if _, _, err := ws.Dial(context.Background(), "ws://localhost:8080", nil); err != nil {
t.Fatal(err)
}
<-closed // wait until the HTTP handler has returned
// test shutdown
server.Close()
<-done
} I'm not very familiar with the code base (nor with the WebSocket protocol itself), but here's where the EDIT: feel free to move to a new issue if you think this is a different bug. |
The WebSocket RFC mandates it as part of the close handshake. Though I'm now thinking I'll remove it in a future version. Anyway it's not a bug here, you're not reading or closing the dialed connection and so the close handshake doesn't occur. |
See docs on |
I’m not sure I understand. The server needs to be able to close the connection without hanging, no matter what the client does. Otherwise, this is an easy attack vector. As a user, after calling Close, I don’t really care if the client acknowledges it or not. All I care is that I can move on with other work (and don’t have to wait for 5s). |
Can you clarify how this is an attack vector/performance issue for you? It's a 5s wait and obeys the set read limit. I can't change this behaviour in v1 as it's possible others rely on But you can fork the library and comment out this line: https://github.com/nhooyr/websocket/blob/14fb98eba64eeb5e9d06a88b98c47ae924ac82b4/close_notjs.go#L38 |
And please stop responding here, open a new issue. |
Sorry for spamming this issue, I opened #384 and will reply there. |
Probably the cause of #391 |
Fixed by 6cec2ca |
Thanks for reporting @spikecurtis |
Problem: If I have an application that waits for data over the websocket, how do I cleanly shut down the websocket? Here, cleanly means:
Our first attempt at this calls
(*Conn).Read(ctx)
in a loop on one goroutine, and then when it was time to shut down, calls(*Conn).Close()
from a different goroutine.However, this would often result in close errors reading the response packet. Investigating, I found that calling
Close()
concurrently withRead()
from different goroutines exposed a race condition whereRead()
would read the frame header, and thenClose()
would read the packet payload as if it were a header, and throw an error.Would you consider this a bug? That is to say, are you meant to be able to call
Read()
andClose()
from different goroutines concurrently?My next attempt cancels the
Context
passed toRead()
, and then callsClose()
from the same goroutine afterRead()
returns. There are a couple problems with this approach. Firstly, canceling theRead()
context seems to trigger an opClose with status PolicyViolation. Ideally I'd be sending a normal status on disconnect. Secondly,Close()
returns a couple different error messages that seem to depend on some races in the library, including "already wrote close" and "WebSocket closed", so it's really hard to verify that the close handshake completed correctly.The text was updated successfully, but these errors were encountered: