-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Library unsafe to use without manual locking #83
Comments
The documentation states that concurrent calls to WriteMessage are not supported. Because the websocket protocol does not support interleaved messages, concurrent write support will require message buffering, blocking or forced finalization of messages written by other goroutines. Any of these choices will result in situations that are difficult to debug. A benefit of the current design is that application errors are found easily with the race detector. |
Never seen that bit, yes its there but we where working off the API docs, which while it is mentioned in NextWriter, WriteMessage doesn't which is where all our issues stemmed from. If all relevant methods cross referenced that section then it would be much clearer e.g. Its very easy to get caught out when doing even simple tasks like single goroutine writer + pinger to keep everything alive. While debugging our issue I found situations in the internals where the write mutex is totally defeated resulting in a very nasty mess that wasn't found by the race detector :( |
For reference our issue was that the internal code was corrupting seemingly individual and none concurrent WriteMessage calls. The reason for this is that reads from buffered channels (channel.mu) don't guarantee FIFO and hence messages where not getting fully processed in order of send. This resulted in the code trying to send messages from the buffer which had been overwritten by subsequent messages. This was due to the WriteMessage concurrency but it was very hard to spot and none obvious due to WriteMessage being a single call and on our case only triggering when a "ping" happened to correspond with data being sent. |
The channel c.mu is used to coordinate concurrent control frame writers and a single message writer. The channel is not designed to coordinate concurrent message writers. Are you saying that that c.mu failed to protect a concurrent call to It's interesting to hear that the race detector was not helpful. I plan to to test the race detector myself. If the race detector is not helpful, then I'll consider the blocking option. |
That's unclear ATM as we where using WriteMessage for websocket.PingMessage as per the examples. |
One thing that is clear however is that the use of c.mu as a gate into c.conn.Write does create potential starvation due to the fact c.mu is buffered and buffered channels are not fair to consumers meaning you could have a control message block until timeout when it could easily have been sent. We saw this unfair behaviour with competing WriteMessage calls where pending messages where ignored with new messages sent in preference, reordering messages by as much as 20 messages. Unbuffered channels in contrast are fair. |
Starvation can occur in cases where the application is close to saturating the network. Please submit a separate issue if this is a concern for your application. An immediate workaround is to coordinate writes through a single goroutine. Concurrent calls to WirteMessage will corrupt connection state. You may be observing the result of this corruption and not starvation.
Unbuffered channels cannot be used to implement a mutex. |
The documentation in this package follows the common Go practice of documenting where concurrency is allowed. The default assumption in Go is that concurrency is not allowed. I wrote a couple of tests and found that the race detector does detect unsynchronized calls to the write methods. I was concerned that internal synchronization for fames might mask races, but it does not. I have no plans to add locking to the methods used for writing messages (SetWriteDeadline, NextWriter, WriteMessage, WriteJSON). Applications typically call more than one of these methods to write a message:
Because the application should already be doing something to synchronize the sequence of method calls, there's not much value in adding locking to the individual methods. The additional locking has the drawback that it will mask errors that are currently found by the race detector. |
While NextReader and NextWriter state "The Next(Reader|Writer) method and the (readers|writers) returned from the method cannot be accessed by more than one goroutine at a time." other more commonly used methods such a WriteMessage etc don't so its easy to fall into the trap of using this library in a way which randomly breaks itself.
Ideally it should be updated to not have this issue, which from reading through the code all steams from the internal buffering.
In the mean time enhancing the documentation for each write / read method to be very explicit that they aren't safe would prevent others from going through the same pain we did, debugging an issue with our code for many hours only to find that WriteMessage is unsafe without external serialisation locking.
The text was updated successfully, but these errors were encountered: