-
Notifications
You must be signed in to change notification settings - Fork 436
Wasted layer of buffering and inneficciency #149
Comments
If you don't want |
I think it's possible to avoid this buffer allocation and this copy by Receive pbufs in a loop one by one in this manner, dechaining a pbuf from a chain seems quite easy, there is pbuf_dechain function provided by lwip. |
That would help remove the extra 2KiB per connection. However, each Receive call still blocks on a Read call, so we would be limiting the Reads to one pbuf, when we could have the application read as much as it can in one go if we switch from push to pull. One thing I don't understand is how the stream or packet is split into pbufs. Is there a max pbuf size? Is it a configuration option? |
To give you some more context about the problem we are having: the Outline client uses go-tun2socks to send traffic to a Shadowsocks server. Shadowsocks puts data in encrypted frames. We can accumulated as much as 16KiB in a frame, but because we don't know whether the next Read will block, we are forced to emit a frame for every Read. Ideally, I'd give tun2socks my 16KiB buffer and it would fill it with as much available data as possible, possibly from multiple packets. The Shadowsocks overhead is not a lot: 34 bytes per frame. But assuming a 1500 MTU - TCP/IP overhead = 1420 we are talking about sending 2.5% more bytes for every byte sent. Not bad, but that's the minimum overhead and it can be an issue if we have small pbufs. For example, that's 25% if we are reading 1.4KiB pbufs. |
This PR would solve the blocking issue, and if tcpRecvFn is called fast enough (cgo is slow), the application would get as much data as possible. But a buffer is introduced. It's not clear to me how we could solve both problems at the same time.
FWIW, it depends on how the pbuf is allocated and the input IP packet size, the tcpRecvFn callback function is mainly driven by packet input. |
I see three issues here:
I implemented your idea above (If I'm understanding it right) 425fb2e and it should solve 1 and 2, without intermediate buffering, I don't see any chance to solve 3. |
This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 7 days |
tcpRecvFn only reads one packet at a time from the pbuf (p.tot_len), passing it to Receive, which blocks on the application call to conn.Read. It gets a buffer from the pool if the packet spans multiple pbuf entries, allocating a new buffer if the packet is larger than 2KiB.
Problem 1) The buffer is held for the duration of the Read, which in practice means all the time. That buffering layer is actually unnecessary.
Problem 2) Only one packet is read, even though there may be a lot more data available. That's a missed opportunity for higher efficiency
You can fix the problem by switching the data flow from push to pull and writing directly to the application-provided buffer:
tcpConn.sndPipeWriter
andtcpConn.sndPipeReader
with areadCh chan []byte
tcpRecvFn
wait for a read bufferreadBuf <- conn.readCh
conn.Read(buf)
, pass the buffer to the channel:conn.readCh <- buf
pbuf
into thereadBuf
tcp_recved
.There are possibly some concurrency nuances to handle closing and errors. The implementation of io.Pipe has some insights.
/cc: @bemasc @alalamav
The text was updated successfully, but these errors were encountered: