Wasted layer of buffering and inneficciency #149

fortuna · 2020-11-25T17:32:54Z

tcpRecvFn only reads one packet at a time from the pbuf (p.tot_len), passing it to Receive, which blocks on the application call to conn.Read. It gets a buffer from the pool if the packet spans multiple pbuf entries, allocating a new buffer if the packet is larger than 2KiB.

Problem 1) The buffer is held for the duration of the Read, which in practice means all the time. That buffering layer is actually unnecessary.

Problem 2) Only one packet is read, even though there may be a lot more data available. That's a missed opportunity for higher efficiency

You can fix the problem by switching the data flow from push to pull and writing directly to the application-provided buffer:

Replace the tcpConn.sndPipeWriter and tcpConn.sndPipeReader with a readCh chan []byte
In tcpRecvFn wait for a read buffer readBuf <- conn.readCh
On conn.Read(buf), pass the buffer to the channel: conn.readCh <- buf
Copy as much data as possible from the pbuf into the readBuf
Notify the reader (done channel?) and call tcp_recved.

There are possibly some concurrency nuances to handle closing and errors. The implementation of io.Pipe has some insights.

/cc: @bemasc @alalamav

The text was updated successfully, but these errors were encountered:

fortuna · 2020-11-25T18:01:20Z

If you don't want tcpRecvFn to access private members of TCPConn, then you could implement a local Reader with the logic above, and pass that Reader to the newTCPConn constructor, to be used in place of tcpConn.sndPipeReader. You could then get rid of tcpConn.Receive.

eycorsican · 2020-11-26T14:55:38Z

I think it's possible to avoid this buffer allocation and this copy by Receive pbufs in a loop one by one in this manner, dechaining a pbuf from a chain seems quite easy, there is pbuf_dechain function provided by lwip.

fortuna · 2020-11-26T18:55:48Z

That would help remove the extra 2KiB per connection. However, each Receive call still blocks on a Read call, so we would be limiting the Reads to one pbuf, when we could have the application read as much as it can in one go if we switch from push to pull.

One thing I don't understand is how the stream or packet is split into pbufs. Is there a max pbuf size? Is it a configuration option?

fortuna · 2020-11-26T19:40:47Z

To give you some more context about the problem we are having: the Outline client uses go-tun2socks to send traffic to a Shadowsocks server. Shadowsocks puts data in encrypted frames. We can accumulated as much as 16KiB in a frame, but because we don't know whether the next Read will block, we are forced to emit a frame for every Read.

Ideally, I'd give tun2socks my 16KiB buffer and it would fill it with as much available data as possible, possibly from multiple packets.

The Shadowsocks overhead is not a lot: 34 bytes per frame. But assuming a 1500 MTU - TCP/IP overhead = 1420 we are talking about sending 2.5% more bytes for every byte sent. Not bad, but that's the minimum overhead and it can be an issue if we have small pbufs. For example, that's 25% if we are reading 1.4KiB pbufs.

eycorsican · 2020-11-26T22:49:59Z

This PR would solve the blocking issue, and if tcpRecvFn is called fast enough (cgo is slow), the application would get as much data as possible. But a buffer is introduced. It's not clear to me how we could solve both problems at the same time.

One thing I don't understand is how the stream or packet is split into pbufs. Is there a max pbuf size? Is it a configuration option?

FWIW, it depends on how the pbuf is allocated and the input IP packet size, the tcpRecvFn callback function is mainly driven by packet input.

eycorsican · 2020-11-27T22:04:05Z

I see three issues here:

Extra intermediate buffering
Application Read blocks lwIP
Application Reads almost always return small data chunks

I implemented your idea above (If I'm understanding it right) 425fb2e and it should solve 1 and 2, without intermediate buffering, I don't see any chance to solve 3.

github-actions · 2021-01-27T00:37:39Z

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 7 days

fortuna mentioned this issue Nov 25, 2020

Increase MTU size to 16KiB Jigsaw-Code/outline-go-tun2socks#67

Open

eycorsican mentioned this issue Dec 11, 2020

Remove intermediate buffering and solve the Read blocking issue #150

Draft

github-actions bot added the Stale label Jan 27, 2021

github-actions bot closed this as completed Feb 4, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wasted layer of buffering and inneficciency #149

Wasted layer of buffering and inneficciency #149

fortuna commented Nov 25, 2020 •

edited

Loading

fortuna commented Nov 25, 2020

eycorsican commented Nov 26, 2020

fortuna commented Nov 26, 2020

fortuna commented Nov 26, 2020 •

edited

Loading

eycorsican commented Nov 26, 2020

eycorsican commented Nov 27, 2020

github-actions bot commented Jan 27, 2021

Wasted layer of buffering and inneficciency #149

Wasted layer of buffering and inneficciency #149

Comments

fortuna commented Nov 25, 2020 • edited Loading

fortuna commented Nov 25, 2020

eycorsican commented Nov 26, 2020

fortuna commented Nov 26, 2020

fortuna commented Nov 26, 2020 • edited Loading

eycorsican commented Nov 26, 2020

eycorsican commented Nov 27, 2020

github-actions bot commented Jan 27, 2021

fortuna commented Nov 25, 2020 •

edited

Loading

fortuna commented Nov 26, 2020 •

edited

Loading