Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support passing flags to c_recv #243

Closed
farnoy opened this issue Apr 25, 2017 · 5 comments
Closed

Support passing flags to c_recv #243

farnoy opened this issue Apr 25, 2017 · 5 comments

Comments

@farnoy
Copy link

farnoy commented Apr 25, 2017

Hi,

I have a usecase where I want to read 4 bytes that define the message length that comes after it. If I read 4 bytes and then 16000 for example, I need to concat both bytestrings for my parse library, which makes a costly copy. I believe that a clean solution would be to recv the first 4 bytes with a flag MSG_PEEK and then read 16000+4 bytes again as one buffer, which would avoid the expensive append later. I noticed that flags passed to c_recv is always 0.

Could we provide an API for this? I can implement it, but I wanted to ask first about how this should be handled (portability could be an issue I believe).

@lpeterse
Copy link

Hi,

I have one thing to note here. I assume you're reading from a TCP socket. Even if you allocate and pass a buffer of 16k, the `c_recv' call is potentially allowed to return very small chunks down 1 Byte each (although it usually does not).
I don't know which parsing library you use (attoparsec, binary, cereal?), but most of them support feeding input incrementally for this use case.

@farnoy
Copy link
Author

farnoy commented Apr 25, 2017

So far I am prototyping and what worked in my application was this:

msgPeek :: CInt
msgPeek = 2

foreign import ccall unsafe "recv"
  -- c_recv returns -1 in the case of errors.
  {-@ assume c_recv :: CInt -> Ptr CChar -> size: {v: CSize | v >= 0} -> flags: CInt -> IO {read: CInt | read >= -1 && size >= read} @-}
  c_recv :: CInt -> Ptr CChar -> CSize -> CInt -> IO CInt

recvUntil :: Socket -> Int -> IO ByteString
recvUntil conn@(MkSocket fd _ _ _ _) n = do
  let go :: Int -> Ptr Word8 -> IO ()
      go readSoFar ptr = do
        nRead <- throwErrnoIfMinus1RetryMayBlock "recvUntil" (c_recv fd (ptr `plusPtr` readSoFar) (fromIntegral $ n - readSoFar) 0) (threadWaitRead (Fd fd))
        when (fromIntegral nRead < (n - readSoFar)) $ go (readSoFar + fromIntegral nRead) ptr
  fp <- mallocForeignPtrBytes n
  withForeignPtr fp (go 0)
  return (BS.PS fp 0 n)

recvUntilPeek :: Socket -> Int -> IO ByteString
recvUntilPeek conn@(MkSocket fd _ _ _ _) n = do
  let go :: Ptr Word8 -> IO ()
      go ptr = do
        threadWaitRead (Fd fd)
        nRead <- throwErrnoIfMinus1RetryMayBlock "recvUntilPeek" (c_recv fd (castPtr ptr) (fromIntegral n) msgPeek) (threadWaitRead (Fd fd))
        when (fromIntegral nRead < n) $ go ptr
  fp <- mallocForeignPtrBytes n
  withForeignPtr fp go
  return (BS.PS fp 0 n)

According to what you're saying, this may not be fully compatible with the standard, but it is working so far in practice. recvUntil retries as many times as needed to fill the buffer (incrementing the offset passed to recv(2) each time), while recvUntilPeek retries as many times as needed to fill in the buffer with one call (and MSG_PEEK set).

I'm only doing a 4 byte MSG_PEEK recv in my app, so I'm not running into the 4KB or whatever the limit a TCP implementation can have.

I use the store library and even though it's not suited for incremental parsing, the protocol I'm implementing sends the length of the message upfront, so I can cheat around this limitation in the library.

@lpeterse
Copy link

Hi,

just some notes on your code. We might also consider moving this discussion to another place if it gets too off-topic for the network library issue tracker.

Have you considered what happens if I initially sent you less than 4 bytes and you try to peek 4 bytes?

I would assume that threadWaitRead won't block as there is input available, the c_recv will return the same bytes again and again and you'll continue to run through the loop very fast.

Your protocol might require well formed messages, but an attacker will not adhere to this. I'm not saying that the approach with MSG_PEEK is a bad idea, but there are some caveats ;-)

@farnoy
Copy link
Author

farnoy commented Apr 25, 2017

I see what you mean, I haven't thought of that. One solution would be to implement some kind of exponential backoff with limited retries but doing all that just to avoid one concatenation of bytestring feels wrong. I will probably just hack around making two separate recv call sites reuse one buffer for simplicity and do this without MSG_PEEK. Thanks for the feedback.

I don't know how you feel about such addition to the library, sorry if this got off topic too much.

@kazu-yamamoto
Copy link
Collaborator

Close this in favor of #284.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants