Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Efficient binary serialization #6

Open
abooij opened this issue Apr 3, 2016 · 5 comments
Open

Efficient binary serialization #6

abooij opened this issue Apr 3, 2016 · 5 comments
Assignees

Comments

@abooij
Copy link
Owner

abooij commented Apr 3, 2016

Currently we are serializing and deserializing using bytestring builders and attoparsec, respectively. But that seems like overkill, as our grammars are much simpler than what those libraries support.

Michael Snoyman has written an article and library for serialization in Haskell that might be relevant to our use case.

This was referenced Apr 3, 2016
@abooij abooij self-assigned this Apr 6, 2016
@abooij abooij changed the title Efficient binary serialization? Efficient binary serialization Apr 6, 2016
@abooij
Copy link
Owner Author

abooij commented Jul 20, 2016

This is now implemented as the store (see also the release announcement).

@tclv
Copy link
Collaborator

tclv commented Dec 11, 2016

So I took a jab at implementing this. In the following screenshot you can see the performance differences. The performance is encoding/decoding 10000 random Wire Packages (with 20-200 bytes payload evenly distributed in microseconds). I put map id in there as well to give an idea of the cost of traversing the entire structure and forcing it to normal form.

screen shot 2016-12-10 at 20 58 25

@tclv
Copy link
Collaborator

tclv commented Dec 11, 2016

The Storable interface was implemented like this for the benchmark:

toInt :: Integral a => a -> Int
toInt = fromInteger . toInteger
{-# INLINE toInt #-}

instance Store WirePackage where
  size = VarSize $ toInt . wirePackageSize
  poke (WirePackage sen size op pl) = do
    poke sen 
    poke op 
    poke size 
    let (sourceFp, sourceOffset, sourceLength) = B.toForeignPtr pl
    pokeFromForeignPtr sourceFp sourceOffset sourceLength
  peek = do
    sen <- peek
    op <- peek
    size <- peek
    let payloadSize = toInt size - 8
    pl <- peekToPlainForeignPtr "Data.ByteString.ByteString" payloadSize
    return $ WirePackage sen size op (B.PS pl 0 payloadSize)
  {-# INLINE size #-}
  {-# INLINE peek #-}
  {-# INLINE poke #-}

This implementation is largely derived from the implementation of ByteString in Store itself.
Before I can write a PR implementing this in Sudbury itself, I have a couple of things that need to be discussed:

  • It might be nice to derive Generic on WirePackages and other applicable data structures as that allows a variety of useful classes to be used (like NFData). Probably more relevant for things that actually get exposed for the users.

  • The implementation uses internals from ByteString.Internal. This module is "unsafe", and thus breaks the Safe pragmas. Is there a obvious solution to this? I'm not too familiar with the Safe pragmas.

  • What does the Wayland protocol require with regards to Endianness? The people from Store were talking about implementing store in such a way that it will always write to Little Endian ByteString format (from both LE and BE machines). Is this desirable?

  • Any ideas for more interesting tests than decodeEx . encode == id

  • Is there a more idiomatic approach to toInt 😞

@tclv tclv assigned tclv and unassigned abooij Dec 11, 2016
@abooij
Copy link
Owner Author

abooij commented Dec 11, 2016

This is really nice.

I'll respond quickly now and more thoroughly later, so that I'm not holding you up:

  • I'd like to err on the side of caution wrt Generic and NFData. When you need to do a full forcing of your objects explicitly, you're probably solving the wrong problem (I'd prefer an approach like the one taken here). But of course such instances are acceptable for benchmarks and tests. Note that you can add a "deriving instance" after the fact using the StandaloneDeriving language extension.
  • The obvious solution is to stop using the safety pragmas as these are sort of deprecated anyway. Feel free to remove any safety pragma that gets in your way.
  • The wayland wire protocol specifies that encoding is based on the host's encoding. I believe this is what store does anyway (but it'd be worth checking+documenting).
  • Well perhaps we could hard-code some sample packages, and package streams. I think the easiest way to record them is to use sudbury itself.
  • fromIntegral is perhaps what you are looking for.

@tclv
Copy link
Collaborator

tclv commented Dec 11, 2016

I'll get started on the PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants