Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Buffer reuse #97

Merged
merged 2 commits into from
Dec 20, 2020
Merged

Buffer reuse #97

merged 2 commits into from
Dec 20, 2020

Conversation

fegies
Copy link
Contributor

@fegies fegies commented Dec 13, 2020

Avoid reallocating the Protocol message buffer by introducing a Trait that abstracts over the concrete stream type and accepts a mutable buffer.

Furthermore, the ByteConsumer Iterator impl is now based on the FramedRead.

Will update the PR with profiling data once I am back at work with access to the test dataset

There are some open questions:

  • Does it make sense to update the Converter struct to use the new Trait also?
  • Should the minor version of the stream-delimit subcrate be bumped? (there should be strictly new features)

EDIT:

Time after this PR:
$ time ./pq --msgtype simmo.TranshipmentCandidate --fdsetdir ./fdset --stream i32be < candidates.proto.ld > /dev/null
real 1m12.450s
user 1m11.133s
sys 0m1.047s

For comparison: achieved Result from previous PR:
$ time ./pq --msgtype simmo.TranshipmentCandidate --fdsetdir ./fdset --stream i32be < candidates.proto.ld > /dev/null
real 1m19.370s
user 1m14.457s
sys 0m4.568s

match e {
StreamDelimitError::VarintDecodeError(i) => i,
e => io::Error::new(io::ErrorKind::InvalidData, format!("{}", e)),
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if its fine to just shunt all errors into the InvalidData kind

@fegies
Copy link
Contributor Author

fegies commented Dec 14, 2020

Here is a flamegraph after:
flame_bufferreuse

And here before (with the saved part crudely marked)
prebuffer

Looking at the graphs, it would seem there are still significant savings to be had in the serde_protobuf library, though I am not sure if this is feasible with justifiable effort

@sevagh
Copy link
Owner

sevagh commented Dec 15, 2020

Does it make sense to update the Converter struct to use the new Trait also?

I'll have to review this PR more closely so I can have an answer for this. It could take more time.

Should the minor version of the stream-delimit subcrate be bumped? (there should be strictly new features)

Unless you really want to, you don't need to worry about version bumps - after merging all the PRs, I can take care of the version bumps for the next release.

@fegies
Copy link
Contributor Author

fegies commented Dec 16, 2020

So, I had a thorough look at serde-protobuf, and found some unfortunate things:

  • because of the protobuf spec, the reader should only keep the last entry for a given key. This requires at least buffering the entire message or deserializing and overwriting duplicate subobjects as the parser descends the structure.
  • serde-protobuf chooses the latter, leading to a tree of keys being constructed internally for every message and submessage
  • because the parser structure does not have any lifetime references to the parsed data, its not possible to buffer the message and descend lazily, or reuse sets or maps without major effort.
  • this is complicated because it uses the protobuf crate, with the CodedInputStream as an abstration over a BufferedStream -> no zero-copy here possible either.

On the pq-side, I did not find any area where perf could be easily improved. (aside from changing the protobuf library).

So, aside from addressing comments from your side, I'd consider this PR series finished.

Cheers!

@sevagh
Copy link
Owner

sevagh commented Dec 20, 2020

Verified on my end with pmap and the new PR has much less memory used. Tests passing. Thanks for another great contribution.

@sevagh sevagh merged commit 034bf64 into sevagh:master Dec 20, 2020
@fegies fegies deleted the buffer-reuse branch January 24, 2021 15:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants