Skip to content
This repository has been archived by the owner on Nov 15, 2023. It is now read-only.

Compress the PoV block before sending it over the network #2288

Merged
10 commits merged into from
Jan 21, 2021

Conversation

bkchr
Copy link
Member

@bkchr bkchr commented Jan 18, 2021

This pr changes the way we send PoV blocks over the network. We now
compress the PoV block before it is send over the network. This should
reduce the size significant for PoVs which contain the runtime WASM for
example.

This pr changes the way we send PoV blocks over the network. We now
compress the PoV block before it is send over the network. This should
reduce the size significant for PoVs which contain the runtime WASM for
example.
@bkchr bkchr added A0-please_review Pull request needs code review. B0-silent Changes should not be mentioned in any release notes C1-low PR touches the given topic and has a low impact on builders. labels Jan 18, 2021
@ordian
Copy link
Member

ordian commented Jan 19, 2021

I wonder whether we should evaluate other compression algorithms like zstd and lz4 before commiting a network protocol change.

@bkchr
Copy link
Member Author

bkchr commented Jan 19, 2021

I wonder whether we should evaluate other compression algorithms like zstd and lz4 before commiting a network protocol change.

Would you like to do this evaluation?


impl CompressedPoV {
/// Create from the given [`PoV`].
pub fn from_pov(pov: &PoV) -> Result<Self, CompressedPoVError> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit (i.e. feel free to ignore): I can't help but suggest to name those: compress and uncompress

@burdges
Copy link
Contributor

burdges commented Jan 19, 2021

If I understand, there are actually several different compression algorithms compatible with roughly the same gzip-like decompression algorithm, although flate2 does not present this picture, so maybe not exactly true. If that's true, then parachains who wish to spend more time compressing could do so by modifying their own code.

@ordian
Copy link
Member

ordian commented Jan 19, 2021

Would you like to do this evaluation?

Sure, I can do it tomorrow. Please send me an example of a typical large PoV. Do we scale encode it before compressing?

@bkchr
Copy link
Member Author

bkchr commented Jan 19, 2021

Would you like to do this evaluation?

Sure, I can do it tomorrow. Please send me an example of a typical large PoV. Do we scale encode it before compressing?

I don't have one. Yes they are SCALE encoded.

I compared the compression between gzip and lz4 of a runtime wasm. Gzip clearly wins here.

@ordian
Copy link
Member

ordian commented Jan 20, 2021

https://github.com/ordian/bench-compression-algorithms

zstd seems better than gzip for our use-case.

@pepyakin
Copy link
Contributor

To have a full picture I think these benches should also show decoding compression/performance.

@ordian
Copy link
Member

ordian commented Jan 20, 2021

To have a full picture I think these benches should also show decoding compression/performance.

They do, I just omitted them in readme, because decoding is much faster than encoding for all of them.

However, now that you brought it up, I'd like to raise a concern about zip bombs [1], a carefully crafted encoded PoV can be used for a DDoS attack. I don't know if any of these are resistant to them.

@coriolinus
Copy link
Contributor

Does anyone know of a compression format designed to be resistant to zip bombs? I'm not read-up on the subject, but if there's a mitigation as simple as "prohibit DEFLATE", then that might be the way to go.

@pepyakin
Copy link
Contributor

However, now that you brought it up, I'd like to raise a concern about zip bombs [1], a carefully crafted encoded PoV can be used for a DDoS attack. I don't know if any of these are resistant to them.

That's a good point.

Since we are using a streaming decoder we can set up a shut-off condition that stops whenever the decoded data exceeds the raw PoV limit (4 MiB as of now?).

As long as the internal implementation is not susceptible for excessive allocation we should be fine.

@ordian
Copy link
Member

ordian commented Jan 20, 2021

Since we are using a streaming decoder we can set up a shut-off condition that stops whenever the decoded data exceeds the raw PoV limit (4 MiB as of now?).

Agree, this might be the simplest way to mitigate the issue, but I don't know if 4 MiB is big enough.

@burdges
Copy link
Contributor

burdges commented Jan 20, 2021

Just an aside: zstd has a cool feature of trained dictionaries. It's supposedly only for small files, but people discuss it for log files. We could not permit untrusted dictionaries, but conceivably someone might one day build a canonical WASM dictionary, which we just fix until WASM changes radically. This is definitely future work and not relevant right now.

@bkchr
Copy link
Member Author

bkchr commented Jan 21, 2021

I created an issue to switch to the actual maximum povblock size: #2298

@bkchr
Copy link
Member Author

bkchr commented Jan 21, 2021

I also made the decompression/compression fail for browser nodes. I did not find any suitable zstd implementation at the moment that compiles for WASM. As all this stuff is currently anyway going to be replaced, I think this is the best solution for now.

roadmap/implementers-guide/src/types/network.md Outdated Show resolved Hide resolved
struct InputDecoder<'a, T: std::io::BufRead>(&'a mut zstd::Decoder<T>, usize);
impl<'a, T: std::io::BufRead> parity_scale_codec::Input for InputDecoder<'a, T> {
fn read(&mut self, into: &mut [u8]) -> Result<(), parity_scale_codec::Error> {
if self.1.saturating_add(into.len()) > MAX_POV_BLOCK_SIZE {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where do we modify self.1?
can we have a test for this please?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We modify self.1 directly here? We call saturating_add in this line.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm an idiot 🤦

@ordian
Copy link
Member

ordian commented Jan 21, 2021

I also made the decompression/compression fail for browser nodes. I did not find any suitable zstd implementation at the moment that compiles for WASM. As all this stuff is currently anyway going to be replaced, I think this is the best solution for now.

Should we ditch browser feature and web-wasm check ? AFAIK light client will use https://github.com/paritytech/smoldot? cc @tomaka

It's strange that zstd doesn't compile to WASM though, found relevant issue: gyscos/zstd-rs#48.
EDIT: https://github.com/gyscos/zstd-rs/blob/7f19ce66b6dd542d2c7ede9571f2648ba65ba6f0/Cargo.toml#L38

@tomaka
Copy link
Contributor

tomaka commented Jan 21, 2021

Should we ditch browser feature and web-wasm check ? AFAIK light client will use https://github.com/paritytech/smoldot?

If we can keep the Wasm check alive without too much efforts, then we should keep it alive.
However since parachains are way higher priority, I don't think we should block this PR on the Wasm check.

@bkchr
Copy link
Member Author

bkchr commented Jan 21, 2021

Should we ditch browser feature and web-wasm check ? AFAIK light client will use https://github.com/paritytech/smoldot?

If we can keep the Wasm check alive without too much efforts, then we should keep it alive.
However since parachains are way higher priority, I don't think we should block this PR on the Wasm check.

Wasm check is happy. It would just fail to decompress/compress a PoVBlock on a browser node. IMHO a browser node should not do this anyway.

When the day comes that we want this, we can clearly get it work. However, I already spent to much time on this.

Copy link
Contributor

@pepyakin pepyakin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apart from the pending comments, looks good.

@rphmeier
Copy link
Contributor

bot merge

@ghost
Copy link

ghost commented Jan 21, 2021

Waiting for commit status.

@ghost ghost merged commit 11797c7 into master Jan 21, 2021
@ghost ghost deleted the bkchr-compress-pov-block branch January 21, 2021 18:04
/// Compress the given [`PoV`] and returns a [`CompressedPoV`].
#[cfg(not(target_os = "unknown"))]
pub fn compress(pov: &PoV) -> Result<Self, CompressedPoVError> {
zstd::encode_all(pov.encode().as_slice(), 3).map_err(|_| CompressedPoVError::Compress).map(Self)
Copy link
Contributor

@cheme cheme Mar 31, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bkchr , I was wondering if you did test with something like (curious if it gives similar ratio):

let mut dest = Vec::new();
pov.encode_to(Encoder::new(&mut dest, 3)

Edit: I did test it on a different context, it looks identical with encoder (checked against zstd using cmd not actually encode_all).

This pull request was closed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A0-please_review Pull request needs code review. B0-silent Changes should not be mentioned in any release notes C1-low PR touches the given topic and has a low impact on builders.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants