-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Compress the PoV block before sending it over the network #2288
Conversation
This pr changes the way we send PoV blocks over the network. We now compress the PoV block before it is send over the network. This should reduce the size significant for PoVs which contain the runtime WASM for example.
I wonder whether we should evaluate other compression algorithms like |
Would you like to do this evaluation? |
node/network/protocol/src/lib.rs
Outdated
|
||
impl CompressedPoV { | ||
/// Create from the given [`PoV`]. | ||
pub fn from_pov(pov: &PoV) -> Result<Self, CompressedPoVError> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit (i.e. feel free to ignore): I can't help but suggest to name those: compress and uncompress
If I understand, there are actually several different compression algorithms compatible with roughly the same gzip-like decompression algorithm, although flate2 does not present this picture, so maybe not exactly true. If that's true, then parachains who wish to spend more time compressing could do so by modifying their own code. |
Sure, I can do it tomorrow. Please send me an example of a typical large PoV. Do we scale encode it before compressing? |
I don't have one. Yes they are SCALE encoded. I compared the compression between gzip and lz4 of a runtime wasm. Gzip clearly wins here. |
https://github.com/ordian/bench-compression-algorithms
|
To have a full picture I think these benches should also show decoding compression/performance. |
They do, I just omitted them in readme, because decoding is much faster than encoding for all of them. However, now that you brought it up, I'd like to raise a concern about zip bombs [1], a carefully crafted encoded PoV can be used for a DDoS attack. I don't know if any of these are resistant to them. |
Does anyone know of a compression format designed to be resistant to zip bombs? I'm not read-up on the subject, but if there's a mitigation as simple as "prohibit DEFLATE", then that might be the way to go. |
That's a good point. Since we are using a streaming decoder we can set up a shut-off condition that stops whenever the decoded data exceeds the raw PoV limit (4 MiB as of now?). As long as the internal implementation is not susceptible for excessive allocation we should be fine. |
Agree, this might be the simplest way to mitigate the issue, but I don't know if 4 MiB is big enough. |
Just an aside: zstd has a cool feature of trained dictionaries. It's supposedly only for small files, but people discuss it for log files. We could not permit untrusted dictionaries, but conceivably someone might one day build a canonical WASM dictionary, which we just fix until WASM changes radically. This is definitely future work and not relevant right now. |
I created an issue to switch to the actual maximum povblock size: #2298 |
I also made the decompression/compression fail for browser nodes. I did not find any suitable zstd implementation at the moment that compiles for WASM. As all this stuff is currently anyway going to be replaced, I think this is the best solution for now. |
node/network/protocol/src/lib.rs
Outdated
struct InputDecoder<'a, T: std::io::BufRead>(&'a mut zstd::Decoder<T>, usize); | ||
impl<'a, T: std::io::BufRead> parity_scale_codec::Input for InputDecoder<'a, T> { | ||
fn read(&mut self, into: &mut [u8]) -> Result<(), parity_scale_codec::Error> { | ||
if self.1.saturating_add(into.len()) > MAX_POV_BLOCK_SIZE { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where do we modify self.1
?
can we have a test for this please?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We modify self.1
directly here? We call saturating_add
in this line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
saturating_add
takes self
by value, not &mut self
https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=da4bed5cde4b3b145bc8754b705aeb20
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm an idiot 🤦
Should we ditch browser feature and web-wasm check ? AFAIK light client will use https://github.com/paritytech/smoldot? cc @tomaka It's strange that zstd doesn't compile to WASM though, found relevant issue: gyscos/zstd-rs#48. |
Co-authored-by: Andronik Ordian <[email protected]>
If we can keep the Wasm check alive without too much efforts, then we should keep it alive. |
Wasm check is happy. It would just fail to decompress/compress a PoVBlock on a browser node. IMHO a browser node should not do this anyway. When the day comes that we want this, we can clearly get it work. However, I already spent to much time on this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apart from the pending comments, looks good.
…adot into bkchr-compress-pov-block
bot merge |
Waiting for commit status. |
/// Compress the given [`PoV`] and returns a [`CompressedPoV`]. | ||
#[cfg(not(target_os = "unknown"))] | ||
pub fn compress(pov: &PoV) -> Result<Self, CompressedPoVError> { | ||
zstd::encode_all(pov.encode().as_slice(), 3).map_err(|_| CompressedPoVError::Compress).map(Self) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bkchr , I was wondering if you did test with something like (curious if it gives similar ratio):
let mut dest = Vec::new();
pov.encode_to(Encoder::new(&mut dest, 3)
Edit: I did test it on a different context, it looks identical with encoder (checked against zstd using cmd not actually encode_all).
This pr changes the way we send PoV blocks over the network. We now
compress the PoV block before it is send over the network. This should
reduce the size significant for PoVs which contain the runtime WASM for
example.