Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

removes repeated bincode serialization of gossip CrdsValues #3575

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

behzadnouri
Copy link

Problem

Gossip CrdsValues are deserialized when received as a gossip push message or pull response, but serialized again immediately to obtain a value hash during Crds::insert and then again serialized repeatedly every time the value is pushed to another node or returned as a response to a pull request.

Summary of Changes

In order to avoid repeated serialization of a CrdsValue during its lifetime, the commit manually implements bincode (de)serialization of CrdsValue to hold on to bincode serialized bytes of CrdsData and reuses that for serializing CrdsValue.

@behzadnouri behzadnouri force-pushed the cache-crds-ser branch 2 times, most recently from bb477ad to 87eded2 Compare November 13, 2024 13:54
@behzadnouri behzadnouri force-pushed the cache-crds-ser branch 2 times, most recently from 330df94 to 2132bae Compare November 13, 2024 18:32
@behzadnouri
Copy link
Author

Left side is this code.
process-gossip-packets-time

behzadnouri added a commit to behzadnouri/solana that referenced this pull request Nov 14, 2024
…_packet}

As part of anza-xyz#3575
we need to implement bincode (de)serialization without deriving serde
Serialize/Deserialize trait.

Packet::{from_data,populate_packet} methods also only require that the
data to be bincode serializable and do not care about serde either.

Next version of bincode provides the necessary traits to achieve this
functionality:
https://docs.rs/bincode/2.0.0-rc.3/bincode/enc/trait.Encode.html
However that is not released yet.

In order to avoid serde::Serialize trait bound on
Packet::{from_data,populate_packet}, the commit adds a new Serialize
trait which only requires implementing bincode serialization.
behzadnouri added a commit to behzadnouri/solana that referenced this pull request Nov 14, 2024
…_packet}

As part of anza-xyz#3575
we need to implement bincode (de)serialization without deriving serde
Serialize/Deserialize trait.

Packet::{from_data,populate_packet} methods also only require that the
data to be bincode serializable and do not care about serde either.

Next version of bincode provides the necessary traits to achieve this
functionality:
https://docs.rs/bincode/2.0.0-rc.3/bincode/enc/trait.Encode.html
but that is not released yet.

In order to avoid serde::Serialize trait bound on
Packet::{from_data,populate_packet}, the commit adds a new Serialize
trait which only requires implementing bincode serialization.
behzadnouri added a commit to behzadnouri/solana that referenced this pull request Nov 18, 2024
…_packet}

As part of anza-xyz#3575
we need to implement bincode (de)serialization without deriving serde
Serialize/Deserialize trait.

Packet::{from_data,populate_packet} methods also only require that the
data to be bincode serializable and do not care about serde either.

Next version of bincode provides the necessary traits to achieve this
functionality:
https://docs.rs/bincode/2.0.0-rc.3/bincode/enc/trait.Encode.html
but that is not released yet.

In order to avoid serde::Serialize trait bound on
Packet::{from_data,populate_packet}, the commit adds a new Serialize
trait which only requires implementing bincode serialization.
behzadnouri added a commit to behzadnouri/solana that referenced this pull request Nov 18, 2024
…_packet}

As part of anza-xyz#3575
we need to implement bincode (de)serialization without deriving serde
Serialize/Deserialize trait.

Packet::{from_data,populate_packet} methods also only require that the
data to be bincode serializable and do not care about serde either.

Next version of bincode provides the necessary traits to achieve this
functionality:
https://docs.rs/bincode/2.0.0-rc.3/bincode/enc/trait.Encode.html
but that is not released yet.

In order to avoid serde::Serialize trait bound on
Packet::{from_data,populate_packet}, the commit adds a new Encode trait
which only requires implementing bincode serialization.
behzadnouri added a commit to behzadnouri/solana that referenced this pull request Nov 18, 2024
…_packet}

As part of anza-xyz#3575
we need to implement bincode (de)serialization without deriving serde
Serialize/Deserialize trait.

Packet::{from_data,populate_packet} methods also only require that the
data to be bincode serializable and do not care about serde either.

Next version of bincode provides the necessary traits to achieve this
functionality:
https://docs.rs/bincode/2.0.0-rc.3/bincode/enc/trait.Encode.html
but that is not released yet.

In order to avoid serde::Serialize trait bound on
Packet::{from_data,populate_packet}, the commit adds a new Encode trait
which only requires implementing bincode serialization.
@behzadnouri behzadnouri force-pushed the cache-crds-ser branch 2 times, most recently from 58fb1e1 to 5217011 Compare November 18, 2024 18:37
behzadnouri added a commit that referenced this pull request Nov 18, 2024
…_packet} (#3636)

As part of #3575
we need to implement bincode (de)serialization without deriving serde
Serialize/Deserialize traits.

Packet::{from_data,populate_packet} methods also only require that the
data to be bincode serializable and do not care about serde either.

Next version of bincode provides the necessary traits to achieve this
functionality:
https://docs.rs/bincode/2.0.0-rc.3/bincode/enc/trait.Encode.html
but that is not released yet.

In order to avoid serde::Serialize trait bound on
Packet::{from_data,populate_packet}, the commit adds a new Encode trait
which only requires implementing bincode serialization.
@behzadnouri behzadnouri force-pushed the cache-crds-ser branch 4 times, most recently from 860442e to 0028ac6 Compare November 19, 2024 22:17
behzadnouri added a commit that referenced this pull request Nov 20, 2024
#3575
reworks CrdsValue bincode (de)serialization.

In order to maintain compatibility and correctness, the commit adds a
round trip test for Vec<CrdsValue> (de)serialization and also verifies
the serialized bytes against a hard-coded hash.
@behzadnouri behzadnouri force-pushed the cache-crds-ser branch 2 times, most recently from 610f57a to 8186746 Compare November 20, 2024 17:22
@behzadnouri behzadnouri force-pushed the cache-crds-ser branch 4 times, most recently from 06e60f8 to 945cbc7 Compare December 4, 2024 18:11
Copy link

@gregcusack gregcusack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mostly looks good. just a couple questions and may need to clarify some uses of "raw" numbers

bytes: &[u8],
) -> impl Iterator<Item = Result<Self, bincode::Error>> + '_ {
// Decode number of items in the slice.
let (size, mut bytes) = match convert_fixed_bytes::<[u8; 8], 8>(bytes) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

convert_fixed_bytes::<[u8; 8], 8> what exactly is the last 8 doing here? can we be more descriptive here on what these 8s represent?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

size is just the first 8 bytes of the serialized struct? and dictates how long the Vec<CrdsValue> is in terms of items (not bytes)?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a comment to clarify.

// bincode::serialize{,_into} serializes sequence length as
// a u64 (8 bytes) with fixint encoding in little endian.

// Implements bincode::deserialize_from for CrdsValue.
pub(crate) fn bincode_deserialize(
bytes: &[u8],
allow_trailing_bytes: bool,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you explain trailing bytes? Not sure I understand why we would allow these?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is the same thing as in bincode:
https://docs.rs/bincode/1.3.3/bincode/config/trait.Options.html#method.allow_trailing_bytes

If you are only deserializing a single CrdsValue then generally you do not want to allow trailing bytes.
But you may be deserializing a struct Foo(CrdsValue, Bar) or a Vec<CrdsValue> in which case the trailing bytes are the next value to be deserialized, so you would want to allow trailing bytes.

.reject_trailing_bytes()
.deserialize(bytes)
}
let (tag, bytes) = convert_fixed_bytes::<[u8; 4], 4>(bytes)?;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here, can you elaborate on what the 4s are here?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh so get the first 4 bytes of the serialized value and convert that into a "tag" which just represents the type of message received (Pull Request, Push Message, etc)? May be worth just clarifying the "raw" numbers here

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a comment.

// bincode::serialize{,_into} serializes enum tag as a u32 (4 bytes)
// with fixint encoding in little endian.

Gossip CrdsValues are deserialized when received as a gossip push
message or pull response, but serialized again immediately to obtain a
value hash during Crds::insert and then again serialized repeatedly
every time the value is pushed to another node or returned as a response
to a pull request.

In order to avoid repeated serialization of a CrdsValue during its
lifetime, the commit manually implements bincode (de)serialization of
CrdsValue to hold on to bincode serialized bytes of CrdsData and reuses
that for serializing CrdsValue.
Copy link

@gregcusack gregcusack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how worried are we about the increased memory usage here? almost 2Xing memory cost of a CrdsValue. I don't know the memory profile of agave. But looks like we're getting ~30% decrease in process_gossip_packets_time which is great so probably worth the increase in memory

@behzadnouri
Copy link
Author

how worried are we about the increased memory usage here? almost 2Xing memory cost of a CrdsValue.

yes, that is one of the concerns with this.
I am thinking of some more changes to the gossip code.
I am holding this off until those changes are in and then re-evaluate.

@gregcusack
Copy link

how worried are we about the increased memory usage here? almost 2Xing memory cost of a CrdsValue.

yes, that is one of the concerns with this. I am thinking of some more changes to the gossip code. I am holding this off until those changes are in and then re-evaluate.

cool sounds good

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants