Repair responses always saturate the MTU limit #25818

CherryWorm · 2022-06-07T16:26:53Z

Problem

When repair requests are serviced, shreds are pulled straight from the blockstore, padded with a nonce and then send back to the requestor. However the data that is returned from the blockstore is padded with a bunch of 0-bytes.

Repair is a non-trivial part of Solana network traffic. Doing this more efficient and not sending the 0-byte padding should reduce overall network overhead by quite a bit (about 3-4x in my small experiments).

Proposed Solution

Don't send the 0-padding in repair responses.

steviez · 2022-06-07T16:54:07Z

Good catch; the relevant snippet where we form repair packets is:

solana/core/src/repair_response.rs

Lines 10 to 23 in 061dc53

    
           pub fn repair_response_packet( 
        
               blockstore: &Blockstore, 
        
               slot: Slot, 
        
               shred_index: u64, 
        
               dest: &SocketAddr, 
        
               nonce: Nonce, 
        
           ) -> Option<Packet> { 
        
               let shred = blockstore 
        
                   .get_data_shred(slot, shred_index) 
        
                   .expect("Blockstore could not get data shred"); 
        
               shred 
        
                   .map(|shred| repair_response_packet_from_bytes(shred, dest, nonce)) 
        
                   .unwrap_or(None) 
        
           }

As you called out, the shred bytestreams are 0-padded when pulled from blockstore. When we create the packet, we use the length of the bytestream, and not the length of the shred payload.

solana/core/src/repair_response.rs

Lines 25 to 35 in 061dc53

    
           pub fn repair_response_packet_from_bytes( 
        
               bytes: Vec<u8>, 
        
               dest: &SocketAddr, 
        
               nonce: Nonce, 
        
           ) -> Option<Packet> { 
        
               let mut packet = Packet::default(); 
        
               let size = bytes.len() + SIZE_OF_NONCE; 
        
               if size > packet.buffer_mut().len() { 
        
                   return None; 
        
               } 
        
               packet.meta.size = size;

So, at first glance, simply adding a blockstore retrieval function that returns unpadded streams seems like it would do the trick

behzadnouri · 2022-06-07T16:59:03Z

The zero padding is part of the signed message; So the receiving node has to:

resize the payload to MTU and move the nonce to the end of the payload.
zero out bytes where previously nonce was located.
verify signature on everything except the nonce at the end of the payload.

That being said current shreds are labeled as legacy, and we are in the process of moving to merkle variants: #25237 which have different layout and signed message. So I wouldn't suggest putting effort into above until merkle variants are fully rolled out.

It is though a good point to consider in designing layout for merkle variants so that above would be easier.

t-nelson · 2022-07-06T06:08:36Z

can someone make a call as to whether we're going to fix this or let it ride 'til the new shred format is live?

behzadnouri · 2022-07-07T16:01:52Z

can someone make a call as to whether we're going to fix this or let it ride 'til the new shred format is live?

I lean towards not making shreds processing any more convoluted until we have fully migrated to merkle variants.
Their binary layout is different so this will add extra if branches based on bytes introspection.

Lets keep the issue open though because longer term we do want to address this if it has significant impact.

steviez · 2022-11-07T16:45:48Z

Just ran solana-ledger-tool analyze-storage on some recent blocks:

{
  "column": "ShredData",
  "entries": 237506364,
  "key_stats": {
    "max": 16,
    "total_bytes": 3800101824
  },
  "row_stats": {
    "max": 1156,
    "min": 104,
    "p50": 1156,
    "p90": 1156,
    "p99": 1156,
    "p999": 1156,
    "stddev": 296,
    "total_bytes": 242635907054
  },
  "val_stats": {
    "max": 1140,
    "min": 88,
    "p50": 1140,
    "p90": 1140,
    "p99": 1140,
    "p999": 1140,
    "stddev": 296,
    "total_bytes": 238835805230
  }
}

Row refers to kv whereas val refers to just v; k is 16 bytes (u64 for slot && u64 for shred_id). So

mean shreds_per_byte = total_bytes_from_val_stats / entries
238835805230 / 237506364 = ~1006 bytes / shred

And then

1006 bytes_per_shreds / 1228 bytes_per_packet (accounting for nonce) ~= 82% aka 18% reduction

Need to double check this, but seems like the significant drop in network bandwidth could warrant the additional overhead that would be incurred by the process that @behzadnouri mentioned above. Not a deep dive now, just a quick comment to get this back on the radar

behzadnouri · 2022-11-10T19:11:52Z

An alternative way to address this issue is to optimize the coalescing/buffering of entries here:
https://github.com/solana-labs/solana/blob/628ac0d4e/core/src/broadcast_stage/broadcast_utils.rs#L34-L98
before [Entry] is converted to shreds.
The larger [Entry] is, the smaller zero padded buffer is per shred (i.e. amortized).
The trade-off is obviously how long it takes since entries are generated until shreds are generated.

CherryWorm · 2023-06-08T11:38:27Z

Friendly bump, did this get implemented yet?

behzadnouri · 2023-06-08T15:22:39Z

Friendly bump, did this get implemented yet?

no, once merkle shreds are rolled out we can look into impact of removing the zero padding; it will require a slice shift at the receiving end though. This is because for Reed-Solomon erasure coding we need a contiguous buffer which forces us to put the Merkle proof after the zero padding and not before the data buffer.

nyetwurk · 2024-06-17T16:49:14Z

See #25818 (comment) this should not have been closed.

steviez · 2024-07-01T17:57:00Z

I'll reopen for now, will let @behzadnouri chime in as he is more up to date on the latest shred happenings.

EDIT: Lol, nevermind; github-actions said no

github-actions · 2024-07-01T17:57:09Z

This repository is no longer in use. Please re-open this issue in the agave repo: https://github.com/anza-xyz/agave

github-actions bot added the stale [bot only] Added to stale content; results in auto-close after a week. label Jun 10, 2024

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Jun 17, 2024

steviez reopened this Jul 1, 2024

github-actions bot closed this as completed Jul 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repair responses always saturate the MTU limit #25818

Repair responses always saturate the MTU limit #25818

CherryWorm commented Jun 7, 2022 •

edited

Loading

steviez commented Jun 7, 2022

behzadnouri commented Jun 7, 2022

t-nelson commented Jul 6, 2022

behzadnouri commented Jul 7, 2022

steviez commented Nov 7, 2022

behzadnouri commented Nov 10, 2022

CherryWorm commented Jun 8, 2023

behzadnouri commented Jun 8, 2023

nyetwurk commented Jun 17, 2024

steviez commented Jul 1, 2024 •

edited

Loading

github-actions bot commented Jul 1, 2024

Repair responses always saturate the MTU limit #25818

Repair responses always saturate the MTU limit #25818

Comments

CherryWorm commented Jun 7, 2022 • edited Loading

Problem

Proposed Solution

steviez commented Jun 7, 2022

behzadnouri commented Jun 7, 2022

t-nelson commented Jul 6, 2022

behzadnouri commented Jul 7, 2022

steviez commented Nov 7, 2022

behzadnouri commented Nov 10, 2022

CherryWorm commented Jun 8, 2023

behzadnouri commented Jun 8, 2023

nyetwurk commented Jun 17, 2024

steviez commented Jul 1, 2024 • edited Loading

github-actions bot commented Jul 1, 2024

CherryWorm commented Jun 7, 2022 •

edited

Loading

steviez commented Jul 1, 2024 •

edited

Loading