Skip to content

Latest commit

 

History

History
788 lines (606 loc) · 34.7 KB

nep-0539.md

File metadata and controls

788 lines (606 loc) · 34.7 KB
NEP Title Authors Status DiscussionsTo Type Version Created LastUpdated
539
Cross-Shard Congestion Control
Waclaw Banasik <[email protected]>, Jakob Meier <[email protected]>
Final
Protocol
1.0.0
2024-03-21
2024-05-15

Summary

We propose to limit the transactions and outgoing receipts included in each chunk. Limits depend on the congestion level of that receiving shard and are to be enforced by the protocol.

Congestion is primarily tracked in terms of the total gas of all delayed receipts. Chunk producers must ensure they stop accepting new transactions if the receiver account is on a shard with a full delayed receipts queue.

Forwarding of receipts that are not directly produced by a transaction, namely cross-contract calls and delegated receipts, is limited by the receiver's overall congestion level. This includes the gas of delayed receipts and the gas of receipts that have not been forwarded, yet, due to congestion control restrictions. Additionally, the memory consumption of both receipt types can also cause congestion.

This proposal extends the local congestion control rules already in place. It keeps the transaction pool size limit as is but replaces the old delayed receipt count limit with limits on gas and size of the receipts.

Motivation

We want to guarantee the Near Protocol blockchain operates stably even during congestion.

Today, when users send transactions at a higher rate than what the network can process, receipts accumulate without limits. This leads to unlimited memory consumption on chunk producers' and validators' machines. Furthermore, the delay for a transaction from when it is accepted to when it finishes execution becomes larger and larger, as the receipts need to queue behind those already in the system.

First attempts to limit the memory consumption have been added without protocol changes. This is known as "local congestion control" and can be summarized in two rules:

  • Limit the transaction pool to 100 MB. near/nearcore#9172
  • Once we have accumulated more than 20k delayed receipts in a shard, chunk-producers for that shard stop accepting more transactions. near/nearcore#9222

But these rules do not put any limits on what other shards would accept. For example, when a particular contract is popular, the contract's shard will eventually stop accepting new transactions, but all other shards will keep accepting more and more transactions that produce receipts for the popular contract. Therefore the number of delayed receipts keeps growing indefinitely.

Cross-shard congestion control addresses this issue by stopping new transactions at the source and delaying receipt forwarding when the receiving shard has reached its congestion limit.

Specification

The proposal adds fields to chunks headers, adds a new trie column, changes the transaction selection rules, and changes the chunk execution flow. After the concepts section below, the next four sections specify each of these changes in more detail.

Concepts

Below are high-level description of important concepts to make the following sections a bit easier to digest.

Delayed receipts are all receipts that were ready for execution in the previous chunk but were delayed due to gas or compute limits. They are stored in the delayed receipt queue, which itself is stored in the state of the trie. There is exactly one delayed receipts queue per shard.

Outgoing receipts are all newly produced receipts as a result of executing receipts or when converting a transaction to non-local receipts. In the absence of congestion, they are all stored in the output of applying receipts, as a simple list of receipts.

The outgoing receipts buffer is a data structure added by cross-shard congestion control. Each shard has one instance of it in the state trie for every other shard. We use this buffer to store outgoing receipts temporarily when reaching receipt forwarding limits.

Receipt limits are measured in gas and size. Gas in this context refers to the maximum amount of gas that could be burn when applying the receipt. The receipt size is how much space it takes in memory, measured in bytes.

The congestion level is an indicator between 0 and 1 that determines how strong congestion prevention measures should be. The maximum congestion measured is reached at congestion level 1. This value is defined separately for each shard and computed as the maximum value of the following congestion types.

  • Incoming congestion increases linearly with the amount of gas in delayed receipts.
  • Outgoing congestion increases linearly with the amount of gas in any of the outgoing buffers of the shard.
  • Memory congestion increases linearly with the total size of all delayed and buffered outgoing receipts.
  • Missed chunk congestion rises linearly with the number of missed chunks since the last successful chunk, measure in block height difference.

Chunk header changes

We change the chunk header to include congestion information, adding four indicators.

/// sum of gas of all receipts in the delayed receipts queue, at the end of chunk execution
delayed_receipts_gas: u128,
/// sum of gas of all receipts in all outgoing receipt buffers, at the end of chunk execution
buffered_receipts_gas: u128,
/// sum of all receipt sizes in the delayed receipts queue and all outgoing buffers
receipt_bytes: u64,
/// if the congestion level is 1.0, the only shard that can forward receipts to this shard in the next chunk
/// not relevant if the congestion level is < 1.0
allowed_shard: u16,

The exact header structure and reasons for the particular integer sizes are described in more details in the reference implementation section. Usage of these fields is described in Changes to chunk execution.

This adds 42 bytes to the chunk header, increasing it from 374 bytes up to 416 bytes in the borsh representation (assuming no validator proposals are included.) This in turn increases the block size by 42 bytes per shard, as all chunk headers are fully included in blocks.

Including all this information in the chunk header enables efficient validation. Using the previous chunk header (or alternatively, the state), combined with the list of receipts applied and forwarded, a validator can check that the congestion rules described in this NEP are fulfilled.

Changes to transaction selection

We change transaction selection to reject new transactions when the system is congested, to reduce to total workload in the system.

Today, transactions are taken from the chunk producer's pool until tx_limit is reached, where tx_limit is computed as follows.

# old
tx_limit = 500 Tgas if len(delayed_receipts) < 20_000 else 0

We replace the formula for the transaction limit to depend on the incoming_congestion variable (between 0 and 1) computed in the previous chunk of the same shard:

# new
MIN_TX_GAS = 20 Tgas
MAX_TX_GAS = 500 Tgas
tx_limit = mix(MAX_TX_GAS, MIN_TX_GAS, incoming_congestion)

This smoothly limits the acceptance of new work, to prioritize reducing the backlog of delayed receipts.

In the pseudo code above, we borrow the mix function from GLSL for linear interpolation.

mix(x, y, a)

mix performs a linear interpolation between x and y using a to weight between them. The return value is computed as $x \times (1 - a) + y \times a$.

More importantly, we add a more targeted rule to reject all transactions targeting a shard with a congestion level above a certain threshold.

def filter_tx(tx):
  REJECT_TX_CONGESTION_THRESHOLD = 0.25
  if congestion_level(tx.receiver_shard_id) > REJECT_TX_CONGESTION_THRESHOLD
    tx.reject()
  else
    tx.accept()

Here, congestion_level() is the maximum of incoming, outgoing, memory, and missed chunk congestion.

This stops (some) new incoming work at the source, when a shard is using too much memory to store unprocessed receipts, or if there is already too much work piled up for that shard.

Chunk validators must validate that the two rules above are respected in a produced chunk.

Changes to chunk execution

We change chunk execution to hold back receipt forwarding to congested shards. This has two effects.

  1. It prevents the memory consumption of the congested shard from growing at the expense of buffering these pending receipts on the outgoing shards.
  2. When user demand is consistently higher than what the system can handle, this mechanism lets backpressure propagate shard-by-shard until it reaches the shards responsible for accepting too many receipts and causes transaction filtering to kick in.

To accomplish this, we add 3 new steps to chunk execution (enumerated as 1, 2, 6 below) and modify how outgoing receipts are treated in the transaction conversion step (3) and in the receipts execution step (4).

The new chunk execution then follows this order.

  1. (new) Read congestion information for all shards from the previous chunk headers.

    // congestion info for each shard, as computed in the last included chunk of the shard
    {
     delayed_receipts_gas: u128,
     buffered_receipts_gas: u128,
     receipt_bytes: u64,
     allowed_shard: u16,
     // extended congestion info, as computed from the latest block header
     missed_chunks_count: u64
    }
  2. (new) Compute bandwidth limits to other shards based on the congestion information. The formula is:

    for receiver in other_shards:
      MAX_CONGESTION_INCOMING_GAS = 20 Pgas
      incoming_congestion = delayed_receipts_gas[receiver] / MAX_CONGESTION_INCOMING_GAS
    
      MAX_CONGESTION_OUTGOING_GAS = 2 Pgas
      outgoing_congestion = buffered_receipts_gas[receiver] / MAX_CONGESTION_OUTGOING_GAS
    
      MAX_CONGESTION_MEMORY_CONSUMPTION = 1000 MB
      memory_congestion = receipt_bytes[receiver] / MAX_CONGESTION_MEMORY_CONSUMPTION
    
      MAX_CONGESTION_MISSED_CHUNKS = 10
      missed_chunk_congestion = missed_chunks_count[receiver] / MAX_CONGESTION_MISSED_CHUNKS
    
      congestion = max(incoming_congestion, outgoing_congestion, memory_congestion, missed_chunk_congestion)
    
      if congestion >= 1.0:
        # Maximum congestion: reduce to minimum speed
        if current_shard == allowed_shard[receiver]:
          outgoing_gas_limit[receiver] = 1 Pgas
        else:
          outgoing_gas_limit[receiver] = 0
      else:
        # Green or Amber
        # linear interpolation based on congestion level
        MIN_GAS_FORWARDING = 1 Pgas
        MAX_GAS_FORWARDING = 300 Pgas
        outgoing_gas_limit[receiver]
          = mix(MAX_GAS_FORWARDING, MIN_GAS_FORWARDING, congestion)
  3. (new) Drain receipts in the outgoing buffer from the previous round

    • Subtract receipt.gas() from outgoing_gas_limit[receipt.receiver] for each receipt drained.
    • Keep receipts in the buffer if the gas limit would be negative.
    • Subtract receipt.gas() from outgoing_congestion and receipt.size() from receipt_bytes for the local shard for every forwarded receipt.
    • Add the removed receipts to the outgoing receipts of the new chunk.
  4. Convert all transactions to receipts included in the chunk.

    • Local receipts, which are receipts where the sender account id is equal to the receiver id, are set aside as local receipts for step 5.
    • Non-local receipts up to outgoing_gas_limit[receipt.receiver] for the respective shard go to the outgoing receipts list of the chunk.
    • (new) Non-local receipts above outgoing_gas_limit[receipt.receiver] for the respective shard go to the outgoing receipts buffer.
    • (new) For each receipt added to the outgoing buffer, add receipt.gas() to outgoing_congestion and receipt.size() to receipt_bytes for the local shard.
  5. Execute receipts in the order of local, delayed, incoming, yield-resume time-out.

    • Don't stop before all receipts are executed or more than 1000 Tgas have been burnt. Burnt gas includes the burnt gas from step 4.
    • Outgoing receipts up to what is left in outgoing_gas_limit[receipt.receiver] per shard (after step 3) go to the outgoing receipts list of the chunk.
    • (new) Outgoing receipts above outgoing_gas_limit[receipt.receiver] go to the outgoing receipts buffer.
    • (new) For each delayed executed receipt, remove receipt.gas() from incoming_congestion and receipt.size() from receipt_bytes.
  6. Remaining local or incoming receipts are added to the end of the delayed receipts queue.

    • (new) For each receipt added to the delayed receipts queue, add receipt.gas() to incoming_congestion and receipt.size() to receipt_bytes.
  7. (new) Write own congestion information into the result, to be included in the next chunk header.

    • If the congestion level is >= 1.0, the allowed_shard can be chosen freely by the chunk producer. Selecting the own shard means nobody can send. The reference implementations uses round-robin assignment of all other shards. Further optimization can be done without requiring protocol changes.
    • If the congestion level is <= 1.0, the allowed_shard value does not affect congestion control. But chunk producer must set it to the own shard in this case.

In the formula above, the receipt gas and the receipt size are defined as:

def gas(receipt):
  return receipt.attached_gas + receipt.exec_gas

def size(receipt):
  return len(borsh(receipt))

Changes to Trie

We store the outgoing buffered receipts in the trie, similar to delayed receipts but in their own separate column. But instead of a single queue per shard, we add one queue for each other shard at the current sharding layout.

We add two trie columns:

  • BUFFERED_RECEIPT_INDICES: u8 = 13;
  • BUFFERED_RECEIPT: u8 = 14

The BUFFERED_RECEIPT_INDICES column only has one value, which stores a borsh-serialized instance of BufferedReceiptIndices defines as follows:

pub struct BufferedReceiptIndices {
    pub shard_buffer_indices: BTreeMap<ShardId, ShardBufferedReceiptIndices>,
}

pub struct ShardBufferedReceiptIndices {
    // First inclusive index in the queue.
    pub first_index: u64,
    // Exclusive end index of the queue
    pub next_available_index: u64,
}

The BUFFERED_RECEIPT column stores receipts keyed by TrieKey::BufferedReceipt{ receiving_shard: ShardId, index: u64 }.

The BufferedReceiptIndices map defines which queues exist, which changes during resharding. For each existing queue, all receipts in the range [first_index, next_available_index) (inclusive start, exclusive end) must exist under the key with the corresponding shard.

Notes on parameter fine-tuning

Below are the reasons why each parameter is set to the specific value given above.

For more details, a spreadsheet with the full analysis can be found here: https://docs.google.com/spreadsheets/d/1Vt_-sgMdX1ncYleikYY8uFID_aG9RaqJOqaVMLQ37tQ/edit#gid=0

Queue sizes

The parameters are chosen to strike a balance between guarantees for short queues and utilization. 20 PGas delayed receipts means that incoming receipts have to wait at most 20 chunks to be applied. And it can guarantee 100% utilization as long as the ratio between burnt and attached gas in receipts is above 1 to 20.

A shorter delayed queue would result in lower delays but in our model simulations, we saw reduced utilization even in simple and balanced workloads.

The 1 GB of memory is a target value for the control algorithm to try and stay below. With receipts in the normal range of sizes seen in today's traffic, we should never even get close to 1 GB. But the protocol allows a single receipt to be multiple MBs. In those cases, a limit of 1 GB still gives us almost 100% utilization but prevents queues from growing larger than what validators can keep in memory.

Receipt forwarding limits

MIN_GAS_FORWARDING = 1 Pgas and MAX_GAS_FORWARDING = 300 Pgas give us a large range to smooth out how much should be forwarded to other shards. Depending on the burnt to attached gas ratio of the workload, it will settle at different values for each directed shard pair. This gives the algorithm adaptability to many workload patterns.

For the forwarding to work smoothly, we need a bit of an outgoing buffer queue. We found in simulations that MAX_CONGESTION_OUTGOING_GAS = 2 Pgas is enough for the forwarding limit to settle in the perfect spot before we are restricted by transaction rejection. Higher values did not yield better results but it does increase delays in some cases, hence we propose 2 Pgas.

Limiting new transactions

The remaining parameters work together to adapt how much new workload we accept in the system, based on how congested the chain is already.

REJECT_TX_CONGESTION_THRESHOLD = 0.25 defines how quickly we start rejecting new workload to a shard. Combined with the 20 PGas limit on the delayed receipts queue, we only reject new work if we have at least 5 PGas excess workload sent to that shard already.

This is more aggressive than other mechanisms simply because rejecting more workload to known-to-be congested shards is the most effective tool to prevent the system from accumulating more transactions. The sooner we do it, the shorter the delays experienced by users who got their transactions accepted.

MIN_TX_GAS = 20 Tgas and MAX_TX_GAS = 500 Tgas gives a large range to smooth out the split between gas spend on new transactions vs delayed receipts. This only looks at how many delayed receipts the local shard has, not at the receiving shard. Depending on the workload, it will settle at different values.

Note that hitting REJECT_TX_CONGESTION_THRESHOLD, which looks at the congestion of the receiving shard, overrules this range and stops all transactions to the congested shard when it is hit.

MIN_TX_GAS = 20 Tgas guarantees that we can still accept a decent amount of new transactions to shards that are not congested, even if the local shard itself is fully congested. This gives fairness properties under certain workloads which we could not achieve in any other of the tried congestion control strategies. This is also useful to add transaction priority in NEP-541, as we can always auction off the available space for new transactions without altering the congestion control algorithm.

Reference Implementation

A reference implementation is available in this PR against nearcore: near/nearcore#10918

Here are the most important details which are not already described in the specification above but are defined in the reference implementation.

Efficiently computing the congestion information

The congestion information is computed based on the gas and size of the incoming queue and the outgoing buffers. A naive implementation would just iterate over all of the receipts in the queue and buffers and sum up the relevant metrics. This approach is slow and, in the context of stateless validation, would add too much to the state witness size. In order to prevent those issues we consider two alternative optimizations. Both use the same principle of caching the previously calculated metrics and updating them based on the changes to the incoming queue and outgoing buffers.

After applying a chunk, we store detailed information of the shard in the chunk extra. Unlike the shard header, this is only stored on the shard and not shared globally.

The new fields in the chunk extra are included in ChunkExtraV3.

pub struct ChunkExtraV3 {

    // ...all fields from ChunkExtraV2

    pub congestion_info: CongestionInfo,
}

pub struct CongestionInfo {
  delayed_receipts_gas: u128,
  buffered_receipts_gas: u128,
  receipt_bytes: u64,
  allowed_shard: u16,
}

This implementation allows to efficiently update the StoredReceiptsInfo during chunk application by starting with the information of the previous chunk and applying only the changes.

Regarding integer sizes, delayed_receipts_gas and buffered_receipts_gas use 128-bit unsigned integers because 64-bit would not always be enough. u64::MAX would only be enough to store 18_446 Pgas. This translates to roughly 5 hours of work, assuming 1 Pgas per second. While the proposed congestion control strategy should prevent congestion ever reaching such high levels, it is not possible to rule out completely.

For receipt_bytes, a u64 is more than enough, we have other problems if we need to store millions of Terabytes of receipts.

For the id of the allowed shard, we chose a u16 which is large enough for 65_535 shards.

Bootstrapping

The previous section explain how the gas and bytes information of unprocessed receipts is computed based on what it was for the previous chunk. But for the first chunk with this feature enabled, the information for the previous chunk is not available.

In this specific case, we detect that the previous information is not available and therefore we trigger an iteration of the existing queues to compute the correct values.

This computed StoredReceiptsInfo only applies locally. But the next value of it will be shared in the chunk header and other shards will start using it to limit the transactions they accept and receipts they forward.

The congestion info of other shards is assumed to be 0 for all values for the first block with the cross-shard congestion control feature enabled.

Missing Chunks

When a chunk is missing, we use the congestion information of the last available chunk header for the shard. In practical terms this simply means we take the chunk header available in the block, even if the included height is not the latest.

Additionally, we include the number of missed chunks as part of the congestion formula, treating a shard with 10 or missed chunks the same way as an otherwise fully congested shard. This is to prevent sending even more receipts to a shard that already struggles to produce chunks.

Validation Changes

The following fields in the chunk header must be validated:

  • receipt_bytes: must be equal to receipt_bytes of the previous chunk, plus all bytes of new receipts added to delayed or buffered receipts, minus all the receipts removed of the same types.
  • delayed_receipts_gas must be equal to delayed_receipts_gas of the previous chunk, plus the gas of receipts added to the delayed receipts queue, minus the gas of receipts removed from the delayed receipts queue.
  • buffered_receipts_gas must be equal to buffered_receipts_gas of the previous chunk, plus the gas of receipts added to any of the outgoing receipts buffers, minus the gas of all forwarded buffered receipts.
  • allowed_shard must be a valid shard id.
  • allowed_shard must be equal to the chunk's shard id if congestion is below 1.

The balance checker also needs to take into account balances stored in buffered receipts.

Security Implications

With cross-shard congestion control enabled, malicious users could try to find patterns that clog up the system. This could potentially lead to cheaper denial of service attacks compared to today.

If such patterns exists, most likely today's implementation would suffer from different problems, such as validators requiring unbounded amounts of memory. Therefore, we believe this feature is massive step forward in terms of security, all things considered.

Integration with state sync

What we described in Efficiently computing the congestion information creates a dependence on the previous block when processing a block. For a fully synced node this requirement is always fulfilled because we keep at least 3 epochs of blocks. However in state sync we start processing from an arbitrary place in chain without access to full history.

In order to integrate the congestion control and state sync features we will add extra steps in state sync to download the blocks that may be needed in order to finalize state sync.

The blocks that are needed are the sync hash block, the previous block where state sync creates chunk extra in order to kick off block sync and the previous previous block that is now needed in order to process the previous block. On top of that we may need to download further blocks to ensure that every shard has at least one new chunk in the blocks leading up to the sync hash block.

Integration with resharding

Resharding is a process wherin we change the shard layout - the assignment of accound ids to shards. The centerpiece of resharding is moving the trie / state records from parent shards to children shards. It's important to preserve the ability to perform resharding while adding other protocol features such as congestion control. Below is a short description how resharding and congestion control can be integrated, in particular how to reshard the new trie columns - the outgoing buffers.

For simplicity we'll only consider splitting a single parent shard into multiple children shards which is currently the only supported operation.

The actual implementation of this integration will be done independently and outside of this effort.

Importantly the resharding affects both the shard that is being split and all the other shards.

Changes to the shard under resharding

The outgoing buffers of the parent shard can be split among children by iterating all of the receipts in each buffer and inserting it to appropriate child shard. The assignment can in theory be arbitrary e.g. all receipts can be reassigned to a single shard. In practice it would make sense to either split the receipts equally between children or based on the sender account id of the receipt.

Special consideration should be given to refund receipts where the sender account is "system" that may not belong to neither parent nor children shards. Any assignment of such receipts is fine.

Changes to the other shards

The other shards, that is all shards that are not under resharding, have an outgoing buffer to the shard under resharding. This buffer should be split into one outgoing buffer per child shard. The buffer can be split by iterating receipts and reassigning each to either of the child shards. Each receipt can be reassigned based on it's receiver account id and the new shard layout.

Alternatives

A wide range of alternatives has been discussed. It would be too much to list all suggested variations of all strategies. Instead, here is a list of different directions that were explored, with a representative strategy for each of them.

  1. Use transaction fees and an open market to reduce workload added to the system.

    • Problem 1: This does not prevent unbounded memory requirements of validators, it just makes it more expensive.
    • Problem 2: In a sharded environment like Near Protocol, it is hard to implement this fairly. Because it's impossible to know where a transaction burns most of its gas, the only simple solution would require all shards to pay the price for the most congested shard.
  2. Set fixed limits for delayed receipts and drop receipts beyond that.

    • Problem 1: Today, smart contract rely on receipts never being lost. This network-level failure mode would be completely new.
    • Problem 2: We either need to allow resuming with external inputs, or roll-back changes on all shards to still have consistent states in smart contracts. Both solutions means we are doing extra work when being congested, inevitably reducing the available throughput for useful work in times when the demand is the largest.
  3. Stop accepting transactions globally when any shard has too long of a delayed receipts queue. (See this issue.)

    • Problem 1: This gives poor utilization in many workloads, as our model simulations confirmed.
    • Problem 2: A global stop conflicts with plans to add fee based transaction priorities which should allow sending transactions even under heavy congestion.
  4. Reduce newly accepted transactions solely based on gas in delayed queues, without adding new buffers or queues to the system. Gas is tracked per shard of the transaction signer. (Zulip Discussion and idea in code.)

    • Problem 1: This requires N gas numbers in each chunk header, or N*N numbers per block, where N is the number of shards.
    • Problem 2: We did not have time to simulate it properly. But on paper, it seems each individual delayed queue can still grow indefinitely as the number of shards in the system grows.
  5. Smartly track and limit buffer space across different shards. Only accept new transactions if enough buffer space can be reserved ahead of time.

    • Problem 1: Without knowing which shards a transaction touches and how large receipts will be, we have to pessimistically reserve more space than most receipts will actually need.
    • Problem 2: If buffer space is shared globally, individual queues can still grow really large, even indefinitely if we assume the number of shards grows over time.
    • Problem 3: If buffer space is on a per-shard basis, we run into deadlocks when two shards have no more space left but both need to send to the other shard to make progress.
  6. Require users to define which shards their transactions will touch and how much gas is burnt in each. Then use this information for global scheduling such that congestion is impossible.

    • Problem 1: This requires lots of changes across the infrastructure stack. It would take too long to implement as we are already facing congestion problems today.
    • Problem 2: This would have a strong impact on usability and it is unclear if gas usage estimating services could close the gap to make it acceptable.
  7. An alternative way to what is described in Efficiently computing the congestion information would be to store the total gas and total size of the incoming queue and the outgoing receipts in the state along the respective queue or buffers. Those values will be updated as receipts are added or removed from the queue.

    • Pro: In this case the CongestionInfo struct can remain small and only reflect the information needed by other shards. (3 bytes instead of 42 bytes)
    pub struct CongestionInfo {
      allowed_shard: u16,
      congestion_level: u8,
    }
    • Con: Overall, it would result in more state changes per chunk, since the congestion value needs to be read before applying receipts anyway. In light of stateless validation, this would be worse for the state witness size

Future possibilities

While this proposal treats all requests the same, it sets the base for a proper transaction priority implementation. We co-designed this proposal with NEP-541, which adds a transaction priority fee. On a very high level, the fee is used to auction off a part of the available gas per chunk to the highest bidders.

We also expect that this proposal alone will not be the final solution for congestion control. Rather, we just want to build a solid foundation in this NEP and allow future optimization to take place on top of it.

For example, estimations on how much gas is burnt on each shard could help with better load distribution in some cases.

We also forsee that the round-robin scheduling of shard allowed to forward even under full congestion is not perfect. It is a key feature to make deadlocks provably impossible, since every shard is guaranteed to make a minimum progress after N rounds. But it could be beneficial to allocate more bandwidth to shards that actually have something to forward, or perhaps it would be better to stop forwarding anything for a while. The current proposal allows chunk producers to experiment with this without a protocol version change.

Lastly, a future optimization could do better transaction rejection for meta transactions. Instead of looking only at the outer transaction receiver, we could also look at the receiver of the delegate action, which is most likely where most gas is going to be burnt, and use this for transaction rejection.

Consequences

Positive

  • Accepted transaction have lower latencies compared to today under congestion.
  • Storage and memory requirements on validator for storing receipts are bounded.

Neutral

  • More transactions are rejected at the chunk producer level.

Negative

  • Users need to resend transaction more often.

Backwards Compatibility

There are no observable changes on the smart contract, wallet, or API level. Thus, there are no backwards-compatibility concerns.

Unresolved Issues (Optional)

These congestion problems are out of scope for this proposal:

  • Malicious patterns can still cause queues to grow beyond the parameter limits.
  • There is no way to pay for higher priority. (NEP-541 adds it.)

Postponed receipts are also considered to be added to receipt_bytes. But at this point it seems better to not include them to avoid further complications with potential deadlocks, since postponed receipts can only be executed when incoming receipts are allowed to come in.

Following the same logic, yielded receipts are also excluded from the size limits, as they require incoming receipts to resume.

A solution that also address the memory space of postponed and yielded receipts could be added with future proposals but is not considered necessary for this first iteration of cross-shard congestion control.

Changelog

[The changelog section provides historical context for how the NEP developed over time. Initial NEP submission should start with version 1.0.0, and all subsequent NEP extensions must follow Semantic Versioning. Every version should have the benefits and concerns raised during the review. The author does not need to fill out this section for the initial draft. Instead, the assigned reviewers (Subject Matter Experts) should create the first version during the first technical review. After the final public call, the author should then finalize the last version of the decision context.]

1.0.0 - Initial Version

Placeholder for the context about when and who approved this NEP version.

Benefits

List of benefits filled by the Subject Matter Experts while reviewing this version:

  • Benefit 1
  • Benefit 2

Concerns

Template for Subject Matter Experts review for this version: Status: New | Ongoing | Resolved

# Concern Resolution Status
1
2

Copyright

Copyright and related rights waived via CC0.