Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add proposal for transactions v2 and address map program #17103

Merged
merged 19 commits into from
Jun 11, 2021
Merged
186 changes: 186 additions & 0 deletions docs/src/proposals/big-transactions.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,186 @@
# Big Transactions
jstarry marked this conversation as resolved.
Show resolved Hide resolved

## Problem

Messages transmitted to Solana validators must not exceed the IPv6 MTU size to
ensure fast and reliable network transmission of cluster info over UDP.
Solana's networking stack uses a conservative MTU size of 1280 bytes which,
after accounting for headers, leaves 1232 bytes for packet data like serialized
transactions.

Developers building applications on Solana must design their on-chain program
interfaces within the above transaction size limit constraint. One common
work-around is to store state temporarily on-chain and consume that state in
later transactions. This is the approach used by the BPF loader program for
deploying Solana programs.

However, this workaround doesn't work well when developers compose many on-chain
programs in a single atomic transaction. With more composition comes more
account inputs, each of which takes up 32 bytes. There is currently no available
workaround for increasing the number of accounts used in a single transaction
since each transaction must list all accounts that it needs to properly lock
accounts for parallel execution. Therefore the current cap is about 35 accounts
after accounting for signatures and other transaction metadata.

## Proposed Solution

Introduce a new on-chain account indexing program which stores account address
mappings and add a new transaction format which supports concise account
references through the new on-chain account indexes.

### Account Indexing Program

Here we describe a contract-based solution to the problem, whereby a protocol
developer or end-user can create collections of related accounts on-chain for
concise use in a transaction's account inputs. This approach is similar to page
tables used in operating systems to succinctly map virtual addresses to physical
memory.

After addresses are stored on-chain in an index account, they may be succinctly
referenced from a transaction using an index rather than a full 32 byte address.
This will require a new transaction format to make use of these succinct indexes
as well as runtime handling for looking up and loading accounts from the
on-chain indexes.
jstarry marked this conversation as resolved.
Show resolved Hide resolved

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about writing down bpf-visible changes?

  • basically there should be none to maintain compatibility any given program's instruction is in normal tx or this new tx format
  • but change is visible in the instruction sysvar?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also what about rpc, explorer, cli? should we probably abstract away this and present as if the account keys are just large at the rpc and ui? CC: @CriesofCarrots @oJshua

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ryoqun yes, no bpf visible changes should happen. The instruction sysvar uses decompiled instructions which will not know anything about account indexes.

As for client side impact. I think that RPC should support returning both compressed and decompressed transactions. I wrote up details here: https://github.com/solana-labs/solana/pull/17103/files#diff-e15f9fcbfd6f181fc2a36cf56d0b5bc1d6837fbd1da1d88a5739d12f6878d549R116-R120

#### State

Once created, index accounts may not be deleted. Stored addresses should be
jstarry marked this conversation as resolved.
Show resolved Hide resolved
append only so that once an address is stored in an index account, it may not be
ryoqun marked this conversation as resolved.
Show resolved Hide resolved
removed later.

Since transactions use a u16 offset to look up addresses, index accounts can
store up to 2^16 addresses each. Anyone may create an index account of any size
as long as its big enough to store the necessary metadata. In addition to
stored addresses, index accounts must also track the latest count of stored
addresses and an authority which must be a present signer for all index
modifications.

#### Program controlled indexes

If the authority of an index account is controlled by a program, more
sophisticated indexes could be built with governance features or price curves
for new index addresses.

jstarry marked this conversation as resolved.
Show resolved Hide resolved
### Versioned Transactions

In order to allow accounts to be referenced more succinctly, the structure of
serialized transactions must be modified. This means that the new transaction
format must be distinguished from the current transaction format.

Current transactions can fit at most 19 signatures (64-bytes each) but the
message format encodes the number of required signers as a `u8`. Since the
jstarry marked this conversation as resolved.
Show resolved Hide resolved
upper bit of the `u8` will never be set for a valid transaction, we can enable
it to denote whether a transaction should be decoded with the versioned format
or not.

#### New Transaction Format

```rust
pub struct VersionedMessage {
/// Version of encoded message.
/// The max encoded version is 2^7 - 1 due to the ignored upper disambiguation bit
pub version: u8,
pub header: MessageHeader,
/// Number of read-only account inputs specified thru indexes
pub num_readonly_indexed_accounts: u8,
#[serde(with = "short_vec")]
pub account_keys: Vec<Pubkey>,
/// All the account indexes used by this transaction
#[serde(with = "short_vec")]
pub account_indexes: Vec<AccountIndex>,
pub recent_blockhash: Hash,
/// Compiled instructions stay the same, account indexes continue to be stored
/// as a u8 which means the max number of account_indexes + account_keys is 256.
#[serde(with = "short_vec")]
pub instructions: Vec<CompiledInstruction>,
}

pub struct AccountIndex {
pub account_key_offset: u8,
// 1-3 bytes used to lookup address in index account
pub index_account_offset: CompactU16,
jstarry marked this conversation as resolved.
Show resolved Hide resolved
}
```

#### Size changes

- Extra byte for version field
- Extra byte for number of total account index inputs
- Extra byte for number of readonly account index inputs
- Most indexes will be compact and use 2 bytes + index address
- Cost of each additional index account is ~2 bytes

### Limitations

- Max of 256 accounts may be specified in a transaction because u8 is used by compiled
instructions to index into transaction message account keys.
jstarry marked this conversation as resolved.
Show resolved Hide resolved
Indexes can hold up to 2^16 keys. Smaller indexes is ok. Each index is then u16
- Transaction signers may not be specified using an on-chain account index, the
full address of each signer must be serialized in the transaction. This ensures
that the performance of transaction signature checks is not affected.

jstarry marked this conversation as resolved.
Show resolved Hide resolved
## Security Concerns

### Resource consumption

Enabling more account inputs in a transaction allows for more program
invocations, write-locks, and data reads / writes. Before indexes are live, we
need transaction-wide compute limits and increased costs for write locks and
data reads.

### Front running

If the addresses listed within an index account are modifiable, front running
attacks could modify which index accounts are accessed from a later transaction.
For this reason, we propose that any stored address is immutable and that index
accounts themselves may not be removed.

### Denial of service

Index accounts will be read very frequently and will therefore be a more high
profile target for denial of service attacks through write locks similar to
sysvar accounts.

Since stored accounts inside index accounts are immutable, reads and writes
jstarry marked this conversation as resolved.
Show resolved Hide resolved
to index accounts could be parallelized as long as all referenced addresses
are for indexes less than the current number of addresses stored.
jstarry marked this conversation as resolved.
Show resolved Hide resolved

Copy link
Member

@ryoqun ryoqun Jun 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe account index must be grouped by program id and authorized to the program id owner?

I came up with this attack (unrealistic though): if tx uses untrusted index account...:

  1. malicious index account owner appends good account index entry at N
  2. before 1. is rooted, alice is tricked to create and sign a tx referencing the newly added entry according to commitment=processed
  3. malicious colluding leader creates new fork reverting 1.
  4. malicous index account owner appends bad account index entry at N (effectively replacement)
  5. so, alice is tricked into buying garbage or trading against bad clob. etc? (so, kind of replay attack)

Copy link
Member

@ryoqun ryoqun Jun 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, it will be a good practice for any wallets to check the actually referenced account address via index before signing like they're doing for account keys?

Otherwise, their signer bit can be abused covertly by referencing via cpi and the compromised account index account.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also thought about hard bad possible interplay regarding nonces and gossiped vote transactions but I couldn't come up with anything particularly bad.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise, their signer bit can be abused covertly by referencing via cpi and the compromised account index account.

Can you please elaborate a bit more here, I didn't quite understand this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, it will be a good practice for any wallets to check the actually referenced account address via index before signing like they're doing for account keys?

Hmm, we might want to add an integrity check. Perhaps we add the sha256 hash of all accounts referenced via indexes to the transaction message?

It might be overkill though. This is not really any different from a malicious actor convincing a user to use an account with some state A1 and then forking the chain to change the account to state A2. It's the client's responsibility to confirm the finality all account state before signing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I think an integrity check could be done as a separate instruction. Users who desire more security can add an assertion instruction which hashes the indexed account addresses together.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The effects of this indirection on what the transaction signatures are saying has been bothering me a bit as well. Ideally, we'd sign over the actual addresses, use the indices for broadcast, then recover the addresses before sigverify. I don't think this is practical though.

Alternatively, we could add another field of {indexed_addresses_witness_index: u8, indexed_addresses_signature: Signature}, that's checked at runtime between accounts load and transaction execution. This should be robust if we require that the address_witness has also signed the broadcast encoding. It'll cost us 65 bytes though.

Hmm, we might want to add an integrity check. Perhaps we add the sha256 hash of all accounts referenced via indexes to the transaction message?

Theoretically, this would be vulnerable to a birthday attack unless the program had full control of the hash. We can mitigate by mixing in date outside an attackers control, like the most recent blockhash. It would be pretty costly to the size of the index entries though

Attaching an insertion_slot to each index and requiring a cool down period before use would likely be sufficient

It might be overkill though.

I'm apt to agree. It seems like these attacks need sufficient control over the network as to assume it's compromised in many other ways.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, it will be a good practice for any wallets to check the actually referenced account address via index before signing like they're doing for account keys?

Otherwise, their signer bit can be abused covertly by referencing via cpi and the compromised account index account.

Can you please elaborate a bit more here, I didn't quite understand this.

I think my concern is no longer valid with these bits :) :

If the same account is referenced in a transaction by address as well as through
an index, the transaction should be rejected to avoid conflicts when determining
if the account is a signer or writeable.

Transaction signers may not be specified using an on-chain account index, the
full address of each signer must be serialized in the transaction. This ensures
that the performance of transaction signature checks is not affected.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a paragraph about the attack vector where indexes are modified pre-finalization. I think we can just recommend that clients wait for finalization before using an index. They can also add their own integrity checks if needed

## Other Proposals

1) Account prefixes

Needing to pre-register accounts in an on-chain index is cumbersome because it
adds an extra step for transaction processing. Instead, Solana transactions
could use variable length address prefixes to specify accounts. These prefix
shortcuts can save on data usage without needing to setup on-chain state.

However, this model requires nodes to keep a mapping of prefixes to active account
addresses. Attackers can create accounts with the same prefix as a popular account
to disrupt transactions.

2) Transaction builder program

Solana can provide a new on-chain program which allows "Big" transactions to be
constructed on-chain by normal transactions. Once the transaction is
constructed, a final "Execute" transaction can trigger a node to process the big
transaction as a normal transaction without needing to fit it into an MTU sized
packet.

The UX of this approach is tricky. A user could in theory sign a big transaction
but it wouldn't be great if they had to use their wallet to sign multiple
transactions to build that transaction that they already signed and approved. This
could be a use-case for transaction relay services, though. A user could pay a
relayer to construct the large pre-signed transaction on-chain for them.

In order to prevent the large transaction from being reconstructed and replayed,
a nonce counter system would be necessary as well.
jstarry marked this conversation as resolved.
Show resolved Hide resolved

3) Epoch account indexes

Similarly to leader schedule calculation, validators could create a global index
of the most accessed accounts in the previous epoch and make that index
available to transactions in the following epoch.

This approach has a downside of only updating the index at epoch boundaries
which means there would be a few day delay before popular new accounts could be
referenced.
jstarry marked this conversation as resolved.
Show resolved Hide resolved