whitepaper.tex

\documentclass[12pt, letterpaper]{article}
\usepackage[utf8]{inputenc}

\usepackage[american]{babel}
\usepackage{csquotes}

\usepackage[
    style=apa,
    backend=biber,
    citestyle=numeric
]{biblatex}

\addbibresource{whitepaper.bib}

\title{
    PADLOCK \\
    \large Protocol Allowing Decentralized Large thrOughput Coins, Kool
}
\author{Ellis Frank \\ ellis@mcfranko.com}

\begin{document}

\maketitle

\begin{abstract}
    PADLOCK is a cryptocurrency protocol that is designed to be decentralized,
    secure, and large throughput. It achieves this through sharded block
    verification, a sharded UTXO set, a novel consensensus mechanism referred
    to as "Proof of Staked Work", UTXO commitments, and UTXO expiry.  When all
    of this is added together, it creates a cryptocurrency that is able to
    match, and vastly exceed Visa capacity.
\end{abstract}

\section{Introduction}
Most blockchains suffer from the scaling trilemma: decentralization, security,
and scale; pick two. In order to remain decentralized, running a verifying node
and mining on it should be easy. Running a verifying node allows the user to
ensure consensus rules are being followed. Mining should be accesible to users
as well, in order to ensure censorship-resistance, and egalitarian distribution
of newly minted coins. However, this inversely proportional to scale. Scaling to
a larger number of transactions puts a burden on mining and verifying nodes.
This centralization opens the network to several attacks, such as the one
described by Vitalik Buterin \cite{LimitToBlockchainScaling}, where the nodes
and major miners of a blockchain hard fork and mint new coins, or large scale
censorship.  This paper attempts to demonstrate a protocol capable of scaling
well remaining decentralized, and secure.

\section{Sharded Block Verfication}
Sharded verification is achieved by having each node only verify a random
portion of each block. Each block will be split into 64 sections/shards. A node
that is verifying a block will choose a portion of the sections (or one single
section) to verify. With 1000 nodes and each of them verifying only one section,
there will only be a block that doesn't get every section verified rougly
0.001\% of the time.

A fraud proof consists of data in that block that is invalid. Attached to the
data must be various merkle proofs that proves the invalid data is actually
apart of the block.

There is one type of fraud proof that would require verifying the entire block,
which is calculating fees owed to the miner. In order to calculate this, a node
would have to add up the fee from every transaction in the block, thus requiring
verification of the entire block to prove fraud.

In order to address this, there will be 256 sections of each block, and each
section specifies the sum of the fees in each section. In order to prove that
the total fee is wrong, only one 256th of the block needs to be revealed.

\section{UTXO set sharding}
The UTXO set will be split into 64 different shards. Each section of the block
will change a different UTXO set shard. However, this leads to a problem; some
transactions in each section will use inputs from different UTXO set shards than
the verifying node holds.

To solve this, the UTXO set will be stored as 64 separate UTreeXO \cite{UTreeXO}
style accumulators. Each node will need to store the roots of every accumulator,
which isn't likely to be over 500 bytes per accumulator. Every transaction input
will need to contain a proof of inclusion to one of the accumulators. Nodes will
then be free to choose which of the 64 accumulators they want to store the UTXOs
from (if any).

\subsection{Minimizing proof sizes}
Proof sizes can be greatly reduced by merging merkle proofs. Each merkle proof
supplies the prover with the locations of various hashes in the merkle tree.

\[ 4400 * 60 * 60 * 24 * 30 * 3 = 34,214,400,000 \textrm{ UTXOs} \]
\[ 2^{35} = 34,359,738,368 \textrm{ Merkle tree of height 35 required} \]
\[ 120 * 4400 = 528,000 \textrm{TXs in a block} \]
\[ 2^{19} = 524,288 \]

Each transaction in the block \textbf{must} share at least 19 branches in the
merkle tree, meaning that those 19 branches will only need to be stored once.
This means every transaction except a few will be able to remove at least 19
hashes from their proofs, resulting in only 16 hashes being needed. Using 25
byte hashes \footnotemark, this adds an extra 400 bytes per transaction
(assuming one input), which is very manageable. This extra data is also in the
right place.  Transaction data isn't an ongoing cost, as old transaction data
can be pruned without a loss in security, whereas the UTXO set is a permanent
cost and cannot be pruned.

\footnotetext{25 byte hashes can be made secure, as described by the UTreeXO
white paper\cite{UTreeXO}}

\subsection{Verkle Tree Potential}
Once the libraries and cryptography around verkle trees are more mature, merkle
trees can be swapped for verkle trees, which would greatly reduce proof size.

\section{Data Availability Proofs}
The above described method of sharding has a vulnerability. An attacker could
simply not publish one section of the block, and then have that section's
accumulator contain a UTXO giving them unlimited coins, or remove someone else's
coins. PADLOCK's security model is based on data being available to verify, so
that a node can create a fraud proof if the data is invalid. But in this case,
the data isn't available, so there is nothing to create fraud proofs from.

In order to counter this, data availability proofs are used. The purpose of data
availbility proofs are to prove that some data is available, without having to
download all of the data. The prover will split the data into chunks, then
perform erasure coding on the data . A merkle tree will then be formed from
these chunks, and the merkle root will be.

The verifier will select random chunks to download. The prover will provide
those chunks, as well as the merkle branches to prove those chunks are apart of
the data.

Because erasure coding was performed, if the prover loses the data, or refuses
to provide it, as long as all the verifiers have cumulatively downloaded at least ~64\% of the
data, it can be reconstructed.

A malicious block producer could still hide an invalid transaction by performing
incorrect erasure coding. To counteract this, nodes which fully download a
section of a block can provide a fraud proof, proving that the coding was done
incorrectly. Using parity based erasure codes, these fraud proofs will only
require a few chunks of the total data. The size of chunks can also be brought
down depending on the amount of nodes, so erasure coding fraud proofs will
likely take less than 10 kilobytes.

\section{Sharded Block Production Isn't Needed}
Sharded block production won't be required until PADLOCK is experiencing large
amounts of throughput. Producing blocks isn't the bottleneck with blockchains.
Due to the need to sync, validating blocks should only take around 5-10\% of a
validators processing power. This is in part to protect from DDOS attacks, but
mainly because of the time it takes to sync. If 100\% of processing power is
used, if a node goes offline, it won't be able to sync with the rest of the
network, as it will always remain behind by the amount of time it was offline.

However, for producing blocks in a system with sharded block verification, this
isn't an issue, as the verification time will be significantly lower than the
time it takes to produce blocks. Bitcoin ABC has been shown to be able to
validate 3000 transactions per second well running 3 nodes on a 4 core desktop
at 100\% cpu usage \cite{BitcoinBenchmark}.

Sharded block production would need to be implemented once throughput starts to
be in excess of ~5000 transactions per second.

\section{UTXO Expiry}
Well it is possible to fully verify the blockchain with only the roots of the
accumulators, users wanting to spend their UTXOs will need to aquire the proofs
from somewhere. In order to facilitate this, nodes will have the option to store
proofs for one or many of the UTXO shards.

However, some UTXOs are sent to invalid addresses, or aren't going to be spent
for a long time. So that nodes won't have to store these sorts of UTXO proofs,
UTXOs will be unspendable after 3 months, and nodes won't have to store their
proofs anymore. These UTXOs can be reintroduced by providing a merkle proof that
they were apart of the UTXO set before it was pruned.

\section{Fast Syncing Via UTXO Commitments}
Because the roots of the accumulators are stored in every block, a node can
start syncing from any point in the blockchain, rather than syncing from the
very beginning. This is secure assuming that the blockchain was valid up until
the point when the node started syncing.

\section{Proof of Staked Work}
The goal of Proof of Staked Work is to provide a consensus mechanism that
doesn't require constant energy expenditure, and is easy to participate in.
Proof of Work is undesirable because it requires constant energy expenditure.
Proof of Staked Work mitigates this, by requiring participants to prove they
have computing power very infrequently. In a general form, in order to become a
block producer, you must first register by proving you have computational power.
What makes this different than normal proof of work, is that block producers
don't need to prove they have computing power every time they make a block, and
there isn't a need for constant energy expenditure like in proof of work.

\subsection{Registration}
To be registered as a block producer, you must mine a valid registration block.
Registration blocks happen once every 3 blocks. A registration block doesn't
contain any actual transaction data, but rather just proofs of work.

Many block producer candidates can pool together as well. They do this by having
the registration block contain a list of all of their addresses, then a list of
nonces, with the nonce producer's address appended to the nonce. This has a
capacity for 20 nonces. The cumulative difficulty of every nonce is added, and
must reach a certain threshold. The difficulty is set to target 20 seconds. The
mining algorithm is RandomX. RandomX is most efficiently mined on traditional
CPUs, which makes it much easier for people to participate, as they don't need a
good GPU, or ASIC.

Every registration block, existing block producers score will be reduced by
0.1\%. This is done to incentivise existing block producers to try and mine new
registration blocks.

\subsection{Flow of Block Creation}
First, a block producer is chosen via a verifiable random function, using the
most recent registration block as entropy. This is secure against stake
grinding, as registration blocks are computationally hard to produce.

The block producer will collect transactions from the mempool, and then add them
to their block. The block producer will then have to complete a proof of work on
their block to prevent spam. The difficulty of this proof of work adjusted every
block to reach an average of 10 minutes per block. Once this is done, they
broadcast their block, and the network will then verify it. If a node detects
the block as invalid, it will broadcast a fraud proof to the network, and the
block will be dropped by all nodes and clients. This results in a 1 of N trust
model for users, compared to the N/2 of N trust model from most consensus
mechanisms.

Well solving the proof of work for their block, the block producer will
broadcast any proofs of work they make that are difficult, but don't meet the
difficulty target. This is done to prove to the rest of the network that they
are online and working on their block.

\subsection{Consensus}
The valid blockchain is considered the \textit{heaviest} chain. Normal blocks
will possess 100 weight units. However, in the event where there was a second
block producer selected, if the first block producer, does provide a block, the
second block, and the following 5 after it, will be worth 80 weight units.

\subsection{Incentives}
In order to incentivise being a block producer, they will periodically receive a
reward based on their score. The total amount given in each period will be fixed
(i.e. the inflation doesn't increase when there are more block producers). This
is how new supply gets put into circulation.

\subsection{Reduced Block Reward}
Because block production takes much less energy than it would in Proof of Work,
block producer's reward can be safely reduced without a drop in security. This
showcases the largest benefit of Proof of Staked Work: it produces the same
security as proof of work (security measured by the cost of acquiring all the
hardware it would take to control 51\% of the hash rate), well requiring much
less energy expenditure.

\subsection{Attacks}

\subsubsection{Stake Grinding}
Stake Grinding is an attack that can be done on certain implementations of proof
of stake. In these implementations, the next block producer is chosen with a
verifiable random function (VRF) seeded with previous blockchain data. The issue
with this design is that a block producer could simply change block parameters
until the VRF produces a result that picks themselves as the next block
producer. This design is immune to this, because generating a new block requires
a proof of work, so generating blocks is slow. On top of this, there is the
verifiable delay function, which would slow down an attacker even more. In the
time it would take them to find block parameters that benefit them, another
block producer would have been chosen.

\subsubsection{Nothing at Stake}
The Nothing at Stake problem happens when there is a fork in a Proof of Stake
blockchain. Stakers are incentivised to stake on both chains at once, in case
one chain gets ahead of another, and they are selected as block producer on the
forked chain. In order to dis-incentivise this, block producers can include a
fraud proof consisting of a chain of block headers from forked chains in order
to prove certain block producers were extending forked chains. This will result
in those block producers score dropping by 80\%. This incentivises all block
producers to remain on the same chain, to reduce the risk of their score going
down.

\section{Privacy}
Privacy is first achieved through pseudonymity. Users can also generate new
addresses for every transaction. However, this still has some pitfalls. If
someone receives a transaction and they know the sender, they are able to see
their balance. To combat this, PADLOCK uses confidential transactions.
Confidential transactions use homomorphic encryption to hide transaction
amounts, well still allowing them to be verified for zero balance sums
\cite{CT}

\section{Throughput Estimate}
There are a total of 64 shards. With a low amount of nodes, in order to remain
secure, nodes will want to validate multiple, or all, of the shards. However,
with a large amount of validating nodes, it becomes increasingly likely that
every shard in a block will be validated by one of the nodes, therefore making
it secure to validate a lower number of shards. Ideally, nodes will only need to
validate one shard. Assuming similar performance to Bitcoin ABC (PADLOCK will
likely perform better, as ABC is based on Bitcoin Core, which doesn't have
parralelization, runs on decade old code, and uses a bulky scripting language),
at 100\% load, a typical 4 core desktop can handle at least 3000 transactions
per second.  In order to make syncing quick, no more than 10\% of the processing
power should be used to validate blocks. This would allow one node to process
3000 transactions per second (see section 4). With 64 shards, that's 19200
transactions per second.  However, before that point, sharded block production
should be implemented.

\section{Adaptive Block Size Cap}
There will be a cap on block size in order to prevent spam attacks. However,
this cap shouldn't be a hard coded constant. Rather, it should be adaptive, so
that a potentially contentious hard fork isn't required every time the
throughput cap starts to be met. The block size cap for PADLOCK is calculated by
multiplying the median block size of the past two months of blocks by two. In
order for an attacker to increase the cap to unweildly amounts, they'd have to
produce half of the blocks in the two month time period.

\section{Conclusion}
In this paper, we have outlined the general structure for how the PADLOCK
blockchain will work. It allows for very high throughput well remaining secure
and decentralized thanks to sharding, requires little energy expenditure, and
keeps users privacy by hiding transaction amounts. The description offered here
is subject to change as more research is done, and as implementation is done.

\printbibliography

\end{document}