-
Notifications
You must be signed in to change notification settings - Fork 9
/
dream.tex
214 lines (150 loc) · 19.7 KB
/
dream.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
The increasing digitisation of data and the widespread use of cloud services have amplified concerns surrounding the privacy and control of personal information like never before.
In response, regulators often impose requirements on removability without recognising that it is virtually impossible to make material once seen become unseen.
However, the reality is that entities operating infrastructure that underpins digital publishing have control over the technology that serves the content and can implement "removal" by denying access.
This centralised gate-keeping approach may seem appealing from a regulatory standpoint, as it can be thought of as an instrument of law enforcement. Although such interventions rarely address the breach of unrealistic privacy rights, the gate-keeper still provides a well-defined responsible party with the effective course of action: not serving censored content.
While this gives the unwary a phony sense of security, it only exemplifies the larger issue of censorship: publishing platforms possess the technological means to filter content, and centralised control makes it more affordable to exert influence over the content. What starts out as sensible measures of content curation gradually transforms into extant censorship. This is all the more problematic with social platforms that have gained quasi-monopolistic status as a result of network effects. Due to the high costs associated with switching platforms, innocent content creators increasingly find themselves 'deplatformed'. Similarly, the ability to identify hosts and deny access to content through legal means gives powers to be the means to infringe on freedom of speech.
In the decentralised paradigm of web 3, there is no longer a single operating entity controlling the publishing platform or the hosting infrastructure. This renders the cost of censorship infeasible. And yet, the potential permanent exposure of personal data remains a significant concern for most individuals. Therefore, there is a need for novel solutions that can restore the sense of security that centralised gate-keepers purportedly represent.
\subsection{Deletion and revoking access}
First, notice that any reference to removal of information in the sense of erasing it from all physical storage devices is both unenforceable and impractical. Even the most rigorous data protection audits do not require the erasure of offending data from backup tapes. In general, the tacit assumption is that information ordered to be removed should become inaccessible through \emph{typical precedented methods of access}.
In what follows, we formulate the strongest meaningful definition of deletion applicable to decentralised storage systems and offer a construction that implements it.
Importantly, this approach is purely technical, focusing on the capabilities and costs of primary actors, rather than relying on procedural measures that impose obligations on intermediaries to respect the rights of primary actors.
The primary actors in this context are the \emph{uploaders} who wish to share content by granting read access to a number of parties, called \emph{downloaders}. Granting access is defined as providing a canonical \emph{reference} to the content, allowing the system to eventually retrieve the complete information that is intended to be disclosed. Any party that is privileged to access this information is able to store, re-code, and potentially disseminate it. This provides a myriad of ways to make the content accessible at any later point in time, bypassing any process that would qualify as deletion or removal of access.
There is, by definition, no protection possible against such adversity. As a consequence, any legally sound notion of deletion (retrospective revocation of access) is meant to be interpreted in a narrower sense: \emph{the viability to replay the same access method} at a lower cost than the \emph{full cost} of storing the content.%
%
\footnote{That is, the total storage cost paid for the full size of the content starting from the time access was revoked until the attempted breach.}
Let us now define deletion as a scheme for uploading content with access revocation, meeting the following requirements:
\begin{labelledlist}
\item[\emph{Specialisation}] The uploader is able to choose at the time of publishing a specialised construct%
%
\footnote{From a user's perspective, content that is meant to be reliably deleted should be uploaded as such. The costs of uploading such content is allowed to be a (small integer) multiple of the cost of regular, censorship-resistant but non-deletable uploads.}
%
allowing retrospective revocation of access.
\item[\emph{Sovereignty}] The uploader is the sovereign owner of the data and is in sole possession of the means to revoke access from any party that was previously granted access.%
%
\footnote{The uploader's credentials are necessary to delete their own deletable content. Deletables must also allow access control, i.e., they are only available to a specific set of recipients.}
%
\item[\emph{Security}] After the revocation of access, a grantee is unable to access the content using the same reference or any other cue shorter than the deleted content.%
%
\footnote{In other words, if the downloader has \emph{not} stored at least as much information as the deleted information itself, they will have no way of retrieving it.}
%
\end{labelledlist}
Note that regularly uploaded content may be \emph{forgotten}: if nobody pays for storing it and the content is not accessed frequently, the chunks constituting it will be garbage collected. However, chunks with expired postage stamps cannot be regarded as reliably deleted in order to satisfy the requirements of sovereignty and security.
\subsection{Construction}
The goal of this section is to arrive at a formal construction of a DISC-based revocable access model. We will restrict our scheme to \emph{chunks}, the fundamental fixed-sized storage units of Swarm's DISC model. The proposed \emph{dream} construct implements a deletable content storage and access model that fulfills the requirements of specialisation, sovereignty, and security. The use of the word 'dream' alludes to the somewhat unexpected finding that such a construct is even possible in the immutable DISC model. On top of this, as customary in Swarm, it serves as a mnemonic acronym, resolving to the 5 \emph{dream attributes} expressing requirements of access control.
\begin{labelledlist}
\item[\textbf{D} -- \emph{deniable}] The dream key serves as a one-time pad for decryption. Since multiple content chunks (in fact any arbitrary content) can use the same dream pad, the key's association to any content is plausibly deniable.
\item[\textbf{R} -- \emph{revocable}] Access granted through dream keys is revocable. Revoking access from all parties, including oneself, is considered as deletion.
\item[\textbf{E} -- \emph{expirable}] The scheme allows for one-time use, i.e., the key can only be retrieved once.
\item[\textbf{A} -- \emph{addressable}] Access can be granted to a neighbourhood, where only clients operating a node within a particular an overlay address range are able to access the content.
\item[\textbf{M} -- \emph{malleable}] The construct is resilient to churn and dynamic changes in network size; it is reusable across independent grantees and upgradeable.
\end{labelledlist}
The scheme is built on top of DISC's APIs and can be implemented entirely as a second-layer solution. Despite its rich feature set, the scheme does not use complex cryptography, but instead leverages the interplay of various component subsystems.
% The network protocols of Swarm use forwarding Kademlia meaning that requests are routed to their destination through a series of hops successively relayed by forwarding nodes with ever increasing proximity to the target address.
% The chunk delivery to storer nodes can also be seen as a request: the closest node respond with a statement of custody that is passed back via the same route as the original request (\emph{backwarding}).
% \subsubsection{Pseudo-random sequence from generator}
Assume we have a pseudo-random deterministic function that generates a longer (e.g., chunk-sized = 4K) sequence $c$ from a key-sized (32 byte) generator $g$.%
%
\footnote{For example, the Keccak sponge function used throughout Ethereum for hashing does have this capability. Alternatively, the blockcipher encryption using initial nonce $g$ could be applied to a fixed constant chunk such as all zeros.}
% \subsubsection{Dream: deniable, revocable, expirable, addressable, malleable}
The central construct called a \emph{dream} is a chunk-sized one time pad which acts as a decryption key for the deletable content. The dream chunk is collectively created by a set of nodes following a network protocol. Each node in the sequence of participating peers receives a piece of data and combines it as input with data from their reserve to produce an output, which is then sent to the neighbourhood of the next node in the chain. As long as each node in the chain performs the same calculation and forwards the same result, the same one-time pad can be generated, enabling the retrieval and reading of the deletable content.
The insight here is that retrieving a deletable chunk's references involves calculations using a set of immutable chunks controlled by the uploader. The uploader's ability to change this underlying set holds the key to providing the necessary mutability required by any notion of deletion.
Let $b$ be the batch ID (a 32-byte hash) of a postage batch owned by $U$, and
let $\mathit{Chunks}(\gamma,b,p)$ stand for all chunks stamped by $b$ at block $\gamma$ in the bucket designated by the pivot $p$. Define $\overline{\Huge\chi}(\gamma,b,p)$ as the chunk stamped by batch $b$ in the bucket designated by $p$ older than $\gamma$ whose address is closest (has minimal XOR distance) to pivot $p$.
\begin{eqnarray}
\overline{\Huge\chi}&:& \Gamma\times\mathit{Batches}\times \mathit{Segment} \mapsto \mathit{Chunks}\\
\overline{\Huge\chi}(\gamma,b,p)&\defeq &\underset{c\in \mathit{Chunks}(\gamma,b,p)}{\operatorname{arg\,min}}{\Large\chi}(p,\textsc{Address}(c))
\end{eqnarray}
If batch $b$ is underutilised, a bucket designated by $p$
may not contain any chunks belonging to the batch bucket.
As a result, there will not be a chunk closest to $p$, rendering the $\overline{\Huge\chi}$ function undefined in such cases.
Let us now define the OTP update function $\Delta(\gamma,b)$
for batch ID and input address $p$ as follows:
%
\begin{eqnarray}
{\Delta}&:&\Gamma\times\mathit{Batches}\mapsto\mathit{Chunks}\mapsto\mathit{Chunks}\\\
{\Delta}(\gamma, b)&:&\mathit{Chunks}\mapsto\mathit{Chunks}\\
\Delta(\gamma,b)(c)&\defeq&\mathcal{G}\idx{4K}(\textsc{Address}(\overline{\Huge\chi}(\gamma,b,c)))
\end{eqnarray}
Since uniformity depth of batches is intended to be greater than the nodes' storage depth, it is expected that all chunks in a batch bucket designated with $p$ are present in the reserve of every node within the neighbourhood designated by $p$.
However, if Swarm grows and there are nodes whose storage depth is higher than the uniformity depth of the batch, then the neighbourhood designated by $p$ may not contain all the
chunks in the bucket, thus compromising the computability of the OTP update function.
Nevertheless, if two conditions are met, namely: 1) the bucket of batch $b$ designated by $p$ is not empty and 2) the neighbourhoods include one or more buckets, then the update function is computable by any storer node within the neighbourhood designated by $p$.
Since the update function can be applied to its own output, we can define the \emph{dream path} as follows:
\begin{eqnarray}
\Pi(b,g)&\defeq&c_0,\ldots, c_n\in\mathit{Chunks}\{n\}\\
c_i&\defeq&\begin{cases}
\mathcal{G}\idx{4K}(g) &\text{if }i=0\\
\Delta(b,c_{i-1}) &\text{otherwise}
\end{cases}
\end{eqnarray}
The dream path function is a pseudo-random generator that is defined by finite recursion. A \emph{dream} is a particular pairing of a generator and a one time-pad, which serve as the input and output of a dream path.
Given a dream path of length $n$ and grantee overlay address $a$, along with a batch $b$ with a uniformity depth of $d$, the uploader needs to find the generator $g$ such that the $n$-th chunk of the dream path falls within $a$'s neighbourhood (i.e., $\mathit{PO}(a,\textsc{address}(c_{n-1}))\geq d$).
\begin{eqnarray}
\mathit{dream}(b,n,a) &\subset &\mathit{Keys}\times\mathit{Chunks}\\
\langle g, k\rangle &\in& \mathit{dream}(b,n,a) \\
&&\Longleftrightarrow\\
&&k=\Pi(b,g)\idx{n}
\land\\
&&\mathit{PO}(a,H_{\mathit{bmt}}(k))\geq d
\end{eqnarray}
The uploader is able to construct the dream pad, which also allows them to calculate $k$ given $g$. Since $\mathit{H}(k)$ is uniform, the chance of $\mathit{PO}(a,\mathit{H}(k))\geq d$ is 1 in $2^{d}$.
Now we can turn to the definition of a dream, which is a network protocol based access method that is deniable, revocable, expirable, addressable, and malleable. In order for downloaders to calculate the dream pad, they must rely on the network, where each recursive step is calculated by a node in the neighbourhood designated by the respective input chunk's address.
In order to guarantee the correct termination, the following criteria must be fulfilled:
\begin{itemize}
\item all buckets of batch $b$ designated by $p_0,\ldots, p_{n-1}$ must be non-empty
\item the uniformity depth of the batch must remain higher than the storage depth of nodes in the neighbourhoods designated by $p_0, \ldots , p_{n-1}$.
\item the output of each step, produced by a node in the neighbourhood designated by $p_i$, must be sent to the
neighbourhood designated by $p_{i+1}$.
\item nodes in the neighbourhoods along each hop of the dream path (designated by $p_0,\ldots, p_{n-1}$) must be
incentivised to compute the OTP update and forward the resulting output chunk to the next neighbourhood.
\end{itemize}
Let us define a network protocol called \emph{dream}. Participants in the protocol listen to single owner chunks that wrap the input.
We assume that they extract the batch identifier $b$, and the payload CAC $c$ as parameters to the OTP update function. They calculate the output pad, wrap it as a dream chunk, and send it to the network towards the address of the output chunk. This allows the destination neighbourhood
to calculate the next step. If the protocol is followed up to $n$ steps, then the target node receives $k=c(n)$ at address $a$.
We now turn to the construction of dream chunks.
Uploader $U$ wishes to grant downloader $D$ (at overlay address $a$) revocable access to content chunk $C$.
$U$ chooses a postage batch $b$ it owns, ensuring that it is not completely filled. Additionally, $U$ chooses a step-count $n$ and a depth $d$.
$U$ creates a dream generator $\langle g, k\rangle\in\mathit{dream}(b,d,n,a)$, then
calculates the "ciphertext" $C'=C \xor k$ and uploads it to Swarm, obtaining $r=\mathit{H}(C')$ as parity reference. Now $U$ creates the \emph{dream reference} as $\mathit{ref}(C)=\langle r,b,g\rangle $ which can be given to the downloader grantee $D$.%
%
\footnote{So far, references to swarm content included an address (\emph{Swarm hash} for unencrypted content) or an address with a decryption key (for encrypted content). The dream reference now defines a third type of reference, serialised in four segments comprising the parity address, the batch ID, the initial generator, and a decryption key.}
\subsection{Correctness, security and privacy}
\subsubsection{Retrieval}
% \label{def:retrieval}
$D$, in possession of a dream chunk reference as
$\langle r,b,g\rangle $ calculated by $U$ as $\mathit{ref}(C)$, constructs $p=\mathcal{G}\idx{4K}(g)$, wraps it as a dream chunk, and sends it to Swarm. When $D$ it receives the dream chunk for $b$, it extracts payload $k$.
$D$ retrieves the parity chunk $C'$ using reference $r$ and then decodes $C=C'\xor k$.
The retrieval process is trivially correct as long as:
\begin{enumerate}
\item The parity chunk $C'$ is retrievable via normal retrieval methods.
\item The dream protocol is followed by cooperating nodes.
\item The batch bucket bottom remains unchanged for the neighbourhoods along the dream path.
\end{enumerate}
\subsubsection{Deletion}
We are now turning to access revocation, which is equivalent to deleting content by revoking all access.
Uploader $U$ had granted $D$ (at address $a$) access to $C$ through the dream reference $\langle r,b,g\rangle $. $U$ revokes $D$'s access by uploading extra chunks to batch $b$ such that $\overline{\Huge\chi}$ changes.
As long as there is at least one honest neighbourhood walked by the protocol, such that the nearest chunk to pivot $p$ from batch $b$ changes, downloader $D$ will no longer be able to retrieve content $C$.
To prove this, first let us assume that there is a bucket designated by some $p_i$ on the dream path such that the $\overline{\Huge\chi}$ for the bucket has changed since the uploader constructed the dream. When the participating node in the neighbourhood designated by $p_i$ calculates the OTP update function, the result will be completely different and the dream chain will not terminate.
\subsubsection{Privacy}
The reference does not leak any information about the step-count or the neighbourhoods involved. Neither are the participants in the protocol aware of their position in the chain or any details about the other neighbourhoods, except for the immediate next one to which they are push-syncing their chunk.
The construction of the dream reference is deniable.
Beside our sensitive content $C$, let us consider $A$, any uncontroversial content chunk. When producing $C'$, the owner also produces $A'=A\xor k$ and uploads it. When asked about $k$, producing $A$ makes the denial of other content, including $C$, more plausible.
\subsubsection{Resolution of attacks}
A powerful adversary could potentially infiltrate every neighborhood of Swarm and archive all information that has ever been uploaded (even without being able to decipher it). They could also keep logs of what has been uploaded in what order, which would allow it to serve up deleted content on demand. But there is a huge price on such indiscriminate archiving of all Swarm's content, and that is the only way to reliably defeat the dream construction.
We now turn to the discussion how to calibrate the step-count in relation to the security model. Let us assume a network-wide neighbourhood infiltration rate of $\frac{1}{2}$, meaning that half of the neighbourhoods in the network is assumed to be malicious and colluding.
When access is revoked, the owner uploads new chunks in each neighbourhood along the dream path.
Given a particular dream chain, however, if even a single neighbourhood on the dream path is honest, they will respect the newly arrived chunks and divert the dream path, preventing it from terminating with the grantee.
Thus, for a breach of access to happen, all neighbourhoods must be malicious.
In our security model, a neighbourhood is malicious with uniform and independent probability. For an overall infiltration rate of 1 out of $k$, the chance of all neighbourhoods on a given random dream path being malicious is $k^{-n}$. For a security requirement of success rate of $\sigma$ "nines", where the error rate is less than $10^{-\sigma}$, we can formulate the requirement as
%
\begin{equation}
\frac{1}{k^n}< \frac{1}{10^{-\sigma}}
\end{equation}
Now, expressing $k$ as $10^\kappa$, taking the logarithm of both sides and multiplying by $-1$, we get
%
\begin{equation}
n > \frac{\sigma}{\kappa}
\end{equation}
With one in every 10 neighbourhoods being malicious, the dream path must have as many hops as the number of nines expressing the success rate.%
%
\footnote{I.e., with malicious node probability $0.1$, a 6-hop-long path will effectively revoke access with certainty of $99.9999\%$.}