-
Notifications
You must be signed in to change notification settings - Fork 452
The design of a trustworthy ecosystem around media sharing
2024: This document is now outdated, but still good source of our methodology, thinking, and approach
In this living document, we describe the design of a trustworthy ecosystem around media sharing in Tribler. This ecosystem addresses a key problem in Tribler: free-riding on bandwidth while downloading content. The majority of users use Tribler just to fetch their media content and leech resources from the network in the process, without contributing anything back. Structural free-riding behaviour can eventually lead to a collapse of the network. The ecosystem described in this document rewards structural good behaviour while punishing free-riding behaviour. We will describe the required components, associated research questions, and implementation challenges. We also consider various attacks on our ecosystem, and show how these treats are addressed.
The following design is written for an integration in Tribler, our academic testbed. Even though we believe that the ideas in this document can be modified such that they apply to other (shared-resource) systems, we try to keep the design feasible, focussed, and self-contained while avoiding pre-mature optimizations and unnecessary system complexity. Our ecosysystem is an amalgamation of existing research ideas and components that work together to address free-riding behaviour at scale. We tried to leverage as much of the existing lab research as possible.
A note on terminology - Even though the original ticket is using the term “token economy”, I’m adopting the term “ecosystem” instead, for the following two reasons. First, it is inaccurate to summarise the whole ecosystem with the term “tokens”. Tokens are only a small part of the entire system and are utilised during the accounting of bandwidth contributions and consumptions. Individual contributions and consumptions are fed to a reputation mechanism, that outputs a (subjective) ranking of trustworthiness. Second, using the term “economy” is misleading since users might think that our system is designed around trade-based incentives (e.g., as in FileCoin). Though we will describe some ideas on making reputation tradable, our system is mostly based on trust-based incentives.
Tribler is our academic testbed that is downloaded by over 1.5 million users. Our software features an onion-routing overlay that tunnels BitTorrent traffic through relay and exit nodes to provide anonymity. Specifically, a downloader creates a circuit with zero or more relay nodes that ends with one exit node. The number of hops in a circuit refers to the total number of exit and relay nodes used. Each relay node adds an additional layer of encryption, and an exit node proxies traffic from the Internet into our darknet. Users themselves can input how many hops they wish to utilise for their download. Downloading over circuits with more hops features better anonymity but impacts download speed.
Our anonymous overlay suffers from an undersupply of exit nodes, leading to frequent network congestions and an overall degradation of download speeds for all users. This is not entirely unexpected since running an exit node is often not without legal risks. As such, we consider our exit node capacity as a scarce resource, that must be allocated amongst users in a fair manner in case of undersupply. Similarly, we require relay nodes when one wishes to include one or more relay nodes in its circuits. By default, every Tribler user operates a relay node by default since there are minimal legal risks involved with running a relay node. The full protocol description is provided in this document.
In summary, we want to reward good behaviour (providing anonymous bandwidth to the network) and punish bad behaviour (structurally leeching anonymous bandwidth from the network, e.g., free-riding).
Given our problem description, we now state three design requirements for our ecosystem.
- Incentivise structural good behaviour: Our main objective is to reward users that structurally contribute to the community by operating an exit or relay node. Users that free-ride on bandwidth, specifically, users that exclusively open Tribler to finish downloads, should be punished. This rewarding scheme should incentivise users to keep Tribler running since they build a digital identity that is potentially trusted by other users, therefore attaining higher download speeds. On the long term, this has the potential to reduce user churn.
- Minimal impact on users: The described ecosystem must have minimal impact on how Tribler behaves. Specifically, Tribler must be straightforward to use and users should not be required to spend hours tweaking different settings. In other words, our mechanism should work out-of-the-box.
- No centralised components: The design of our ecosystem must avoid any components under centralised control. This aligns with our lab's relentless focus on decentralisation. Centralised mechanisms, in general, enable abuse and manipulation by the operator which is unacceptable for Tribler.
Our ecosystem consists of four mechanisms, as given in the figure below.
The first component that we introduce is an accounting mechanism, used to securely record community contributions and consumptions, referred to as interactions, on a distributed ledger. These interactions describe the bandwidth downloaded from or uploaded to other users. A new interaction is recorded when a circuit is destroyed, e.g., due to a timeout or when the maximum traffic amount is reached (250MB).
A basic accounting mechanism is currently implemented and integrated in Tribler. Our primitive ledger, named TrustChain, is capable of tamper-proof recording of events and has superior scalability compared to state-of-the-art blockchain ledgers. Instead of maintaining a fully shared log of records, TrustChain equips each user with its own individual ledger, fully maintained by that user themself. Individual ledgers are cryptographically linked when two users interact. We specifically avoid the need for network-wide consensus, since such a mechanism is unable to scale to Internet-level communities. Instead, bilateral agreement is reached between interacting participants. At the time of writing, TrustChain has recorded over 149 million transactions, created by over 90.000 unique participants. TrustChain is currently being used as a primitive component for decentralized applications, including self-sovereign identity and secure payments between strangers.
TrustChain orients around fraud detection instead of prevention. Whereas attempt to prevent fraudulent behaviour (e.g., double spending), this comes at a significant performance loss and a requirement for transaction fees to reward the effort of miners. This makes blockchain technology unsuitable for accounting purposes. Fraud in TrustChain is optimistically detected through the exchange of random records, and by checking the validity of incoming records against known records. Each record is non-reputable since the digital signature of the creator is included. This way, an adversarial user that has secretly forked its personal ledger can be exposed with minimal computational effort and bandwidth overhead. We provide full details on our fraud detection mechanism in an upcoming article.
To this end, @grimadas and I have devised a very simple, yet effective approach to account bandwidth contributions in Tribler. The upcoming algorithm does not impose a personal ledger structure, therefore significantly simplifying the design and implementation. A compelling advantage is the storage overhead of our algorithm is kept to a minimum. At the minimum, individuals are only required to store the last transaction for every counterparty they have interacted with. Each transaction contains two signatures, one from each counterparty. Each transaction represents a single edge in the work graph and contains the weight of the edge between the interacting parties. Transactions contain an ever-growing counter, indicating the total amount of bandwidth token with a specific counterparty. Therefore, transactions can easily be merged when a state inconsistency is detected. That's all there is.
A key design decision is that transactions will be relatively small, e.g., at most 50MB. Therefore, it is not a big problem if a counterparty goes offline, or when a peer refuses to sign a specific transaction. Like TrustChain, liveness is guaranteed in the presence of network churn, and users can engage in multiple transactions at the same time.
Implementing this approach allows us to get rid of the TrustChain implementation. TrustChain will move to the AnyDex repository since it is still used to account trades in the decentralized market.
The following algorithm is an essential step towards a secure micro-economy around bandwidth transfers. It requires minimal computation and state storage overhead. We exemplify our accounting approach using a simple example, where party A payouts 20MB to party B. It consists of the following seven steps:
1) A creates T = (sig_a, EMPTY_SIG, total_taken_a_from_b = db.total_taken_a_from_b + 20)
2) A sends T to B
3) B verifies T
3a) verify truthfulness of data in T
3b) verify validity of signature sig_a
3c) compare T against the last known state of A => total_taken_a_from_b > db.total_taken_a_from_b must hold
4)
4a) If T is valid, create T’ = (sig_a, sig_b, total_taken_a_from_b) and commit to DB
4b) If T is not valid, send the last (dual-signed) T’ to A
5) send T’ it to A
6) A verifies T’
6a) verify validity of signature sig_a and sig_b
6b) check that T’ is the last state => total_taken_a_from_b > db.total_taken_a_from_b
7)
7a) if T’ is valid, commit to DB
7b) if T’ is not valid, ignore T’
We now describe how bandwidth traffic is accounted using TrustChain. In our ecosystem, each user earns bandwidth tokens by forwarding traffic as a relay or exit node. When downloading content, the user compensates the used relay and exit nodes. In the figure below, we show how a downloader pays tokens to others nodes after downloading a 50 Megabyte file over a two-hop circuit.
When the circuit is destroyed, the downloader accounts a transfer of 250MB to the first relay node using TrustChain (MB is the unit of this bandwidth token). The first relay node then transfers 150MB to the next relay node, earning 100MB. The rationale behind our payout scheme is that we reward relay and exit nodes for performing the cryptographic operations on the forwarded data. Currently, we only perform a payout when the circuit has at least transferred 1MB of data. Relays that do not forward the payout to the next hop will be blacklisted by the previous hop, lowering their opportunity to earn bandwidth tokens. Each record in a personal ledger contains the token amount transferred in that interaction, and the total bandwidth uploaded and downloaded by that user. By requesting the last record in the personal ledger of a user, we can quickly determine the total amount of data uploaded and downloaded. The continuous recording of interactions result in a global, interconnected DAG, which is called the work graph.
The second component is a reputation mechanism that estimates the trustworthiness of agents in a digital community. Trust is a cardinal requirement for sustainable management of a common good since the reputation of individual and others heavily influence the incentives of agents to show good behaviour. We envision that each user locally computes trustworthiness scores, based on the discovered records in the work graph. The work graph contains directed edges between users, specifying who did how much work. The output of the reputation mechanism is a trustworthiness ranking. The reputation mechanism requires a reputation algorithm, for example, (modifications of) PageRank or max-flow.
One might argue that a user can take into consideration the edges in the work graph as is, without needing a reputation mechanism. By computing the net contributions of each peer (total upload minus total download), a ranking can be derived. However, this basic approach is vulnerable to the Sybil Attack where a user records fake interactions with their fake identities (further described later). One solution to address the recording of fake interactions is by devising a Proof-of-Bandwidth where the occurrence of each bandwidth interaction can be proven by the involved parties. [TorCoin], for example, utilises cryptographic primitives to prove that some traffic has gone over a particular Tor circuit. However, their mechanism requires centralised assignment of circuits over relay and exit nodes. How to devise a fully decentralised Proof-of-Bandwidth is an open research question. Since we currently have no defence against the Sybil Attack, and recording fake interactions, we therefore require a reputation mechanism.
Tribler currently has no integrated reputation mechanism. In the past, we used BarterCast to prevent free-riding in our BitTorrent network, however, BarterCast has been removed. It is important to select a reputation algorithm that is (1) computationally efficient, and (2) Sybil-resistant. We require computational efficiency to minimise the impact of Tribler on user hardware. We cannot make assumptions on the computational capabilities of consumer hardware.
Our third component is an allocation mechanism, that determines who gets access to a shared resource. These local decisions are driven using trust scores computed by reputation algorithms. Allocation is at the core of commons management since improper allocation policies will still lead to a collapse of the community. Our allocation policies do not require mediation by a centralised authority; instead, community members work together to ensure fair and efficient allocation amongst all members.
We currently have an allocation mechanism implemented in Tribler. This works as follows. Each relay and exit nodes has a pre-defined number of slots and each circuit consumes a single slot. This slot-based approach biases the assignment of slots to users with a higher trustworthiness in the community. We distinguish between random slots and competitive slots. When a circuit initiation request by some user i
arrives at a relay or exit node, Tribler first checks if a random slot is available. If so, it allocates a random slot to the pending circuit. Otherwise, Tribler will compute the trustworthiness score of circuit requester i
. When this score is determined, Tribler will check if there is a free competitive slot. If so, it assigns the free competitive slot to the pending circuit. Otherwise, it compares the score against the scores of circuit initiators that currently hold a competitive slot. The score of i
is now compared to the lowest score of a circuit initiator that already has a competitive slot, say j
. In this case, the circuit of j
will be destroyed and the slot will be re-assigned to i
.
The above allocation mechanism ensures that users with higher trustworthiness scores have more chance to claim a slot when the exit nodes are congested. Furthermore, they experience higher and more stable download speeds.
The final mechanism we describe is a market mechanism. For the moment being, we consider this component optional and the key components are currently the accounting, reputation, and allocation mechanism. The market mechanism enables users to buy interactions from other users in the work graph. This provides an alternative method for users to bootstrap their trustworthiness in the network, and to recover from the negative consequences of free-riding. The details on many economic primitives, e.g., how to price these interactions, should still be determined.
This component can be fulfilled by AnyDex, our decentralized exchange which is already integrated in Tribler, but disabled by default. This market mechanism is fully decentralized and leverages TrustChain to store orders and to account all conducted trades. The order matchmaking protocol is also fully decentralized. On the market, users offer to counter-sign an interaction that transfers bandwidth from the selling user to the buying user. A single trade is broken down in multiple, smaller trades. This is called incremental settlement and lowers the value at stake. Full details on the protocol can be found here.
We now describe different attacks on our ecosystem and elaborate on possible solutions for each attack.
The Sybil Attack is one of the most challenging attacks in open decentralized systems. It involves an adversary that operates multiple, fake identities in order to subvert the network. This attack mostly happens when the cost of creating a new identity is negligible. Within Tribler, a peer can mount a Sybil Attack to leech resources from the network, possibly with many different identities. For example, Sybils can create fake edges in the work graph between themselves with the aim to boost their standing in the network.
We are well-aware of the threats that Sybils pose to our network integrity. Addressing the Sybil Attack is a cardinal research direction of our lab. Existing proposals to address Sybil Attacks at the network layer include the adoption of a Proof-of-(useful)-Work, leveraging self-sovereign identities, exploiting geographic proximity and building a latency-diverse neighbourhood in the overlay. In addition, the adopted reputation mechanism should address Sybils in some form. We require adequate protection against Sybils at different layers in our stack. This includes defence mechanisms in both the network layer and in the adopted reputation mechanisms.
A particular attack occurs when a user re-enters the network under a new identity after having leeched significant resources from the network. Similar to the Sybil Attack, this is a threat when the cost of creating a new identity is negligible. Our proposed solution is to implement a stranger policy that somehow takes into consideration the "age" of an identity. For example, an identity with 100MB uploaded and 100MB downloaded has overall performed more work for the community than an identity with 0MB uploaded and 0MB downloaded.
There is a trade-off between anonymity and accountability. Since we are accounting bandwidth traffic through our circuits, an adversary might be able to link a downloader and an exit node by careful analysis of the interaction graph. To address this issue, we plan on using aggregated, deferred payouts as introduced by Palmieri et al.
Currently, a user itself reports its token balance to an exit/relay node, by sending the latest record in the personal ledger. This approach is vulnerable to lying since a user can send any record to the exit/relay node that requested its balance. Furthermore, the exit/relay node cannot be sure that the received record is the current latest in the personal ledger of the user requesting services. The BAMI accounting mechanism is capable of preventing this attack through secure audits. With BAMI, a relay/exit node can join the community around a specific peer and request the latest state from other users.
Another attack happens when a user wishes to record many transactions within a short time period using BAMI. This spam attack can eventually deplete computer resources of the affected users, e.g., the interaction partners. BAMI ignores the transactions of users that are suspected to conduct such a spam attack.
We next describe the implementation roadmap for the discussed designs. A related roadmap, concerning improvements of the content ecosystem in Tribler, can be found here.
During the first phase, we will integrate the BAMI accounting mechanism in Tribler. BAMI provides the necessary components for our accounting mechanism. Through guaranteed eventual consistency and periodic audits by witnesses in the network, we replace TrustChain and leverage BAMI to account bandwidth transfers between users. Our aim is to integrate a first version of BAMI in the Tribler 7.6 release (planned for release this December).
A key component of BAMI is a dedicated crawler that collects and displays all transactions in the network in a convenient interface. This is critical to monitor the network health, and to analyze the robustness of the BAMI in the wild Tribler network.
During the second phase, we will integrate a Sybil-resisent, efficient reputation mechanism in Tribler. Note that work on this component can be done in parallel with the work in phase I.
TODO finish
Fair allocation of available bandwidth in a fully decentralized setting is an open and non-trivial research challenge. Existing mechanism either rely on cryptographic techniques, often sacrificing scalability, or they rely on centralized supervision. We identify the following scientific sub-challenges:
- What reputation mechanism would be most suitable to use in the described ecosystem?
- What is the effect of partial/imperfect accounting? We cannot assume that every user possesses the most recent version of the global work graph at any time. Therefore, the adopted reputation mechanism must be able to deal with partial information, e.g., a user might be not know some contributions/consumptions.