-
Notifications
You must be signed in to change notification settings - Fork 110
Spam protection of Swarm through modifying GC rules #757
Comments
this makes sense to me. The only other enhancement I'd suggest is that (if it can be done) retrieval requests that originate on the local node do not move chunks to the top of the queue either. On the other hand, if I use a dapp often, I would want the dapp data cached locally... In the end I think we may have to explore keeping a separate chunk queue for locally requested chunks. (i.e. any chunks requested as a result of an http request enter the local chunk queue and any chunk received over bzz enter the network chunk queue). |
or as we just discussed in the standup, we can treat chunks that were requested locally through HTTP requests similarly to synced chunks. That is, we don't insert them at the top of the chunk queue but at some α (either same as for syncing or different α). |
It is not clear to me why we first say
but then later:
In this context it's not clear why Does it mean that a constant number of chunks will never be evicted, i.e. the actual "configurable" capacity is smaller than the actual capacity? More clarity please, be aware that there are math morons like me :) So the queue would look like:
where |
It sounds like the following:
isn't the usual insurance (or is it), but rather a new concept of |
@homotopycolimit I think your
proposal is very good. A local cache. It would also solve the "binge" swarmflix issue. But probaly requires more work. |
A single queue with one constant α for synced chunks and another constant α' for uploaded chunks for gc prioritization compared to chunks that are requested should be the good enough. Keeping two or three queues would more likely put more work on each gc run. The proposal is very good and does make sense to have it implemented. |
Yes, I agree with the latter option. Should I amend the text of the ticket? |
0 means the head of the queue and 1 means the end of the queue. Everything else means: somewhere in the middle.
No, because everything that is at the head of the queue can eventually move down as other chunks move to the front of the queue through retrieval requests. But it does mean that at any one time, there are chunks (the entire head of the queue up to alpha * n) that cannot be evicted through new syncing / uploads. |
It does mean though that both caches (up to alpha for RetrieveRequests for RetrieveRequests and from alpha to end for SyncRequests) become smaller. I wonder if it makes sense to make alpha configurable for the user (a user may be running a node solely for profitability) |
There still is only one cache and it stays the same size. The question is just, when the cache is full, what gets garbage collected first? |
Our main site at theswarm.eth got GC'd. We need to implement these "anti-spam" rules soon |
Nodes get paid through Swap only when they serve a chunk upon a retrieval request. Thus, Swap incentivizes them to store profitable chunks, i.e. ones that can be expected to be requested soon.
Therefore, it is both in the node's best interest and, as argued below, that of the network, not to treat fresh uploads and syncs identically to retrieval requests, when deciding about which chunk to garbage-collect in the event of saturated storage.
One common method of keeping chunks ordered by expected profitability is to keep them in a queue, and move a chunk that has been requested to the head of that queue, whereas garbage collection is performed on the tail. With a suitable data structure, such as a doubly linked list, all aforementioned operations have O(1) complexity. Newly synced or uploaded chunks (the two are indistinguishable) can be inserted at the k=int(αn)th position where n is the size of the queue and α is a constant real parameter between 0 and 1. Note that while finding the kth element in a queue is an O(k) complexity operation, tracking the kth element requires O(1) operations at each update of the queue.
Even an arbitrarily powerful DDoS attack flooding the network with bogus chunks cannot force a guaranteed fraction (namely α) of the most popularly requested chunks out of Swarm. Yet, new uploads are not immediately garbage collected, thus a simple flooding attack won't make uploads impossible either, especially if the uploader is willing to pay the Swap price of moving their content to the front of the queue.
Of course, the latter option is available to the DDoS attacker as well, but it is not free. The larger the Swarm and the more storage space the nodes have, the more expensive it becomes to effectively DDoS it and a large part of the costs gets directly transfered from the attacker to honest nodes.
The text was updated successfully, but these errors were encountered: