Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-Chain live edge traversal with random walks #2458

Closed
synctext opened this issue Jul 12, 2016 · 9 comments
Closed

Multi-Chain live edge traversal with random walks #2458

synctext opened this issue Jul 12, 2016 · 9 comments

Comments

@synctext
Copy link
Member

synctext commented Jul 12, 2016

Goal: every peer conducts a crawl of it's neighbors and stores a duplicate of their complete chain.
This walk is random and resilient against peer failure.

Approach: each peer conducts a keep-alive to a peer which is Multichain record is created with. Incoming introduction-responses are given to any peer to which we have a Multichain record. The result is a graph traversal across Multichain edges, live edges as only online peers are eligible.

random_walks

Each random walk starts at yourself. A random walk is conducted of, for instance, 4 steps deep. After these steps a teleport home is conducted. Each visit of a peer results in the crawling of one or more Multichain records. Shown above is the effect of repeated random walks and the resulting graph sampling.

Open questions: consider any Multichain record, or only incoming/outgoing edges as a walk candidate.

@synctext synctext added this to the Backlog milestone Jul 12, 2016
@pimveldhuisen
Copy link
Member

Design for a dispersy modification to use trust based peer selection:

Currently dispersy keeps a list of around 11 peers that are selected by random walks. Every 5 seconds a random peer is replaced. To make this trust based:

  1. For every new peer request it's last x = 100 blocks
  2. Calculate the netflow multichain score for each peer.
  3. Order all peers from lowest to highest.
  4. With probability alpha = 0.2 select the peer on the top of the list peer, otherwise go to the next peer, until a peer has been selected.
  5. Replace this peer with a new one.

Since low scoring peers have a higher chance to be replaced, this should converge the list to a list of peers with high scores. However, since there is still a chance that the best peer is replaced, ( (1-alpha)^10 * alpha) ) to avoid clogging top contributors.

The crawling incorporates all 2 hop flows from the last x blocks. Based on previous crawls the score will also incorporate multi hop flows. To incorporate more 3 hop flows, after step 1:
1.2: Get the list of active peers from the new peer.
1.3: Crawl x blocks for each of those peers.
This of course can be extended ad infinitum, but is very costly.

@pimotte
Copy link
Contributor

pimotte commented Sep 1, 2016

Johan: Specs for this component: Basic functioning hearsay mechanism. Low-risk conservative design.

@synctext
Copy link
Member Author

synctext commented Sep 3, 2016

Fault resilience and a design without any probability of cascading failure

Random graph sampling is safe, but not resilient against resource exhaustion attack.
Random fault resilience would be the main argument against making a multichain crawler directly dependent on the contents of the multichain. As proposed, start with a simple scoring function: 0=random peer, 1=connected through multichain. Deploy and learn.

@synctext
Copy link
Member Author

synctext commented Sep 7, 2016

The key challenge and danger for 2016 and 2017 is that we DDoS ourself. In scientific turns this is the problem of load balancing. The sybil attack is more medium to long-term. Hence my expressed desire for a conservative design, that just works.

@synctext
Copy link
Member Author

synctext commented Sep 8, 2016

Key piece of theory for msc thesis "Problem Description": https://www.semanticscholar.org/paper/Estimating-and-Sampling-Graphs-with-Ribeiro-Towsley/2337ba01e237a47c5f965474c1f2d4f4ee4f2643
Not that we want "Frontier Sampling", but the description, theory, and resource-constrained sampling idea is exactly the problem we have in Tribler ecosystem.

@synctext
Copy link
Member Author

synctext commented Jan 9, 2017

Simulation results show that a "Pim-Rank" biased walk on the network gives network overloads at certain nodes. Simulation with 720 nodes and 45000 edges. One or more nodes get 14000 requests to process, while the average outgoing load is making 720 requests (1 req / 5 sec for 1h)

load

@pimveldhuisen
Copy link
Member

pimveldhuisen commented Jan 23, 2017

WIP: screenshot from 2017-01-23 17-17-31

pimveldhuisen@4c07a82

@pimveldhuisen
Copy link
Member

I think this issue should be closed or reassigned to @qstokkink

@qstokkink
Copy link
Contributor

As both Tribler and IPv8 have a live edge implementation, I will close this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

4 participants