Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Content popularity #3660

Merged
merged 5 commits into from
Jun 29, 2018
Merged

Content popularity #3660

merged 5 commits into from
Jun 29, 2018

Conversation

xoriole
Copy link
Contributor

@xoriole xoriole commented Jun 1, 2018

This pull request provides a community for sharing popular and alive content to the connected peers. Right now the content is torrent only but can be extended to channels and others.

Model for sharing the content is publish subscribe. At bootstrap, a peer subscribes to a set of its connected peers sorted in descending order based on the individual trust score of the peers.

When a peer accepts the subscription request, it sends its most popular torrents it had checked to the subscriber and then periodically sends the updated torrent checker responses.

The case of conflict when two peers send the torrent response for the same torrent, the decision to accept or reject any of the response is based on

  • Freshness of the response: If the database record is stale, we accept the response
  • Trust score of the sending peer: If the sending peer is trusted (trust>1), we accept it.

@@ -303,6 +304,21 @@ def load_ipv8_overlays(self):

self.ipv8.strategies.append((RandomWalk(self.market_community), 20))

# Popular Community
if self.session.config.get_popular_community_enabled() or True:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this line always evaluate to True?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! good catch

@xoriole xoriole changed the title Content popularity WIP: Content popularity Jun 1, 2018
@xoriole xoriole force-pushed the popular_channel branch from 6511102 to e41b41a Compare June 1, 2018 15:17
# Popular Community

def get_popular_community_enabled(self):
return self.config['popular_community']['enabled'] or True

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is likely a mistake too.

@xoriole xoriole force-pushed the popular_channel branch 13 times, most recently from 4e56d0a to 48bdeea Compare June 8, 2018 09:19
@xoriole
Copy link
Contributor Author

xoriole commented Jun 8, 2018

retest this please

@xoriole xoriole force-pushed the popular_channel branch 5 times, most recently from 21fb686 to 00659e4 Compare June 13, 2018 15:08
@xoriole xoriole force-pushed the popular_channel branch 2 times, most recently from dcc815a to 4bb446e Compare June 18, 2018 14:45
@xoriole
Copy link
Contributor Author

xoriole commented Jun 22, 2018

retest this please

@@ -873,6 +889,9 @@ def early_shutdown(self):
yield self.ipv8.unload_overlay(self.tunnel_community)
yield self.ipv8.unload_overlay(self.triblerchain_community)

if self.popular_community:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you unload this separately? The tunnel and triblerchain are only unloaded the way they are to enforce the unload ordering.

@xoriole xoriole force-pushed the popular_channel branch 2 times, most recently from 0c3d7f1 to bfc39e5 Compare June 26, 2018 13:47
query = "ubuntu"
self.nodes[0].overlay.send_torrent_search_request(query)

yield self.deliver_messages()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where are the asserts?

serialized_response = self.serializer.pack_multiple(in_response.to_pack_list())

# Deserialize request and test it
(deserialized_ressponse, _) = self.serializer.unpack_multiple(ContentInfoResponse.format_list,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Too many s's

@xoriole xoriole force-pushed the popular_channel branch 2 times, most recently from bc19910 to c99e53e Compare June 28, 2018 16:15
@qstokkink
Copy link
Contributor

The C0330 and R1707 Pylint errors should still be fixed.

@xoriole xoriole force-pushed the popular_channel branch 2 times, most recently from cbcc9d1 to 97960d5 Compare June 28, 2018 16:52
@xoriole xoriole changed the title WIP: Content popularity READY: Content popularity Jun 28, 2018
devos50
devos50 previously approved these changes Jun 28, 2018
@devos50
Copy link
Contributor

devos50 commented Jun 28, 2018

@xoriole I guess this PR is almost ready for merge. If you still have some time, please cleanup your commit history so you have a logical sequence of commits (i.e. three commits: added pubsub community, added popularity community, improved tests). @qstokkink can you also give your seal of approval soon?

I've updated the SonarQube technical debt threshold to 24 days so it should pass when you retest this PR. This is really not the way we should manage technical debt but I guess that @ichorid will significantly lower the technical debt once the ORM model becomes active (since we can remove all Tribler database crap then).

qstokkink
qstokkink previously approved these changes Jun 29, 2018
qstokkink
qstokkink previously approved these changes Jun 29, 2018
Copy link
Contributor

@ichorid ichorid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, address these issues. These are easy to fix, but neglecting them will have a large impact on security.

elif peer in self.publishers:
self.publishers.remove(peer)

def create_message_packet(self, message_type, payload):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, we definitely have to put this into IPv8 for everyone's convenience.

payload = payload if isinstance(payload, list) else payload.to_pack_list()
return self._ez_pack(self._prefix, message_type, [auth, dist, payload])

def broadcast_message(self, packet, peer=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I second this.

"""
# Add or remove the publisher peer
if subscribe:
self.publishers.add(peer)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems we add publisher twice: once here (on sending sub packet), and once on receiving confirmation packet:
line 178:

       if payload.subscribe:
           self.publishers.add(peer)

Is it by design or I don't get something?

If I would design this system, I would only add peers to my publishers' list on the response packet. And to prevent abuse I would put in each sub packet some nonce, and remember it, and only accept response packets with this nonce (this would solve the packet loss/timeout problem, by the way).
How do you handle timeouts?

peer = self.get_peer_from_auth(auth, source_address)

if payload.subscribe:
self.publishers.add(peer)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ноw do you prevent abuse of this packets? Do you track unanswered subscribe requests?

self.publishers.remove(peer)

# Create subscription packet and send it
subscription = ContentSubscription(cache.number, subscribe)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If cache.number is incremented sequentially, and after seeing one of them you could guess the next one, you still have a security hole there. Imagine the following attack scenario:
Alice sends a subscription message to Bob. Bob's private key is compromised by Charlie. Charlie eavesdrops the message, extract from it the cache.number and generates a long sequence of cache.numbers that follow the original one. Next, Charlie creates some sybil private keys and starts to bomb Alice's machine with ContentSubscription messages. From this moment, any Alice's request for subscription will end up subscribing to Charlie.
You should never use a non-random nonce in crypto.
So, if cache.number is generated at random, you're good. If not, you're screwed.
Please, check it.

There is another way to fix this problem. You already store the whole packet in the cache, so you can check if the reply was actually from the person you send the original request to.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cache.number is generated randomly.

cache = self.request_cache.add(ContentRequest(self.request_cache, MSG_SUBSCRIBE, None))
# Add or remove the publisher peer
if subscribe:
self.publishers.add(peer)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Despite you're using a set to store the publishers, you should still only add them on receiving successful ContentSubscription message. Otherwise, the confirmation message is useless, and you become vulnerable to packet loss.

Please, make publishers only be added on successful confirmation message.
(And BTW, do subscriptions time out?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! will make the change.
Subscriptions do not time out. However, there are periodically refreshed to fill up for disconnected peers.

@qstokkink qstokkink changed the title READY: Content popularity Content popularity Jun 29, 2018
@qstokkink qstokkink merged commit 727b40d into Tribler:next Jun 29, 2018
@synctext
Copy link
Member

synctext commented Sep 5, 2018

It would be great to post some measurement graphs here.

  • Create crawler
  • Obtain anonimized dataset
  • conduct performance evalution
  • improve performance and deploy fixes as needed.

Q: Quality of popularity gossip information, freshness of information, coverage of information, usability of information, overlap, bandwidth-cost, etc.

@xoriole xoriole deleted the popular_channel branch July 3, 2019 14:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

6 participants