Explore using Libtorrent as our database, filesystem, and dissemination solution #3484

synctext · 2018-02-28T09:20:48Z

Seed, stop seeding, modify file/blob, redo hash check, seed.

Block alignment is essential. Bittorrent pieces align with fixed-sized Trustchain records? Or variable size? Filesystem for multiple chains?

First: seek related work! This stuff has been beaten to death in the past 30 years. Post papers, architectures, ideas.

ichorid · 2018-02-28T09:58:20Z

We could use Linear Tape File System for this stuff.

ichorid · 2018-02-28T10:05:54Z

As everyone will have their own blockchain/filesystem, it would be reasonably fast. And one could defrag it sometimes, cleaning the unneeded stuff. To do incremental updates, one could use torrent "update torrent" feature, adding new variable-sized blocks in the form of files. This could be described as a "streamed filesystem".
The only problem I could see with this approach is the problem of updates latency. It is not very well suited for usage scenarios that require sub-second updates.

qstokkink · 2018-03-01T13:39:00Z

We could use this: https://www.libtorrent.org/manual-ref.html#ssl-torrents for tying the libtorrent download to the channel owner's public key on the transport level (SSL).

Also with the optimize_alignment flag we can have libtorrent pad files automatically. We would just have to make sure to keep the separate files below the piece size.

Here is some prototype code for fitting arbitrary key value stores into files for a 16MB piece torrent:

import os

from libtorrent import (add_files, bdecode, bencode, create_torrent, create_torrent_flags_t,
                        file_storage, set_piece_hashes)


PIECE_SIZE = 16*1024*1024 # 16 MB


class Chunk(object):

    def __init__(self):
        super(Chunk, self).__init__()
        self.data = {}
        self.current_length = 0
        self.max_length = PIECE_SIZE - 2 # 16MB - len('d') len('e')

    def add(self, key, value):
        key_len = len(key)
        value_len = len(value)
        combined_len = len(str(key_len)) + len(str(value_len)) + key_len + value_len + 4

        if self.current_length + combined_len <= self.max_length:
            self.data[key] = value
            self.current_length += combined_len
            return True
        return False

    def remove(self, key):
        self.data.pop(key)

    def serialize(self):
        return bencode(self.data)

    @classmethod
    def unserialize(cls, data):
        out = cls()
        for key, value in bdecode(data).iteritems():
            out.add(key, value)
        return out


class ChunkedTable(object):

    def __init__(self):
        super(ChunkedTable, self).__init__()
        self.chunklist = {}

    def add(self, key, value):
        for chunk in self.chunklist.values():
            if chunk.add(key, value):
                return
        chunk = Chunk()
        if not chunk.add(key, value):
            return False # key value pair too large for any container
        self.chunklist[len(self.chunklist)] = chunk

    def remove(self, key):
        for chunk in self.chunklist.values():
            chunk.remove(key)

    def serialize(self):
        out = {}
        for i in range(len(self.chunklist)):
            out[str(i)] = self.chunklist[i].serialize()
        return out

    @classmethod
    def unserialize(cls, map):
        chunk_table = ChunkedTable()
        for i in map.keys():
            chunk_table.chunklist[int(i)] = Chunk.unserialize(map[i])
        return chunk_table

    def get_all(self):
        out = {}
        for chunk in self.chunklist.values():
            out.update(chunk.data)
        return out


class Channel(object):

    def __init__(self, name, directory=".", allow_edit=False):
        super(Channel, self).__init__()

        self.name = name
        self.channel_directory = os.path.abspath(os.path.join(directory, name))
        if not os.path.isdir(self.channel_directory):
            os.makedirs(self.channel_directory)
        self.chunked_table = ChunkedTable()

    def add_magnetlink(self, magnetlink):
        self.chunked_table.add(magnetlink, "")

    def remove_magnetlink(self, magnetlink):
        self.chunked_table.remove(magnetlink)

    def get_magnetlinks(self):
        return self.chunked_table.get_all().keys()

    def commit(self):
        for filename, content in self.chunked_table.serialize().iteritems():
            with open(os.path.join(self.channel_directory, filename), 'w') as f:
                f.write(content)

    def make_torrent(self):
        fs = file_storage()
        add_files(fs, self.channel_directory)
        flags = create_torrent_flags_t.optimize | create_torrent_flags_t.calculate_file_hashes
        t = create_torrent(fs, piece_size=PIECE_SIZE, flags=flags)
        t.set_priv(False)
        set_piece_hashes(t, ".")
        torrent_name = os.path.join(self.channel_directory, self.name + ".torrent")
        with open(torrent_name, 'w') as f:
            f.write(bencode(t.generate()))
        return torrent_name

    def load(self):
        files = os.listdir(self.channel_directory)
        data = {}
        for filename in files:
            if filename.isdigit():
                with open(os.path.join(self.channel_directory, filename), 'r') as f:
                    data[filename] = f.read()
        self.chunked_table = ChunkedTable.unserialize(data)

# TEST
channel = Channel('mychannel', allow_edit=True)
channel.add_magnetlink('a'*20)
channel.add_magnetlink('b'*20)
channel.remove_magnetlink('a'*20)
channel.commit()
torrent = channel.make_torrent()

discovered_channel = Channel('mychannel')
discovered_channel.load()
print discovered_channel.get_magnetlinks()

arvidn · 2018-03-03T17:46:18Z

there's even a mutable_torrent_support flag for create_torrent(), to configure it to pad large files to be piece aligned.

qstokkink · 2018-03-03T20:28:34Z

@arvidn thanks, it seems we wont even need to make it a merkle tree torrent with our structure then.

qstokkink · 2018-03-03T21:25:04Z

Moved the development of this to my fork: https://github.com/qstokkink/tribler/blob/allchannel2/Tribler/community/allchannel2/structures.py

devos50 · 2018-03-04T18:26:02Z

Random idea: dissemination of content metadata could be rewarded with bandwidth tokens when being performed over anonymous tunnels (see ticket #3337). This would work better if we have (a set of) thumbnails attached to each content torrent. Whether this reward scheme is a good idea or not, is open for discussion. We could even provide something like 'hidden channels' where channel content metadata is only seeded over end-to-end tunnels. I'm not sure about the legal implications of this though.

qstokkink · 2018-03-04T19:48:26Z

@devos50 In principle I think this is a good idea. I also added a secret feature in #3489 to directly store metadata alongside the magnetlinks in the channels. Providing incentive to share the channels would be good.

It does strike me as overkill for most channels to actually use tunnels for infohash dissemination. Actually downloading the channel contents might lend itself more to anonymization & payout.

This brings up another question: should you be paid equally for relaying tunnel traffic, exiting tunnel traffic and sharing channels? Should it be a marketplace which can be mined?

synctext · 2018-03-04T20:52:23Z

Lets keep things as simple as possible for 2018..

qstokkink · 2018-03-05T08:55:46Z

#3489

synctext · 2018-03-13T07:46:26Z

Attacking Merkle Trees with a second preimage attack https://news.ycombinator.com/item?id=16572793

ichorid · 2019-01-13T20:06:43Z

TrustChain blocks can have arbitrary size and contents. Currently, we store the user's TrustChain in the SQLite database and spread it out using IPv8 queries. But TrustChain is, by definition, an append-only data structure. This means we can write it into a file on disk as it grows, and periodically publish it in a torrent, as we do with GigaChannels.

In fact, GigaChannel already features simple and efficient code to do just that: periodically dump binary data into file chunks, dynamically compressing the dumped data with LZ4 (and serving queries through IPv8).
GigaChannel allows us to hone our methods of managing collective-access to append-only data created by users. If it proves to be successful, eventually GigaChannel can become one with TrustChain.

As a first step towards this goal, we could add sidechain support to TrustChain, so we can experiment with various forms of sidechain offloading.

synctext · 2019-01-13T20:15:48Z

Please focus on a minimal viable PR. This is outside the sprint scope. We have another ticket on using torrents for Trustchain, it seems like a smart idea. We will explore later.

ichorid · 2019-01-13T21:00:39Z

@synctext , this was exactly my point: we first finish the current GigaChannel PR and release 7.2. Then, when we get enough experience with using Libtorrent as a channels dissemination engine we continue with this issue (TrustChain merging).

devos50 · 2019-02-28T09:41:21Z

We believe this issue has been sufficiently addressed with our recent gigachannel efforts. We are now using libtorrent as underlying mechanism to disseminate magnet links and metadata.

synctext added this to the Backlog milestone Feb 28, 2018

synctext added infrastructure labels Feb 28, 2018

synctext mentioned this issue Mar 5, 2018

crowdsourcing Metadata #2455

Closed

synctext mentioned this issue Apr 28, 2018

Refactoring IPv8 class structure Tribler/py-ipv8#123

Closed

2 tasks

qstokkink mentioned this issue Jun 15, 2018

Redesign of the Search/Channels feature #3615

Closed

qstokkink added the AllChannel 2.0 label Aug 13, 2018

qstokkink modified the milestones: Backlog, V7.2: Credit mining and trading Aug 13, 2018

devos50 mentioned this issue Jan 13, 2019

Serving TrustChain as a Torrent #4145

Closed

devos50 closed this as completed Feb 28, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Explore using Libtorrent as our database, filesystem, and dissemination solution #3484

Explore using Libtorrent as our database, filesystem, and dissemination solution #3484

synctext commented Feb 28, 2018

ichorid commented Feb 28, 2018

ichorid commented Feb 28, 2018

qstokkink commented Mar 1, 2018 •

edited

Loading

arvidn commented Mar 3, 2018

qstokkink commented Mar 3, 2018

qstokkink commented Mar 3, 2018

devos50 commented Mar 4, 2018

qstokkink commented Mar 4, 2018

synctext commented Mar 4, 2018

qstokkink commented Mar 5, 2018

synctext commented Mar 13, 2018

ichorid commented Jan 13, 2019

synctext commented Jan 13, 2019

ichorid commented Jan 13, 2019

devos50 commented Feb 28, 2019

Explore using Libtorrent as our database, filesystem, and dissemination solution #3484

Explore using Libtorrent as our database, filesystem, and dissemination solution #3484

Comments

synctext commented Feb 28, 2018

ichorid commented Feb 28, 2018

ichorid commented Feb 28, 2018

qstokkink commented Mar 1, 2018 • edited Loading

arvidn commented Mar 3, 2018

qstokkink commented Mar 3, 2018

qstokkink commented Mar 3, 2018

devos50 commented Mar 4, 2018

qstokkink commented Mar 4, 2018

synctext commented Mar 4, 2018

qstokkink commented Mar 5, 2018

synctext commented Mar 13, 2018

ichorid commented Jan 13, 2019

synctext commented Jan 13, 2019

ichorid commented Jan 13, 2019

devos50 commented Feb 28, 2019

qstokkink commented Mar 1, 2018 •

edited

Loading