Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explore using Libtorrent as our database, filesystem, and dissemination solution #3484

Closed
synctext opened this issue Feb 28, 2018 · 15 comments
Closed

Comments

@synctext
Copy link
Member

Seed, stop seeding, modify file/blob, redo hash check, seed.

Block alignment is essential. Bittorrent pieces align with fixed-sized Trustchain records? Or variable size? Filesystem for multiple chains?

First: seek related work! This stuff has been beaten to death in the past 30 years. Post papers, architectures, ideas.

@synctext synctext added this to the Backlog milestone Feb 28, 2018
@ichorid
Copy link
Contributor

ichorid commented Feb 28, 2018

We could use Linear Tape File System for this stuff.

@ichorid
Copy link
Contributor

ichorid commented Feb 28, 2018

As everyone will have their own blockchain/filesystem, it would be reasonably fast. And one could defrag it sometimes, cleaning the unneeded stuff. To do incremental updates, one could use torrent "update torrent" feature, adding new variable-sized blocks in the form of files. This could be described as a "streamed filesystem".
The only problem I could see with this approach is the problem of updates latency. It is not very well suited for usage scenarios that require sub-second updates.

@qstokkink
Copy link
Contributor

qstokkink commented Mar 1, 2018

We could use this: https://www.libtorrent.org/manual-ref.html#ssl-torrents for tying the libtorrent download to the channel owner's public key on the transport level (SSL).

Also with the optimize_alignment flag we can have libtorrent pad files automatically. We would just have to make sure to keep the separate files below the piece size.

Here is some prototype code for fitting arbitrary key value stores into files for a 16MB piece torrent:

import os

from libtorrent import (add_files, bdecode, bencode, create_torrent, create_torrent_flags_t,
                        file_storage, set_piece_hashes)


PIECE_SIZE = 16*1024*1024 # 16 MB


class Chunk(object):

    def __init__(self):
        super(Chunk, self).__init__()
        self.data = {}
        self.current_length = 0
        self.max_length = PIECE_SIZE - 2 # 16MB - len('d') len('e')

    def add(self, key, value):
        key_len = len(key)
        value_len = len(value)
        combined_len = len(str(key_len)) + len(str(value_len)) + key_len + value_len + 4

        if self.current_length + combined_len <= self.max_length:
            self.data[key] = value
            self.current_length += combined_len
            return True
        return False

    def remove(self, key):
        self.data.pop(key)

    def serialize(self):
        return bencode(self.data)

    @classmethod
    def unserialize(cls, data):
        out = cls()
        for key, value in bdecode(data).iteritems():
            out.add(key, value)
        return out


class ChunkedTable(object):

    def __init__(self):
        super(ChunkedTable, self).__init__()
        self.chunklist = {}

    def add(self, key, value):
        for chunk in self.chunklist.values():
            if chunk.add(key, value):
                return
        chunk = Chunk()
        if not chunk.add(key, value):
            return False # key value pair too large for any container
        self.chunklist[len(self.chunklist)] = chunk

    def remove(self, key):
        for chunk in self.chunklist.values():
            chunk.remove(key)

    def serialize(self):
        out = {}
        for i in range(len(self.chunklist)):
            out[str(i)] = self.chunklist[i].serialize()
        return out

    @classmethod
    def unserialize(cls, map):
        chunk_table = ChunkedTable()
        for i in map.keys():
            chunk_table.chunklist[int(i)] = Chunk.unserialize(map[i])
        return chunk_table

    def get_all(self):
        out = {}
        for chunk in self.chunklist.values():
            out.update(chunk.data)
        return out


class Channel(object):

    def __init__(self, name, directory=".", allow_edit=False):
        super(Channel, self).__init__()

        self.name = name
        self.channel_directory = os.path.abspath(os.path.join(directory, name))
        if not os.path.isdir(self.channel_directory):
            os.makedirs(self.channel_directory)
        self.chunked_table = ChunkedTable()

    def add_magnetlink(self, magnetlink):
        self.chunked_table.add(magnetlink, "")

    def remove_magnetlink(self, magnetlink):
        self.chunked_table.remove(magnetlink)

    def get_magnetlinks(self):
        return self.chunked_table.get_all().keys()

    def commit(self):
        for filename, content in self.chunked_table.serialize().iteritems():
            with open(os.path.join(self.channel_directory, filename), 'w') as f:
                f.write(content)

    def make_torrent(self):
        fs = file_storage()
        add_files(fs, self.channel_directory)
        flags = create_torrent_flags_t.optimize | create_torrent_flags_t.calculate_file_hashes
        t = create_torrent(fs, piece_size=PIECE_SIZE, flags=flags)
        t.set_priv(False)
        set_piece_hashes(t, ".")
        torrent_name = os.path.join(self.channel_directory, self.name + ".torrent")
        with open(torrent_name, 'w') as f:
            f.write(bencode(t.generate()))
        return torrent_name

    def load(self):
        files = os.listdir(self.channel_directory)
        data = {}
        for filename in files:
            if filename.isdigit():
                with open(os.path.join(self.channel_directory, filename), 'r') as f:
                    data[filename] = f.read()
        self.chunked_table = ChunkedTable.unserialize(data)

# TEST
channel = Channel('mychannel', allow_edit=True)
channel.add_magnetlink('a'*20)
channel.add_magnetlink('b'*20)
channel.remove_magnetlink('a'*20)
channel.commit()
torrent = channel.make_torrent()

discovered_channel = Channel('mychannel')
discovered_channel.load()
print discovered_channel.get_magnetlinks()

@arvidn
Copy link

arvidn commented Mar 3, 2018

there's even a mutable_torrent_support flag for create_torrent(), to configure it to pad large files to be piece aligned.

@qstokkink
Copy link
Contributor

@arvidn thanks, it seems we wont even need to make it a merkle tree torrent with our structure then.

@qstokkink
Copy link
Contributor

@devos50
Copy link
Contributor

devos50 commented Mar 4, 2018

Random idea: dissemination of content metadata could be rewarded with bandwidth tokens when being performed over anonymous tunnels (see ticket #3337). This would work better if we have (a set of) thumbnails attached to each content torrent. Whether this reward scheme is a good idea or not, is open for discussion. We could even provide something like 'hidden channels' where channel content metadata is only seeded over end-to-end tunnels. I'm not sure about the legal implications of this though.

@qstokkink
Copy link
Contributor

@devos50 In principle I think this is a good idea. I also added a secret feature in #3489 to directly store metadata alongside the magnetlinks in the channels. Providing incentive to share the channels would be good.

It does strike me as overkill for most channels to actually use tunnels for infohash dissemination. Actually downloading the channel contents might lend itself more to anonymization & payout.

This brings up another question: should you be paid equally for relaying tunnel traffic, exiting tunnel traffic and sharing channels? Should it be a marketplace which can be mined?

@synctext
Copy link
Member Author

synctext commented Mar 4, 2018

Lets keep things as simple as possible for 2018..

@qstokkink
Copy link
Contributor

#3489

@synctext
Copy link
Member Author

Attacking Merkle Trees with a second preimage attack https://news.ycombinator.com/item?id=16572793

@ichorid
Copy link
Contributor

ichorid commented Jan 13, 2019

TrustChain blocks can have arbitrary size and contents. Currently, we store the user's TrustChain in the SQLite database and spread it out using IPv8 queries. But TrustChain is, by definition, an append-only data structure. This means we can write it into a file on disk as it grows, and periodically publish it in a torrent, as we do with GigaChannels.

In fact, GigaChannel already features simple and efficient code to do just that: periodically dump binary data into file chunks, dynamically compressing the dumped data with LZ4 (and serving queries through IPv8).
GigaChannel allows us to hone our methods of managing collective-access to append-only data created by users. If it proves to be successful, eventually GigaChannel can become one with TrustChain.

As a first step towards this goal, we could add sidechain support to TrustChain, so we can experiment with various forms of sidechain offloading.

@synctext
Copy link
Member Author

Please focus on a minimal viable PR. This is outside the sprint scope. We have another ticket on using torrents for Trustchain, it seems like a smart idea. We will explore later.

@ichorid
Copy link
Contributor

ichorid commented Jan 13, 2019

@synctext , this was exactly my point: we first finish the current GigaChannel PR and release 7.2. Then, when we get enough experience with using Libtorrent as a channels dissemination engine we continue with this issue (TrustChain merging).

@devos50
Copy link
Contributor

devos50 commented Feb 28, 2019

We believe this issue has been sufficiently addressed with our recent gigachannel efforts. We are now using libtorrent as underlying mechanism to disseminate magnet links and metadata.

@devos50 devos50 closed this as completed Feb 28, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

5 participants