Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: MetadataStore module ("Channel 2.0") #3775

Closed
wants to merge 15 commits into from

Conversation

ichorid
Copy link
Contributor

@ichorid ichorid commented Aug 7, 2018

This module is the base of our upcoming Channel2.0 subsystem. Currently, it provides 3 features:

  • PonyORM-based objects/DB bindings for Torrents and Channels metadata
  • Serialization-deserialization of Torrent and Channels metadata (XDRLib)
  • Torrent-based dissemination of Channels (new-style Channel can be serialized to disk in the form of a torrent)

At the moment, the module is integrated with some of the REST endpoints. It is done in a dual-stack manner (i.e. search endpoint returns concatenated search results over both new- and old-style channels).
By default, it is possible to load the .mdblob file containing channel metadata entry from disk. Tribler should start to download the corresponding torrent automatically into the correct "channels dir", and add scan the dir for binary metadata blobs when the download finishes. This is the intended usage mode for first experimental versions - just download channels by hand.
To enable the ability to create the new-style channels, the user must tick the corresponding checkbox in the GUI "preferences" dialog box. If the option is enabled, new-style channel will be created from "My channel" dialog.

TODO and problems

  • First, as @xoriole pointed out, we still don't know how to seed unique torrents from behind the NAT. One solution can be using hidden seeding (if it works).
  • Second, there is a LOT of stuff to be changed in the remaining REST endpoints to support new-style channel addresses (which are basically hex-encoded IPv8 public keys).
  • Third the saved "personal channel" is not seeded by default.
  • Fourth current channel "torrent" form uses "single MD entry per file" scheme. It is inefficient for very large channels, but very convenient for testing purposes.
  • And add more unit tests, and state checks for channel downloads, and move stuff around, and refactor unit tests, etc, etc...

So, the draft plan is:

  1. Fix the remaining REST endpoints (and add tests for them).
  2. Add tracker field to the TorrentMD object and serialized form.
  3. Get the channel torrents seeding working.
  4. Start squashing all MD entries from the same channel update into a single file.
  5. Work through TODOs and FIXMEs in the code.

Design notes

  • Some methods are "dangling" and not bound to objects. This is because there is still no fitting objects to attach them to. This objects should emerge eventually.
  • We have an option to make serialization procedure completely automatical based on class structure.
  • The class structure is designed to be general, but not too general.
  • serialize method changes input dict by design. It is discussable, though.

@qstokkink
Copy link
Contributor

Does anyone volunteer to install pony on all the testing machines? 😃

@ichorid
Copy link
Contributor Author

ichorid commented Aug 8, 2018

Or, better yet, add Pony to dependencies list and write automatical dependency install script...

@qstokkink
Copy link
Contributor

qstokkink commented Aug 8, 2018

@ichorid #1768, #3770.. our dependencies are quite a pain: we have some which can be pip'd, but others that are incredibly broken on their latest versions.

EDIT: Also this monstrosity: https://github.com/Tribler/tribler/blob/next/doc/development/development_on_windows.rst

@xoriole
Copy link
Contributor

xoriole commented Aug 9, 2018

retest this please

@qstokkink
Copy link
Contributor

qstokkink commented Aug 9, 2018

Hmm.. two segfaults in the Linux tests. @ichorid turns out it's not just your machine, yay?

Also, I'm getting the following:

Tribler.pyipv8.ipv8.util: ERROR: Exception raised on the reactor's thread TransactionError: "create_tables() cannot be called inside of db_session".
 Traceback from this thread:
  File "/home/quinten/Downloads/pycharm-community-2017.2.3/helpers/pycharm/_jb_nosetest_runner.py", line 17, in <module>
    nose.main(addplugins=[TeamcityReport()])
  File "/usr/local/lib/python2.7/dist-packages/nose/core.py", line 121, in __init__
    **extra_args)
  File "/usr/lib/python2.7/unittest/main.py", line 95, in __init__
    self.runTests()
  File "/usr/local/lib/python2.7/dist-packages/nose/core.py", line 207, in runTests
    result = self.testRunner.run(self.test)
  File "/usr/local/lib/python2.7/dist-packages/nose/core.py", line 62, in run
    test(result)
  File "/usr/local/lib/python2.7/dist-packages/nose/suite.py", line 177, in __call__
    return self.run(*arg, **kw)
  File "/usr/local/lib/python2.7/dist-packages/nose/suite.py", line 224, in run
    test(orig)
  File "/usr/local/lib/python2.7/dist-packages/nose/suite.py", line 177, in __call__
    return self.run(*arg, **kw)
  File "/usr/local/lib/python2.7/dist-packages/nose/suite.py", line 224, in run
    test(orig)
  File "/usr/local/lib/python2.7/dist-packages/nose/suite.py", line 177, in __call__
    return self.run(*arg, **kw)
  File "/usr/local/lib/python2.7/dist-packages/nose/suite.py", line 224, in run
    test(orig)
  File "/usr/local/lib/python2.7/dist-packages/nose/suite.py", line 177, in __call__
    return self.run(*arg, **kw)
  File "/usr/local/lib/python2.7/dist-packages/nose/suite.py", line 224, in run
    test(orig)
  File "/usr/local/lib/python2.7/dist-packages/nose/case.py", line 45, in __call__
    return self.run(*arg, **kwarg)
  File "/usr/local/lib/python2.7/dist-packages/nose/case.py", line 133, in run
    self.runTest(result)
  File "/usr/local/lib/python2.7/dist-packages/nose/case.py", line 151, in runTest
    test(result)
  File "/usr/lib/python2.7/unittest/case.py", line 393, in __call__
    return self.run(*args, **kwds)
  File "/usr/lib/python2.7/unittest/case.py", line 320, in run
    self.setUp()
  File "/home/quinten/tribler/tribler/Tribler/pyipv8/ipv8/util.py", line 14, in helper
    return blockingCallFromThread(reactor, func, *args, **kargs)
  File "/home/quinten/tribler/tribler/Tribler/pyipv8/ipv8/util.py", line 37, in blockingCallFromThread
    this_thread_tb = traceback.extract_stack()

 Traceback from the reactor's thread:
   File "/home/quinten/.local/lib/python2.7/site-packages/twisted/internet/defer.py", line 1418, in _inlineCallbacks
    result = g.send(result)
  File "/home/quinten/tribler/tribler/Tribler/Test/test_as_server.py", line 272, in setUp
    self.tribler_started_deferred = self.session.start()
  File "/home/quinten/tribler/tribler/Tribler/pyipv8/ipv8/util.py", line 14, in helper
    return blockingCallFromThread(reactor, func, *args, **kargs)
  File "/home/quinten/tribler/tribler/Tribler/pyipv8/ipv8/util.py", line 26, in blockingCallFromThread
    return f(*args, **kwargs)
  File "/home/quinten/tribler/tribler/Tribler/Core/Session.py", line 481, in start
    self.mds = start_orm(self.chant_db_filename, create_db=(False==os.path.isfile(self.chant_db_filename)))
  File "/home/quinten/tribler/tribler/Tribler/Core/Modules/MetadataStore/base.py", line 68, in start_orm
    db.generate_mapping(create_tables=create_db)
  File "/usr/local/lib/python2.7/dist-packages/pony-0.7.6rc1-py2.7.egg/pony/orm/core.py", line 1041, in generate_mapping
    if create_tables: database.create_tables(check_tables)
  File "<auto generated wrapper of create_tables() function>", line 2, in create_tables
  File "/usr/local/lib/python2.7/dist-packages/pony-0.7.6rc1-py2.7.egg/pony/orm/core.py", line 458, in new_func
    throw(TransactionError, '%s cannot be called inside of db_session' % func)
  File "/usr/local/lib/python2.7/dist-packages/pony-0.7.6rc1-py2.7.egg/pony/utils/utils.py", line 106, in throw
    raise exc

Help :(

Ah I see: if TestDownloadChannel.test_download_channel fails, all other tests fail as well.

@ichorid
Copy link
Contributor Author

ichorid commented Aug 9, 2018

@qstokkink , you're using Pony 0.7.6rc1. Its not stable yet, this could probably be the case. Try 0.7.5.

@qstokkink
Copy link
Contributor

@ichorid I updated from 0.7.5 because I was getting that error there as well.

@ichorid
Copy link
Contributor Author

ichorid commented Aug 9, 2018

@qstokkink , oops...

@qstokkink
Copy link
Contributor

@ichorid I'm guessing this is a cascaded failure due to TestDownloadChannel.test_download_channel and if that test fails it does not unlock the db_session or something.

@qstokkink
Copy link
Contributor

@ichorid That was the case. I fixed it as follows:

diff --git a/Tribler/Test/test_as_server.py b/Tribler/Test/test_as_server.py
index 13a28ec..71cd421 100644
--- a/Tribler/Test/test_as_server.py
+++ b/Tribler/Test/test_as_server.py
@@ -126,6 +126,13 @@ class AbstractServer(BaseTestCase):
                 self._logger.error(">     %s" % dc)
                 dc.cancel()
 
+        from pony.orm.core import local
+        if local.db_context_counter > 0:
+            self._logger.error("Leftover pony db sessions found!")
+            from pony.orm import db_session
+            for _ in range(local.db_context_counter):
+                db_session.__exit__()
+
         has_network_selectables = False
         for item in reactor.getReaders() + reactor.getWriters():
             if isinstance(item, HTTPChannel) or isinstance(item, Client):

This cleans up any leaking db_sessions.

@ichorid
Copy link
Contributor Author

ichorid commented Aug 9, 2018

@qstokkink , you're already better at Pony handling than me 😉

@qstokkink
Copy link
Contributor

@ichorid also, please use this nasty, nasty hack to make the download test more consistent:

diff --git a/Tribler/Core/Modules/MetadataStore/channels.py b/Tribler/Core/Modules/MetadataStore/channels.py
index d0b8a1e..823f9b9 100644
--- a/Tribler/Core/Modules/MetadataStore/channels.py
+++ b/Tribler/Core/Modules/MetadataStore/channels.py
@@ -140,6 +140,7 @@ def download_channel(session, infohash, title):
     dcfg.set_dest_dir(session.channels_dir)
     tdef = TorrentDefNoMetainfo(infohash=str(infohash), name=title)
     download = session.start_download_from_tdef(tdef, dcfg)
+    session.lm.ltmgr.dht_ready = True
 
     def err(f):
         print "we got an exception: %s" % (f.getTraceback(),)

@ichorid
Copy link
Contributor Author

ichorid commented Aug 9, 2018

+    session.lm.ltmgr.dht_ready = True

@qstokkink , do we really have to put it into the download method itself? We only want DHT to be disabled for unit tests.

@qstokkink
Copy link
Contributor

@ichorid you could also generally enable that, but I don't know if it will screw with the other tests. It's safer to only do that for these tests.

@ichorid
Copy link
Contributor Author

ichorid commented Aug 9, 2018

@qstokkink , the test times out if I add this line in the test itself.

@ichorid
Copy link
Contributor Author

ichorid commented Aug 10, 2018

Seems like some of the tests added in this PR fail on timeout on test machines. This is clearly not the case on my development machines, though...
@qstokkink , is there some way to force tests to run sequentially for this PR, so we could see exactly which one causes the failure?

@qstokkink
Copy link
Contributor

@ichorid not without making an entirely custom Jenkins job (if it was any error other than a segfault you would be able to see this per test though).

@qstokkink
Copy link
Contributor

Hmm it seems there is something VERY wrong with the linux machine: INFO:OneShotProcessProtocol:[run_nosetests_for_je...] ERR: ERROR: ld.so: object '/usr/lib/libtcmalloc_minimal.so.4' from LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored..
My python-for-android builder is also quite dead. Something funky with ld.so.

@ichorid
Copy link
Contributor Author

ichorid commented Aug 10, 2018

@xoriole , could you comment on what happens to the Linux test machine?

@synctext
Copy link
Member

Solid progress..
About NATs. Lets just assume for 2018 that the people creating channels are connectable. Would be great to inform users of their connectability and max. download speed.

@ichorid
Copy link
Contributor Author

ichorid commented Aug 14, 2018

retest this please

@qstokkink
Copy link
Contributor

@ichorid how far along is this PR?

@ichorid
Copy link
Contributor Author

ichorid commented Aug 17, 2018

@qstokkink, I'll get back to work this monday and fix test timeouts.

@devos50
Copy link
Contributor

devos50 commented Sep 17, 2018

Steps to proceeds:

  • first, polish this PR, according to our reviews (my review is still pending)
  • make sure all tests are passing and basic Pylint errors are fixed
  • next I suggest to add functionality to (automatically) seed a channel that you have downloaded
  • Validation: use the dataset that @synctext suggested. Setup a single seeder that seeds this giga-channel (i.e. use our Transmission server). Manually verify whether your functionality works by importing the infohash of this giga-channel in Tribler. Expected outcome: you should see the files/be able to download them.

Feedback:

  • hide rss feeds/playlist buttons in the overview for my channel when using the new types of channels.
  • why don't we import a channel using the infohash?

@ichorid ichorid changed the title ON HOLD: MetadataStore module ("Channel 2.0") WIP: MetadataStore module ("Channel 2.0") Sep 17, 2018
@ichorid ichorid force-pushed the f_channel20 branch 2 times, most recently from 81a29be to f76014a Compare September 19, 2018 15:16
@ichorid
Copy link
Contributor Author

ichorid commented Sep 20, 2018

retest this please

@ichorid
Copy link
Contributor Author

ichorid commented Sep 21, 2018

retest this please

@devos50
Copy link
Contributor

devos50 commented Sep 21, 2018

Hmm, it seems that the sqlite library on Mac does not have FTS5 installed.

@ichorid
Copy link
Contributor Author

ichorid commented Sep 21, 2018

I've updated both homebrew sqlite and pip sqlite modules. It is either not there, or requires reboot/library cache reload.

@devos50
Copy link
Contributor

devos50 commented Sep 21, 2018

IIRC, the SQLite libraries are part of the Python 2.7 installation, which always have priority when importing sqlite. Updating sqlite with pip/brew does nothing then.

@ichorid
Copy link
Contributor Author

ichorid commented Oct 2, 2018

Moved to #3923

@ichorid ichorid closed this Oct 2, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

5 participants