Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tags auto generation #6718

Merged
merged 5 commits into from
Feb 2, 2022
Merged

Tags auto generation #6718

merged 5 commits into from
Feb 2, 2022

Conversation

drew2a
Copy link
Contributor

@drew2a drew2a commented Jan 12, 2022

This PR is a part of #6214 and it introduces automatic tags extraction from torrent titles.

The following rules have been added:

  1. Tags in square brackets: title [tag1, tag2, tag3]
  2. Tags in parentheses: title (tag1, tag2, tag3)
  3. A tag in an extension: title.tag

The tags could be extracted for all new torrents and they could be extracted for existing torrents in the background.

The extraction procedure for all new tags is pretty straightforward, they are just processed at the moment when they adding to MDS.

The background extraction procedure is a bit more complicated. This extraction makes by processing a batch of items. Every 10 seconds a batch of 1000 items has been processed. In the case tag rules processor rich the upper bound of DB, it starts from the beginning, but with increased interval (20 seconds) and increased batch size (2000 items).

Why 1000 items and 10 seconds? It is not too heavy for CPU and with these values, 360k items will be processed within the hour.

To distinguish processed items, the tag_version column has been added to TorrentMetadata:

    class TorrentMetadata(db.MetadataNode):
        ...
        tag_version = orm.Required(int, default=0)

Therefore it could be used in the future when we will process multiple incoming rules (we should just increase this value by 1 for each new rule set).

@drew2a drew2a requested a review from kozlovsky January 12, 2022 11:36
@drew2a drew2a force-pushed the feature/tags_generation branch from adee0c3 to ba45b9d Compare January 14, 2022 13:30
@drew2a drew2a force-pushed the feature/tags_generation branch 5 times, most recently from e26b28e to a88db81 Compare January 25, 2022 11:36
@drew2a drew2a marked this pull request as ready for review January 25, 2022 11:45
@drew2a drew2a requested review from a team, kozlovsky and devos50 January 25, 2022 11:45
@drew2a drew2a requested a review from kozlovsky January 26, 2022 13:34
@drew2a drew2a force-pushed the feature/tags_generation branch from a802e7f to 0a9212c Compare January 26, 2022 17:34
Copy link
Contributor

@devos50 devos50 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, great work 👍 . I left a few minor comments and some concerns regarding the tag processor.

@devos50
Copy link
Contributor

devos50 commented Jan 28, 2022

Could this error be related to this PR (Windows tests)?

src.tribler-core.tribler_core.components.key.tests.test_key_component.test_get_private_key_filename (from pytest)

pyfuncitem = <Function test_get_private_key_filename>

    def pytest_pyfunc_call(pyfuncitem):  # type: ignore[no-untyped-def]
        """Run coroutines in an event loop instead of a normal function call."""
        fast = pyfuncitem.config.getoption("--aiohttp-fast")
        if asyncio.iscoroutinefunction(pyfuncitem.function):
            existing_loop = pyfuncitem.funcargs.get(
                "proactor_loop"
            ) or pyfuncitem.funcargs.get("loop", None)
            with _runtime_warning_context():
                with _passthrough_loop_context(existing_loop, fast=fast) as _loop:
                    testargs = {
                        arg: pyfuncitem.funcargs[arg]
                        for arg in pyfuncitem._fixtureinfo.argnames
                    }
>                   _loop.run_until_complete(pyfuncitem.obj(**testargs))

c:\users\tribler\appdata\local\programs\python\python38\lib\site-packages\aiohttp\pytest_plugin.py:186: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
c:\users\tribler\appdata\local\programs\python\python38\lib\contextlib.py:120: in __exit__
    next(self.gen)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

    @contextlib.contextmanager
    def _runtime_warning_context():  # type: ignore[no-untyped-def]
        """Context manager which checks for RuntimeWarnings.
    
        This exists specifically to
        avoid "coroutine 'X' was never awaited" warnings being missed.
    
        If RuntimeWarnings occur in the context a RuntimeError is raised.
        """
        with warnings.catch_warnings(record=True) as _warnings:
            yield
            rw = [
                "{w.filename}:{w.lineno}:{w.message}".format(w=w)
                for w in _warnings
                if w.category == RuntimeWarning
            ]
            if rw:
>               raise RuntimeError(
                    "{} Runtime Warning{},\n{}".format(
                        len(rw), "" if len(rw) == 1 else "s", "\n".join(rw)
                    )
                )
E               RuntimeError: 1 Runtime Warning,
E               c:\users\tribler\appdata\local\programs\python\python38\lib\site-packages\aiohttp\test_utils.py:560:coroutine 'interval_runner' was never awaited

c:\users\tribler\appdata\local\programs\python\python38\lib\site-packages\aiohttp\pytest_plugin.py:143: RuntimeError

@drew2a drew2a marked this pull request as draft January 30, 2022 09:19
@drew2a
Copy link
Contributor Author

drew2a commented Jan 31, 2022

@devos50

Could this error be related to this PR (Windows tests)?

I guess it could, but I'm not sure.
Anyway, I've rewritten the execution of the TagProcessor's tests (the possible cause of the error), so will see.

@drew2a drew2a force-pushed the feature/tags_generation branch from 0b3b4e1 to 48753d6 Compare January 31, 2022 02:49
@drew2a drew2a marked this pull request as ready for review January 31, 2022 02:59
@drew2a drew2a requested review from devos50 and kozlovsky January 31, 2022 02:59
@drew2a drew2a changed the title Tags auto generation WIP Tags auto generation Jan 31, 2022
@drew2a drew2a force-pushed the feature/tags_generation branch from 48753d6 to 15a545d Compare February 1, 2022 08:56
@drew2a drew2a force-pushed the feature/tags_generation branch 2 times, most recently from a70e233 to 0a5ea31 Compare February 1, 2022 10:21
@drew2a drew2a changed the title WIP Tags auto generation Tags auto generation Feb 1, 2022
@drew2a drew2a force-pushed the feature/tags_generation branch from 0a5ea31 to 599caa6 Compare February 1, 2022 12:49
@sonarqubecloud
Copy link

sonarqubecloud bot commented Feb 1, 2022

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
No Duplication information No Duplication information

@devos50
Copy link
Contributor

devos50 commented Feb 1, 2022

🤔 maybe wait with merging until the PR pipeline can complete?

@kozlovsky
Copy link
Contributor

retest this please

3 similar comments
@kozlovsky
Copy link
Contributor

retest this please

@drew2a
Copy link
Contributor Author

drew2a commented Feb 1, 2022

retest this please

@drew2a
Copy link
Contributor Author

drew2a commented Feb 2, 2022

retest this please

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants