Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change tag search condition from or to and #6752

Merged
merged 1 commit into from
Feb 2, 2022

Conversation

drew2a
Copy link
Contributor

@drew2a drew2a commented Feb 2, 2022

This PR is related to #6708 and introduces a bit different logic for tag search than it was before.

As @kozlovsky suggested earlier, we should return 'intersection' of search results instead of 'union' in the case that more than one tag is specified in the search query.

This test that should clarify what I mean:

    @db_session
    async def test_get_infohashes(self):
        self.add_operation_set(
            self.db,
            {
                b'infohash1': [
                    Tag(name='tag1', count=SHOW_THRESHOLD),
                    Tag(name='tag2', count=SHOW_THRESHOLD)
                ],
                b'infohash2': [
                    Tag(name='tag1', count=SHOW_THRESHOLD)
                ],
                b'infohash3': [
                    Tag(name='tag2', count=SHOW_THRESHOLD)
                ]
            }
        )

        assert self.db.get_infohashes({'tag1', 'tag2'}) == [b'infohash1']

@drew2a drew2a force-pushed the feature/change_tag_search branch 2 times, most recently from 1642098 to 52a6aff Compare February 2, 2022 07:35
@drew2a drew2a requested a review from kozlovsky February 2, 2022 07:36
@drew2a drew2a marked this pull request as ready for review February 2, 2022 07:36
@drew2a drew2a requested a review from a team February 2, 2022 07:36
return select(tt.torrent.infohash for tt in self.instance.TorrentTag
if self._show_condition(tt) and tt.tag.name in tags).fetch()

# first, get all torrents that contains any tag from `tags`
Copy link
Contributor

@kozlovsky kozlovsky Feb 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue: The result of a query that returns torrents with any tags may be several orders larger than the result of a query that returns torrents with all tags. Retrieving all these objects into memory may be very slow.

suggestion: It is better to select torrents with all tags directly in the SQL query.

SQL does not have a direct way to express the FORALL universal quantifier. To express the necessary condition, it is necessary to use a double negation: "select all torrents for which does not exist a tag from the list of required tags that is not associated with this torrent". It will be something like:

# torrents that contains all necessary tags
query_results = select(
    torrent for torrent in self.instance.Torrent
    if not exists(
        tag for tag in self.instance.Tag
        if tag.name in tags and not exists(
            torrent_tag for torrent_tag in self.instance.TorrentTag
            if torrent_tag.torrent == torrent
            and torrent_tag.tag == tag
            and self._show_condition(torrent_tag)
        )
    )
)

Queries with double not exists may be hard to read and understand, but they are the proper way to express the FORALL quantifier in SQL.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kozlovsky wow, looks like black magic 👏
Thanks. Implemented.

@drew2a drew2a force-pushed the feature/change_tag_search branch from 52a6aff to fe45ea8 Compare February 2, 2022 10:39
@sonarqubecloud
Copy link

sonarqubecloud bot commented Feb 2, 2022

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
No Duplication information No Duplication information

@kozlovsky
Copy link
Contributor

retest this please

1 similar comment
@kozlovsky
Copy link
Contributor

retest this please

@kozlovsky kozlovsky self-requested a review February 2, 2022 11:30
@drew2a drew2a merged commit e58b5b6 into Tribler:main Feb 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants