Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[YouPorn] Make title regex more specific. #18748

Closed
wants to merge 2 commits into from

Conversation

oddstr13
Copy link
Contributor

@oddstr13 oddstr13 commented Jan 5, 2019

Currently the title regex matches too many blocks, and finds Recommended Categories For You in stead of the actual title.

Safe to drop _search_regex? _og_search_title also finds the correct title.


I dedicate any and all copyright interest in this software to the
public domain. I make this dedication for the benefit of the public at
large and to the detriment of my heirs and successors. I intend this
dedication to be an overt act of relinquishment in perpetuity of all
present and future rights to this software under copyright law.

Safe to drop _search_regex? _og_search_title also finds the correct title.
#r'<h1[^>]+class=["\']heading\d?["\'][^>]*>(?P<title>[^<]+)<',
], webpage, 'title', group='title', default=None) \
or self._og_search_title(webpage, default=None) \
or self._html_search_meta('title', webpage, fatal=True)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't touch code formatting.

title = self._search_regex([
r'[=:]\s*(["\'])video[\._-]titles?\1[^>]*>\s*<\s*h1[^>]+class=["\']heading\d?["\'][^>]*>(?P<title>[^<]+)<',
r'(?:video_titles|videoTitle)\s*[:=]\s*(["\'])(?P<title>(?:(?!\1).)+)\1',
#r'<h1[^>]+class=["\']heading\d?["\'][^>]*>(?P<title>[^<]+)<',
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove all garbage.

default=None) or self._og_search_title(
webpage, default=None) or self._html_search_meta(
'title', webpage, fatal=True)
title = self._og_search_title(webpage, default=None) or self._html_search_meta('title', webpage, fatal=True)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not remove _search_regex part.

@dstftw dstftw closed this in 6089ff4 Jan 8, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants