-
Notifications
You must be signed in to change notification settings - Fork 10.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New regex for extracting series title #21834
Conversation
Closes #21833
youtube_dl/extractor/tvn24.py
Outdated
title = self._og_search_title(webpage) | ||
title = self._html_search_regex( | ||
r'<span[^>]+class="standardHeader1[^"]*"[^>]*>\s*(.+?)\s*</span>', | ||
webpage, 'title', default=None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Title is mandatory. Read coding conventions.
youtube_dl/extractor/tvn24.py
Outdated
@@ -39,7 +39,10 @@ def _real_extract(self, url): | |||
|
|||
webpage = self._download_webpage(url, video_id) | |||
|
|||
title = self._og_search_title(webpage) | |||
title = self._html_search_regex( | |||
r'<span[^>]+class="standardHeader1[^"]*"[^>]*>\s*(.+?)\s*</span>', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- It's not a title. You must extract title of the video not series.
class=standardHeader1
pattern is too ambiguous and not future-proof since it occurs multiple times on the page though not in span currently.
youtube_dl/extractor/tvn24.py
Outdated
title = self._html_search_regex( | ||
r'<span[^>]+class="standardHeader1[^"]*"[^>]*>\s*(.+?)\s*</span>', | ||
webpage, 'title', default=None) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove all meaningless changes.
title = self._og_search_title(webpage) | ||
title = self._html_search_regex( | ||
r'<span[^>]+class="standardHeader1 headerPadding header-span headerMargin"[^>]*>\s*(.+?)\s*</span>', | ||
webpage, 'title') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you read review comments at all?
title = self._og_search_title(webpage) | ||
title = self._html_search_regex( | ||
r'<span[^>]+class="standardHeader1 headerPadding header-span headerMargin"[^>]*>\s*(.+?)\s*</span>', | ||
webpage, 'title') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This also breaks currently working tests.
This reverts commit 70066c1.
This reverts commit 70066c1.
Closes #21833
In order to be accepted and merged into youtube-dl each piece of code must be in public domain or released under Unlicense. Check one of the following options:
What is the purpose of your pull request?
Description of your pull request and other information
Fix for changed site HTML #21833
Apparently there's a big refactoring of this module in https://github.com/ytdl-org/youtube-dl/pull/20698/files apparently, but this is a quick fix which restores it to working state, and the other one seems to be in development for a long time.