Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New regex for extracting series title #21834

Closed
wants to merge 3 commits into from
Closed

Conversation

gjedeer
Copy link
Contributor

@gjedeer gjedeer commented Jul 19, 2019

Closes #21833

In order to be accepted and merged into youtube-dl each piece of code must be in public domain or released under Unlicense. Check one of the following options:

  • [x ] I am the original author of this code and I am willing to release it under Unlicense
  • I am not the original author of this code but it is in public domain or released under Unlicense (provide reliable evidence)

What is the purpose of your pull request?

  • [x ] Bug fix
  • Improvement
  • New extractor
  • New feature

Description of your pull request and other information

Fix for changed site HTML #21833

Apparently there's a big refactoring of this module in https://github.com/ytdl-org/youtube-dl/pull/20698/files apparently, but this is a quick fix which restores it to working state, and the other one seems to be in development for a long time.

title = self._og_search_title(webpage)
title = self._html_search_regex(
r'<span[^>]+class="standardHeader1[^"]*"[^>]*>\s*(.+?)\s*</span>',
webpage, 'title', default=None)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Title is mandatory. Read coding conventions.

@@ -39,7 +39,10 @@ def _real_extract(self, url):

webpage = self._download_webpage(url, video_id)

title = self._og_search_title(webpage)
title = self._html_search_regex(
r'<span[^>]+class="standardHeader1[^"]*"[^>]*>\s*(.+?)\s*</span>',
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. It's not a title. You must extract title of the video not series.
  2. class=standardHeader1 pattern is too ambiguous and not future-proof since it occurs multiple times on the page though not in span currently.

title = self._html_search_regex(
r'<span[^>]+class="standardHeader1[^"]*"[^>]*>\s*(.+?)\s*</span>',
webpage, 'title', default=None)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove all meaningless changes.

title = self._og_search_title(webpage)
title = self._html_search_regex(
r'<span[^>]+class="standardHeader1 headerPadding header-span headerMargin"[^>]*>\s*(.+?)\s*</span>',
webpage, 'title')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you read review comments at all?

title = self._og_search_title(webpage)
title = self._html_search_regex(
r'<span[^>]+class="standardHeader1 headerPadding header-span headerMargin"[^>]*>\s*(.+?)\s*</span>',
webpage, 'title')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also breaks currently working tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[TVN24] HTML changed
3 participants