New regex for extracting series title #21834

gjedeer · 2019-07-19T14:53:09Z

Closes #21833

[x ] At least skimmed through adding new extractor tutorial and youtube-dl coding conventions sections
[x ] Searched the bugtracker for similar pull requests
[x ] Checked the code with flake8

In order to be accepted and merged into youtube-dl each piece of code must be in public domain or released under Unlicense. Check one of the following options:

[x ] I am the original author of this code and I am willing to release it under Unlicense
I am not the original author of this code but it is in public domain or released under Unlicense (provide reliable evidence)

What is the purpose of your pull request?

[x ] Bug fix
Improvement
New extractor
New feature

Description of your pull request and other information

Fix for changed site HTML #21833

Apparently there's a big refactoring of this module in https://github.com/ytdl-org/youtube-dl/pull/20698/files apparently, but this is a quick fix which restores it to working state, and the other one seems to be in development for a long time.

Closes #21833

dstftw · 2019-07-20T16:18:42Z

youtube_dl/extractor/tvn24.py

-        title = self._og_search_title(webpage)
+        title = self._html_search_regex(
+            r'<span[^>]+class="standardHeader1[^"]*"[^>]*>\s*(.+?)\s*</span>',
+            webpage, 'title', default=None)


Title is mandatory. Read coding conventions.

dstftw · 2019-07-20T16:22:07Z

youtube_dl/extractor/tvn24.py

@@ -39,7 +39,10 @@ def _real_extract(self, url):

        webpage = self._download_webpage(url, video_id)

-        title = self._og_search_title(webpage)
+        title = self._html_search_regex(
+            r'<span[^>]+class="standardHeader1[^"]*"[^>]*>\s*(.+?)\s*</span>',


It's not a title. You must extract title of the video not series.

class=standardHeader1 pattern is too ambiguous and not future-proof since it occurs multiple times on the page though not in span currently.

dstftw · 2019-07-20T16:22:41Z

youtube_dl/extractor/tvn24.py

+        title = self._html_search_regex(
+            r'<span[^>]+class="standardHeader1[^"]*"[^>]*>\s*(.+?)\s*</span>',
+            webpage, 'title', default=None)
+


Remove all meaningless changes.

dstftw · 2019-07-30T19:10:54Z

youtube_dl/extractor/tvn24.py

-        title = self._og_search_title(webpage)
+        title = self._html_search_regex(
+            r'<span[^>]+class="standardHeader1 headerPadding header-span headerMargin"[^>]*>\s*(.+?)\s*</span>',
+            webpage, 'title')


Have you read review comments at all?

dstftw · 2019-07-30T19:11:28Z

youtube_dl/extractor/tvn24.py

-        title = self._og_search_title(webpage)
+        title = self._html_search_regex(
+            r'<span[^>]+class="standardHeader1 headerPadding header-span headerMargin"[^>]*>\s*(.+?)\s*</span>',
+            webpage, 'title')


This also breaks currently working tests.

This reverts commit 70066c1.

New regex for extracting series title

c7d0ab0

Closes #21833

dstftw requested changes Jul 20, 2019

View reviewed changes

dstftw added the pending-fixes label Jul 20, 2019

ag-gh added 2 commits July 22, 2019 19:12

tvn24.py extractor: clean up

ae19aa9

Make the title extraction regex more specific

d0e9650

dstftw reviewed Jul 30, 2019

View reviewed changes

dstftw added the do-not-merge label Jul 30, 2019

dstftw reviewed Jul 30, 2019

View reviewed changes

dstftw closed this in 7279163 Jul 30, 2019

kylepw referenced this pull request in kylepw/youtube-dl Aug 1, 2019

[tvn24] Fix metadata extraction (closes #21833, closes #21834)

706a20f

Lamieur referenced this pull request in Lamieur/youtube-dl Aug 3, 2019

[tvn24] Fix metadata extraction (closes #21833, closes #21834)

70066c1

meunierd referenced this pull request in meunierd/youtube-dl Feb 13, 2020

[tvn24] Fix metadata extraction (closes #21833, closes #21834)

8417e76

Lamieur referenced this pull request in Lamieur/youtube-dl Apr 20, 2020

Revert "[tvn24] Fix metadata extraction (closes #21833, closes #21834)"

c3c0b7a

This reverts commit 70066c1.

Lamieur referenced this pull request in Lamieur/youtube-dl Apr 20, 2020

Revert "[tvn24] Fix metadata extraction (closes #21833, closes #21834)"

1d74f51

This reverts commit 70066c1.

pareronia referenced this pull request in pareronia/youtube-dl Jun 22, 2020

[tvn24] Fix metadata extraction (closes #21833, closes #21834)

0f6846d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New regex for extracting series title #21834

New regex for extracting series title #21834

gjedeer commented Jul 19, 2019

dstftw Jul 20, 2019

dstftw Jul 20, 2019

dstftw Jul 20, 2019

dstftw Jul 30, 2019

dstftw Jul 30, 2019

New regex for extracting series title #21834

New regex for extracting series title #21834

Conversation

gjedeer commented Jul 19, 2019

In order to be accepted and merged into youtube-dl each piece of code must be in public domain or released under Unlicense. Check one of the following options:

What is the purpose of your pull request?

Description of your pull request and other information

dstftw Jul 20, 2019

Choose a reason for hiding this comment

dstftw Jul 20, 2019

Choose a reason for hiding this comment

dstftw Jul 20, 2019

Choose a reason for hiding this comment

dstftw Jul 30, 2019

Choose a reason for hiding this comment

dstftw Jul 30, 2019

Choose a reason for hiding this comment