[tf1] fix wat id extraction (closes ytdl-org#21365) #21372

froiss · 2019-06-12T12:11:10Z

Please follow the guide below

You will be asked some questions, please read them carefully and answer honestly
Put an x into all the boxes [ ] relevant to your pull request (like that [x])
Use Preview tab to see how your pull request will actually look like

Before submitting a pull request make sure you have:

At least skimmed through adding new extractor tutorial and youtube-dl coding conventions sections
Searched the bugtracker for similar pull requests
Checked the code with flake8

In order to be accepted and merged into youtube-dl each piece of code must be in public domain or released under Unlicense. Check one of the following options:

I am the original author of this code and I am willing to release it under Unlicense
I am not the original author of this code but it is in public domain or released under Unlicense (provide reliable evidence)

What is the purpose of your pull request?

Bug fix
Improvement
New extractor
New feature

Description of your pull request and other information

Explanation of your pull request in arbitrary form goes here. Please make sure the description explains the purpose and effect of your pull request and is worded well enough to be understood. Provide as much context and examples as possible.

dstftw · 2019-06-12T12:14:12Z

youtube_dl/extractor/tf1.py

    }]

    def _real_extract(self, url):
        video_id = self._match_id(url)
        webpage = self._download_webpage(url, video_id)
        wat_id = self._html_search_regex(
-            r'(["\'])(?:https?:)?//www\.wat\.tv/embedframe/.*?(?P<id>\d{8})\1',
+            r'"streamId":"(?P<id>\d{8})"',


Do not remove the old pattern.

Relax regex.

What should I do with the old pattern? Comment it out?

No. As already said you must keep the old pattern along with the new.

youtube_dl/extractor/tf1.py

dstftw · 2019-06-13T21:18:29Z

youtube_dl/extractor/tf1.py

    }]

    def _real_extract(self, url):
        video_id = self._match_id(url)
        webpage = self._download_webpage(url, video_id)
+        slug = self._search_regex(
+            r'(?<=/)(?P<slug>[^/]+)(?=\.html$)',
+            url, 'slug', group='slug', default='')


It's already extracted as video_id.

youtube_dl/extractor/tf1.py

rs · 2019-06-16T03:34:07Z

Can we have this merged?

dstftw · 2019-06-18T16:31:03Z

youtube_dl/extractor/tf1.py

-            r'(["\'])(?:https?:)?//www\.wat\.tv/embedframe/.*?(?P<id>\d{8})\1',
-            webpage, 'wat id', group='id')
+        vids_data_string = self._html_search_regex(
+            r'<script>\s*window\.__APOLLO_STATE__\s*=\s*(?P<vids_data_string>\{.*?\})\s*;?\s*</script>',


Remove script tags, it's unique enough without them.

Do not use named group when there is only one group.

Curly braces don't need escaping.

Do not capture empty dict.

dstftw · 2019-06-18T16:31:34Z

youtube_dl/extractor/tf1.py

+        if vids_data_string is not None:
+            vids_data = self._parse_json(
+                vids_data_string, video_id,
+                transform_source=js_to_json)


Must not be fatal.

dstftw · 2019-06-18T16:34:21Z

youtube_dl/extractor/tf1.py

+            vids_data = self._parse_json(
+                vids_data_string, video_id,
+                transform_source=js_to_json)
+            video_data = [v for v in vids_data.values()


video_data is totally useless. Write directly to id variable when found.

dstftw · 2019-06-18T16:34:26Z

youtube_dl/extractor/tf1.py

+                vids_data_string, video_id,
+                transform_source=js_to_json)
+            video_data = [v for v in vids_data.values()
+                          if 'slug' in v and v['slug'] == video_id]


v may not be dict.

v.get('slug').

rs · 2019-06-20T23:26:18Z

What's missing to get this merged?

[tf1] fix wat id extraction(closes #21365)

8f738c2

dstftw requested changes Jun 12, 2019

View reviewed changes

dstftw added the pending-fixes label Jun 12, 2019

Emmanuel Froissart added 2 commits June 12, 2019 14:42

[tf1] relax wat id regex

6e6151a

[tf1] reintroduce old wat id pattern

77d6e33

dstftw requested changes Jun 12, 2019

View reviewed changes

youtube_dl/extractor/tf1.py Show resolved Hide resolved

dstftw requested changes Jun 12, 2019

View reviewed changes

youtube_dl/extractor/tf1.py Outdated Show resolved Hide resolved

[tf1] proper multipattern and relaxed regex

ae4df7d

dstftw requested changes Jun 12, 2019

View reviewed changes

youtube_dl/extractor/tf1.py Outdated Show resolved Hide resolved

youtube_dl/extractor/tf1.py Outdated Show resolved Hide resolved

youtube_dl/extractor/tf1.py Outdated Show resolved Hide resolved

Emmanuel Froissart added 3 commits June 12, 2019 23:23

remove unused import

eb5d7a4

replaced long strings with md5

c0319bc

disambiguated id patterns using the slug

acda141

dstftw requested changes Jun 13, 2019

View reviewed changes

extract wat_id by parsing JSON

c381d37

improved regex to extract videos data

c352f74

dstftw requested changes Jun 18, 2019

View reviewed changes

dstftw closed this in 1c11204 Jun 21, 2019

meunierd referenced this pull request in meunierd/youtube-dl Dec 27, 2019

[tf1] Improve extraction and fix issues (closes #21372)

a64cf85

meunierd referenced this pull request in meunierd/youtube-dl Feb 13, 2020

[tf1] Improve extraction and fix issues (closes #21372)

df965bf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[tf1] fix wat id extraction (closes ytdl-org#21365) #21372

[tf1] fix wat id extraction (closes ytdl-org#21365) #21372

froiss commented Jun 12, 2019 •

edited

Loading

dstftw Jun 12, 2019

froiss Jun 12, 2019

dstftw Jun 12, 2019

dstftw Jun 13, 2019

rs commented Jun 16, 2019

dstftw Jun 18, 2019

dstftw Jun 18, 2019

dstftw Jun 18, 2019

dstftw Jun 18, 2019

rs commented Jun 20, 2019

[tf1] fix wat id extraction (closes ytdl-org#21365) #21372

[tf1] fix wat id extraction (closes ytdl-org#21365) #21372

Conversation

froiss commented Jun 12, 2019 • edited Loading

Please follow the guide below

Before submitting a pull request make sure you have:

In order to be accepted and merged into youtube-dl each piece of code must be in public domain or released under Unlicense. Check one of the following options:

What is the purpose of your pull request?

Description of your pull request and other information

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rs commented Jun 16, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rs commented Jun 20, 2019

froiss commented Jun 12, 2019 •

edited

Loading