Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[tf1] fix wat id extraction (closes ytdl-org#21365) #21372

Closed
wants to merge 9 commits into from
Closed

[tf1] fix wat id extraction (closes ytdl-org#21365) #21372

wants to merge 9 commits into from

Conversation

froiss
Copy link

@froiss froiss commented Jun 12, 2019

Please follow the guide below

  • You will be asked some questions, please read them carefully and answer honestly
  • Put an x into all the boxes [ ] relevant to your pull request (like that [x])
  • Use Preview tab to see how your pull request will actually look like

Before submitting a pull request make sure you have:

In order to be accepted and merged into youtube-dl each piece of code must be in public domain or released under Unlicense. Check one of the following options:

  • I am the original author of this code and I am willing to release it under Unlicense
  • I am not the original author of this code but it is in public domain or released under Unlicense (provide reliable evidence)

What is the purpose of your pull request?

  • Bug fix
  • Improvement
  • New extractor
  • New feature

Description of your pull request and other information

Explanation of your pull request in arbitrary form goes here. Please make sure the description explains the purpose and effect of your pull request and is worded well enough to be understood. Provide as much context and examples as possible.

}]

def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
wat_id = self._html_search_regex(
r'(["\'])(?:https?:)?//www\.wat\.tv/embedframe/.*?(?P<id>\d{8})\1',
r'"streamId":"(?P<id>\d{8})"',
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Do not remove the old pattern.
  2. Relax regex.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What should I do with the old pattern? Comment it out?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. As already said you must keep the old pattern along with the new.

youtube_dl/extractor/tf1.py Show resolved Hide resolved
youtube_dl/extractor/tf1.py Outdated Show resolved Hide resolved
youtube_dl/extractor/tf1.py Outdated Show resolved Hide resolved
youtube_dl/extractor/tf1.py Outdated Show resolved Hide resolved
youtube_dl/extractor/tf1.py Outdated Show resolved Hide resolved
}]

def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
slug = self._search_regex(
r'(?<=/)(?P<slug>[^/]+)(?=\.html$)',
url, 'slug', group='slug', default='')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's already extracted as video_id.

youtube_dl/extractor/tf1.py Outdated Show resolved Hide resolved
@rs
Copy link

rs commented Jun 16, 2019

Can we have this merged?

r'(["\'])(?:https?:)?//www\.wat\.tv/embedframe/.*?(?P<id>\d{8})\1',
webpage, 'wat id', group='id')
vids_data_string = self._html_search_regex(
r'<script>\s*window\.__APOLLO_STATE__\s*=\s*(?P<vids_data_string>\{.*?\})\s*;?\s*</script>',
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Remove script tags, it's unique enough without them.
  2. Do not use named group when there is only one group.
  3. Curly braces don't need escaping.
  4. Do not capture empty dict.

if vids_data_string is not None:
vids_data = self._parse_json(
vids_data_string, video_id,
transform_source=js_to_json)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Must not be fatal.

vids_data = self._parse_json(
vids_data_string, video_id,
transform_source=js_to_json)
video_data = [v for v in vids_data.values()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

video_data is totally useless. Write directly to id variable when found.

vids_data_string, video_id,
transform_source=js_to_json)
video_data = [v for v in vids_data.values()
if 'slug' in v and v['slug'] == video_id]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. v may not be dict.
  2. v.get('slug').

@rs
Copy link

rs commented Jun 20, 2019

What's missing to get this merged?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants