Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[daum] Fix clipID-based downloads #15015

Closed
wants to merge 1 commit into from
Closed

[daum] Fix clipID-based downloads #15015

wants to merge 1 commit into from

Conversation

Namnamseo
Copy link
Contributor

Before submitting a pull request make sure you have:

In order to be accepted and merged into youtube-dl each piece of code must be in public domain or released under Unlicense. Check one of the following options:

  • I am the original author of this code and I am willing to release it under Unlicense
  • I am not the original author of this code but it is in public domain or released under Unlicense (provide reliable evidence)

What is the purpose of your pull request?

  • Bug fix
  • Improvement
  • New extractor
  • New feature

Description of your pull request and other information

Some daum.net URLs have a alphanumeric video ID called vid, which looks like v2b9eCxzw1EzE6swwSS4Okz. An example URL is http://tvpot.daum.net/v/v2b9eCxzw1EzE6swwSS4Okz

Some other URLs have a numeric clip ID called clipid, which looks like 66008403. An example URL is http://tvpot.daum.net/v/66008403

Currently, download for URLs with clip IDs is broken. This commit fixes it, and also updates the expected test data.

It achieves it by redirecting it to the vid-based URL. The URL can be found from the canonical link tag, which is explained later. I'm not sure if the vid is always present in this manner in clip ID-based pages. But 10 pages of 10 I saw had them, and I don't think there could be a reason other pages won't have it.

TODO: Some other features of daum.net, such as playlists, are still broken. Those should be fixed later.
In fact, TVPot has disabled almost all of its features, so playlists might not work from now on. Some old API endpoints are still alive and current extractor is using them, like in daum.py#L85.

Explanation of the canonical link tag

<link rel="canonical" href="http://example.com">
This kind of tag, presented in <head>, is originally meant for search engines to correctly discern and group URLs. Often is the case where the same page can be accessed via a few different URLs. This tag tells the search engine that it can consider this page as the page given in the href attribute, making the search results faster and more accurate.

Download for clips with url tvpot.daum.net/v/<number> were broken. This commit fixes it, and updates the expected test data.

Some other features. such as playlists, are still broken. Those should be fixed later.
@@ -40,11 +40,11 @@ class DaumIE(InfoExtractor):
}, {
'url': 'http://m.tvpot.daum.net/v/65139429',
'info_dict': {
'id': '65139429',
'id': 'v4e99Kd61HUKxI18xR87xRb',
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better to keep the original video ID or --download-archive will be broken for affected videos. smuggle_url and unsmuggle_url can help.

webpage = self._download_webpage(url, video_id, 'Requesting webpage', fatal=False)
if webpage:
canonical = self._html_search_regex(
r'<link rel="canonical" href="(http://tvpot\.daum\.net/v/[a-zA-Z0-9]+)">', webpage, 'Canonical link', fatal=False)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid spaces in regular expressions. See rutube.py for an example.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants