[daum] Fix clipID-based downloads #15015

Namnamseo · 2017-12-17T10:35:47Z

Before submitting a pull request make sure you have:

At least skimmed through adding new extractor tutorial and youtube-dl coding conventions sections
Searched the bugtracker for similar pull requests
Checked the code with flake8

In order to be accepted and merged into youtube-dl each piece of code must be in public domain or released under Unlicense. Check one of the following options:

I am the original author of this code and I am willing to release it under Unlicense
I am not the original author of this code but it is in public domain or released under Unlicense (provide reliable evidence)

What is the purpose of your pull request?

Bug fix
Improvement
New extractor
New feature

Description of your pull request and other information

Some daum.net URLs have a alphanumeric video ID called vid, which looks like v2b9eCxzw1EzE6swwSS4Okz. An example URL is http://tvpot.daum.net/v/v2b9eCxzw1EzE6swwSS4Okz

Some other URLs have a numeric clip ID called clipid, which looks like 66008403. An example URL is http://tvpot.daum.net/v/66008403

Currently, download for URLs with clip IDs is broken. This commit fixes it, and also updates the expected test data.

It achieves it by redirecting it to the vid-based URL. The URL can be found from the canonical link tag, which is explained later. I'm not sure if the vid is always present in this manner in clip ID-based pages. But 10 pages of 10 I saw had them, and I don't think there could be a reason other pages won't have it.

TODO: Some other features of daum.net, such as playlists, are still broken. Those should be fixed later.
In fact, TVPot has disabled almost all of its features, so playlists might not work from now on. Some old API endpoints are still alive and current extractor is using them, like in daum.py#L85.

Explanation of the canonical `link` tag

<link rel="canonical" href="http://example.com">
This kind of tag, presented in <head>, is originally meant for search engines to correctly discern and group URLs. Often is the case where the same page can be accessed via a few different URLs. This tag tells the search engine that it can consider this page as the page given in the href attribute, making the search results faster and more accurate.

Download for clips with url tvpot.daum.net/v/<number> were broken. This commit fixes it, and updates the expected test data. Some other features. such as playlists, are still broken. Those should be fixed later.

yan12125 · 2018-01-16T06:07:04Z

youtube_dl/extractor/daum.py

@@ -40,11 +40,11 @@ class DaumIE(InfoExtractor):
    }, {
        'url': 'http://m.tvpot.daum.net/v/65139429',
        'info_dict': {
-            'id': '65139429',
+            'id': 'v4e99Kd61HUKxI18xR87xRb',


It's better to keep the original video ID or --download-archive will be broken for affected videos. smuggle_url and unsmuggle_url can help.

yan12125 · 2018-01-16T06:07:41Z

youtube_dl/extractor/daum.py

+        webpage = self._download_webpage(url, video_id, 'Requesting webpage', fatal=False)
+        if webpage:
+            canonical = self._html_search_regex(
+                r'<link rel="canonical" href="(http://tvpot\.daum\.net/v/[a-zA-Z0-9]+)">', webpage, 'Canonical link', fatal=False)


Avoid spaces in regular expressions. See rutube.py for an example.

[daum] Fix numbered-clip downloads

f31b87e

Download for clips with url tvpot.daum.net/v/<number> were broken. This commit fixes it, and updates the expected test data. Some other features. such as playlists, are still broken. Those should be fixed later.

yan12125 requested changes Jan 16, 2018

View reviewed changes

dstftw added the pending-fixes label Jan 21, 2018

dstftw force-pushed the master branch from 37318e1 to 65220c3 Compare January 27, 2018 22:49

dstftw force-pushed the master branch from d99bab0 to e118a87 Compare January 23, 2019 18:41

remitamine closed this in d439989 Nov 1, 2019

meunierd referenced this pull request in meunierd/youtube-dl Feb 13, 2020

[daum] fix VOD and Clip extracton(closes #15015)

add96fd

pareronia referenced this pull request in pareronia/youtube-dl Jun 22, 2020

[daum] fix VOD and Clip extracton(closes #15015)

0baa427

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[daum] Fix clipID-based downloads #15015

[daum] Fix clipID-based downloads #15015

Namnamseo commented Dec 17, 2017

yan12125 Jan 16, 2018

yan12125 Jan 16, 2018

[daum] Fix clipID-based downloads #15015

[daum] Fix clipID-based downloads #15015

Conversation

Namnamseo commented Dec 17, 2017

Before submitting a pull request make sure you have:

In order to be accepted and merged into youtube-dl each piece of code must be in public domain or released under Unlicense. Check one of the following options:

What is the purpose of your pull request?

Description of your pull request and other information

Explanation of the canonical link tag

yan12125 Jan 16, 2018

Choose a reason for hiding this comment

yan12125 Jan 16, 2018

Choose a reason for hiding this comment

Explanation of the canonical `link` tag