[daum] Fix clipID-based downloads #15015
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Before submitting a pull request make sure you have:
In order to be accepted and merged into youtube-dl each piece of code must be in public domain or released under Unlicense. Check one of the following options:
What is the purpose of your pull request?
Description of your pull request and other information
Some daum.net URLs have a alphanumeric video ID called
vid
, which looks likev2b9eCxzw1EzE6swwSS4Okz
. An example URL ishttp://tvpot.daum.net/v/v2b9eCxzw1EzE6swwSS4Okz
Some other URLs have a numeric clip ID called
clipid
, which looks like66008403
. An example URL ishttp://tvpot.daum.net/v/66008403
Currently, download for URLs with clip IDs is broken. This commit fixes it, and also updates the expected test data.
It achieves it by redirecting it to the
vid
-based URL. The URL can be found from the canonicallink
tag, which is explained later. I'm not sure if thevid
is always present in this manner in clip ID-based pages. But 10 pages of 10 I saw had them, and I don't think there could be a reason other pages won't have it.TODO: Some other features of
daum.net
, such as playlists, are still broken. Those should be fixed later.In fact, TVPot has disabled almost all of its features, so playlists might not work from now on. Some old API endpoints are still alive and current extractor is using them, like in daum.py#L85.
Explanation of the canonical
link
tag<link rel="canonical" href="http://example.com">
This kind of tag, presented in
<head>
, is originally meant for search engines to correctly discern and group URLs. Often is the case where the same page can be accessed via a few different URLs. This tag tells the search engine that it can consider this page as the page given in thehref
attribute, making the search results faster and more accurate.