Skip to content

Commit

Permalink
[youtube:playlist] Fetch all the videos in a mix (fixes #3837)
Browse files Browse the repository at this point in the history
Since there doesn't seem to be any indication, it stops when there aren't new videos in the webpage.
  • Loading branch information
jaimeMF committed Apr 17, 2016
1 parent 7bab22a commit 1b6182d
Show file tree
Hide file tree
Showing 2 changed files with 21 additions and 9 deletions.
2 changes: 1 addition & 1 deletion test/test_youtube_lists.py
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ def test_youtube_mix(self):
ie = YoutubePlaylistIE(dl)
result = ie.extract('https://www.youtube.com/watch?v=W01L70IGBgE&index=2&list=RDOQpdSVF_k_w')
entries = result['entries']
self.assertTrue(len(entries) >= 20)
self.assertTrue(len(entries) >= 50)
original_video = entries[0]
self.assertEqual(original_video['id'], 'OQpdSVF_k_w')

Expand Down
28 changes: 20 additions & 8 deletions youtube_dl/extractor/youtube.py
Original file line number Diff line number Diff line change
Expand Up @@ -1818,20 +1818,32 @@ def _real_initialize(self):
def _extract_mix(self, playlist_id):
# The mixes are generated from a single video
# the id of the playlist is just 'RD' + video_id
url = 'https://youtube.com/watch?v=%s&list=%s' % (playlist_id[-11:], playlist_id)
webpage = self._download_webpage(
url, playlist_id, 'Downloading Youtube mix')
ids = []
last_id = playlist_id[-11:]
for n in itertools.count(1):
url = 'https://youtube.com/watch?v=%s&list=%s' % (last_id, playlist_id)
webpage = self._download_webpage(
url, playlist_id, 'Downloading page {0} of Youtube mix'.format(n))
new_ids = orderedSet(re.findall(
r'''(?xs)data-video-username=".*?".*?
href="/watch\?v=([0-9A-Za-z_-]{11})&[^"]*?list=%s''' % re.escape(playlist_id),
webpage))
# Fetch new pages until all the videos are repeated, it seems that
# there are always 51 unique videos.
new_ids = [_id for _id in new_ids if _id not in ids]
if not new_ids:
break
ids.extend(new_ids)
last_id = ids[-1]

url_results = self._ids_to_results(ids)

search_title = lambda class_name: get_element_by_attribute('class', class_name, webpage)
title_span = (
search_title('playlist-title') or
search_title('title long-title') or
search_title('title'))
title = clean_html(title_span)
ids = orderedSet(re.findall(
r'''(?xs)data-video-username=".*?".*?
href="/watch\?v=([0-9A-Za-z_-]{11})&[^"]*?list=%s''' % re.escape(playlist_id),
webpage))
url_results = self._ids_to_results(ids)

return self.playlist_result(url_results, playlist_id, title)

Expand Down

2 comments on commit 1b6182d

@dstftw
Copy link
Collaborator

@dstftw dstftw commented on 1b6182d Apr 17, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It still does not download all the videos. I'm getting completely different results in browser with any mix.

@jaimeMF
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I got that behaviour on Firefox but then I tried on Safari and when it reached https://www.youtube.com/watch?v=Sb5aq5HcS1A&index=51&list=RDSBjQ9tuuTJQ it starts again with the first video. I'm pretty sure I tested again on Firefox and see the same behaviour, so I assumed it was an error on YouTube side. But now both on Firefox and Chrome the list doesn't seem to contain duplicates and continues (in Safari it's still limited to 51 videos). To be honest I don't know why it's that. In the case of youtube-dl, maybe adding the appropriate &index to the url may help (although we'll need to handle an infinite playlist).

Please sign in to comment.