Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[sproutvideo] Add new extractor (closes #7935, replaces #21962) #27685

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

TheZ3ro
Copy link

@TheZ3ro TheZ3ro commented Jan 5, 2021

Please follow the guide below

  • You will be asked some questions, please read them carefully and answer honestly
  • Put an x into all the boxes [ ] relevant to your pull request (like that [x])
  • Use Preview tab to see how your pull request will actually look like

Before submitting a pull request make sure you have:

In order to be accepted and merged into youtube-dl each piece of code must be in public domain or released under Unlicense. Check one of the following options:

  • I am the original author of this code and I am willing to release it under Unlicense
  • I am not the original author of this code but it is in public domain or released under Unlicense (provide reliable evidence)

What is the purpose of your pull request?

  • Bug fix
  • Improvement
  • New extractor
  • New feature

Description of your pull request and other information

This PR adds support for SproutVideo and every website/service that use it as a video provider.
This PR closes #7935, #16994, #16996 and #21333.
This PR also replaces #21962 since I cannot push on that PR anymore (due to the DCMA blockage).

Now the extractor adds the correct Accept, Origin and Referer HTTP header to avoid 403 errors.

For the full PR description refer to #21962, Thanks.

@gianpaj
Copy link

gianpaj commented Jan 8, 2021

I pull this, run make, and then got the same error as in the other branch.

  • How are you testing this?
./youtube-dl "http://videos.sproutvideo.com/embed/e89bddb01f1be3cf60/0d7fb4d67f328c8b"
[SproutVideo] e89bddb01f1be3cf60: Downloading webpage
[SproutVideo] e89bddb01f1be3cf60: Downloading m3u8 information
WARNING: Failed to download m3u8 information: HTTP Error 403: Forbidden
ERROR: No video formats found; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; type  youtube-dl -U  to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.
$ git log | cat |head -n5
Alias tip: g log | \cat |head -n5
commit 372d712dac1e3aaf76197665b1435a40964022fb
Author: thezero <[email protected]>
Date:   Tue Jan 5 18:40:36 2021 +0100

    [sproutvideo] add Accept, Origin and Referer headers to avoid 403
verbose

./youtube-dl "http://videos.sproutvideo.com/embed/e89bddb01f1be3cf60/0d7fb4d67f328c8b" -v
[debug] System config: []
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: [u'http://videos.sproutvideo.com/embed/e89bddb01f1be3cf60/0d7fb4d67f328c8b', u'-v']
[debug] Encodings: locale UTF-8, fs utf-8, out UTF-8, pref UTF-8
[debug] youtube-dl version 2021.01.03
[debug] Python version 2.7.16 (CPython) - Darwin-19.6.0-x86_64-i386-64bit
[debug] exe versions: ffmpeg 4.3.1, ffprobe 4.3.1, rtmpdump 2.4
[debug] Proxy map: {}
[SproutVideo] e89bddb01f1be3cf60: Downloading webpage
[SproutVideo] e89bddb01f1be3cf60: Downloading m3u8 information
WARNING: Failed to download m3u8 information: HTTP Error 403: Forbidden
ERROR: No video formats found; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; type  youtube-dl -U  to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.
Traceback (most recent call last):
  File "./youtube-dl/youtube_dl/YoutubeDL.py", line 803, in wrapper
    return func(self, *args, **kwargs)
  File "./youtube-dl/youtube_dl/YoutubeDL.py", line 824, in __extract_info
    ie_result = ie.extract(url)
  File "./youtube-dl/youtube_dl/extractor/common.py", line 532, in extract
    ie_result = self._real_extract(url)
  File "./youtube-dl/youtube_dl/extractor/sproutvideo.py", line 63, in _real_extract
    self._sort_formats(formats)
  File "./youtube-dl/youtube_dl/extractor/common.py", line 1367, in _sort_formats
    raise ExtractorError('No video formats found')
ExtractorError: No video formats found; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; type  youtube-dl -U  to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.

@TheZ3ro
Copy link
Author

TheZ3ro commented Jan 8, 2021

I usually test with python test/test_download.py TestDownload.test_SproutVideo.

I will look into it, even though I'm pretty much out of ideas now.

@gianpaj
Copy link

gianpaj commented Jan 10, 2021

2nd request is getting 403.

If you set debuglevel = 1 on https://github.com/TheZ3ro/youtube-dl-1/blob/sproutvideo/youtube_dl/YoutubeDL.py#L2354

you'll see:

[SproutVideo] 4c9dddb01910e3c9c4: Downloading m3u8 information
send: u'GET /49baec34e9983ed24492919e974bd436/86f37744278646b0dc7c2f60483e8382/video/index.m3u8?Policy=eyJTdGF0ZW1lbnQiOlt7IlJlc291cmNlIjoiaHR0cHM6Ly9obHMyLnZpZGVvcy5zcHJvdXR2aWRlby5jb20vNDliYWVjMzRlOTk4M2VkMjQ0OTI5MTllOTc0YmQ0MzYvODZmMzc3NDQyNzg2NDZiMGRjN2MyZjYwNDgzZTgzODIvKi5tM3U4P3Nlc3Npb25JRD0xZmFlOGY0MC0yNzJiLTRlMjQtYjgwNS00ZTlkNTBiYzliNDYiLCJDb25kaXRpb24iOnsiRGF0ZUxlc3NUaGFuIjp7IkFXUzpFcG9jaFRpbWUiOjE2MTAzNDA4MDF9fX1dfQ__&sessionID=1fae8f40-272b-4e24-b805-4e9d50bc9b46&Signature=qIo4NQw-hMPyMFsJ2RLvuEzC92PPPX%7E0iWBL7BnfzPsgQ0%7EoQR--pfa4162wk5IZ-2gMgc3mMC57jMF2fYJwfstFzBCLooF3JFakiWifs%7Exn3dukag381CQquBdaSpObHf8baZsv1Vzgf8zF%7EeAmpzE4W4m9QojVAuDp212Gfqp9lVN6P0kQRe%7EJXkdfsCBkaFaKD7-x7MXM1huVz8K9r9qPgIK9KH8%7EQxwSbRknMeYFU7RkHxtgZ6ISrvVHHlWJ5Orgg71g6-WeHRgmBJ-xG4wyPsnaxdSLvawVqzPZpGjB8R0uyCxMj0j-7UpgBFupHmqbEQQOUMYRabgvO1H3WA__&Key-Pair-Id=APKAIB5DGCGAQJ4GGIUQ HTTP/1.1\r\nOrigin: https://videos.sproutvideo.com\r\nAccept-Language: en-us,en;q=0.5\r\nAccept-Encoding: gzip, deflate\r\nHost: hls2.videos.sproutvideo.com\r\nAccept: */*\r\nUser-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3532.7 Safari/537.36\r\nAccept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7\r\nConnection: close\r\nCookie: svid=c13f97e9-6320-4676-921c-471827ea0e05\r\nReferer: https://videos.sproutvideo.com/embed/4c9dddb01910e3c9c4/0fc24387c4f24ee3\r\n\r\n'
reply: 'HTTP/1.1 403 Forbidden\r\n'
header: Server: CloudFront
header: Date: Sun, 10 Jan 2021 22:53:20 GMT
header: Content-Type: text/html
header: Content-Length: 919
header: Connection: close
header: X-Cache: Error from cloudfront
header: Via: 1.1 56eff4217adb539e7a42fbab3eee2d4d.cloudfront.net (CloudFront)
header: X-Amz-Cf-Pop: MAD51-C2
header: X-Amz-Cf-Id: LeV99Qe3awkY47TGjk1EkCvEwFnQ5arksen-TCT6CXz2zRg1bScwOQ==
detail

$ python test/test_download.py TestDownload.test_SproutVideo
[SproutVideo] 4c9dddb01910e3c9c4: Downloading webpage
send: u'GET /embed/4c9dddb01910e3c9c4/0fc24387c4f24ee3 HTTP/1.1\r\nAccept-Language: en-us,en;q=0.5\r\nAccept-Encoding: gzip, deflate\r\nConnection: close\r\nAccept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8\r\nUser-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3532.7 Safari/537.36\r\nAccept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7\r\nHost: videos.sproutvideo.com\r\n\r\n'
reply: 'HTTP/1.1 200 OK\r\n'
header: Access-Control-Allow-Headers: Origin, X-Requested-With, Content-Type, Accept
header: Access-Control-Allow-Methods: GET
header: Access-Control-Allow-Origin: *
header: Content-Encoding: gzip
header: Content-Type: text/html; charset=utf-8
header: Date: Sun, 10 Jan 2021 22:53:20 GMT
header: ETag: "-1947696748"
header: p3p: CP="NOI CURa ADMa DEVa TAIa OUR BUS IND UNI COM NAV INT"
header: Referrer-Policy: no-referrer-when-downgrade
header: Set-Cookie: svid=c13f97e9-6320-4676-921c-471827ea0e05; max-age=31556952000; path=/; SameSite=None; Secure
header: Vary: Accept-Encoding
header: X-Powered-By: Express
header: X-XSS-Protection: 0
header: transfer-encoding: chunked
header: Connection: Close
[SproutVideo] 4c9dddb01910e3c9c4: Downloading m3u8 information
send: u'GET /49baec34e9983ed24492919e974bd436/86f37744278646b0dc7c2f60483e8382/video/index.m3u8?Policy=eyJTdGF0ZW1lbnQiOlt7IlJlc291cmNlIjoiaHR0cHM6Ly9obHMyLnZpZGVvcy5zcHJvdXR2aWRlby5jb20vNDliYWVjMzRlOTk4M2VkMjQ0OTI5MTllOTc0YmQ0MzYvODZmMzc3NDQyNzg2NDZiMGRjN2MyZjYwNDgzZTgzODIvKi5tM3U4P3Nlc3Npb25JRD0xZmFlOGY0MC0yNzJiLTRlMjQtYjgwNS00ZTlkNTBiYzliNDYiLCJDb25kaXRpb24iOnsiRGF0ZUxlc3NUaGFuIjp7IkFXUzpFcG9jaFRpbWUiOjE2MTAzNDA4MDF9fX1dfQ__&sessionID=1fae8f40-272b-4e24-b805-4e9d50bc9b46&Signature=qIo4NQw-hMPyMFsJ2RLvuEzC92PPPX%7E0iWBL7BnfzPsgQ0%7EoQR--pfa4162wk5IZ-2gMgc3mMC57jMF2fYJwfstFzBCLooF3JFakiWifs%7Exn3dukag381CQquBdaSpObHf8baZsv1Vzgf8zF%7EeAmpzE4W4m9QojVAuDp212Gfqp9lVN6P0kQRe%7EJXkdfsCBkaFaKD7-x7MXM1huVz8K9r9qPgIK9KH8%7EQxwSbRknMeYFU7RkHxtgZ6ISrvVHHlWJ5Orgg71g6-WeHRgmBJ-xG4wyPsnaxdSLvawVqzPZpGjB8R0uyCxMj0j-7UpgBFupHmqbEQQOUMYRabgvO1H3WA__&Key-Pair-Id=APKAIB5DGCGAQJ4GGIUQ HTTP/1.1\r\nOrigin: https://videos.sproutvideo.com\r\nAccept-Language: en-us,en;q=0.5\r\nAccept-Encoding: gzip, deflate\r\nHost: hls2.videos.sproutvideo.com\r\nAccept: */*\r\nUser-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3532.7 Safari/537.36\r\nAccept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7\r\nConnection: close\r\nCookie: svid=c13f97e9-6320-4676-921c-471827ea0e05\r\nReferer: https://videos.sproutvideo.com/embed/4c9dddb01910e3c9c4/0fc24387c4f24ee3\r\n\r\n'
reply: 'HTTP/1.1 403 Forbidden\r\n'
header: Server: CloudFront
header: Date: Sun, 10 Jan 2021 22:53:20 GMT
header: Content-Type: text/html
header: Content-Length: 919
header: Connection: close
header: X-Cache: Error from cloudfront
header: Via: 1.1 56eff4217adb539e7a42fbab3eee2d4d.cloudfront.net (CloudFront)
header: X-Amz-Cf-Pop: MAD51-C2
header: X-Amz-Cf-Id: LeV99Qe3awkY47TGjk1EkCvEwFnQ5arksen-TCT6CXz2zRg1bScwOQ==
ERROR: Failed to download m3u8 information: HTTP Error 403: Forbidden; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see  https://yt-dl.org/update  on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.
Traceback (most recent call last):
  File "/Users/gpalumbo/temp/youtube-dl/youtube_dl/extractor/common.py", line 632, in _request_webpage
    return self._downloader.urlopen(url_or_request)
  File "/Users/gpalumbo/temp/youtube-dl/youtube_dl/YoutubeDL.py", line 2248, in urlopen
    return self._opener.open(req, timeout=self._socket_timeout)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 435, in open
    response = meth(req, response)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 548, in http_response
    'http', request, response, code, msg, hdrs)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 473, in error
    return self._call_chain(*args)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 407, in _call_chain
    result = func(*args)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 556, in http_error_default
    raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
HTTPError: HTTP Error 403: Forbidden
Traceback (most recent call last):
  File "/Users/gpalumbo/temp/youtube-dl/youtube_dl/YoutubeDL.py", line 803, in wrapper
    return func(self, *args, **kwargs)
  File "/Users/gpalumbo/temp/youtube-dl/youtube_dl/YoutubeDL.py", line 824, in __extract_info
    ie_result = ie.extract(url)
  File "/Users/gpalumbo/temp/youtube-dl/youtube_dl/extractor/common.py", line 532, in extract
    ie_result = self._real_extract(url)
  File "/Users/gpalumbo/temp/youtube-dl/youtube_dl/extractor/sproutvideo.py", line 62, in _real_extract
    headers=custom_headers)
  File "/Users/gpalumbo/temp/youtube-dl/youtube_dl/extractor/common.py", line 1636, in _extract_m3u8_formats
    fatal=fatal, data=data, headers=headers, query=query)
  File "/Users/gpalumbo/temp/youtube-dl/youtube_dl/extractor/common.py", line 665, in _download_webpage_handle
    urlh = self._request_webpage(url_or_request, video_id, note, errnote, fatal, data=data, headers=headers, query=query, expected_status=expected_status)
  File "/Users/gpalumbo/temp/youtube-dl/youtube_dl/extractor/common.py", line 652, in _request_webpage
    self._downloader.report_warning(errmsg)
  File "/Users/gpalumbo/temp/youtube-dl/test/helper.py", line 271, in _report_warning
    real_warning(w)
  File "test/test_download.py", line 52, in report_warning
    raise ExtractorError(message)
ExtractorError: Failed to download m3u8 information: HTTP Error 403: Forbidden; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see  https://yt-dl.org/update  on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.

E
======================================================================
ERROR: test_SproutVideo (__main__.TestDownload):
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test/test_download.py", line 159, in test_template
    force_generic_extractor=params.get('force_generic_extractor', False))
  File "/Users/gpalumbo/temp/youtube-dl/youtube_dl/YoutubeDL.py", line 796, in extract_info
    return self.__extract_info(url, ie, download, extra_info, process)
  File "/Users/gpalumbo/temp/youtube-dl/youtube_dl/YoutubeDL.py", line 812, in wrapper
    self.report_error(compat_str(e), e.format_traceback())
  File "/Users/gpalumbo/temp/youtube-dl/youtube_dl/YoutubeDL.py", line 625, in report_error
    self.trouble(error_message, tb)
  File "/Users/gpalumbo/temp/youtube-dl/youtube_dl/YoutubeDL.py", line 595, in trouble
    raise DownloadError(message, exc_info)
DownloadError: ERROR: Failed to download m3u8 information: HTTP Error 403: Forbidden; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see  https://yt-dl.org/update  on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.

----------------------------------------------------------------------
Ran 1 test in 1.353s

FAILED (errors=1)

For some reason, the order of the query parameters is important. If you move the sessionID to the end it works 🤷‍♀️

This doesn't work:

curl -I -X GET 'https://hls2.videos.sproutvideo.com/49baec34e9983ed24492919e974bd436/86f37744278646b0dc7c2f60483e8382/video/index.m3u8?Policy=eyJTdGF0ZW1lbnQiOlt7IlJlc291cmNlIjoiaHR0cHM6Ly9obHMyLnZpZGVvcy5zcHJvdXR2aWRlby5jb20vNDliYWVjMzRlOTk4M2VkMjQ0OTI5MTllOTc0YmQ0MzYvODZmMzc3NDQyNzg2NDZiMGRjN2MyZjYwNDgzZTgzODIvKi5tM3U4P3Nlc3Npb25JRD0xZmFlOGY0MC0yNzJiLTRlMjQtYjgwNS00ZTlkNTBiYzliNDYiLCJDb25kaXRpb24iOnsiRGF0ZUxlc3NUaGFuIjp7IkFXUzpFcG9jaFRpbWUiOjE2MTAzNDA4MDF9fX1dfQ__&sessionID=1fae8f40-272b-4e24-b805-4e9d50bc9b46&Signature=qIo4NQw-hMPyMFsJ2RLvuEzC92PPPX%7E0iWBL7BnfzPsgQ0%7EoQR--pfa4162wk5IZ-2gMgc3mMC57jMF2fYJwfstFzBCLooF3JFakiWifs%7Exn3dukag381CQquBdaSpObHf8baZsv1Vzgf8zF%7EeAmpzE4W4m9QojVAuDp212Gfqp9lVN6P0kQRe%7EJXkdfsCBkaFaKD7-x7MXM1huVz8K9r9qPgIK9KH8%7EQxwSbRknMeYFU7RkHxtgZ6ISrvVHHlWJ5Orgg71g6-WeHRgmBJ-xG4wyPsnaxdSLvawVqzPZpGjB8R0uyCxMj0j-7UpgBFupHmqbEQQOUMYRabgvO1H3WA__&Key-Pair-Id=APKAIB5DGCGAQJ4GGIUQ'
HTTP/2 403
server: CloudFront
date: Sun, 10 Jan 2021 23:03:31 GMT
content-type: text/html
content-length: 919
x-cache: Error from cloudfront
via: 1.1 4ddf42f206fdf10afe67b89baac28c46.cloudfront.net (CloudFront)
x-amz-cf-pop: MAD51-C2
x-amz-cf-id: zvjmR3FzJZcUlubgLaKVlHHdGnWvOAEfaX35LKwb_XDTiAFibzUFxg==

This does:

curl -I -X GET 'https://hls2.videos.sproutvideo.com/49baec34e9983ed24492919e974bd436/86f37744278646b0dc7c2f60483e8382/video/index.m3u8?Policy=eyJTdGF0ZW1lbnQiOlt7IlJlc291cmNlIjoiaHR0cHM6Ly9obHMyLnZpZGVvcy5zcHJvdXR2aWRlby5jb20vNDliYWVjMzRlOTk4M2VkMjQ0OTI5MTllOTc0YmQ0MzYvODZmMzc3NDQyNzg2NDZiMGRjN2MyZjYwNDgzZTgzODIvKi5tM3U4P3Nlc3Npb25JRD0xZmFlOGY0MC0yNzJiLTRlMjQtYjgwNS00ZTlkNTBiYzliNDYiLCJDb25kaXRpb24iOnsiRGF0ZUxlc3NUaGFuIjp7IkFXUzpFcG9jaFRpbWUiOjE2MTAzNDA4MDF9fX1dfQ__&Signature=qIo4NQw-hMPyMFsJ2RLvuEzC92PPPX%7E0iWBL7BnfzPsgQ0%7EoQR--pfa4162wk5IZ-2gMgc3mMC57jMF2fYJwfstFzBCLooF3JFakiWifs%7Exn3dukag381CQquBdaSpObHf8baZsv1Vzgf8zF%7EeAmpzE4W4m9QojVAuDp212Gfqp9lVN6P0kQRe%7EJXkdfsCBkaFaKD7-x7MXM1huVz8K9r9qPgIK9KH8%7EQxwSbRknMeYFU7RkHxtgZ6ISrvVHHlWJ5Orgg71g6-WeHRgmBJ-xG4wyPsnaxdSLvawVqzPZpGjB8R0uyCxMj0j-7UpgBFupHmqbEQQOUMYRabgvO1H3WA__&Key-Pair-Id=APKAIB5DGCGAQJ4GGIUQ&sessionID=1fae8f40-272b-4e24-b805-4e9d50bc9b46'
HTTP/2 200
content-type: application/x-mpegURL
content-length: 554
date: Sun, 10 Jan 2021 22:55:30 GMT
last-modified: Wed, 05 Jun 2019 18:07:14 GMT
etag: "34862399178b7948dde82c0f6a769312"
cache-control: max-age=31536000
accept-ranges: bytes
server: AmazonS3
x-cache: Hit from cloudfront
via: 1.1 3a040ac81c3e03a31883d4bf85a17866.cloudfront.net (CloudFront)
x-amz-cf-pop: MAD51-C2
x-amz-cf-id: aBpd0SYdO4SyzGe13RaNKajyZXI7cTTaH-xqypOItoI4ZVxwYMFh6g==
age: 494

@TheZ3ro
Copy link
Author

TheZ3ro commented Jan 11, 2021

It seems more like a CloudFront caching problem.

In the request that don't work you get a x-cache: Error from cloudfront header in the response meaning that for that request there is no element in the cache.
While for the other request you get a x-cache: Hit from cloudfront header meaning that the response was already fetched before, it's in the cache and can be served directly.

@sideloading
Copy link

Also getting 403, but the test with python test/test_download.py TestDownload.test_SproutVideo works - not sure how to get past it. @gianpaj any luck?

@frans77frans
Copy link

Commenting to support this request. I'd really like a new extractor for this site.

J0s3f added a commit to J0s3f/yt-dlp that referenced this pull request Apr 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Site Support request: sproutvideo.com
4 participants