Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

youtube embeds on recent (2019?) squarespace sites fail to download #21294

Closed
5 tasks done
galgeek opened this issue Jun 4, 2019 · 1 comment
Closed
5 tasks done

youtube embeds on recent (2019?) squarespace sites fail to download #21294

galgeek opened this issue Jun 4, 2019 · 1 comment
Labels
site-support-request Add extractor(s) for a new domain

Comments

@galgeek
Copy link
Contributor

galgeek commented Jun 4, 2019

Checklist

  • I'm reporting a new site support request
  • I've verified that I'm running youtube-dl version 2019.07.16
  • I've checked that all provided URLs are alive and playable in a browser
  • I've checked that none of provided URLs violate any copyrights
  • I've searched the bugtracker for similar site support requests including closed ones

Description

youtube-dl fails on youtube embeds on some recent squarespace sites, including
http://www.ootboxford.com/
https://www.harvardballetcompany.org/past-productions
http://www.immediategratification.org/

youtube-dl --write-pages http://www.ootboxford.com
produces a dump showing the site's youtube embed like so:

<div class="sqs-block video-block sqs-block-video" data-block-json="&#123;&quot;layout&quot;:&quot;caption-hidden&quot;,&quot;overlay&quot;:false,&quot;description&quot;:&#123;&quot;html&quot;:&quot;&lt;p&gt;Long-standing supporters of Helen &amp;amp; Douglas House, Out of the Blue performed at the last Childish Things - here they are, talking about the show, and their involvement with the charity.&lt;/p&gt;&quot;,&quot;source&quot;:&quot;&lt;p&gt;Long-standing supporters of Helen &amp;amp; Douglas House, Out of the Blue performed at the last Childish Things - here they are, talking about the show, and their involvement with the charity.&lt;/p&gt;&quot;&#125;,&quot;hSize&quot;:null,&quot;floatDir&quot;:null,&quot;html&quot;:&quot;&lt;iframe src=\&quot;//www.youtube.com/embed/Tc7b_JGdZfw?wmode=opaque&amp;amp;enablejsapi=1\&quot; height=\&quot;480\&quot; width=\&quot;854\&quot; scrolling=\&quot;no\&quot; frameborder=\&quot;0\&quot; allowfullscreen=\&quot;\&quot;&gt;\n&lt;/iframe&gt;&quot;,&quot;url&quot;:&quot;https://www.youtube.com/watch?v=Tc7b_JGdZfw&quot;,&quot;width&quot;:854,&quot;height&quot;:480,&quot;providerName&quot;:&quot;YouTube&quot;,&quot;thumbnailUrl&quot;:&quot;https://i.ytimg.com/vi/Tc7b_JGdZfw/hqdefault.jpg&quot;,&quot;resolvedBy&quot;:&quot;youtube&quot;&#125;" data-block-type="32" id="block-yui_3_17_2_22_1450272268012_7317">
youtube-dl -v http://www.ootboxford.com
[debug] System config: []
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: ['-v', 'http://www.ootboxford.com']
[debug] Encodings: locale UTF-8, fs utf-8, out UTF-8, pref UTF-8
[debug] youtube-dl version 2019.07.16
[debug] Python version 3.5.6 (CPython) - Darwin-16.7.0-x86_64-i386-64bit
[debug] exe versions: none
[debug] Proxy map: {}
[generic] www.ootboxford: Requesting header
WARNING: Falling back on generic information extractor.
[generic] www.ootboxford: Downloading webpage
[generic] www.ootboxford: Extracting information
ERROR: Unsupported URL: http://www.ootboxford.com
Traceback (most recent call last):
  File "/1/broz-venv/lib/python3.5/site-packages/youtube_dl/YoutubeDL.py", line 796, in extract_info
    ie_result = ie.extract(url)
  File "/1/broz-venv/lib/python3.5/site-packages/youtube_dl/extractor/common.py", line 530, in extract
    ie_result = self._real_extract(url)
  File "/1/broz-venv/lib/python3.5/site-packages/youtube_dl/extractor/generic.py", line 3333, in _real_extract
    raise UnsupportedError(url)
youtube_dl.utils.UnsupportedError: Unsupported URL: http://www.ootboxford.com
@galgeek galgeek added the site-support-request Add extractor(s) for a new domain label Jun 4, 2019
@galgeek galgeek changed the title youtube embeds on 2019 squarespace sites fail to download youtube embeds on 2019 squarespace sites and at least one wordpress site fail to download Jun 21, 2019
@galgeek galgeek changed the title youtube embeds on 2019 squarespace sites and at least one wordpress site fail to download youtube embeds on recent (2019?) squarespace sites fail to download Jul 16, 2019
@galgeek
Copy link
Contributor Author

galgeek commented Aug 21, 2019

This issue is still present in youtube-dl version 2019.08.13.

$ youtube-dl -v http://ootboxford.com
[debug] System config: []
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: ['-v', 'http://ootboxford.com']
[debug] Encodings: locale UTF-8, fs utf-8, out UTF-8, pref UTF-8
[debug] youtube-dl version 2019.08.13
[debug] Python version 3.5.6 (CPython) - Darwin-16.7.0-x86_64-i386-64bit
[debug] exe versions: none
[debug] Proxy map: {}
[generic] ootboxford: Requesting header
[redirect] Following redirect to http://www.ootboxford.com/
[generic] www.ootboxford: Requesting header
WARNING: Falling back on generic information extractor.
[generic] www.ootboxford: Downloading webpage
[generic] www.ootboxford: Extracting information
ERROR: Unsupported URL: http://www.ootboxford.com/
Traceback (most recent call last):
  File "/1/broz-venv/lib/python3.5/site-packages/youtube_dl/YoutubeDL.py", line 796, in extract_info
    ie_result = ie.extract(url)
  File "/1/broz-venv/lib/python3.5/site-packages/youtube_dl/extractor/common.py", line 530, in extract
    ie_result = self._real_extract(url)
  File "/1/broz-venv/lib/python3.5/site-packages/youtube_dl/extractor/generic.py", line 3333, in _real_extract
    raise UnsupportedError(url)
youtube_dl.utils.UnsupportedError: Unsupported URL: http://www.ootboxford.com/

@dstftw dstftw closed this as completed in d78657f Aug 31, 2019
meunierd referenced this issue in meunierd/youtube-dl Feb 13, 2020
pareronia referenced this issue in pareronia/youtube-dl Jun 22, 2020
coletdjnz added a commit to yt-dlp/yt-dlp that referenced this issue Oct 14, 2022
Extracts embeds from escaped HTML within `data-html` attribute.
Related: ytdl-org/youtube-dl#21294, #5121

Authored by: coletdjnz
Co-authored-by: pukkandan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
site-support-request Add extractor(s) for a new domain
Projects
None yet
Development

No branches or pull requests

1 participant