Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Gelbooru] some pic URLs mistakenly modified as "webm" by gallery-dl when downloading search page #2188

Closed
mo-han opened this issue Jan 12, 2022 · 3 comments

Comments

@mo-han
Copy link
Contributor

mo-han commented Jan 12, 2022

was trying to download a tag:
https://gelbooru.com/index.php?page=post&s=list&tags=yana_(nekoarashi)

some involved picture page such as:
https://gelbooru.com/index.php?page=post&s=view&id=6603204
which contains JPEG 75a2ea77d7f487ff91322fe4044e28ab.jpg with URL:
https://img3.gelbooru.com/images/75/a2/75a2ea77d7f487ff91322fe4044e28ab.jpg
but gallery-dl tries to download it from as a WEBM 75a2ea77d7f487ff91322fe4044e28ab.webm (which of course does not exist):
https://img3.gelbooru.com/images/75/a2/75a2ea77d7f487ff91322fe4044e28ab.webm

however this bug does not occur when download that single page.
repeat, this bug is only triggered when downloading a search result which contains that page.
here are CLI result:

>gallery-dl "https://gelbooru.com/index.php?page=post&s=view&id=6603204"
* .\gallery-dl\gelbooru\gelbooru_6603204_75a2ea77d7f487ff91322fe4044e28ab.jpg

>gallery-dl "https://gelbooru.com/index.php?page=post&s=list&tags=yana_%28nekoarashi%29" -vv
[gallery-dl][debug] Version 1.20.1
[gallery-dl][debug] Python 3.6.8 - Windows-10-10.0.14393-SP0
[gallery-dl][debug] requests 2.26.0 - urllib3 1.22
[gallery-dl][debug] Starting DownloadJob for 'https://gelbooru.com/index.php?page=post&s=list&tags=yana_%28nekoarashi%29'
[gelbooru][debug] Using GelbooruTagExtractor for 'https://gelbooru.com/index.php?page=post&s=list&tags=yana_%28nekoarashi%29'
[urllib3.connectionpool][debug] Starting new HTTPS connection (1): gelbooru.com
[urllib3.connectionpool][debug] https://gelbooru.com:443 "GET /index.php?page=dapi&s=post&q=index&json=1&tags=yana_%28nekoarashi%29&pid=0&limit=100 HTTP/1.1" 200 None
# .\gallery-dl\gelbooru\yana_(nekoarashi)\gelbooru_6826256_24b03cafd2b6f5f34517952bb2a619e3.jpg
# .\gallery-dl\gelbooru\yana_(nekoarashi)\gelbooru_6814413_93f4a7e0f87805a065eeffe360c4ff84.jpg
# .\gallery-dl\gelbooru\yana_(nekoarashi)\gelbooru_6796897_a23e3206418b052a0c312962d18db462.png
# .\gallery-dl\gelbooru\yana_(nekoarashi)\gelbooru_6796895_21fccc1b4ecb9f3ce99ec86aee8932d2.png
# .\gallery-dl\gelbooru\yana_(nekoarashi)\gelbooru_6785846_559824707ff472f3ad9d8917c33e9438.png
# .\gallery-dl\gelbooru\yana_(nekoarashi)\gelbooru_6769794_8a5d352e213ec8660a7a5a3401cd7d15.jpg
# .\gallery-dl\gelbooru\yana_(nekoarashi)\gelbooru_6684534_d1b2dabccd647e265eb0d7e7d802b365.gif
# .\gallery-dl\gelbooru\yana_(nekoarashi)\gelbooru_6684528_fb8f7a2e349f68dc9bb94ad3b7223571.jpg
# .\gallery-dl\gelbooru\yana_(nekoarashi)\gelbooru_6683470_a84052e16a7d8c42cb03fa6854c316a3.jpg
# .\gallery-dl\gelbooru\yana_(nekoarashi)\gelbooru_6682789_dc472bba3b5ff7ab8e63bdd3c512db59.jpg
# .\gallery-dl\gelbooru\yana_(nekoarashi)\gelbooru_6682763_18e15536a37043c16e94b4b52e70bbfc.png
# .\gallery-dl\gelbooru\yana_(nekoarashi)\gelbooru_6682760_2ca457610b12960255a890bb525fd3bf.gif
# .\gallery-dl\gelbooru\yana_(nekoarashi)\gelbooru_6682759_646069953724476623a8226bf4c303b7.webm
[urllib3.connectionpool][debug] Starting new HTTPS connection (1): img3.gelbooru.com
[urllib3.connectionpool][debug] https://img3.gelbooru.com:443 "GET /images/75/a2/75a2ea77d7f487ff91322fe4044e28ab.webm HTTP/1.1" 404 None
[downloader.http][warning] '404 Not Found' for 'https://img3.gelbooru.com/images/75/a2/75a2ea77d7f487ff91322fe4044e28ab.webm'
[download][info] Trying fallback URL #1
[urllib3.connectionpool][debug] Starting new HTTPS connection (1): img2.gelbooru.com
[downloader.http][warning] HTTPSConnectionPool(host='img2.gelbooru.com', port=443): Max retries exceeded with url: /images/75/a2/75a2ea77d7f487ff91322fe4044e28ab.webm (Caused by SSLError(SSLEOFError(8, 'EOF occurred in violation of protocol (_ssl.c:852)'),)) (1/5)
[urllib3.connectionpool][debug] Starting new HTTPS connection (2): img2.gelbooru.com
[downloader.http][warning] HTTPSConnectionPool(host='img2.gelbooru.com', port=443): Max retries exceeded with url: /images/75/a2/75a2ea77d7f487ff91322fe4044e28ab.webm (Caused by SSLError(SSLEOFError(8, 'EOF occurred in violation of protocol (_ssl.c:852)'),)) (2/5)
@mikf
Copy link
Owner

mikf commented Jan 13, 2022

For some reason Gelbooru uses https://video-cdn3.gelbooru.com/ as domain for recently uploaded images for /index.php?page=dapi&s=post&q=index, and gallery-dl was rewriting those to .webm.

@mo-han
Copy link
Contributor Author

mo-han commented Jan 14, 2022

rewriting those to .webm.

interesting, so MP4 is also rewritten to WebM and still able to downloaded successfully?

@mikf
Copy link
Owner

mikf commented Jan 17, 2022

Yes, that works even for .mp4 files. There was an issue were Gelbooru switched all video URLs to .mp4 and older videos that weren't converted yet just gave a 404, but the webm version still worked.
(#1048, 7a0ba37)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants