Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Twitter: 404 not found on some tweets #1145

Closed
musjj opened this issue Nov 30, 2020 · 6 comments
Closed

Twitter: 404 not found on some tweets #1145

musjj opened this issue Nov 30, 2020 · 6 comments

Comments

@musjj
Copy link

musjj commented Nov 30, 2020

I have a problem with this tweet:
https://twitter.com/i/web/status/1331186716795781122
You can view the image just fine in the web page, but gallery-dl fails to download it.

gallery-dl --ignore-config --verbose https://twitter.com/i/web/status/1331186716795781122

[gallery-dl][debug] Version 1.16.0-dev
[gallery-dl][debug] Python 3.8.1 - Windows-10-10.0.18362-SP0
[gallery-dl][debug] requests 2.24.0 - urllib3 1.25.9
[gallery-dl][debug] Starting DownloadJob for 'https://twitter.com/i/web/status/1331186716795781122'
[twitter][debug] Using TwitterTweetExtractor for 'https://twitter.com/i/web/status/1331186716795781122'
[urllib3.connectionpool][debug] Starting new HTTPS connection (1): api.twitter.com:443
[urllib3.connectionpool][debug] https://api.twitter.com:443 "GET /2/timeline/conversation/1331186716795781122.json?include_profile_interstitial_type=1&include_blocking=1&include_blocked_by=1&include_followed_by=1&include_want_retweets=1&include_mute_edge=1&include_can_dm=1&include_can_media_tag=1&skip_status=1&cards_platform=Web-12&include_cards=1&include_composer_source=true&include_ext_alt_text=true&include_reply_count=1&tweet_mode=extended&include_entities=true&include_user_entities=true&include_ext_media_color=true&include_ext_media_availability=true&send_error_codes=true&simple_quoted_tweet=true&count=100&ext=mediaStats%2ChighlightedLabel%2CcameraMoment&include_quote_count=true HTTP/1.1" 200 3446
[urllib3.connectionpool][debug] Starting new HTTPS connection (1): pbs.twimg.com:443
[urllib3.connectionpool][debug] https://pbs.twimg.com:443 "GET /media/EnlTOuPVkAEVFcG.jpg:orig HTTP/1.1" 404 0
[downloader.http][warning] '404 Not Found' for 'https://pbs.twimg.com/media/EnlTOuPVkAEVFcG.jpg:orig'
[download][info] Trying fallback URL #1
[urllib3.connectionpool][debug] Starting new HTTPS connection (2): pbs.twimg.com:443
[urllib3.connectionpool][debug] https://pbs.twimg.com:443 "GET /media/EnlTOuPVkAEVFcG.jpg:large HTTP/1.1" 404 0
[downloader.http][warning] '404 Not Found' for 'https://pbs.twimg.com/media/EnlTOuPVkAEVFcG.jpg:large'
[download][info] Trying fallback URL #2
[urllib3.connectionpool][debug] Starting new HTTPS connection (3): pbs.twimg.com:443
[urllib3.connectionpool][debug] https://pbs.twimg.com:443 "GET /media/EnlTOuPVkAEVFcG.jpg:medium HTTP/1.1" 404 0
[downloader.http][warning] '404 Not Found' for 'https://pbs.twimg.com/media/EnlTOuPVkAEVFcG.jpg:medium'
[download][info] Trying fallback URL #3
[urllib3.connectionpool][debug] Starting new HTTPS connection (4): pbs.twimg.com:443
[urllib3.connectionpool][debug] https://pbs.twimg.com:443 "GET /media/EnlTOuPVkAEVFcG.jpg:small HTTP/1.1" 404 0
[downloader.http][warning] '404 Not Found' for 'https://pbs.twimg.com/media/EnlTOuPVkAEVFcG.jpg:small'
[download][error] Failed to download 1331186716795781122_1.jpg
@kattjevfel
Copy link
Contributor

For some reason https://pbs.twimg.com/media/EnlTOuPVkAEVFcG?format=jpg&name=medium exists, but https://pbs.twimg.com/media/EnlTOuPVkAEVFcG.jpg:medium doesn't. I'm not sure which is the "best" here but Image Max URL does the former.

@musjj
Copy link
Author

musjj commented Nov 30, 2020

Yeah the web URL points to the former one too.
Maybe the *.ext:size URL format is getting slowly deprecated?

@Hrxn
Copy link
Contributor

Hrxn commented Nov 30, 2020

Yeah the web URL points to the former one too.
Maybe the *.ext:size URL format is getting slowly deprecated?

Yes, that is what I would assume.

@qsniyg
Copy link

qsniyg commented Nov 30, 2020

What's strange is that twitter's web api actually gives the .jpg version for media_url. But yeah, things like cards don't work with .jpg, and require &format=jpg instead (and have been that way for quite a while), so I wouldn't be surprised if this is just the way that twitter's going (though I fail to understand the reasoning).

mikf added a commit that referenced this issue Dec 3, 2020
use
'/<name>?format=<fmt>&name=<size>'
instead of the potentially deprecated
'/<name>.<fmt>:<size>'

but keep all of them as fallback URLs
@mikf
Copy link
Owner

mikf commented Dec 3, 2020

Updated the image URL format for Twitter in 63e61a0 (and kept the old ones as fallback URLs), but it seems like the image from https://twitter.com/i/web/status/1331186716795781122 is now gone for good. All that's left is an empty preview and a "An error occurred loading this image" message when enlarging it.

Trying to download it with new/old URLs in gallery-dl doesn't work anymore either:

$ gallery-dl https://twitter.com/i/web/status/1331186716795781122
[downloader.http][warning] '404 Not Found' for 'https://pbs.twimg.com/media/EnlTOuPVkAEVFcG?format=jpg&name=orig'
[download][info] Trying fallback URL #1
[downloader.http][warning] '404 Not Found' for 'https://pbs.twimg.com/media/EnlTOuPVkAEVFcG.jpg:orig'
[download][info] Trying fallback URL #2
[downloader.http][warning] '404 Not Found' for 'https://pbs.twimg.com/media/EnlTOuPVkAEVFcG?format=jpg&name=large'
[download][info] Trying fallback URL #3
[downloader.http][warning] '404 Not Found' for 'https://pbs.twimg.com/media/EnlTOuPVkAEVFcG.jpg:large'
[download][info] Trying fallback URL #4
[downloader.http][warning] '404 Not Found' for 'https://pbs.twimg.com/media/EnlTOuPVkAEVFcG?format=jpg&name=medium'
[download][info] Trying fallback URL #5
[downloader.http][warning] '404 Not Found' for 'https://pbs.twimg.com/media/EnlTOuPVkAEVFcG.jpg:medium'
[download][info] Trying fallback URL #6
[downloader.http][warning] '404 Not Found' for 'https://pbs.twimg.com/media/EnlTOuPVkAEVFcG?format=jpg&name=small'
[download][info] Trying fallback URL #7
[downloader.http][warning] '404 Not Found' for 'https://pbs.twimg.com/media/EnlTOuPVkAEVFcG.jpg:small'
[download][error] Failed to download 1331186716795781122_1.jpg

The internal data also shows that something went wrong for this image:

...
    "ext": {
        "mediaStats": {
            "r": "Missing",
            "ttl": -1
        }
    },
    "ext_media_availability": {
        "reason": "deleted",
        "status": "unavailable"
    },
...

@musjj
Copy link
Author

musjj commented Dec 3, 2020

Thanks for the fix!
Didn't expect the image to suddenly disappear for good like that, should've saved it while I had the chance.
This kind of error actually happens a lot with really old tweets, but this is the first time I've seen it happening on a tweet that is barely over a week old.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants