Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Twitter is now broken #806

Closed
electricduck opened this issue Jun 2, 2020 · 42 comments
Closed

Twitter is now broken #806

electricduck opened this issue Jun 2, 2020 · 42 comments

Comments

@electricduck
Copy link

electricduck commented Jun 2, 2020

For every Twitter post, I am now receiving this output. Was happening every so often a few hours ago, and now it does it all the time. Using gallery-dl 1.40.0.

Input
$ gallery-dl -j https://twitter.com/MotocrossNews/status/1267608884325216256

Output

[
  [
    "ValueError",
    "substring not found"
  ]
]

Verbose

[gallery-dl][debug] Version 1.14.0
[gallery-dl][debug] Python 3.5.2 - Linux-4.4.0-179-generic-x86_64-with-Ubuntu-16.04-xenial
[gallery-dl][debug] requests 2.23.0 - urllib3 1.25.9
[gallery-dl][debug] Starting DownloadJob for 'https://twitter.com/MotocrossNews/status/1267608884325216256'
[twitter][debug] Using TwitterTweetExtractor for 'https://twitter.com/MotocrossNews/status/1267608884325216256'
[urllib3.connectionpool][debug] Starting new HTTPS connection (1): twitter.com:443
[urllib3.connectionpool][debug] https://twitter.com:443 "GET /i/web/status/1267608884325216256 HTTP/1.1" 200 None
[twitter][error] An unexpected error occurred: ValueError - substring not found. Please run gallery-dl again with the --verbose flag, copy its output and report this issue on https://github.com/mikf/gallery-dl/issues .
[twitter][debug] 
Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/gallery_dl/job.py", line 61, in run
    for msg in self.extractor:
  File "/usr/local/lib/python3.5/dist-packages/gallery_dl/extractor/twitter.py", line 50, in items
    for tweet in self.tweets():
  File "/usr/local/lib/python3.5/dist-packages/gallery_dl/extractor/twitter.py", line 416, in tweets
    end = page.index('class="js-tweet-stats-container')
ValueError: substring not found
@walramb
Copy link

walramb commented Jun 2, 2020

I'm having the same thing suddenly on the same version of gallery-dl. I think this is on twitter's end, I assume they just rolled out some tweak to how the HTML works

@PHR16384
Copy link

PHR16384 commented Jun 2, 2020

Does TwitterTweetExtractor still expect the "legacy" version of Twitter (i.e. using the AdaptiveMedia container etc.)? Because that thing's definitely gone now; they said they'd shut it down today, and they did.

GoodTwitter browser addon doesn't work anymore either, and trying to use cmd-line tools such as curl returns an html response with no actual page content except "We've detected that JavaScript is disabled in your browser."

EDIT: Other workarounds are still functional, such as the "legacy" image server (pbs.twitter.com/media/HASH.ext:size) and nitter.net

@Hrxn
Copy link
Contributor

Hrxn commented Jun 2, 2020

Yup. Site change.
soimort/you-get@81ba2bc

@rivke41levp656
Copy link

URLs of the form: twitter.com/username/media
still work as of a few hours ago. I haven't tried it, but I think the method for circumventing the 3200 tweet limit is now broken. But for the time being gallery-dl should still work if you want to get less than 3200 tweets from a user.

@electricduck
Copy link
Author

@rivke41levp656 How do I get that URL from a Twitter post, may I ask?

@rivke41levp656
Copy link

That form would be for downloading all of a user's posts. If you just want one particular tweet I would use something like Image Max URL instead of gallery-dl. For example the image you gave: https://pbs.twimg.com/media/EZdzj-gVAAAPTup.jpg?name=orig

mikf added a commit that referenced this issue Jun 3, 2020
Everything except logging in with username & password and TwitPic
embeds should be working again.

Metadata per Tweet is massively different than before (mostly raw API
responses - might need some cleaning up) and the default 'archive_fmt'
changed.
@mikf
Copy link
Owner

mikf commented Jun 3, 2020

Everything Twitter related got rewritten and now uses the new interface: a10f31d

There actually was a warning 3 weeks ago that Twitter was shutting down its legacy interface (#740), but I didn't get this done before releasing 1.14.0.

As the commit message says, most things should be working again except logging in, and there are most likely a few bugs here and there, so please test this and let me know.

(And if anyone knows where/how to get the authenticity_token value for the login form, I'd be eternally grateful. I could use the nojs login form for mobile devices, but I don't know if that would allow using the internal API endpoints ...)

@biznizz
Copy link

biznizz commented Jun 3, 2020

I've updated to the latest dev build and ran it. It downloaded a few tweets, then had an error. I know you're still working on it, so I'm sure it'll be worked out.

However, I did see something. I had my username and password for twitter in my conf, but the cmd window said just now

"Logging in with username and password is currently not possible. Use cookies from your browser session instead."

Which cookies should I use? Do I just copy them from Inspect Element into the conf, or do I have to export them with cookies.txt like deviantart?

@Defrost4528
Copy link

Thank you for the prompt update as always! I once again can't stress enough how much I appreciate this application and your hard work. Anyways, I'll list down some of the issues I've found with my limited testing.

This is the tweet I used to test. It looks like image and video downloading in general work well, but details like file naming and the JSON metadata have some kinks to them.

For file naming, the following do not work (they are replaced with "None"):

  • date. Looking at the JSON, it seems to be now saved in a human readable format, but I'd prefer to still follow the old format for compatibility of older files and syntax sorting, and I'm sure other people will agree too.
    This is what it looks like in the downloaded tweet: "Fri May 29 12:52:04 +0000 2020"
  • tweet_id. It seems to be named as id_str in the JSON now.
  • author[name]. It technically works, but grabs the display name instead of the username.

Some other things I noticed:

  • The JSON file has tons of extra data compared to before, lots of it for website formatting and user info. While the more data the merrier, I think the color palettes, focus_rects sizes, and stuff of that nature are definitely unnecessary. Removing these manually decreased the filesize to about 3x smaller.
  • The "content" object, now named "full_text", has the t.co link trailing at the end again. iirc you opted to remove this due to redundancy, so just mentioning it.
  • there is an "author" and "user" object. I'm not sure in what situations they are different, but in this testing case they have identical data. There are also more places where data is duped but in different context (like height and width), but I think these are insignificant enough to not mind for now.

@electricduck electricduck changed the title Twitter completely broken? Twitter is now broken Jun 4, 2020
@mikf
Copy link
Owner

mikf commented Jun 4, 2020

Short status update:

  • logging should be working again (bd0f214)
  • TwitPic support restored (2132e54)
  • It doesn't crash on unavailable Tweets anymore (655c98c)
  • Removed some metadata entries (655c98c) (more to come)

Thanks for testing and giving some feedback!

@biznizz
Login should be fixed now, but yes, you'd have to do the same as with DeviantArt and export all Twitter cookies. The important one is auth_token I think.

And what was the error you got? Something like KeyError: <tweet-id>?

@Cak3lies
The current metadata is just the raw API response. I've now removed a few things and re-added date (655c98c), but there is a lot more to do in that regard. For example I could rename id_str to the previous tweet_id, or even restore all/most metadata to how it was before, but I'd rather keep the new format and clean it up a bit.

author[name]. It technically works, but grabs the display name instead of the username.

This is now author[screen_name]

there is an "author" and "user" object

These were there before, but not as packed with (useless) information. The main reason to have them are retweets or quoted tweets. "user" is the timeline owner, and "author" is the original creator of a tweet. These are pretty much redundant most of the time, so if you (or anyone else) has a better solution, I'm all ears. (Maybe only include the "author" field if "author" and "user" differ?)

@biznizz
Copy link

biznizz commented Jun 4, 2020

@mikf

So, get latest dev build, export twitter cookies with cookies.txt extension, and have the twitter config look like this?

"twitter":
        {
            "retweets": true,
            "videos": false,
			"cookies": "C:\\Users\\USER\\cookies.txt",
			"cookies-update": true
        },

And yes, I think that was the error I got. I didn't screenshot it, but that sounds right.

@pxssy
Copy link

pxssy commented Jun 5, 2020

Just to shill my twitter related issue, can gallery compare tweet_id/id_str with existing ones inside --archive files before attempting to query twitter for it? I'm not sure how the archive works but it seems to be doing pretty much nothing except logging.

@KaMyKaSii
Copy link

I believe it is a related error?

[gallery-dl][debug] Version 1.14.1-dev
[gallery-dl][debug] Python 3.8.3 - Linux-4.4.141-perf+-aarch64-with-libc
[gallery-dl][debug] requests 2.23.0 - urllib3 1.25.9
[gallery-dl][debug] Starting DownloadJob for 'https://twitter.com/censored/media'
[twitter][debug] Using TwitterMediaExtractor for 'https://twitter.com/censored/media'
[urllib3.connectionpool][debug] Starting new HTTPS connection (1): api.twitter.com:443
[urllib3.connectionpool][debug] https://api.twitter.com:443 "GET /graphql/-xfUfZsnR_zqjFd-IfrN5A/UserByScreenName?variables=%7B%22screen_name%22%3A%22censored%22%2C%22withHighlightedLabel%22%3Atrue%7D HTTP/1.1" 200 1079
[urllib3.connectionpool][debug] https://api.twitter.com:443 "GET /2/timeline/media/1222337124801884160.json?include_profile_interstitial_type=1&include_blocking=1&include_blocked_by=1&include_followed_by=1&include_want_retweets=1&include_mute_edge=1&include_can_dm=1&include_can_media_tag=1&skip_status=1&cards_platform=Web-12&include_cards=1&include_composer_source=true&include_ext_alt_text=true&include_reply_count=1&tweet_mode=extended&include_entities=true&include_user_entities=true&include_ext_media_color=true&include_ext_media_availability=true&send_error_codes=true&simple_quoted_tweet=true&count=100&ext=mediaStats%252ChighlightedLabel%252CcameraMoment&include_quote_count=true HTTP/1.1" 200 25141
/sdcard/gallery-dl/twitter/censored/1269931234358149120_1.jpg
/sdcard/gallery-dl/twitter/censored/1269931234358149120_2.jpg
/sdcard/gallery-dl/twitter/censored/1269869247733420032_1.mp4
/sdcard/gallery-dl/twitter/censored/1269812285968658432_1.jpg
/sdcard/gallery-dl/twitter/censored/1269811544629575681_1.mp4
/sdcard/gallery-dl/twitter/censored/1269810357347942406_1.mp4
/sdcard/gallery-dl/twitter/censored/1269810117123375104_1.mp4
/sdcard/gallery-dl/twitter/censored/1269807196113637387_1.jpg
/sdcard/gallery-dl/twitter/censored/1269807195266433025_1.jpg
/sdcard/gallery-dl/twitter/censored/1269807195044151296_1.jpg
/sdcard/gallery-dl/twitter/censored/1269806395341316102_1.jpg
/sdcard/gallery-dl/twitter/censored/1269805516265529344_1.mp4
/sdcard/gallery-dl/twitter/censored/1269805506660532224_1.jpg
/sdcard/gallery-dl/twitter/censored/1269538845328015360_1.mp4
[twitter][error] An unexpected error occurred: KeyError - '1269528076326756356'. Please run gallery-dl again with the --verbose flag, copy its output and report this issue on https://github.com/mikf/gallery-dl/issues .
[twitter][debug]
Traceback (most recent call last):
  File "/data/data/com.termux/files/usr/lib/python3.8/site-packages/gallery_dl/job.py", line 61, in run                                                                                             for msg in self.extractor:
  File "/data/data/com.termux/files/usr/lib/python3.8/site-packages/gallery_dl/extractor/twitter.py", line 41, in items
    for tweet in self.tweets():
  File "/data/data/com.termux/files/usr/lib/python3.8/site-packages/gallery_dl/extractor/twitter.py", line 421, in _pagination
    quoted = tweets[tweet["quoted_status_id_str"]]
KeyError: '1269528076326756356'
$

@kattjevfel
Copy link
Contributor

Can reproduce the above error (since your censorship was weak af):

[gallery-dl][debug] Version 1.14.1-dev
[gallery-dl][debug] Python 3.8.3 - Linux-5.7.1-zen1-1-zen-x86_64-with-glibc2.2.5
[gallery-dl][debug] requests 2.23.0 - urllib3 1.25.9
[gallery-dl][debug] Starting DownloadJob for 'https://twitter.com/petitegarceg'
[twitter][debug] Using TwitterTimelineExtractor for 'https://twitter.com/petitegarceg'
[urllib3.connectionpool][debug] Starting new HTTPS connection (1): api.twitter.com:443
[urllib3.connectionpool][debug] https://api.twitter.com:443 "GET /graphql/-xfUfZsnR_zqjFd-IfrN5A/UserByScreenName?variables=%7B%22screen_name%22%3A%22petitegarceg%22%2C%22withHighlightedLabel%22%3Atrue%7D HTTP/1.1" 200 1005
[urllib3.connectionpool][debug] https://api.twitter.com:443 "GET /2/timeline/profile/1222337124801884160.json?include_profile_interstitial_type=1&include_blocking=1&include_blocked_by=1&include_followed_by=1&include_want_retweets=1&include_mute_edge=1&include_can_dm=1&include_can_media_tag=1&skip_status=1&cards_platform=Web-12&include_cards=1&include_composer_source=true&include_ext_alt_text=true&include_reply_count=1&tweet_mode=extended&include_entities=true&include_user_entities=true&include_ext_media_color=true&include_ext_media_availability=true&send_error_codes=true&simple_quoted_tweet=true&count=100&ext=mediaStats%252ChighlightedLabel%252CcameraMoment&include_quote_count=true HTTP/1.1" 200 50946
[urllib3.connectionpool][debug] Starting new HTTPS connection (1): pbs.twimg.com:443
...
[twitter][error] An unexpected error occurred: KeyError - '1269528076326756356'. Please run gallery-dl again with the --verbose flag, copy its output and report this issue on https://github.com/mikf/gallery-dl/issues .
[twitter][debug] 
Traceback (most recent call last):
  File "/usr/lib/python3.8/site-packages/gallery_dl/job.py", line 61, in run
    for msg in self.extractor:
  File "/usr/lib/python3.8/site-packages/gallery_dl/extractor/twitter.py", line 41, in items
    for tweet in self.tweets():
  File "/usr/lib/python3.8/site-packages/gallery_dl/extractor/twitter.py", line 421, in _pagination
    quoted = tweets[tweet["quoted_status_id_str"]]
KeyError: '1269528076326756356'

Full output: https://gist.github.com/a801327d2c4a9d61a1d80e4a8f7838a1

@kattjevfel
Copy link
Contributor

Tried d769bb4 and it works just like it should now!

@mikf
Copy link
Owner

mikf commented Jun 8, 2020

2 new upates

  • 5bc1097 renames and cleans up a lot of metadata entries. Everything should mostly be like it was before (tweet_id, content, etc) except there are now quite a few more, albeit useful, entries.
  • d769bb4 fixes an endless loop on the end of a timeline as well as pagination over search results

@biznizz
Yes, like this. Use the absolute path of your exported cookies.txt file as value for cookies
(There is now also a section about cookies in the README: https://github.com/mikf/gallery-dl#cookies)

@pxssy
The new Twitter implementation doesn't need to send an extra API request for each video, so your issues in #784 shouldn't be happening anymore when using it.
An archive is currently not used to speed anything up. It's just a way to have files be recognized as "already downloaded" without having the actual files on disk.

@KaMyKaSii
should be fixed in 5bc1097

@ntqr
Copy link

ntqr commented Jun 11, 2020

only include 'author' if it would differ from 'user'

What would cause them to be different and why?

@iamleot
Copy link
Contributor

iamleot commented Jun 11, 2020

@anonieee if it's a retweet the user would be the user who is the retweeting the tweet while the author the original author that is being retweeted.

@KirbyFan102
Copy link

KirbyFan102 commented Jun 19, 2020

Is it still impossible to download from twitter? I'm getting WinError 10061 every time I try downloading an account from there.

@KaMyKaSii
Copy link

Is it still impossible to download from twitter? I'm getting WinError 10061 every time I try downloading an account from there.

Did you try to login or use cookies file? Twitter downloads are working here. I just would like the developer to add support for Twitter fleets, but I believe they are available still only here in Brazil

@KirbyFan102
Copy link

KirbyFan102 commented Jun 19, 2020

Did you try to login or use cookies file? Twitter downloads are working here. I just would like the developer to add support for Twitter fleets, but I believe they are available still only here in Brazil

I did both of these things, and I'm using the latest dev build. (You're Brazilian, too? What a coincidence.)

@KaMyKaSii
Copy link

Did you try to login or use cookies file? Twitter downloads are working here. I just would like the developer to add support for Twitter fleets, but I believe they are available still only here in Brazil

I did both of these things, and I'm using the latest dev build. (You're Brazilian, too? What a coincidence.)

Do you mean you did it individually or both at the same time? If it is the second option, it is not advisable, choose only one. I use the cookie file method and it works well here, also always running the latest code. And yep, I'm Brazilian :)

@biznizz
Copy link

biznizz commented Jun 19, 2020

I can confirm that, after setting my twitter extractor to my cookies.txt, I'm able to download pics from tweets. As mentioned above, I wouldn't use both and would say that the cookies.txt option is the better one.

@KirbyFan102
Copy link

Did you get a WinError 10061 message when you used both?

@KirbyFan102
Copy link

I can confirm that I am using only cookies.txt file and still getting this error.

@biznizz
Copy link

biznizz commented Jun 19, 2020

And you have the latest version since the last release on the 12th? Hm....

Can you post your twitter extractor info from your config? Also (sorry to ask), but you do have a twitter account and was logged into it before exporting your cookies? Just trying to eliminate variables.

Also, I didn't use both at any time, when mikf said he altered it to work with cookies.txt, I altered my USERNAME/PASSWORD in my extractor config to work with cookies. So I never got any error. It's just that using two different methods at once would probably cause issues.

@KirbyFan102
Copy link

"twitter": { "content": true, "retweets": true, "twitpic": true, "videos": true, "cookies": "%HOMEPATH%/Google Drive/cookies-twitter-com.txt", "filename": "{filename}.{extension}"

And yes, I have a twitter account.

@biznizz
Copy link

biznizz commented Jun 19, 2020

Hmm... try this configuration:

"twitter":
        {
            "filename": "{filename}.{extension}",
            "retweets": true,
            "videos": true,
			"cookies": "C"%HOMEPATH%/Google Drive/cookies-twitter-com.txt"",
			"cookies-update": true
        },

I haven't seen anything regarding content in the configuration.rst, and I think twitpic is dead (site says it's inactive/archived state since 2017).

Does your cookies.txt have any other cookies from other sites on it. Mine has both my DA and twitter cookies with no spaces between any of the lines, and they work properly with both sites.

@KirbyFan102
Copy link

Well, obviously since the file is named cookies.twitter.com.txt, it's specific to twitter. I had no idea until now that it was possible to store cookies for multiple sites in a single file. And, unless someone corrects me, twitpic images are still present in the accounts that used that site to post their images, and I don't want to ignore them.

How come your configuration has the cookies settings so much farther than the other settings above? Is there some purpose to doing that?

@KirbyFan102
Copy link

So I tested it with --ignore-config, and it still didn't work.

@biznizz
Copy link

biznizz commented Jun 19, 2020

Well, specifying/using different cookies.txt files should work. I just have two: the native one exported from browser with EVERY cookie I have, and a more managable one I use with gallery-dl that I can specify which site cookies to copy into.

Dunno why my cookie creds are that far off when I copypasta'd it here, just formatting I guess. It still parses properly.

Question: can you use gallery-dl on any other site other than twitter? Does it work with Deviantart at the moment? Because looking it up, WinError 10061 with Python means that there's a connection issue.

@KirbyFan102
Copy link

KirbyFan102 commented Jun 19, 2020

Yes, I tested with dA and Tumblr. Both work fine.

@biznizz
Copy link

biznizz commented Jun 19, 2020

Well, it works for me here in the US. Do you have a VPN? You could try to run gallery-dl with it on a non-Brazillian server and see if that clears up the issue.

@KirbyFan102
Copy link

I do... I'll try using it tomorrow. It may also be a good idea to try clearing my cache.

@KirbyFan102
Copy link

Whatever the problem was before, clearing the cache fixed it. Didn't need to use a VPN.

@biznizz
Copy link

biznizz commented Jun 20, 2020

Well, glad to hear it. Since it was a connection issue, I figured it might have been a regional issue; dunno how the cache would have caused that, but whatev things are working for you now. :)

@KirbyFan102
Copy link

Is anything being done about Issue #798? I can't test it, as I don't follow any locked accounts.

@ntqr
Copy link

ntqr commented Jun 23, 2020

I follow a few private accounts and they currently all work. Either the problem is actually unrelated to it being private or something is wrong with his setup.

@KirbyFan102
Copy link

I sure hope someone tries to help him; I'm at my wits' end at this point.

@ntqr
Copy link

ntqr commented Jun 23, 2020

I'll try to give the guy a follow request and test it out if he accepts it. At least that way we can know for sure whether or not it's an actual bug.

@mikf
Copy link
Owner

mikf commented Jun 27, 2020

I think most bugs have been fixed and every feature has been restored to how it was before or even improved. General changes:

  • video downloads don't require youtube-dl anymore
  • more metadata fields
  • better handling of quoted tweets
  • the content option got removed and is now always enabled
  • rate limits (ca 180 API calls every 15min for anonymous users, a bit more for logged in users)

@mikf mikf closed this as completed Jun 27, 2020
@mikf
Copy link
Owner

mikf commented Jun 27, 2020

@KaMyKaSii it's rather hard to implement a feature without having access to it. Maybe open another issue for this specific feature and post a few timelines were I can find examples of these temporary tweets.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests