-
-
Notifications
You must be signed in to change notification settings - Fork 995
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FR] Twitter API support #980
Comments
It does, but only from the regular timeline (
The official API also has limits (~3000 tweets per timeline).
That would be the only reason for using the official API, but they've only just changed their web API after it being "stable" for several years, and I doubt they are going to that again in the near future.
You can do that with the current implementation. You just have to login or use exported cookies.
That can theoretically be done with the web API as well. |
Are you saying the official API is limited to ~3000 most recent tweets, or that it will only return ~3000 tweets per call? Because you can do API calls to retrieve tweets before a certain ID to essentially chain them together, unless it literally won't retrieve any tweet past the most recent 3000 even with that parameter. |
This would definitely help. |
I noticed that some retweets indeed don't hold "media" in them.
retweet is the one that holds both media and full text, so it should be used instead. |
Pretty sure the official API doesn't allow you to retrieve any tweets older than the newest 3200 for each timeline. There have been other issues asking/discussing on how to get all tweets from a user's timeline, but nobody found a definite answer: #186 #544 If getting all tweets were as easy as using the official API, it would have already been used a long time ago. |
by using 'id:…' as their screen name, i.e. https://www.twitter.com/id:2976459548/media instead of https://twitter.com/supernaturepics/media The user ID can, for example, be obtained from the output of $ gallery-dl -j --range 1 https://twitter.com/<screen-name>
@mikf I found a use case for the API (unfortunately). For some reason, even with include:nativeretweets, the search function misses a lot of retweets. Now, while you can just scrape the account normally, you lose out on the ability to start from a specific tweet (i.e. since_id). When you use the since_id via the API search, it retrieves those retweets. Right now my strategy is to build up a user's archive via search queries until present day, and then simply start searching from the last tweet id my script recorded, but doing it via the web search misses lots of recent retweets. The next best thing is to set it to abort once so many files are skipped, but if a user happens to have a lot of retweets from an account I already have a lot of images from (since I sort them in directory by id), it'll reach the limit erroneously... right now I'm just setting the limit to a particularly high number, like 100. Twitter is such a wonderful platform. |
I realize I'm not explaining this in the most coherent way. Let me describe my process. Step 1 is accumulating a user's tweets via searches from their join date in 2 week intervals. This works fine, but it doesn't retrieve all retweets. Step 2 is running gallery-dl on their user page to grab all the missed retweets from their most recent ~3200 tweets. (I could probably switch Step 1 and Step 2 in some manner, so that it starts searching backward from the last tweet it could grab, but this just seems more foolproof.) Step 3 is, from that point forward for that user, using a separate config set to abort when it hits a 40 'file already exists' limit. I can't afford to set it too low, because if it grabs retweets I already have from scraping a different user's account, it'll abort prematurely, before I've grabbed all the new content. What I would like is for my Step 3 to switch to the API and have my script simply start grabbing from the last recorded tweet ID in the directory. This would not miss the retweets that the web search's |
Twitter has made it increasingly difficult to scrape tweets through their web API, and have been putting in an increasing number of checks to try and verify the user is running a browser. Not to mention, their web design receives frequent changes, usually invisibly, but sometimes full-on revamps.. . and certainly more than most services certainly do.
Right now, gallery-dl has no way of downloading media from retweets, regardless of whether or not the option for it is enabled. Additionally, the /media URL will retrieve files the standard user URL misses - and there still might not be files it's grabbing.
Because you can retrieve an entire user's timeline with just a few API calls, all it takes is parsing the JSON return for all media URLs and downloading them directly, and this method would not break any time soon.
Additionally, it would be able to retrieve tweets that can only be seen by 'approved followers', assuming the user has approved the account associated with the API key.
Lastly, and perhaps most importantly, this would allow the retrieval of users by ID rather than their current screen name, meaning that even if the user you want to download media from changes their screen name in the future, so long as you're retrieving their newest stuff by ID, gallery-dl would find it.
The text was updated successfully, but these errors were encountered: