Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Instagram stopped working #1149

Closed
mikaljan opened this issue Dec 1, 2020 · 30 comments
Closed

Instagram stopped working #1149

mikaljan opened this issue Dec 1, 2020 · 30 comments
Labels

Comments

@mikaljan
Copy link

mikaljan commented Dec 1, 2020

gallery-dl stop working on instagram today, i'm getting the following error:

E:\gallery-dl>gallery-dl https://www.instagram.com/migichen_/
[instagram][warning] Unable to fetch data from 'https://www.instagram.com/p/CIP9dLAhkn3/': JSONDecodeError: Expecting value: line 1 column 1 (char 0)
[instagram][warning] Unable to fetch data from 'https://www.instagram.com/p/CIN3Hhwhtne/': JSONDecodeError: Expecting value: line 1 column 1 (char 0)
[instagram][warning] Unable to fetch data from 'https://www.instagram.com/p/CIDVKJshBuM/': JSONDecodeError: Expecting value: line 1 column 1 (char 0)
[instagram][warning] Unable to fetch data from 'https://www.instagram.com/p/CH-cjDIh0Tz/': JSONDecodeError: Expecting value: line 1 column 1 (char 0)
[instagram][warning] Unable to fetch data from 'https://www.instagram.com/p/CH4-mdcBlAP/': JSONDecodeError: Expecting value: line 1 column 1 (char 0)
[instagram][warning] Unable to fetch data from 'https://www.instagram.com/p/CH2itYohHD8/': JSONDecodeError: Expecting value: line 1 column 1 (char 0)
[instagram][warning] Unable to fetch data from 'https://www.instagram.com/p/CH0I8u5BWVQ/': JSONDecodeError: Expecting value: line 1 column 1 (char 0)
[instagram][warning] Unable to fetch data from 'https://www.instagram.com/p/CHxLcqxBqfe/': JSONDecodeError: Expecting value: line 1 column 1 (char 0)
.
.
.
.

@iamleot
Copy link
Contributor

iamleot commented Dec 1, 2020

Hello @mikaljan!
Unfortunately I think this is similar to #1113 (i.e. Instagram starting being more aggressive with users that requests several images).

(I've tryed downloading the profile here - without authenticating - and it seems that I'm downloading it but I'm pretty sure I will be blocked soon.)

@iamleot
Copy link
Contributor

iamleot commented Dec 1, 2020

...and indeed after ~2 minutes or so:

% gallery-dl -v 'https://www.instagram.com/migichen_/'                                                                                                                                    
[gallery-dl][debug] Version 1.15.4
[gallery-dl][debug] Python 3.8.6 - NetBSD-9.99.75-amd64-x86_64-64bit-ELF
[gallery-dl][debug] requests 2.24.0 - urllib3 1.25.11
[gallery-dl][debug] Starting DownloadJob for 'https://www.instagram.com/migichen_/'
[instagram][debug] Using InstagramUserExtractor for 'https://www.instagram.com/migichen_/'
[urllib3.connectionpool][debug] Starting new HTTPS connection (1): www.instagram.com:443
[urllib3.connectionpool][debug] https://www.instagram.com:443 "GET /migichen_/ HTTP/1.1" 200 49249
[urllib3.connectionpool][debug] https://www.instagram.com:443 "GET /p/CIP9dLAhkn3/?__a=1 HTTP/1.1" 302 0
[urllib3.connectionpool][debug] https://www.instagram.com:443 "GET /accounts/login/ HTTP/1.1" 200 12619
[instagram][warning] Unable to fetch data from 'https://www.instagram.com/p/CIP9dLAhkn3/':  JSONDecodeError: Expecting value: line 1 column 1 (char 0)

@aeriessy
Copy link

aeriessy commented Dec 2, 2020

I'm also having the same issue. I used two accounts and one of them is now banned. I used a 10 second delay for sleep and sleep-request which got through maybe 50 files or something before it gave me the error. Before the account was banned, I was able to download in batches of 50 until it gave me the "something is wrong with your account, change your password" or phone verification. After doing that maybe 3 times, that account was banned outright.

Probably going to put this off until this is fixed or figured out.

@UnforeseenOcean
Copy link

UnforeseenOcean commented Dec 2, 2020

I'm having better luck with setting the sleep time to 15, but that could change at any moment. I did get the Your Account Has Been Temporarily Locked message on my phone after using the cookie and forgetting to set the delay.

Note: If you get this error, your account might be locked. Unlock it and set the delay to something about 10 or 15 seconds longer. Oh and I'd recommend not using your primary account for this!
IMG1606922048

@reallyuniquename
Copy link

I think Instagram extractor needs a slight rewrite. It's inefficient and with new Instagram rate limits you get stuck with first few hundrends of images at best. I explained that here #1113 (comment).

Either that or gallery-dl have to detect ip ban and support proxy lists for quick address rotation.

@iamleot
Copy link
Contributor

iamleot commented Dec 3, 2020 via email

mikf added a commit that referenced this issue Dec 3, 2020
(#1113, #1122, #1128, #1130, #1149)

Rely on the results of GraphQL queries instead of requesting data
for each post separately via '/p/<shortcode>/?__a=1'.

This might result in some missing metadata, and there might be some
issues for '/channel/' and '/saved/' URLs, but at least downloading
from the regular post listings should work without issues and without
getting users blocked/banned.

TODO: reimplement support for stories
@reallyuniquename
Copy link

needs to scroll all the timeline

Correct but scrolling is querying graphql endpoint and that's like only 80 queries per 1000 images. Besides you could dump whole timeline once and keep reusing it until you download every picture.

What Instagram really doesn't like is when you start hammering /p/ABCDEFG123 pages. When rate limits hit gallery-dl has to either switch proxy or start scraping from the last downloaded image on the next run. None of that is properly supported by extractor, --range and --download-archive do not work with Instagram the way you expect it. Gallery-dl starts from the beginning of the timeline every time.

Also when I look at the log it seems that extractor just skips images it fails to download, no retries or pause. That's... not good.

@mikf
Copy link
Owner

mikf commented Dec 3, 2020

Should be fixed with 447488f.

Querying /p/<shortcode>/__a=1 for each post is what gets one blocked/banned, and I would highly advice against using gallery-dl versions before 1.16.0 for Instagram or any other Instagram downloaders that do this (which are pretty much all of them from what I can tell).

The rewrite is still lacking support for stories, and post listings other than the regular one (e.g. instagram.com/instagram) might not work as before, but at least it won't get you banned anymore.

@dsblack
Copy link

dsblack commented Dec 3, 2020

I've been having this problem for weeks, so I'm very happy to see it being addressed.

Right now, this commit isn't in a full release, so I don't get the update yet using the pip install --upgrade method. Do you know when it will be in an official release?

Also, I was afraid instagram might be taking measures to block scripts like this. But even if adding a delay (as some people have tried) helps, their next step might be to detect scripts that hit at repeating intervals -- e.g., every 10 seconds. If it's too exact, I could see them detecting that and blocking you anyway.

One thing I wrote into a homespun crawler (which checks prices for items on a web site) several years ago was a an option to randomize the delay. You give it a low bound and high bound (in seconds) -- e.g., 1 to 8, or 3 to 15 -- and each request uses a new random delay within those bounds. That way, you look much more like a human clicking through at random intervals, pausing longer at some images than others. For something like this, maybe you'd even want to have a different (longer) range for videos than for images.

What do you think, would that be a worthwhile option to add?

If you really wanted to make it easier, you could even bundle some of these options together into a "typical" group of settings under a single parameter, maybe -human. I'd definitely still allow for the individual settings, but that could make it easier to get it running successfully.

I'd be tempted to try contributing to the project myself, but I don't really know python.

@kattjevfel
Copy link
Contributor

@dsblack

Right now, this commit isn't in a full release, so I don't get the update yet using the pip install --upgrade method. Do you know when it will be in an official release?

As listed in the readme you can do python3 -m pip install --upgrade https://github.com/mikf/gallery-dl/archive/master.tar.gz to get the latest dev version.

@UnforeseenOcean
Copy link

UnforeseenOcean commented Dec 4, 2020

I can say for certain Instagram is looking for this kind of activity because my account got suspended (but only for the /p/ action):
IMG1607079259

I will try the new version after the ban is lifted. Can't risk getting banned again!

@xibr
Copy link

xibr commented Dec 6, 2020

[gallery-dl][error] No suitable extractor found for 'https://www.instagram.com/stories/et2k/2457611747557737659/'

latest dev 1.16.0-dev

@phanirithvij
Copy link

@xibr #1149 (comment) says

The rewrite is still lacking support for stories, and post listings other than the regular one (e.g. instagram.com/instagram) might not work as before, but at least it won't get you banned anymore.

@mikf
Copy link
Owner

mikf commented Dec 7, 2020

@xibr 2b93515

@xibr
Copy link

xibr commented Dec 7, 2020

Now it works well with stories. Thanks

@xibr
Copy link

xibr commented Dec 8, 2020

A question: When trying to download Instagram story All stories download, not a single story. Is this expected?

@TestPolygon
Copy link

TestPolygon commented Dec 8, 2020

Well, is it possible to download a part of images and save the position to continue from it on the next launch?
For example, I have downloaded 1000 of 2000. Is it possible to continue from 1001 on the next launch? Currently the program performs requests for the first 1000 of images that were downloaded. Requests are performed one by one without pauses for the downloading that leads to the login page (the recheck of 1000 posts requires to perform 84 requests for a short time).

@rivke41levp656
Copy link

@mikf The fullname filename field returns None on 1.16 for all users as far as I can tell.

@reallyuniquename
Copy link

@TestPolygon

is it possible to download a part of images and save the position to continue from it on the next launch?

You couldn't with old extractor and I don't think you can with the new one but I haven't checked that yet.

Try that yourself, you are looking for options -v --range 1000- and -v --download-archive history.sqlite.

@TestPolygon
Copy link

SQLite DB stores only node IDs, so it can be used only to check (if --range exists) the node with certain ID was downloaded or not. By default it checks the location where files would be downloaded and compares the expected filename with names of files are in this directory.

--verbose was useful to debug. I can say that it is possible to do.

It requires to add, for example --session flag.

With this flag the program should store (in a system file) the current parameters that are required for requesting the next "list page" with accociated url. For example: [{ur1: [param1, param2]}, {ur2: [param1, param2]}]. And use them if they are presented in this file to continue downloading from a certain possition. (If a user has interrupted the downloding via Ctrl+C (for this case it needed to store the params for requesting the current "list page" too), or he was faced with API limit exceed ("login page") when he has requested the next "list page".

A more complicated format example:
[{ url1: { current: { params: [], fullyDownloaded: false }, next: { params: [] }, date: 1607609281 } }]

For instagram it are: tracking_token, query_hash and id.

@mikf ?

@mikaljan
Copy link
Author

mikaljan commented Dec 10, 2020

Hi @mikf,

I tried the latest 1.16.0-dev version, and I would get some successful downloads in the beginning, and after a minute or so everything returns a warning, please check the TXT file I've attached:

instagram_log.txt

mikf added a commit that referenced this issue Dec 11, 2020
To enable at least 'some' way to continue downloading from the middle
of a user profile listing.
@mikf
Copy link
Owner

mikf commented Dec 11, 2020

@mikaljan This output isn't from the latest dev version. The Unable to fetch data from ... logging message was removed in the rewrite (447488f). Check gallery-dl --version to make sure you are actually using 1.16.0-dev.
I'll release a new version with the fix this weekend. You could just wait until then.

@TestPolygon b88c97b adds a way to at least manually input a cursor value and continue downloading from the current position. The cursor tokens get outputted as debug logging messages or when getting redirected to the login page.

This commit also increases the amount of requested posts per GraphQL from 12 to 50 (the maximum possible). Since the redirect to login page for not logged in users always happens after ~120 requests regardless of how many posts get fetched or how long of a wait time there is in between, this should allow for more posts to get downloaded.

@TestPolygon
Copy link

TestPolygon commented Dec 11, 2020

Hm, I used pip install --no-cache-dir --upgrade https://github.com/mikf/gallery-dl/archive/master.tar.gz, but I still have the old behavior ("first":+12 and no promt "Use '-o cursor=%s' to continue downloading " on the login page event)


Upd: use pip unistall gallery-dl

@syntopikon
Copy link

I was experiencing this error previously as well, but after upgrading to 1.16.0, I've yet to encounter it (working across several 2k+ mixed albums).

@mikf
Copy link
Owner

mikf commented Dec 13, 2020

As omnicr0n said, v1.16.0 is out, which should at least somewhat mitigate any rate limit problems with Instagram.

@xibr this is expected and worked like that even before the rewrite. If you want to limit the download to only a specific story ID, use --filter "media_id == 'STORY ID'"

@rivke41levp656 Instagram removed those from all owner fields, it seems. This has nothing directly to do with the rewrite from 447488f. The fullname info was still available a month ago, but now the embedded data in user profile pages like https://www.instagram.com/instagram/ only has
"owner":{"id":"25025320","username":"instagram"}

@mikf mikf closed this as completed Dec 13, 2020
@mikf
Copy link
Owner

mikf commented Dec 13, 2020

@TestPolygon

$ pip install -U -I --no-deps --no-cache-dir https://github.com/mikf/gallery-dl/archive/master.tar.gz

should work without needing to uninstall.
(I've updated the instructions in the README accordingly)

@xibr
Copy link

xibr commented Dec 13, 2020

@xibr this is expected and worked like that even before the rewrite. If you want to limit the download to only a specific story ID, use --filter "media_id == 'STORY ID'"

got it, thanks.

@left1000
Copy link

So, instagram works, again, yeah! (at least on public follows).

Unfortunately it doesn't work for private accounts (that my account has access to), even having provided instagram with my username/password in the conf file... and I'm fairly sure I did it right because, well, it used to work just fine.

@Hrxn
Copy link
Contributor

Hrxn commented Dec 17, 2020

Does it work if you remove username/password authentication and try it with the exported cookies instead?

@mikf mikf unpinned this issue Dec 17, 2020
@mikf
Copy link
Owner

mikf commented Dec 17, 2020

Forcing a re-login by clearing your cache with gallery-dl --clear-cache and then trying to download from Instagram again might also work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests