Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deviantart Scraps Downloader 403's randomly #655

Closed
sledgehammer93 opened this issue Mar 24, 2020 · 17 comments
Closed

Deviantart Scraps Downloader 403's randomly #655

sledgehammer93 opened this issue Mar 24, 2020 · 17 comments

Comments

@sledgehammer93
Copy link

sledgehammer93 commented Mar 24, 2020

Hello again,
I appear to be having issues involving the scraps downloader of gallery-dl this time. It is now randomly crashing on different links set in my URL file, and then pulling 403 errors for the rest of my links. For example, the downloader will work perfectly fine for the first few URL's, only to show the following error (verbose output):

[deviantart][debug] 
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/gallery_dl-1.13.3.dev0-py3.7.egg/gallery_dl/job.py", line 49, in run
    for msg in self.extractor:
  File "/usr/local/lib/python3.7/dist-packages/gallery_dl-1.13.3.dev0-py3.7.egg/gallery_dl/extractor/deviantart.py", line 657, in items
    "journal" if deviation["isJournal"] else "art",
  File "/usr/local/lib/python3.7/dist-packages/gallery_dl-1.13.3.dev0-py3.7.egg/gallery_dl/extractor/deviantart.py", line 1047, in deviation_extended_fetch
    return response.json()
  File "/usr/lib/python3/dist-packages/requests/models.py", line 897, in json
    return complexjson.loads(self.text, **kwargs)
  File "/usr/lib/python3.7/json/__init__.py", line 348, in loads
    return _default_decoder.decode(s)
  File "/usr/lib/python3.7/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/lib/python3.7/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

It then continues to attempt to download from the various links set in my URL file, except it will 403 on every last one. I have my cookies file properly configured. Here is my gallery-dl.conf file:

{
    "extractor":
    {
        "base-directory": "/mnt/7F1332F211233659/Important/gallery-dl/",
        "postprocessors": null,
        "proxy": null,
        "skip": true,
        "sleep": 0,
        "user-agent": "Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Firefox/68.0",
        "path-restrict": "\\\\|/<>:\"?*",
        
        "postprocessors": [
            {
                "name": "metadata",
                "mode": "json"
            }
        ],
       
        "deviantart":
        {
	    "scraps":
            {
                "cookies": "/mnt/7F1332F211233659/Important/gallery-dl/cookies.txt",
                "cookies-update": true
            }, 

            "extra": true,
            "flat": true,
            "folders": false,
            "journals": "html",
            "mature": true,
            "metadata": true,
            "original": true,
            "quality": 100,
	    "wait-min": 0,
	    "client-id": "#####",
	    "client-secret": "#####",
	    "refresh-token": "#####"
	    }   
    },

    "downloader":
    {
        "part": true,
        "part-directory": null,

        "http":
        {
            "adjust-extensions": true,
            "mtime": true,
            "rate": "256k",
            "retries": 4,
            "timeout": 30.0,
            "verify": true
        },

        "ytdl":
        {
            "format": "bestvideo+bestaudio/best",
            "forward-cookies": true,
            "mtime": true,
            "rate": null,
            "retries": 4,
            "timeout": 30.0,
            "verify": true
        }
    },

    "output":
    {
        "mode": "auto",
        "progress": true,
        "shorten": true,
        "log": "[{name}][{levelname}] {message}",
        "logfile": null,
        "unsupportedfile": null
    },

    "netrc": false
}

Thanks again.

@biznizz
Copy link

biznizz commented Mar 24, 2020

I'm gonna piggyback on this topic instead of making my own since it's also related to the topic about scraps in config. I'm still not really good at writing JSON, so I was hoping for some assistance in being able to add scraps extractor to both my deviantart and furaffinity extractors in my config.

Here's what I have currently

 "deviantart":
         {
             "refresh-token": "secret",
             "client-id": "secret",
 			"client-secret": "secret",
 			"flat": true,
             "folders": false,
             "journals": "html",
             "mature": true,
             "metadata": false,
             "original": true,
 		    "quality": 100,
 		    "extra": true,
             "wait-min": 0,
 			"cookies": "C:\\Users\\blank\\cookies.txt",
 			"cookies-update": true
 "furaffinity": {
             "filename": "{filename}.{extension}",
             "cookies": {
                 "a": "secret",
                 "b": "secret"
             }
 		},

mikf added a commit that referenced this issue Mar 24, 2020
This isn't going to solve the underlying problem, but it should at
least provide the server response when those errors happen.
@mikf
Copy link
Owner

mikf commented Mar 24, 2020

@sledgehammer93 I suspect there is a hidden rate limit for the /extended_fetch endpoint that's used for fetching scraps, and you get 403 responses whenever that's reached. 1b82d36 makes it so it doesn't crash when a JSONDecodeError happens, and it'll print the server response as debug output. Could you try this out and post the error response here? (might be a complete HTML document)

@biznizz If you want to download scraps as well as the regular gallery when using user profile links as input, you can add "include": "gallery,scraps" to both your deviantart and furaffinity blocks
(see deviantart.include):

"deviantart":
{
    "include": "gallery,scraps",
    "...": "..."
},
"furaffinity":
{
    "include": "gallery,scraps",
    "...": "..."
},

or you just use two URLs per user, e.g. https://www.deviantart.com/USER/gallery/ and https://www.deviantart.com/USER/gallery/scraps, instead of https://www.deviantart.com/USER

@sledgehammer93
Copy link
Author

sledgehammer93 commented Mar 24, 2020

This is the output that I have just received (some sensitive data has been blocked, if that matters, just let me know) It occurs on random URL's for me.:

[urllib3.connectionpool][debug] https://www.deviantart.com:443 "GET /_napi/da-browse/shared_api/deviation/extended_fetch?deviationid=#####&username=#####&type=art&include_session=false HTTP/1.1" 403 919
[deviantart][warning] Unable to fetch deviation ID #####
[deviantart][debug] Server response: {'error': '<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">\n<HTML><HEAD><META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">\n<TITLE>ERROR: The request could not be satisfied</TITLE>\n</HEAD><BODY>\n<H1>403 ERROR</H1>\n<H2>The request could not be satisfied.</H2>\n<HR noshade size="1px">\nRequest blocked.\nWe can\'t connect to the server for this app or website at this time. There might be too much traffic or a configuration error. Try again later, or contact the app or website owner.\n<BR clear="all">\nIf you provide content to customers through CloudFront, you can find steps to troubleshoot and help prevent this error by reviewing the CloudFront documentation.\n<BR clear="all">\n<HR noshade size="1px">\n<PRE>\nGenerated by cloudfront (CloudFront)\nRequest ID: CcCB70v9vtQFw3LdV7V9WspzbdUKUwm-9EKxbwpjn2MerR8wChThqw==\n</PRE>\n<ADDRESS>\n</ADDRESS>\n</BODY></HTML>'}
[urllib3.connectionpool][debug] https://www.deviantart.com:443 "GET /_napi/da-browse/shared_api/deviation/extended_fetch?deviationid=#####&username=#####&type=art&include_session=false HTTP/1.1" 200 None
/mnt/7F1332F211233659/Important/gallery-dl/deviantart/#####/Scraps/deviantart_185446477_may's milk 3.jpg
[urllib3.connectionpool][debug] https://www.deviantart.com:443 "GET /_napi/da-browse/shared_api/deviation/extended_fetch?deviationid=#####&username=#####&type=art&include_session=false HTTP/1.1" 403 919
[deviantart][warning] Unable to fetch deviation ID #####
[deviantart][debug] Server response: {'error': '<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">\n<HTML><HEAD><META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">\n<TITLE>ERROR: The request could not be satisfied</TITLE>\n</HEAD><BODY>\n<H1>403 ERROR</H1>\n<H2>The request could not be satisfied.</H2>\n<HR noshade size="1px">\nRequest blocked.\nWe can\'t connect to the server for this app or website at this time. There might be too much traffic or a configuration error. Try again later, or contact the app or website owner.\n<BR clear="all">\nIf you provide content to customers through CloudFront, you can find steps to troubleshoot and help prevent this error by reviewing the CloudFront documentation.\n<BR clear="all">\n<HR noshade size="1px">\n<PRE>\nGenerated by cloudfront (CloudFront)\nRequest ID: knIXynQ80m3AxLqKPMF3-th-67GGyD_7xFmrSZwLmIgCOja2kz_CXg==\n</PRE>\n<ADDRESS>\n</ADDRESS>\n</BODY></HTML>'}
[urllib3.connectionpool][debug] https://www.deviantart.com:443 "GET /_napi/da-browse/shared_api/deviation/extended_fetch?deviationid=#####&username=#####&type=art&include_session=false HTTP/1.1" 403 919
[deviantart][warning] Unable to fetch deviation ID #####
[deviantart][debug] Server response: {'error': '<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">\n<HTML><HEAD><META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">\n<TITLE>ERROR: The request could not be satisfied</TITLE>\n</HEAD><BODY>\n<H1>403 ERROR</H1>\n<H2>The request could not be satisfied.</H2>\n<HR noshade size="1px">\nRequest blocked.\nWe can\'t connect to the server for this app or website at this time. There might be too much traffic or a configuration error. Try again later, or contact the app or website owner.\n<BR clear="all">\nIf you provide content to customers through CloudFront, you can find steps to troubleshoot and help prevent this error by reviewing the CloudFront documentation.\n<BR clear="all">\n<HR noshade size="1px">\n<PRE>\nGenerated by cloudfront (CloudFront)\nRequest ID: zid18sZ_HbqSyQN1xl_2Eerni5ARsR9CGkyeCIXPgrF9c1dWP_e13Q==\n</PRE>\n<ADDRESS>\n</ADDRESS>\n</BODY></HTML>'}
[urllib3.connectionpool][debug] https://www.deviantart.com:443 "GET /_napi/da-browse/shared_api/deviation/extended_fetch?deviationid=#####&username=#####&type=art&include_session=false HTTP/1.1" 403 919
[deviantart][warning] Unable to fetch deviation ID #####
[deviantart][debug] Server response: {'error': '<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">\n<HTML><HEAD><META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">\n<TITLE>ERROR: The request could not be satisfied</TITLE>\n</HEAD><BODY>\n<H1>403 ERROR</H1>\n<H2>The request could not be satisfied.</H2>\n<HR noshade size="1px">\nRequest blocked.\nWe can\'t connect to the server for this app or website at this time. There might be too much traffic or a configuration error. Try again later, or contact the app or website owner.\n<BR clear="all">\nIf you provide content to customers through CloudFront, you can find steps to troubleshoot and help prevent this error by reviewing the CloudFront documentation.\n<BR clear="all">\n<HR noshade size="1px">\n<PRE>\nGenerated by cloudfront (CloudFront)\nRequest ID: hHoPqjAV2tMorQ02mwL95tlOkSEOsoP3Nf2aAfdi42HY3mEcoOf25g==\n</PRE>\n<ADDRESS>\n</ADDRESS>\n</BODY></HTML>'}
[urllib3.connectionpool][debug] https://www.deviantart.com:443 "GET /_napi/da-browse/shared_api/deviation/extended_fetch?deviationid=#####&username=#####&type=art&include_session=false HTTP/1.1" 403 919
[deviantart][warning] Unable to fetch deviation ID #####
[deviantart][debug] Server response: {'error': '<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">\n<HTML><HEAD><META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">\n<TITLE>ERROR: The request could not be satisfied</TITLE>\n</HEAD><BODY>\n<H1>403 ERROR</H1>\n<H2>The request could not be satisfied.</H2>\n<HR noshade size="1px">\nRequest blocked.\nWe can\'t connect to the server for this app or website at this time. There might be too much traffic or a configuration error. Try again later, or contact the app or website owner.\n<BR clear="all">\nIf you provide content to customers through CloudFront, you can find steps to troubleshoot and help prevent this error by reviewing the CloudFront documentation.\n<BR clear="all">\n<HR noshade size="1px">\n<PRE>\nGenerated by cloudfront (CloudFront)\nRequest ID: yeaytakhmuoZbkqO59upwoveuRPXii4MHnh-oNcjlUDs0QQVmF39Dg==\n</PRE>\n<ADDRESS>\n</ADDRESS>\n</BODY></HTML>'}
[urllib3.connectionpool][debug] https://www.deviantart.com:443 "GET /_napi/da-browse/shared_api/deviation/extended_fetch?deviationid=#####&username=#####&type=art&include_session=false HTTP/1.1" 403 919
[deviantart][warning] Unable to fetch deviation ID #####
[deviantart][debug] Server response: {'error': '<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">\n<HTML><HEAD><META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">\n<TITLE>ERROR: The request could not be satisfied</TITLE>\n</HEAD><BODY>\n<H1>403 ERROR</H1>\n<H2>The request could not be satisfied.</H2>\n<HR noshade size="1px">\nRequest blocked.\nWe can\'t connect to the server for this app or website at this time. There might be too much traffic or a configuration error. Try again later, or contact the app or website owner.\n<BR clear="all">\nIf you provide content to customers through CloudFront, you can find steps to troubleshoot and help prevent this error by reviewing the CloudFront documentation.\n<BR clear="all">\n<HR noshade size="1px">\n<PRE>\nGenerated by cloudfront (CloudFront)\nRequest ID: D4LY_Z8ZmQLonPXgykQqzXlx2JSB2tVFQ86q4DbHu7POoELzrlPMuA==\n</PRE>\n<ADDRESS>\n</ADDRESS>\n</BODY></HTML>'}
[urllib3.connectionpool][debug] https://www.deviantart.com:443 "GET /_napi/da-browse/shared_api/deviation/extended_fetch?deviationid=#####&username=#####&type=art&include_session=false HTTP/1.1" 403 919
[deviantart][warning] Unable to fetch deviation ID #####
[deviantart][debug] Server response: {'error': '<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">\n<HTML><HEAD><META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">\n<TITLE>ERROR: The request could not be satisfied</TITLE>\n</HEAD><BODY>\n<H1>403 ERROR</H1>\n<H2>The request could not be satisfied.</H2>\n<HR noshade size="1px">\nRequest blocked.\nWe can\'t connect to the server for this app or website at this time. There might be too much traffic or a configuration error. Try again later, or contact the app or website owner.\n<BR clear="all">\nIf you provide content to customers through CloudFront, you can find steps to troubleshoot and help prevent this error by reviewing the CloudFront documentation.\n<BR clear="all">\n<HR noshade size="1px">\n<PRE>\nGenerated by cloudfront (CloudFront)\nRequest ID: Y4l4epB5b6xsqqdk7Ooky68G3UxhcYxJ2J5KSh1pnFujdwpQvtvNOA==\n</PRE>\n<ADDRESS>\n</ADDRESS>\n</BODY></HTML>'}
[urllib3.connectionpool][debug] https://www.deviantart.com:443 "GET /_napi/da-browse/shared_api/deviation/extended_fetch?deviationid=#####&username=#####&type=art&include_session=false HTTP/1.1" 403 919
[deviantart][warning] Unable to fetch deviation ID #####
[deviantart][debug] Server response: {'error': '<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">\n<HTML><HEAD><META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">\n<TITLE>ERROR: The request could not be satisfied</TITLE>\n</HEAD><BODY>\n<H1>403 ERROR</H1>\n<H2>The request could not be satisfied.</H2>\n<HR noshade size="1px">\nRequest blocked.\nWe can\'t connect to the server for this app or website at this time. There might be too much traffic or a configuration error. Try again later, or contact the app or website owner.\n<BR clear="all">\nIf you provide content to customers through CloudFront, you can find steps to troubleshoot and help prevent this error by reviewing the CloudFront documentation.\n<BR clear="all">\n<HR noshade size="1px">\n<PRE>\nGenerated by cloudfront (CloudFront)\nRequest ID: 7zZZbwufUjM0ntLixmw5KxxP0RuUXD7zkOiqMxVVyqdkapuWfS12ZQ==\n</PRE>\n<ADDRESS>\n</ADDRESS>\n</BODY></HTML>'}
[urllib3.connectionpool][debug] https://www.deviantart.com:443 "GET /_napi/da-browse/shared_api/deviation/extended_fetch?deviationid=#####&username=#####&type=art&include_session=false HTTP/1.1" 403 919
[deviantart][warning] Unable to fetch deviation ID #####
[deviantart][debug] Server response: {'error': '<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">\n<HTML><HEAD><META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">\n<TITLE>ERROR: The request could not be satisfied</TITLE>\n</HEAD><BODY>\n<H1>403 ERROR</H1>\n<H2>The request could not be satisfied.</H2>\n<HR noshade size="1px">\nRequest blocked.\nWe can\'t connect to the server for this app or website at this time. There might be too much traffic or a configuration error. Try again later, or contact the app or website owner.\n<BR clear="all">\nIf you provide content to customers through CloudFront, you can find steps to troubleshoot and help prevent this error by reviewing the CloudFront documentation.\n<BR clear="all">\n<HR noshade size="1px">\n<PRE>\nGenerated by cloudfront (CloudFront)\nRequest ID: 3YfN2s_zc5HtmrMTpMo2DnLspjvXiVeeSL2CnECoGyl2SGh0fAddXA==\n</PRE>\n<ADDRESS>\n</ADDRESS>\n</BODY></HTML>'}
[urllib3.connectionpool][debug] https://www.deviantart.com:443 "GET /_napi/da-browse/shared_api/deviation/extended_fetch?deviationid=#####&username=#####&type=art&include_session=false HTTP/1.1" 403 919
[deviantart][warning] Unable to fetch deviation ID #####
[deviantart][debug] Server response: {'error': '<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">\n<HTML><HEAD><META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">\n<TITLE>ERROR: The request could not be satisfied</TITLE>\n</HEAD><BODY>\n<H1>403 ERROR</H1>\n<H2>The request could not be satisfied.</H2>\n<HR noshade size="1px">\nRequest blocked.\nWe can\'t connect to the server for this app or website at this time. There might be too much traffic or a configuration error. Try again later, or contact the app or website owner.\n<BR clear="all">\nIf you provide content to customers through CloudFront, you can find steps to troubleshoot and help prevent this error by reviewing the CloudFront documentation.\n<BR clear="all">\n<HR noshade size="1px">\n<PRE>\nGenerated by cloudfront (CloudFront)\nRequest ID: 9ScvRCFyu6yVuVCeDykTEkopIZGPzDrUmXYhV0bCPEL4fn2kTMjULw==\n</PRE>\n<ADDRESS>\n</ADDRESS>\n</BODY></HTML>'}
[urllib3.connectionpool][debug] https://www.deviantart.com:443 "GET /_napi/da-browse/shared_api/deviation/extended_fetch?deviationid=#####&username=#####&type=art&include_session=false HTTP/1.1" 403 919
[deviantart][warning] Unable to fetch deviation ID #####
[deviantart][debug] Server response: {'error': '<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">\n<HTML><HEAD><META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">\n<TITLE>ERROR: The request could not be satisfied</TITLE>\n</HEAD><BODY>\n<H1>403 ERROR</H1>\n<H2>The request could not be satisfied.</H2>\n<HR noshade size="1px">\nRequest blocked.\nWe can\'t connect to the server for this app or website at this time. There might be too much traffic or a configuration error. Try again later, or contact the app or website owner.\n<BR clear="all">\nIf you provide content to customers through CloudFront, you can find steps to troubleshoot and help prevent this error by reviewing the CloudFront documentation.\n<BR clear="all">\n<HR noshade size="1px">\n<PRE>\nGenerated by cloudfront (CloudFront)\nRequest ID: A6jTLziOMfYlkLE9RoRnn3ha1yENWJ_6EdPGuf6DtItK84hyI3Vk7A==\n</PRE>\n<ADDRESS>\n</ADDRESS>\n</BODY></HTML>'}
[urllib3.connectionpool][debug] https://www.deviantart.com:443 "GET /_napi/da-browse/shared_api/deviation/extended_fetch?deviationid=#####&username=#####&type=art&include_session=false HTTP/1.1" 403 919
[deviantart][warning] Unable to fetch deviation ID #####
[deviantart][debug] Server response: {'error': '<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">\n<HTML><HEAD><META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">\n<TITLE>ERROR: The request could not be satisfied</TITLE>\n</HEAD><BODY>\n<H1>403 ERROR</H1>\n<H2>The request could not be satisfied.</H2>\n<HR noshade size="1px">\nRequest blocked.\nWe can\'t connect to the server for this app or website at this time. There might be too much traffic or a configuration error. Try again later, or contact the app or website owner.\n<BR clear="all">\nIf you provide content to customers through CloudFront, you can find steps to troubleshoot and help prevent this error by reviewing the CloudFront documentation.\n<BR clear="all">\n<HR noshade size="1px">\n<PRE>\nGenerated by cloudfront (CloudFront)\nRequest ID: kcVgljjLzIN36yMSOp1gH2jK2u8YmAdL5rnEzo7KxBV1T0ha-1aCwg==\n</PRE>\n<ADDRESS>\n</ADDRESS>\n</BODY></HTML>'}
[urllib3.connectionpool][debug] https://www.deviantart.com:443 "GET /_napi/da-browse/shared_api/deviation/extended_fetch?deviationid=#####&username=#####&type=art&include_session=false HTTP/1.1" 403 919
[deviantart][warning] Unable to fetch deviation ID #####
[deviantart][debug] Server response: {'error': '<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">\n<HTML><HEAD><META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">\n<TITLE>ERROR: The request could not be satisfied</TITLE>\n</HEAD><BODY>\n<H1>403 ERROR</H1>\n<H2>The request could not be satisfied.</H2>\n<HR noshade size="1px">\nRequest blocked.\nWe can\'t connect to the server for this app or website at this time. There might be too much traffic or a configuration error. Try again later, or contact the app or website owner.\n<BR clear="all">\nIf you provide content to customers through CloudFront, you can find steps to troubleshoot and help prevent this error by reviewing the CloudFront documentation.\n<BR clear="all">\n<HR noshade size="1px">\n<PRE>\nGenerated by cloudfront (CloudFront)\nRequest ID: E_btLUWrlvAF34GE6PFlN_-mxYJkV-DVZ6KrGifDTRq6aspC9dnmCw==\n</PRE>\n<ADDRESS>\n</ADDRESS>\n</BODY></HTML>'}

Edit: Added some more info that my computer apparently didn't select.

@biznizz
Copy link

biznizz commented Mar 24, 2020

That worked, now I can rip main and scraps galleries with only one command run, thank you!

By-the-by, what is this extended_fetch deal anyways? Is it a process to download everything (gallery, scraps, favs, journals, etc) from a user gallery?

@Hrxn
Copy link
Contributor

Hrxn commented Mar 25, 2020

If I'm not mistaken, extended_fetch is part of DeviantArt's API, which is a bit, well, weird.

@mikf
Copy link
Owner

mikf commented Mar 26, 2020

This error is apparently from DeviantArt's CDN CloudFront, which will block all requests to deviantart.com after getting too much traffic from the same address. I've set up an endless loop to fetch scraps from the same artist over and over again, and after some time I got the same 403 error. Even visiting the website in a normal browser showed me the following:
deviantart

Going slower through all Deviations by using a delay (sleep), and stopping when there is nothing more to do ("skip": "abort:3") should help here, I hope.

And /extended_fetch is an API endpoint of DeviantArt's Eclipse interface that fetches all information for a single Deviation. The current strategy for scraps is to fetch a list of all numeric Deviation IDs from another API endpoint and than call /extended_fetch for each one of them, which is a bit wasteful.

@sledgehammer93
Copy link
Author

sledgehammer93 commented Mar 26, 2020

@mikf, Just did a test run with the settings you recommended, and it still manged to 403 after a while. I also wasn't able to access any of Deviantart in a normal browser as well. Here's the biggest change I made to my config file:

"scraps":
           {
               "cookies": "/mnt/7F1332F211233659/Important/gallery-dl/cookies.txt",
               "cookies-update": true,
               "sleep": 4,
               "skip": "abort:5"
           },

@Hrxn
Copy link
Contributor

Hrxn commented Mar 26, 2020

@sledgehammer93 Did you try these new settings with a new client IP address on your side first?

@sledgehammer93
Copy link
Author

sledgehammer93 commented Mar 26, 2020

@Hrxn , I did not. I will attempt it again in a little bit to see if that helps.
@Hrxn , Just attempted it again with a new client side IP. It still managed to do the same exact thing.

@biznizz
Copy link

biznizz commented Mar 27, 2020

I can report that I just had a 403 Error as well that interrupted a rip of a DA gallery and temporarily gave me the error page on my browser for a little while. After using a VPN to change my ISP, I was able to finish downloading and when I turned off VPN, I was able to get on site again normally.

Seems that ripping a large amount of images from them now triggers a kind of DDOS prevention routine.

�[1;33mdownloader.http: '403 Forbidden' for 'https://www.deviantart.com/download/27509209/dgdm8p-b2c5fd5e-ebed-42d7-a5c5-385e329dcee6.jpg?token=8858120ded275b3cb7d73c73ab2790c6ba14db79&ts=1585307556'�[0m

�[1;31mdeviantart: An unexpected error occurred: JSONDecodeError - Expecting value: line 1 column 1 (char 0). Please run gallery-dl again with the --verbose flag, copy its output and report this issue on https://github.com/mikf/gallery-dl/issues .�[0m

@Twi-Hard
Copy link

Twi-Hard commented Mar 27, 2020

I would get banned after ~300 images each time I tried to download the artist I was trying to download. I used a different IP each time. The artist has ~585 images total and only 5 of them are scraps. I gave up and haven't tried downloading since. This was on March 17th btw. I was never able to finish downloading them.

The ban only affects your current IP and not your API key.

@biznizz
Copy link

biznizz commented Mar 27, 2020

I would get banned after ~300 images each time I tried to download the artist I was trying to download. I used a different IP each time. The artist has ~585 images total and only 5 of them are scraps. I gave up and haven't tried downloading since. This was on March 17th btw. I was never able to finish downloading them.

The ban only affects your current IP and not your API key.

It'd still kick you even if you had already downloaded the first 300 images? Like, even though you'd already have them, it'd stop in the same position every time?

Odd, you'd think that, since images you'd downloaded are automatically skipped (unless your config is set to redownload them everytime), it'd skip each image you have until image 301 and continue to rip normally afterwards.

@mikf
Copy link
Owner

mikf commented Mar 27, 2020

DeviantArt's developers did something on March 17th, it seems. There is even an entry in the API changelog: https://www.deviantart.com/developers/changelog and, of course, they only removed stuff as their first update in 4½ years.

@sledgehammer93 try "skip": "abort:1" instead of :5, so it only does 2 HTTP requests instead of 6 before stopping when there is nothing new, and have a long pause (>10 seconds maybe) between each artist.

@Twi-Hard use --range "300-". For DeviantArt, this will immediately jump to the 300th image without having to skip over your already downloaded files.

@biznizz It's not downloading images that's causing the ban - most of them are hosted on images-wixmp-....wixmp.com and not on DeviantArt's servers anyway - it's the HTTP requests in the background to gather download URLs, metadata, etc.

@sledgehammer93
Copy link
Author

sledgehammer93 commented Mar 27, 2020

That appears to work reasonably well, @mikf . At the moment, I just have "skip": "abort:1" set in my gallery-dl.conf file, and it appears to run well enough for one or two runs.

However, subsequent runs after that appear to still 403 after a while. I will run the tests again later when I have a bit more time with "sleep: 10" set as well, to see if that helps any.

Update: Tried it again with "sleep: 10" and "wait-min: 10" , and it still 403'ed after a while.

@biznizz
Copy link

biznizz commented Mar 28, 2020

I have "skip": "exit:50", "sleep": 0, and "wait-min": 0 currently in my config.

Is there any benefit to having abort over exit in skip? They largely sound the same according to the configuration doc, as in ending the process after x amount of skips.

mikf added a commit that referenced this issue Apr 3, 2020
'/extended_fetch' as well as Deviation webpages now again contain
Deviation UUIDs needed to grab Deviation info through the OAuth API,
meaning cookies are no longer necessary to grab original files.

The only instance were cookies are still needed are scraps marked as
"mature", since those entries are hidden for public users.

(#655, #657, #660)
mikf added a commit that referenced this issue Apr 3, 2020
- add a 2 second wait time between requests to deviantart.com
- catch 403 "Request blocked" errors and wait for 3 minutes until
  retrying
@mikf
Copy link
Owner

mikf commented Apr 3, 2020

I've been trying to figure out under what conditions these "Request blocked" errors occur by writing little script that continuously sends HTTP requests to https://www.deviantart.com/ and waits a certain time in between.

When waiting <1 seconds, I'd get the error after ca. 250 requests, and after roughly 10min you could send another batch of 250 requests until it happened again. Waiting for 5 seconds didn't result in any errors at all, even after 1500 requests.

ff7c0b7 and f9a590f now add a mandatory 2 second wait time between all regular non-OAuth requests, and, should this error still happen, will wait 3 minutes until trying again, hoping that the internal rate limiting is gone. (It'll continue to wait until the block is actually gone)

@Twi-Hard Somewhere in between I've also fixed --range not doing a "quick jump" for scraps. Sorry for recommending something that didn't properly work.

@sledgehammer93
Copy link
Author

@mikf , That appears to have done the trick. Currently downloading a massive number of scraps at the moment, and it has yet to 403 once. Thanks again!

mikf added a commit that referenced this issue Mar 23, 2024
This was already done for non-OAuth requests (#655)
but CF is now blocking OAuth API requests as well.
JackTildeD added a commit to JackTildeD/gallery-dl-forked that referenced this issue Apr 24, 2024
* save cookies to tempfile, then rename

avoids wiping the cookies file if the disk is full

* [deviantart:stash] fix 'index' metadata (mikf#5335)

* [deviantart:stash] recognize 'deviantart.com/stash/…' URLs

* [gofile] fix extraction

* [kemonoparty] add 'revision_count' metadata field (mikf#5334)

* [kemonoparty] add 'order-revisions' option (mikf#5334)

* Fix imagefap extrcator

* [twitter] add 'birdwatch' metadata field (mikf#5317)

should probably get a better name,
but this is what it's called internally by Twitter

* [hiperdex] update URL patterns & fix 'manga' metadata (mikf#5340)

* [flickr] add 'contexts' option (mikf#5324)

* [tests] show full path for nested values

'user.name' instead of just 'name' when testing for
"user": { … , "name": "…", … }

* [bluesky] add 'instance' metadata field (mikf#4438)

* [vipergirls] add 'like' option (mikf#4166)

* [vipergirls] add 'domain' option (mikf#4166)

* [gelbooru] detect returned favorites order (mikf#5220)

* [gelbooru] add 'date_favorited' metadata field

* Update fapello.py

get fullsize image instead resized

* fapello.py Fullsize image

by remove ".md" and ".th" in image url, it will download fullsize of images

* [formatter] fix local DST datetime offsets for ':O'

'O' would get the *current* local UTC offset and apply it to all
'datetime' objects it gets applied to.
This would result in a wrong offset if the current offset includes
DST and the target 'datetime' does not or vice-versa.

'O' now determines the correct local UTC offset while respecting DST for
each individual 'datetime'.

* [subscribestar] fix 'date' metadata

* [idolcomplex] support new pool URLs

* [idolcomplex] fix metadata extraction

- replace legacy 'id' vales with alphanumeric ones, since the former are
  no longer available
- approximate 'vote_average', since the real value is no longer
  available
- fix 'vote_count'

* [bunkr] remove 'description' metadata

album descriptions are no longer available on album pages
and the previous code erroneously returned just '0'

* [deviantart] improve 'index' extraction for stash files (mikf#5335)

* [kemonoparty] fix exception for '/revision/' URLs

caused by 03a9ce9

* [steamgriddb] raise proper exception for deleted assets

* [tests] update extractor results

* [pornhub:gif] extract 'viewkey' and 'timestamp' metadata (mikf#4463)

mikf#4463 (comment)

* [tests] use 'datetime.timezone.utc' instead of 'datetime.UTC'

'datetime.UTC' was added in Python 3.11
and is not defined in older versions.

* [gelbooru] add 'order-posts' option for favorites (mikf#5220)

* [deviantart] handle CloudFront blocks in general (mikf#5363)

This was already done for non-OAuth requests (mikf#655)
but CF is now blocking OAuth API requests as well.

* release version 1.26.9

* [kemonoparty] fix KeyError for empty files (mikf#5368)

* [twitter] fix pattern for single tweet (mikf#5371)

- Add optional slash
- Update tests to include some non-standard tweet URLs

* [kemonoparty:favorite] support 'sort' and 'order' query params (mikf#5375)

* [kemonoparty] add 'announcements' option (mikf#5262)

mikf#5262 (comment)

* [wikimedia] suppress exception for entries without 'imageinfo' (mikf#5384)

* [docs] update defaults of 'sleep-request', 'browser', 'tls12'

* [docs] complete Authentication info in supportedsites.md

* [twitter] prevent crash when extracting 'birdwatch' metadata (mikf#5403)

* [workflows] build complete docs Pages only on gdl-org/docs

deploy only docs/oauth-redirect.html on mikf.github.io/gallery-dl

* [docs] document 'actions' (mikf#4543)

or at least attempt to

* store 'match' and 'groups' in Extractor objects

* [foolfuuka] improve 'board' pattern & support pages (mikf#5408)

* [reddit] support comment embeds (mikf#5366)

* [build] add minimal pyproject.toml

* [build] generate sdist and wheel packages using 'build' module

* [build] include only the latest CHANGELOG entries

The CHANGELOG is now at a size where it takes up roughly 50kB or 10% of
an sdist or wheel package.

* [oauth] use Extractor.request() for HTTP requests (mikf#5433)

Enables using proxies and general network options.

* [kemonoparty] fix crash on posts with missing datetime info (mikf#5422)

* restore LD_LIBRARY_PATH for PyInstaller builds (mikf#5421)

* remove 'contextlib' imports

* [pp:ugoira] log errors for general exceptions

* [twitter] match '/photo/' Tweet URLs (mikf#5443)

fixes regression introduced in 40c0553

* [pp:mtime] do not overwrite '_mtime' for None values (mikf#5439)

* [wikimedia] fix exception for files with empty 'metadata'

* [wikimedia] support wiki.gg wikis

* [pixiv:novel] add 'covers' option (mikf#5373)

* [tapas] add 'creator' extractor (mikf#5306)

* [twitter] implement 'relogin' option (mikf#5445)

* [docs] update docs/configuration links (mikf#5059, mikf#5369, mikf#5423)

* [docs] replace AnchorJS with custom script

use it in rendered .rst documents as well as in .md ones

* [text] catch general Exceptions

* compute tempfile path only once

* Add warnings flag

This commit adds a warnings flag

It can be combined with -q / --quiet to display warnings.
The intent is to provide a silent option that still surfaces
warning and error messages so that they are visible in logs.

* re-order verbose and warning options

* [gelbooru] improve pagination logic for meta tags (mikf#5478)

similar to 494acab

* [common] add Extractor.input() method

* [twitter] improve username & password login procedure (mikf#5445)

- handle more subtasks
- support 2FA
- support email verification codes

* [common] update Extractor.wait() message format

* [common] simplify 'status_code' check in Extractor.request()

* [common] add 'sleep-429' option (mikf#5160)

* [common] fix NameError in Extractor.request()

… when accessing 'code' after an requests exception was raised.

Caused by the changes in 566472f

* [common] show full URL in Extractor.request() error messages

* [hotleak] download files with 404 status code (mikf#5395)

* [pixiv] change 'sanity_level' debug message to a warning (mikf#5180)

* [twitter] handle missing 'expanded_url' fields (mikf#5463, mikf#5490)

* [tests] allow filtering extractor result tests by URL or comment

python test_results.py twitter:+/i/web/
python test_results.py twitter:~twitpic

* [exhentai] detect CAPTCHAs during login (mikf#5492)

* [output] extend 'output.colors' (mikf#2566)

allow specifying ANSI colors for all loglevels
(debug, info, warning, error)

* [output] enable colors by default

* add '--no-colors' command-line option

---------

Co-authored-by: Luc Ritchie <[email protected]>
Co-authored-by: Mike Fährmann <[email protected]>
Co-authored-by: Herp <[email protected]>
Co-authored-by: wankio <[email protected]>
Co-authored-by: fireattack <[email protected]>
Co-authored-by: Aidan Harris <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants