-
-
Notifications
You must be signed in to change notification settings - Fork 995
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature Request] --abort ignoring type of extractor being used #1399
Comments
It doesn't completely reset, i.e. the number for skipped reddit posts is still at 4 and the next one will trigger
Exactly. It will stop for the current site, e.g. imgur, but will continue with reddit regardless. So you'd need a "global" |
That was my thought; I turned off I'm not sure how you would implement that; would you add a new variant "global" |
I forgot to mention, my main reason for requesting this was related to the imgur rate-limit issues I previously asked about (#1386). By far, imgur has the worst rate-limiting out of all the sites I've seen (1,250 requests per hour; 12,500 per day; if daily rate is hit 5 times in a month your IP gets blocked for the rest of the month). I've found that when scraping a subreddit or reddit user page that has mostly imgur links, the cap is hit fairly quick; even when files are already downloaded, |
@mikf I've managed to mitigate my imgur-rate issues with a shoddy workaround (manually identifying and setting aside subreddits and users that were imgur-post heavy). I still have scrape speed issues when it comes to gfycat/redgif, some subreddits almost exclusively use media from those site so they essentially never abort and have to parse the whole ~1,000 posts available before the next URL. Any idea on when this type of |
This issue also comes up for example with behance, when using a profile (with contains multiple projects) as input: the skip counter resets on every project, as they are handled as different jobs. |
@sourmilk01 I think 7ab8374 combined with c693db5 and dfe1e09 solves your problem.
|
Works great, thank you @mikf! |
@sourmilk01 Is there any specific reason for not using the archive file option here? |
@Hrxn the problem here isn't detecting an already downloaded file, but gallery-dl's action when finding one in combination with parent and child extractors, e.g. Reddit and Imgur. Any skipped download on one Imgur URL didn't propagate to its parent or other children and didn't count towards the overall "skip limit". Hitting said "skip limit" on an Imgur URL also wasn't able to halt the download for its Reddit parent, only itself. |
I've noticed that using
--abort
with a site that has uses different image hosts (such as reddit with reddit, imgur, gfycat, redgifs content posted) will cause the--abort
feature to get interrupted before it hitsn
if it switches to a different extractor (e.g.--abort 5
and 4 repeated reddit posts are skipped, but then a repeated imgur post gets skipped and it resets).I haven't tested it yet, but I suspect that even if 5 posts of the same time that isn't the parent-extractor (like imgur posts on a subreddit url) are skipped,
--abort
won't apply because they aren't reddit-extracted posts.Is there a way to have
--abort
ignore what the type of extractor is being used and if not, could that feature be added?The text was updated successfully, but these errors were encountered: