-
-
Notifications
You must be signed in to change notification settings - Fork 965
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Questions, Feedback, and Suggestions #4 #5262
Comments
For most sites I'm able to sort files into year/month folders like this:
However for redgifs it doesn't look like there's a date keyword available for |
There's a typo in
|
There's also another typo in |
Can you grab all the media from quoted tweets? Example. |
#5262 (comment) It's implemented as a search for 'quoted_tweet_id:…' on Twitter.
#5262 (comment) This on was on the same line as the previous one ... (9fd851c)
Regarding typos, thanks for pointing them out. @biggestsonicfan |
EDIT: Actually, I think there's just something wrong with that URL. I had it saved for a long time and searching that tag normally gives a different URL ( |
You could use |
Is there support to remove metadata like this?
Post-processor: "filter-metadata":
{
"name": "metadata",
"mode": "delete",
"event": "prepare",
"fields": ["preview[images][0][resolutions]"]
} I've tried a few variations but no dice. "fields": ["preview[images][][resolutions]"] "fields": ["preview[images][N][resolutions]"] "fields": ["preview['images'][0]['resolutions']"] |
Hello, I left a comment in #4168 . Does the |
@taskhawk def remove_resolutions(metadata):
for image in metadata["preview"]["images"]:
del image["resolutions"] (untested, might need some check whether @YuanGYao |
@mikf |
Not sure if I'm missing something, but are directory specific configurations exclusive to running gallery-dl via the executable? Basically, I have a directory for regular tags, and a directory for artist tags. For regular tags I use So right now the only way I know to get this per-directory configuration to work, is to copy the gallery-dl executable everywhere I want to use a master configuration override. Am I missing something? It feels like there should be a better way. |
Huh? No, the configuration works always in the same way. You're simply using different configuration files? |
From the readme:
I want to override my master configuration |
You can load additional configuration files from the console with:
You just need to specify the path to the file and any options there will overwrite your main configuration file. Edit: From my understanding, yeah, automatic loading of local config files in each directory is only possible having the standalone executable in each directory. Are different directory options the only thing you need? |
Thanks, that's exactly what I was looking for! Guess I didn't read the documentation thoroughly enough. For now the only thing I'd want to override is the directory structure for artist tags. I don't think it's possible to determine from the metadata alone if a given tag is the name of an artist or not, so I thought the best way to go about it is to just have a separate directory for artists, and use a configuration override. So yeah, loading that override with the -c flag works great for that purpose, thanks again! |
You kinda can, but you need to enable "gelbooru": {
"directory": {
"search_tags in tags_artists": ["{category}", "{search_tags[0]!u}", "{search_tags}", "{date:%Y}", "{date:%m}"],
"" : ["{category}", "{search_tags}", "{date:%Y}", "{date:%m}"]
},
"tags": true
}, Set Of course, this depends on the artists being correctly tagged. Not sure if it happens on Gelbooru, but at least in other boorus and booru-like sites I've come across posts with the artist tagged as a general tag instead of an artist tag. Another limitation is that your search tag can only include one artist at a time, doing more will require a more complex expression to check all tags are present in What I do instead is that I inject a keyword to influence where it will be saved, like this:
And in my config I have "gelbooru": {
"directory": ["boorus", "{search_tags_type}", "{search_tags}"]
}, You can have: "gelbooru": {
"directory": {
"search_tags_type == 'artists'": ["{category}", "{search_tags[0]!u}", "{search_tags}", "{date:%Y}", "{date:%m}"],
"" : ["{category}", "{search_tags}", "{date:%Y}", "{date:%m}"]
}
}, You can do this for other tag types, like general, copyright, characters, etc. Because it's a chore to type that option every time I made a wrapper script, so I just call it like this because artists is my default:
For other tag types I can do:
|
Thanks for pointing out there's a tags option available for the gelbooru extractor. I already used it in the kemono extractor to get the name of the artist, but it didn't occur to me that gelbooru might also have such an option (and just accepted that the tags aren't categorized). For artists I store all the url's in their respective gelbooru.txt, rule34.txt, etc files like so:
And then just run |
When I'm making an extractor, what do I do if the site doesn't have different URL patterns for different page types? Every single page is just a numerical ID that could be a forum post, image, blog post, or something completely different. |
@Wiiplay123 You handle everything with a single extractor and decide what type of result to return on the fly. The |
Hi, what options should I use in my config file to change the format of dates in metadata files? I would like to use And would it also be possible to do this for json files that ytdl creates? I downloaded some videos with gallery-dl but the dates got saved as |
@mikf |
@mikf |
Guys, I need some help. I tried to look for this but I couldn't find anything similar. Maybe there is and I'm just a noob, so I apologize in advance. Here's my situation: Let's say I downloaded some artworks from "random website A" a while ago. tldr: I would like to know if there is a way to skip files based on what's inside the folder instead of the archive. Just like yt-dlp for instance. Or is wiping my archive the only way? Thanks for reading through. UPDATE: I got this. So I basically edited "archive.sqlite3" with sqlitebrowser. Deleted all the entries related to the artist. Next time I entered the command line, it basically recognized what was already in the folder and downloaded what was missing. Maybe there is an easier way? I don't know, but it worked like a charm. |
Yeah, that's what I meant. My bad haha. I will edit the original post. |
One question, why was the following line: self.out.skip(pathfmt.path) moved after the I'm manipulating the output of gallery-dl to give me a bit more information and after upgrading to a later version it broke my output and I'm wondering if it will cause any issue if I just move it back in my local copy. I think what's happening is that the code in the |
is it possible to specify multiple cookies in the config such that g-dl will cycle through them as needed? eg: i'm also curious if it would be possible to randomly cycle through a cookie list to help prevent account bans eg when downloading instagram. |
Heavily related to my previous post, I've now encountered a new patron who edits the text, image attachment, and file attachment of a single post to update rewards from month to month. Since they do change the title of the post, my filename schema shouldn't match and it might be redownloaded as a new post. I won't know until next month, but I guess I'll cross that bridge when I get there. How expensive would it be, computational-wise, to check specific fields within json dumps to determine if an enumerate file should be downloaded or not? |
is there a way to make there are times when my IG cookies get expired but it doesn't show me any errors, so it justs keeps downloading files, lol. is there a way to stop it when cookies get expired? even if it doesn't show any errors? this only works when there are explicit errors: "error:NotFoundError|AuthorizationError|HttpError|HTTP redirect to login page": "exit 0" |
I am getting the error I see many mentions of this error: but I read through many of them trying to understand what to do, and I cannot figure it out. Will someone please tell me how to fix this. Also, just to vent, I had no idea how long this had been happening, or if any of my attempts to download pixiv profiles prior had been subject to this. I can't retroactively check any logs, since I think I used to have logs, but it would cause redownloading profiles to skip media it already downloaded, which annoyed me. I didn't know if I could disable that specifically, so I just gave up on having logs. So, I potentially am missing media when I intended to get everything. I am a bit sad about it. Also, the "logs" I am describing might actually be something entirely different, and might not have told me of this error anyway. I don't know. I barely manage to get gallery-dl working for myself, so it working at all is essentially where my knowledge on the program ends. |
I've come across this too often. Regular auditing of your archives sucks but is almost a necessary thing to do if you want to make sure you have it all. I'd recommend polishing up on some Python skills, and while you don't have to work with gallery-dl's code itself necessarily, you can write your own little audit scripts as needed. I wish we all were at a point where we could say a program is bulletproof, but not knowing everyone's scenarios and every gallery type out there throws curveballs and exceptions into the mix. |
@biggestsonicfan part of the problem was I delayed updating to windows 10 for a very long time, so the cmd window allowing seemingly an infinite amount of text (or at least enough that it dwarf's windows 7's not even allowing a |
Trying to download this: using: produces this error: until it hits 5/5 then fails. It happens for all misskey.gg links. In contrast, misskey.io links work without even needing to preface the link with "misskey:". For example: Is there anything I can do to make misskey.gg links work? |
Trying to download resources from Imgur: |
Is there a "correct" way to convert large deviantart "*.gif" files to webm? I suppose this is doable with "exec" post-processor, but this seems quite tricky, especially given this functionality "almost" exists for ugoira:
So, maybe I'm missing the correct way to do that? And if there is none, maybe it makes sense to add an ugoira-like filter specifically for that to gallery-dl? |
I'm finding case-differences in my twitter directory ("Username" vs "UserName"). It's a btrfs partition under linux so it can handle that, but what's the best way to find out what twitter currently considers the case of the username? Should I take a current tweet, convert it to |
I realize I am double posting here, but I think I have a solution for at least fanbox posts for this. Fanbox metadata has |
@biggestsonicfan {
"extractor": {
"fanbox": {
"postprocessors": [
{
"name": "metadata",
"event": "post",
"filename": "{id}.json",
"directory": ["metadata", "supported"],
"filter": "locals().get('isSupported')"
},
{
"name": "metadata",
"event": "post",
"filename": "{id}.json",
"directory": ["metadata", "unsupported"],
"filter": "not locals().get('isSupported')"
}
]
}
}
}
|
Holy Christ, I've only recently started using filters with gallery-dl and I didn't realize it had potential like this. |
I would like to amend this to say that misskey.gg links actually do successfully download, but only sometimes? Specific links appear to seemingly always fail in the way I explained in this reply, but others will succeed no problem. It appears that I simply didn't let the command run long enough to reach media it would successfully download. Maybe I need to be logged in to download everything? I haven't tried that yet, but also I don't think I will, since the profile I tried to scrape hosts their media elsewhere, so I have no incentive to make an account just to test this. |
@mikf Looks like adding both those filters as a fanbox postprocessor is throwing everything in "unsupported":
Also not entirely sure this would work out well anyway anymore. As higher tier metadata that I don't support also set I'll play with the filter system a bit to see if I can fine tune it. |
When installing from source using
I understand it's inaccurate but I can't figure out why. There is no |
@fireattack I personally use this command on Windows |
Now it says 1.26.9.dev0. despite the built wheel clearly says |
I feel like something has gone awry for sure. Try creating a fresh venv and installing in that, just in case? |
Ah thanks, I figured it out. Apparently I have billions of And doing So, I have to run I suspect this is caused by
Maybe we shouldn't let the users use it unless really needed, @mikf ? (Or change to --force-reinstall instead.) Log if interested
|
Is there a way to download specifically the revisions on an artist's page on kemono.su? For example, one artist has had many of their posts updated with a revision that removed the content, while the original revision retains them. There are hundreds of posts on their page like that, so I was wondering if there was a way to set it to download the original revisions for all of them automatically. |
I just noticed that the latest gallery-dl release made this "just werk":
I still don't know whether it was possible to download such artwork using gallery-dl before (I thought it was, so I was just asking for someone to explain to me in simple terms how to do it), but, again, it "just werks" now, so, much appreciated. |
So Seiga is now region-locked. Can I proxy/wireguard just that extractor? EDIT: I've managed to get Wireguard locally to proxy via a port using wireproxy, but I just need a post(pre)processor to launch it as a daemon and close it when it's done. EDIT2: Figured it out:
|
I hate posting so frequently here but I hate making new issues more. This is once again an issue for me. I've just supported a user that has a preview image and download urls in their post. I normally parse the json files with a python script, however this preview image had been downloaded previously and I don't overwrite json data anymore. So I will re-run the user with skip set to EDIT: I also don't get how the metadata archive works either. Will the metadata entry be the same as the one for the extractor? |
Continuation of the previous issue as a central place for any sort of question or suggestion not deserving their own separate issue.
Links to older issues: #11, #74, #146.
The text was updated successfully, but these errors were encountered: