Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions, Feedback, and Suggestions #4 #5262

Open
mikf opened this issue Mar 1, 2024 · 220 comments
Open

Questions, Feedback, and Suggestions #4 #5262

mikf opened this issue Mar 1, 2024 · 220 comments

Comments

@mikf
Copy link
Owner

mikf commented Mar 1, 2024

Continuation of the previous issue as a central place for any sort of question or suggestion not deserving their own separate issue.

Links to older issues: #11, #74, #146.

@BakedCookie
Copy link

For most sites I'm able to sort files into year/month folders like this:

"directory": ["{category}", "{search_tags}", "{date:%Y}", "{date:%m}"]

However for redgifs it doesn't look like there's a date keyword available for directory. There's only a date keyword available for filename. Is this an oversight?

@mikf
Copy link
Owner Author

mikf commented Mar 2, 2024

Yep, that's a mistake that happened when adding support for galleries in 5a6fd80.
Will be fixed with the next git push.

edit: 82c73c7

@taskhawk
Copy link

taskhawk commented Mar 6, 2024

There's a typo in extractor.reddit.client-id & .user-agent:

"I'm not a rebot"

@the-blank-x
Copy link
Contributor

There's also another typo in extractor.reddit.client-id & .user-agent, "reCATCHA"

@biggestsonicfan
Copy link

Can you grab all the media from quoted tweets? Example.

mikf added a commit that referenced this issue Mar 7, 2024
#5262 (comment)

It's implemented as a search for 'quoted_tweet_id:…' on Twitter.
mikf added a commit that referenced this issue Mar 7, 2024
#5262 (comment)

This on was on the same line as the previous one ... (9fd851c)
@mikf
Copy link
Owner Author

mikf commented Mar 7, 2024

Regarding typos, thanks for pointing them out.
I would be surprised if there aren't at least 10 more somewhere in this file.

@biggestsonicfan
This is implemented as a search for quoted_tweet_id:…- on Twitter's end.
I've added an extractor for it similar to the hashtags one (40c0553), but it only does said search under the hood.

@BakedCookie
Copy link

BakedCookie commented Mar 7, 2024

Normally %-encoded characters in the URL get converted nicely when running gallery-dl, eg.

https://gelbooru.com/index.php?page=post&s=list&tags=nighthawk_%28circle%29
gives me a nighthawk_(circle) folder

but for this url:
https://gelbooru.com/index.php?page=post&s=list&tags=shin%26%23039%3Bya_%28shin%26%23039%3Byanchi%29

I'm getting a shin'ya_(shin'yanchi) folder. Shouldn't I be getting a shin'ya_(shin'yanchi) folder instead?

EDIT: Actually, I think there's just something wrong with that URL. I had it saved for a long time and searching that tag normally gives a different URL (https://gelbooru.com/index.php?page=post&s=list&tags=shin%27ya_%28shin%27yanchi%29). I still got valid posts from the weird URL so I didn't think much of it.

@mikf
Copy link
Owner Author

mikf commented Mar 7, 2024

%28 and so on are URL escaped values, which do get resolved.
#039; is the HTML escaped value for '.

You could use {search_tags!U} to convert them.

@taskhawk
Copy link

taskhawk commented Mar 8, 2024

Is there support to remove metadata like this?

gallery-dl -K https://www.reddit.com/r/carporn/comments/axo236/mean_ctsv/

...
preview['images'][N]['resolutions'][N]['height']
  144
preview['images'][N]['resolutions'][N]['url']
  https://preview.redd.it/mcerovafack21.jpg?width=108&crop=smart&auto=webp&s=f8516c60ad7fa17c84143d549c070738b8bcc989
preview['images'][N]['resolutions'][N]['width']
  108
...

Post-processor:

"filter-metadata":
    {
      "name": "metadata",
      "mode": "delete",
      "event": "prepare",
      "fields": ["preview[images][0][resolutions]"]
    }

I've tried a few variations but no dice.

"fields": ["preview[images][][resolutions]"]
"fields": ["preview[images][N][resolutions]"]
"fields": ["preview['images'][0]['resolutions']"]

@YuanGYao
Copy link

YuanGYao commented Mar 8, 2024

Hello, I left a comment in #4168 . Does the _pagination method of the WeiboExtractor class in weibo.py return when data["list"] is an empty list?
When I used gallery-dl to batch download the album page of Weibo, the download also appeared incomplete.
Through testing on the web page, I found that Weibo's getImageWall api sometimes returns an empty list when the image is not completely loaded. I think this may be what causes gallery-dl to terminate the download.

@mikf
Copy link
Owner Author

mikf commented Mar 8, 2024

@taskhawk
fields selectors are quite limited and can't really handle lists.
You might want to use a python post processor (example) and write some code that does this.

def remove_resolutions(metadata):
    for image in metadata["preview"]["images"]:
        del image["resolutions"]

(untested, might need some check whether preview and/or images exists)

@YuanGYao
Yes, the code currently stops when Weibo's API returns no more results (empty list).
This is probably not ideal, as I've hinted at in #4168 (comment)

@YuanGYao
Copy link

YuanGYao commented Mar 9, 2024

@mikf
Well, I think for Weibo's album page, since_id should be used to determine whether the image is fully loaded.
I updated my comment in #4168(comment) and attached the response returned by Weibo's getImageWall API.
I think this should help solve this problem.

@BakedCookie
Copy link

Not sure if I'm missing something, but are directory specific configurations exclusive to running gallery-dl via the executable?

Basically, I have a directory for regular tags, and a directory for artist tags. For regular tags I use "directory": ["{category}", "{search_tags}", "{date:%Y}", "{date:%m}"] since the tag number is manageable. For artist tags though, there's way more of them so this "directory": ["{category}", "{search_tags[0]!u}", "{search_tags}", "{date:%Y}", "{date:%m}"] makes more sense.

So right now the only way I know to get this per-directory configuration to work, is to copy the gallery-dl executable everywhere I want to use a master configuration override. Am I missing something? It feels like there should be a better way.

@Hrxn
Copy link
Contributor

Hrxn commented Mar 11, 2024

Huh? No, the configuration works always in the same way. You're simply using different configuration files?

@BakedCookie
Copy link

@Hrxn

From the readme:

When run as executable, gallery-dl will also look for a gallery-dl.conf file in the same directory as said executable.

It is possible to use more than one configuration file at a time. In this case, any values from files after the first will get merged into the already loaded settings and potentially override previous ones.

I want to override my master configuration %APPDATA%\gallery-dl\config.json in specific directories with a local gallery-dl.conf but it seems like that's only possible with the standalone executable.

@taskhawk
Copy link

taskhawk commented Mar 11, 2024

You can load additional configuration files from the console with:

-c, --config FILE           Additional configuration files

You just need to specify the path to the file and any options there will overwrite your main configuration file.

Edit: From my understanding, yeah, automatic loading of local config files in each directory is only possible having the standalone executable in each directory. Are different directory options the only thing you need?

@BakedCookie
Copy link

@taskhawk

Thanks, that's exactly what I was looking for! Guess I didn't read the documentation thoroughly enough.

For now the only thing I'd want to override is the directory structure for artist tags. I don't think it's possible to determine from the metadata alone if a given tag is the name of an artist or not, so I thought the best way to go about it is to just have a separate directory for artists, and use a configuration override. So yeah, loading that override with the -c flag works great for that purpose, thanks again!

@taskhawk
Copy link

taskhawk commented Mar 11, 2024

You kinda can, but you need to enable tags for Gelbooru in your configuration to get them, which will require an additional request:

    "gelbooru": {
      "directory": {
        "search_tags in tags_artists": ["{category}", "{search_tags[0]!u}", "{search_tags}", "{date:%Y}", "{date:%m}"],
        ""                           : ["{category}", "{search_tags}", "{date:%Y}", "{date:%m}"]
      },
      "tags": true
    },

Set "tags": true in your config and run a test with gallery-dl -K "https://gelbooru.com/index.php?page=post&s=list&tags=TAG" so you can see the tags_* keywords.

Of course, this depends on the artists being correctly tagged. Not sure if it happens on Gelbooru, but at least in other boorus and booru-like sites I've come across posts with the artist tagged as a general tag instead of an artist tag. Another limitation is that your search tag can only include one artist at a time, doing more will require a more complex expression to check all tags are present in tags_artists.

What I do instead is that I inject a keyword to influence where it will be saved, like this:

gallery-dl -o keywords='{"search_tags_type":"artists"}' "https://gelbooru.com/index.php?page=post&s=list&tags=ARTIST"

And in my config I have

    "gelbooru": {
      "directory": ["boorus", "{search_tags_type}", "{search_tags}"]
    },

You can have:

    "gelbooru": {
      "directory": {
        "search_tags_type == 'artists'": ["{category}", "{search_tags[0]!u}", "{search_tags}", "{date:%Y}", "{date:%m}"],
        ""                             : ["{category}", "{search_tags}", "{date:%Y}", "{date:%m}"]
      }
    },

You can do this for other tag types, like general, copyright, characters, etc.

Because it's a chore to type that option every time I made a wrapper script, so I just call it like this because artists is my default:

~/script.sh "TAG"

For other tag types I can do:

~/script.sh --copyright "TAG"
~/script.sh --characters "TAG"
~/script.sh --general "TAG"

@BakedCookie
Copy link

Thanks for pointing out there's a tags option available for the gelbooru extractor. I already used it in the kemono extractor to get the name of the artist, but it didn't occur to me that gelbooru might also have such an option (and just accepted that the tags aren't categorized).

For artists I store all the url's in their respective gelbooru.txt, rule34.txt, etc files like so:

https://gelbooru.com/index.php?page=post&s=list&tags=john_doe
https://gelbooru.com/index.php?page=post&s=list&tags=blue-senpai
https://gelbooru.com/index.php?page=post&s=list&tags=kaneru
.
.
.

And then just run gallery-dl -c gallery-dl.conf -i gelbooru.txt. Since the search_tags ends up being the artist anyway, getting tags_artists is probably not worth the extra request. Same for general tags, and copyright tags, in their respective directories. With this workflow I can't immediately see where I'd be able to utilize keyword injection, but it's definitely a useful feature that I'll keep in mind.

@Wiiplay123
Copy link
Contributor

When I'm making an extractor, what do I do if the site doesn't have different URL patterns for different page types? Every single page is just a numerical ID that could be a forum post, image, blog post, or something completely different.

@mikf
Copy link
Owner Author

mikf commented Mar 19, 2024

@Wiiplay123 You handle everything with a single extractor and decide what type of result to return on the fly. The gofile code is a good example for this I think, or aryion.

@I-seah
Copy link

I-seah commented Mar 20, 2024

Hi, what options should I use in my config file to change the format of dates in metadata files? I would like to use "%Y-%m-%dT%H:%M:%S%z" for the values of "date" and "published" (from coomer/kemono downloads).

And would it also be possible to do this for json files that ytdl creates? I downloaded some videos with gallery-dl but the dates got saved as "upload_date": "20230910" and "timestamp": 1694344011, so I think it might be better to convert the timestamp to a date to get a more precise upload time, but I'm not sure if it's possible to do that either.

@biggestsonicfan
Copy link

@mikf
Oh, that'll work a treat! Still though, if I do end up subbing to someone on a platform where free/preview metadata was downloaded, I will need to edit the post-processor or remove existing metadata. I was thinking maybe a size comparison of the metadata downloaded by gallery-dl vs what's on disk could do the trick, and theoretically the "bigger" size would contain paywalled data (text/links/whatever) but if new fields or something were added, that could possibly push the size of a free/locked post metadata over previously downloaded paid metadata and overwrite it. It's a tricky scenario that I'll keep in the back of my mind to figure out a best solution, but for now manually making sure I have paid data is fine. Thanks!

@Coro365
Copy link

Coro365 commented Sep 11, 2024

@mikf
Thanks for the reply.
I had overlooked that.
Thanks kindly, I'll give it a try.

@raz3x
Copy link

raz3x commented Sep 12, 2024

Guys, I need some help. I tried to look for this but I couldn't find anything similar. Maybe there is and I'm just a noob, so I apologize in advance.

Here's my situation:

Let's say I downloaded some artworks from "random website A" a while ago.
But as I got to know gallery-dl more, the way I named my files got better organized.
So I basically use "-o skip=false" now in order to redownload some stuff with the new settings.
Everything is fine until a few things go wrong here and there, and a few files aren't downloaded due to connection issues or something.
This could be easily solved just by re-entering the same command line to download what is missing.
But the thing is that I'm using "-o skip=false" because of my already existing archive, and it ends up downloading everything again.
And if I remove "-o skip=false", gallery-dl will take the archive in consideration and skip everything.

tldr: I would like to know if there is a way to skip files based on what's inside the folder instead of the archive. Just like yt-dlp for instance. Or is wiping my archive the only way?

Thanks for reading through.

UPDATE: I got this. So I basically edited "archive.sqlite3" with sqlitebrowser. Deleted all the entries related to the artist. Next time I entered the command line, it basically recognized what was already in the folder and downloaded what was missing. Maybe there is an easier way? I don't know, but it worked like a charm.

@Hrxn
Copy link
Contributor

Hrxn commented Sep 12, 2024

@raz3x What cache?
Do you mean the archive option?

This is similar to the --download-archive option of yt-dlp, except it's more advanced (uses a database) and much more flexible.

@raz3x
Copy link

raz3x commented Sep 12, 2024

@Hrxn

Yeah, that's what I meant. My bad haha. I will edit the original post.

@taskhawk
Copy link

One question, why was the following line:

self.out.skip(pathfmt.path)

moved after the if-block in the commit 3595721#diff-805418c86a6e54601f79e880a0a58749fbc92607592a0d4f73d1e0bc2c8e56f1 ?

I'm manipulating the output of gallery-dl to give me a bit more information and after upgrading to a later version it broke my output and I'm wondering if it will cause any issue if I just move it back in my local copy. I think what's happening is that the code in the if-block is printing a newline for each file in the output somewhere in there.

@docholllidae
Copy link

docholllidae commented Sep 18, 2024

is it possible to specify multiple cookies in the config such that g-dl will cycle through them as needed?

eg:
when downloading a list of twitter users with cookie[0], if the profile is private and c[0] doesn't have access, then g-dl will try with c[1], then c[2], etc

i'm also curious if it would be possible to randomly cycle through a cookie list to help prevent account bans eg when downloading instagram.
feed an array of cookies into the config, and when downloading from a list of url's it will randomly choose a cookie each time it starts a new extraction/input url

@biggestsonicfan
Copy link

Heavily related to my previous post, I've now encountered a new patron who edits the text, image attachment, and file attachment of a single post to update rewards from month to month. Since they do change the title of the post, my filename schema shouldn't match and it might be redownloaded as a new post. I won't know until next month, but I guess I'll cross that bridge when I get there.

How expensive would it be, computational-wise, to check specific fields within json dumps to determine if an enumerate file should be downloaded or not?

@throwaway242685
Copy link

throwaway242685 commented Sep 23, 2024

is there a way to make gallery-dl stop/exit when cookies get expired? even when there aren't any errors.

there are times when my IG cookies get expired but it doesn't show me any errors, so it justs keeps downloading files, lol.

is there a way to stop it when cookies get expired? even if it doesn't show any errors?

this only works when there are explicit errors:

"error:NotFoundError|AuthorizationError|HttpError|HTTP redirect to login page": "exit 0"

@topchaser
Copy link

I am getting the error pixiv: Unable to download work 59915441 ('sanity_level' warning) when I try to download this link (NSFW, but you cannot see it unless logged in):
https://www.pixiv.net/en/artworks/59915441

I see many mentions of this error:
https://github.com/mikf/gallery-dl/issues?q=sanity+level+warning

but I read through many of them trying to understand what to do, and I cannot figure it out. Will someone please tell me how to fix this.

Also, just to vent, I had no idea how long this had been happening, or if any of my attempts to download pixiv profiles prior had been subject to this. I can't retroactively check any logs, since I think I used to have logs, but it would cause redownloading profiles to skip media it already downloaded, which annoyed me. I didn't know if I could disable that specifically, so I just gave up on having logs. So, I potentially am missing media when I intended to get everything. I am a bit sad about it. Also, the "logs" I am describing might actually be something entirely different, and might not have told me of this error anyway. I don't know. I barely manage to get gallery-dl working for myself, so it working at all is essentially where my knowledge on the program ends.

@biggestsonicfan
Copy link

Also, just to vent, I had no idea how long this had been happening...

I've come across this too often. Regular auditing of your archives sucks but is almost a necessary thing to do if you want to make sure you have it all. I'd recommend polishing up on some Python skills, and while you don't have to work with gallery-dl's code itself necessarily, you can write your own little audit scripts as needed. I wish we all were at a point where we could say a program is bulletproof, but not knowing everyone's scenarios and every gallery type out there throws curveballs and exceptions into the mix.

@topchaser
Copy link

@biggestsonicfan part of the problem was I delayed updating to windows 10 for a very long time, so the cmd window allowing seemingly an infinite amount of text (or at least enough that it dwarf's windows 7's not even allowing a gallery-dl -K command to necessarily be fully displayed) is by all accounts extremely new to me relative to the years I've been using gallery-dl. It now would be no issue to just scroll up on the command window before I close it, but prior, I had to babysit it in the present, without letting it scroll too far before checking on it again, since I didn't (and still don't) know if I can even keep a log of everything I've downloaded, to retroactively check them for errors if ever I so chose. I don't know if I have it in me to commit to anything much, especially considering even reading existing issues on my immediate issue is something I gave up on after trying to make sense of them for maybe half an hour at most. But, it would be in my best interest to do so, of course. For now I will just scroll up on my cmd windows before I close them, I guess. It is so easy to do, I should've been doing it since I updated to windows 10.

@topchaser
Copy link

Trying to download this:
https://misskey.gg/notes/9yp3zt35c3

using:
gallery-dl misskey:https://misskey.gg/notes/9yp3zt35c3

produces this error:
[downloader.http][warning] ('Connection broken: IncompleteRead(0 bytes read, 58762 more expected)', IncompleteRead(0 bytes read, 58762 more expected)) (1/5)

until it hits 5/5 then fails. It happens for all misskey.gg links. In contrast, misskey.io links work without even needing to preface the link with "misskey:". For example:
https://misskey.io/notes/9ru7yqi5u4j6070a

Is there anything I can do to make misskey.gg links work?

@HilalSoorty
Copy link

Trying to download resources from Imgur:
URL: https://imgur.com/search/score?q=code
(CMD: gallery-dl --range 1-10 https://imgur.com/search/score?q=code)
But the --range flag not working properly even after putting this flag it leads to downloading of unlimited resources.

@hunter-gatherer8
Copy link
Contributor

Is there a "correct" way to convert large deviantart "*.gif" files to webm?

I suppose this is doable with "exec" post-processor, but this seems quite tricky, especially given this functionality "almost" exists for ugoira:

  1. You need to check extension somehow, to convert only gifs.
  2. Makes sense to check size, to only convert gifs that are actually 50+MB videos, not pictures.
  3. You have to manually add "rm", "{_path}" to the command, I guess, which seems hacky, and gallery-dl won't "know" anything about it.

So, maybe I'm missing the correct way to do that? And if there is none, maybe it makes sense to add an ugoira-like filter specifically for that to gallery-dl?

@biggestsonicfan
Copy link

I'm finding case-differences in my twitter directory ("Username" vs "UserName"). It's a btrfs partition under linux so it can handle that, but what's the best way to find out what twitter currently considers the case of the username? Should I take a current tweet, convert it to i/web/status and dump json info?

@biggestsonicfan
Copy link

I realize I am double posting here, but I think I have a solution for at least fanbox posts for this. Fanbox metadata has isSupported and I think the supported plan fee amount. Perhaps a new argument to metadata-skip should be supported in which metadata will not be overwritten unless currently supported for that tier/plan?

@mikf
Copy link
Owner Author

mikf commented Oct 16, 2024

@biggestsonicfan
When there is a metadata field like this, you can use a filter statement to control if a post processor should run and potentially overwrite a previous file. You could even use it to put metadata into different supported/not supported directories.

{
    "extractor": {
        "fanbox": {
            "postprocessors": [
                {
                    "name": "metadata",
                    "event": "post",
                    "filename": "{id}.json",
                    "directory": ["metadata", "supported"],
                    "filter": "locals().get('isSupported')"
                },
                {
                    "name": "metadata",
                    "event": "post",
                    "filename": "{id}.json",
                    "directory": ["metadata", "unsupported"],
                    "filter": "not locals().get('isSupported')"
                }
            ]
        }
    }
}

metadata post processor also support archive functionality, by the way.

@biggestsonicfan
Copy link

Holy Christ, I've only recently started using filters with gallery-dl and I didn't realize it had potential like this.

@topchaser
Copy link

Trying to download this: https://misskey.gg/notes/9yp3zt35c3

using: gallery-dl misskey:https://misskey.gg/notes/9yp3zt35c3

produces this error: [downloader.http][warning] ('Connection broken: IncompleteRead(0 bytes read, 58762 more expected)', IncompleteRead(0 bytes read, 58762 more expected)) (1/5)

until it hits 5/5 then fails. It happens for all misskey.gg links. In contrast, misskey.io links work without even needing to preface the link with "misskey:". For example: https://misskey.io/notes/9ru7yqi5u4j6070a

Is there anything I can do to make misskey.gg links work?

I would like to amend this to say that misskey.gg links actually do successfully download, but only sometimes? Specific links appear to seemingly always fail in the way I explained in this reply, but others will succeed no problem. It appears that I simply didn't let the command run long enough to reach media it would successfully download. Maybe I need to be logged in to download everything? I haven't tried that yet, but also I don't think I will, since the profile I tried to scrape hosts their media elsewhere, so I have no incentive to make an account just to test this.

@biggestsonicfan
Copy link

@mikf Looks like adding both those filters as a fanbox postprocessor is throwing everything in "unsupported":

            "postprocessors": [
                {
                    "name": "metadata",
                    "event": "prepare",
                    "mode": "json",
                    "directory": ["json", "supported"],
                    "extension-format": "json",
                    "filter": "locals().get('isSupported')"
                },
                {
                    "name": "metadata",
                    "event": "prepare",
                    "mode": "json",
                    "directory": ["json", "unsupported"],
                    "extension-format": "json",
                    "filter": "not locals().get('isSupported')"
                },
                {
                    "name": "mtime",
                    "event": "file,post"
                }
            ]

Also not entirely sure this would work out well anyway anymore. As higher tier metadata that I don't support also set isSupported to True. I feel like the filter for supported/unsupported should check against my current pledged tier, and should be put in a directory like ["json", "supported", "{feeRequired}"],.

I'll play with the filter system a bit to see if I can fine tune it.

@fireattack
Copy link
Contributor

fireattack commented Oct 20, 2024

When installing from source using python3 -m pip install -U -I --no-deps --no-cache-dir https://github.com/mikf/gallery-dl/archive/master.tar.gz, why it always says "Successfully installed gallery_dl-1.26.7.dev0" or "Successfully installed gallery_dl-1.26.8" (yes it somehow even changes, after i uninstall and try again!) despite we're on 1.27.x for a while?

D:\>python -m pip install -U -I --no-deps --no-cache-dir https://github.com/mikf/gallery-dl/archive/master.tar.gz
Collecting https://github.com/mikf/gallery-dl/archive/master.tar.gz
  Downloading https://github.com/mikf/gallery-dl/archive/master.tar.gz
     | 731.8 kB 4.8 MB/s 0:00:00
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... done
Building wheels for collected packages: gallery_dl
  Building wheel for gallery_dl (pyproject.toml) ... done
  Created wheel for gallery_dl: filename=gallery_dl-1.27.7.dev0-py3-none-any.whl size=568515 sha256=3d320818d2c31ee30f0b1225cc3504bb7654c740640bf33ef53048f0bd8cce09
  Stored in directory: C:\Users\ikena\AppData\Local\Temp\pip-ephem-wheel-cache-em7n5c60\wheels\2a\d6\3c\0feca23fdac400e011e04ee81cac7ad83f629ec3e2cc8dc1ed
Successfully built gallery_dl
Installing collected packages: gallery_dl
Successfully installed gallery_dl-1.26.7.dev0

D:\>pip uninstall gallery-dl
Found existing installation: gallery-dl 1.26.7.dev0
Uninstalling gallery-dl-1.26.7.dev0:
  Would remove:
    c:\users\ikena\appdata\local\programs\python\python311\lib\site-packages\gallery_dl-1.26.7.dev0.dist-info\*
    c:\users\ikena\appdata\local\programs\python\python311\lib\site-packages\gallery_dl\*
    c:\users\ikena\appdata\local\programs\python\python311\scripts\gallery-dl.exe
  Would not remove (might be manually added):
    c:\users\ikena\appdata\local\programs\python\python311\lib\site-packages\gallery_dl\archive.py
    c:\users\ikena\appdata\local\programs\python\python311\lib\site-packages\gallery_dl\extractor\2ch.py
    c:\users\ikena\appdata\local\programs\python\python311\lib\site-packages\gallery_dl\extractor\agnph.py
    c:\users\ikena\appdata\local\programs\python\python311\lib\site-packages\gallery_dl\extractor\ao3.py
    c:\users\ikena\appdata\local\programs\python\python311\lib\site-packages\gallery_dl\extractor\bluesky.py
    c:\users\ikena\appdata\local\programs\python\python311\lib\site-packages\gallery_dl\extractor\boosty.py
    c:\users\ikena\appdata\local\programs\python\python311\lib\site-packages\gallery_dl\extractor\cien.py
    c:\users\ikena\appdata\local\programs\python\python311\lib\site-packages\gallery_dl\extractor\civitai.py
    c:\users\ikena\appdata\local\programs\python\python311\lib\site-packages\gallery_dl\extractor\cohost.py
    c:\users\ikena\appdata\local\programs\python\python311\lib\site-packages\gallery_dl\extractor\hentainexus.py
    c:\users\ikena\appdata\local\programs\python\python311\lib\site-packages\gallery_dl\extractor\koharu.py
    c:\users\ikena\appdata\local\programs\python\python311\lib\site-packages\gallery_dl\extractor\wikimedia.py
    c:\users\ikena\appdata\local\programs\python\python311\lib\site-packages\gallery_dl\postprocessor\hash.py
    c:\users\ikena\appdata\local\programs\python\python311\lib\site-packages\gallery_dl\postprocessor\rename.py
    c:\users\ikena\appdata\local\programs\python\python311\lib\site-packages\gallery_dl\update.py
Proceed (Y/n)? y
  Successfully uninstalled gallery-dl-1.26.7.dev0

D:\>python -m pip install -U -I --no-deps --no-cache-dir https://github.com/mikf/gallery-dl/archive/master.tar.gz
Collecting https://github.com/mikf/gallery-dl/archive/master.tar.gz
  Downloading https://github.com/mikf/gallery-dl/archive/master.tar.gz
     | 731.8 kB 4.8 MB/s 0:00:00
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... done
Building wheels for collected packages: gallery_dl
  Building wheel for gallery_dl (pyproject.toml) ... done
  Created wheel for gallery_dl: filename=gallery_dl-1.27.7.dev0-py3-none-any.whl size=568515 sha256=1558b598d4f62ed90cd0e8a60ff03659fc55991b3678cc90bf289ab37e6d2677
  Stored in directory: C:\Users\ikena\AppData\Local\Temp\pip-ephem-wheel-cache-p2jedlq2\wheels\2a\d6\3c\0feca23fdac400e011e04ee81cac7ad83f629ec3e2cc8dc1ed
Successfully built gallery_dl
Installing collected packages: gallery_dl
Successfully installed gallery_dl-1.26.8

D:\>

I understand it's inaccurate but I can't figure out why. There is no 1.26.7 or 1.26.8 string left in this repo from what I can tell:

image

@biggestsonicfan
Copy link

biggestsonicfan commented Oct 20, 2024

@fireattack I personally use this command on Windows python -m pip install --force-reinstall --no-deps --no-cache-dir https://github.com/mikf/gallery-dl/archive/master.tar.gz with the --force-reinstall flag. Might help.

@fireattack
Copy link
Contributor

fireattack commented Oct 20, 2024

>python3 -m pip install -U -I --force-reinstall --no-deps --no-cache-dir https://github.com/mikf/gallery-dl/archive/m
aster.tar.gz
Collecting https://github.com/mikf/gallery-dl/archive/master.tar.gz
  Downloading https://github.com/mikf/gallery-dl/archive/master.tar.gz
     - 731.8 kB 643.2 kB/s 0:00:01
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... done
Building wheels for collected packages: gallery_dl
  Building wheel for gallery_dl (pyproject.toml) ... done
  Created wheel for gallery_dl: filename=gallery_dl-1.27.7.dev0-py3-none-any.whl size=568515 sha256=9fb25b7eefd00dcd729bc26fa937ad531a0741f26d9c7fc7fbdb86bc578611e5
  Stored in directory: C:\Users\ikena\AppData\Local\Temp\pip-ephem-wheel-cache-be3ub4hf\wheels\2a\d6\3c\0feca23fdac400e011e04ee81cac7ad83f629ec3e2cc8dc1ed
Successfully built gallery_dl
Installing collected packages: gallery_dl
Successfully installed gallery_dl-1.26.9.dev0

Now it says 1.26.9.dev0. despite the built wheel clearly says gallery_dl-1.27.7.dev0-py3-none-any.whl. Did pip just randomly calculate these version numbers on its own?

@biggestsonicfan
Copy link

I feel like something has gone awry for sure. Try creating a fresh venv and installing in that, just in case?

@fireattack
Copy link
Contributor

fireattack commented Oct 20, 2024

Ah thanks, I figured it out. Apparently I have billions of gallery_dl (some are called gallery-dl, even) installed in my system.

And doing pip uninstall gallery_dl will only uninstall one of them.. others will happily continue to exist (pip list will only list one of them, too.)

So, I have to run pip uninstall gallery_dl multiple times until pip list reports none and then re-install.

I suspect this is caused by -I argument in the command given in README:

-I, --ignore-installed
Ignore the installed packages, overwriting them. This can break your system if the existing package is of a different version or was installed with a different package manager!

(environment variable: PIP_IGNORE_INSTALLED)

Maybe we shouldn't let the users use it unless really needed, @mikf ? (Or change to --force-reinstall instead.)

Log if interested
Microsoft Windows [Version 10.0.19045.5011]
(c) Microsoft Corporation. All rights reserved.

D:\3>pip uninstall gallery-dl
Found existing installation: gallery-dl 1.26.9.dev0
Uninstalling gallery-dl-1.26.9.dev0:
  Would remove:
    c:\users\ikena\appdata\local\programs\python\python311\lib\site-packages\gallery_dl-1.26.9.dev0.dist-info\*
    c:\users\ikena\appdata\local\programs\python\python311\lib\site-packages\gallery_dl\*
    c:\users\ikena\appdata\local\programs\python\python311\scripts\gallery-dl.exe
  Would not remove (might be manually added):
    c:\users\ikena\appdata\local\programs\python\python311\lib\site-packages\gallery_dl\archive.py
    c:\users\ikena\appdata\local\programs\python\python311\lib\site-packages\gallery_dl\extractor\agnph.py
    c:\users\ikena\appdata\local\programs\python\python311\lib\site-packages\gallery_dl\extractor\ao3.py
    c:\users\ikena\appdata\local\programs\python\python311\lib\site-packages\gallery_dl\extractor\boosty.py
    c:\users\ikena\appdata\local\programs\python\python311\lib\site-packages\gallery_dl\extractor\cien.py
    c:\users\ikena\appdata\local\programs\python\python311\lib\site-packages\gallery_dl\extractor\civitai.py
    c:\users\ikena\appdata\local\programs\python\python311\lib\site-packages\gallery_dl\extractor\cohost.py
    c:\users\ikena\appdata\local\programs\python\python311\lib\site-packages\gallery_dl\extractor\hentainexus.py
    c:\users\ikena\appdata\local\programs\python\python311\lib\site-packages\gallery_dl\extractor\koharu.py
    c:\users\ikena\appdata\local\programs\python\python311\lib\site-packages\gallery_dl\postprocessor\hash.py
    c:\users\ikena\appdata\local\programs\python\python311\lib\site-packages\gallery_dl\postprocessor\rename.py
    c:\users\ikena\appdata\local\programs\python\python311\lib\site-packages\gallery_dl\update.py
Proceed (Y/n)? y
  Successfully uninstalled gallery-dl-1.26.9.dev0

D:\3>pip list | findstr gallery
gallery_dl                     1.27.0.dev0

D:\3>pip uninstall gallery_dl
Found existing installation: gallery_dl 1.27.0.dev0
Uninstalling gallery_dl-1.27.0.dev0:
  Would remove:
    c:\users\ikena\appdata\local\programs\python\python311\lib\site-packages\gallery_dl-1.27.0.dev0.dist-info\*
    c:\users\ikena\appdata\local\programs\python\python311\lib\site-packages\gallery_dl\archive.py
Proceed (Y/n)? y
  Successfully uninstalled gallery_dl-1.27.0.dev0

D:\3>python3 -m pip install -U -I --force-reinstall --no-deps --no-cache-dir https://github.com/mikf/gallery-dl/archive/m
Collecting https://github.com/mikf/gallery-dl/archive/m
  ERROR: HTTP error 404 while getting https://github.com/mikf/gallery-dl/archive/m
ERROR: Could not install requirement https://github.com/mikf/gallery-dl/archive/m because of HTTP error 404 Client Error: Not Found for url: https://github.com/mikf/gallery-dl/archive/m for URL https://github.com/mikf/gallery-dl/archive/m

D:\3>python -m pip install -U -I --force-reinstall --no-deps --no-cache-dir https://github.com/mikf/gallery-dl/archive/master.tar.gz
Collecting https://github.com/mikf/gallery-dl/archive/master.tar.gz
  Downloading https://github.com/mikf/gallery-dl/archive/master.tar.gz (731 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 731.8/731.8 kB 2.0 MB/s eta 0:00:00
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... done
Building wheels for collected packages: gallery_dl
  Building wheel for gallery_dl (pyproject.toml) ... done
  Created wheel for gallery_dl: filename=gallery_dl-1.27.7.dev0-py3-none-any.whl size=568515 sha256=9695c9ea1c21c83ed4dfa7d9c9ad91a1692c7adb7936fe7ea17dbbbdf28a1485
  Stored in directory: C:\Users\ikena\AppData\Local\Temp\pip-ephem-wheel-cache-_uiz8zay\wheels\2a\d6\3c\0feca23fdac400e011e04ee81cac7ad83f629ec3e2cc8dc1ed
Successfully built gallery_dl
Installing collected packages: gallery_dl
Successfully installed gallery_dl-1.27.2.dev0

D:\3>pip list | findstr gallery
gallery_dl                     1.27.2.dev0

D:\3>pip uninstall gallery-dl gallery_dl
Found existing installation: gallery_dl 1.27.2.dev0
Uninstalling gallery_dl-1.27.2.dev0:
  Would remove:
    c:\users\ikena\appdata\local\programs\python\python311\lib\site-packages\gallery_dl-1.27.2.dev0.dist-info\*
    c:\users\ikena\appdata\local\programs\python\python311\lib\site-packages\gallery_dl\*
    c:\users\ikena\appdata\local\programs\python\python311\scripts\gallery-dl.exe
  Would not remove (might be manually added):
    c:\users\ikena\appdata\local\programs\python\python311\lib\site-packages\gallery_dl\extractor\ao3.py
    c:\users\ikena\appdata\local\programs\python\python311\lib\site-packages\gallery_dl\extractor\boosty.py
    c:\users\ikena\appdata\local\programs\python\python311\lib\site-packages\gallery_dl\extractor\civitai.py
    c:\users\ikena\appdata\local\programs\python\python311\lib\site-packages\gallery_dl\extractor\cohost.py
    c:\users\ikena\appdata\local\programs\python\python311\lib\site-packages\gallery_dl\postprocessor\hash.py
    c:\users\ikena\appdata\local\programs\python\python311\lib\site-packages\gallery_dl\postprocessor\rename.py
Proceed (Y/n)? y
  Successfully uninstalled gallery_dl-1.27.2.dev0

D:\3>pip list | findstr gallery
gallery_dl                     1.27.2

D:\3>pip uninstall gallery_dl
Found existing installation: gallery_dl 1.27.2
Uninstalling gallery_dl-1.27.2:
  Would remove:
    c:\users\ikena\appdata\local\programs\python\python311\lib\site-packages\gallery_dl-1.27.2.dist-info\*
Proceed (Y/n)? y
  Successfully uninstalled gallery_dl-1.27.2

D:\3>pip uninstall gallery_dl
Found existing installation: gallery_dl 1.27.7.dev0
Uninstalling gallery_dl-1.27.7.dev0:
  Would remove:
    c:\users\ikena\appdata\local\programs\python\python311\lib\site-packages\gallery_dl-1.27.7.dev0.dist-info\*
    c:\users\ikena\appdata\local\programs\python\python311\lib\site-packages\gallery_dl\extractor\ao3.py
    c:\users\ikena\appdata\local\programs\python\python311\lib\site-packages\gallery_dl\extractor\boosty.py
    c:\users\ikena\appdata\local\programs\python\python311\lib\site-packages\gallery_dl\extractor\civitai.py
    c:\users\ikena\appdata\local\programs\python\python311\lib\site-packages\gallery_dl\extractor\cohost.py
    c:\users\ikena\appdata\local\programs\python\python311\lib\site-packages\gallery_dl\postprocessor\hash.py
    c:\users\ikena\appdata\local\programs\python\python311\lib\site-packages\gallery_dl\postprocessor\rename.py
Proceed (Y/n)? n

D:\3>pip list | findstr gallery
gallery_dl                     1.27.7.dev0

D:\3>pip uninstall gallery_dl
Found existing installation: gallery_dl 1.27.7.dev0
Uninstalling gallery_dl-1.27.7.dev0:
  Would remove:
    c:\users\ikena\appdata\local\programs\python\python311\lib\site-packages\gallery_dl-1.27.7.dev0.dist-info\*
    c:\users\ikena\appdata\local\programs\python\python311\lib\site-packages\gallery_dl\extractor\ao3.py
    c:\users\ikena\appdata\local\programs\python\python311\lib\site-packages\gallery_dl\extractor\boosty.py
    c:\users\ikena\appdata\local\programs\python\python311\lib\site-packages\gallery_dl\extractor\civitai.py
    c:\users\ikena\appdata\local\programs\python\python311\lib\site-packages\gallery_dl\extractor\cohost.py
    c:\users\ikena\appdata\local\programs\python\python311\lib\site-packages\gallery_dl\postprocessor\hash.py
    c:\users\ikena\appdata\local\programs\python\python311\lib\site-packages\gallery_dl\postprocessor\rename.py
Proceed (Y/n)? y
  Successfully uninstalled gallery_dl-1.27.7.dev0

D:\3>pip list | findstr gallery

D:\3>python -m pip install -U --no-deps --no-cache-dir https://github.com/mikf/gallery-dl/archive/master.tar.gz
Collecting https://github.com/mikf/gallery-dl/archive/master.tar.gz
  Downloading https://github.com/mikf/gallery-dl/archive/master.tar.gz (731 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 731.8/731.8 kB 6.0 MB/s eta 0:00:00
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... done
Building wheels for collected packages: gallery_dl
  Building wheel for gallery_dl (pyproject.toml) ... done
  Created wheel for gallery_dl: filename=gallery_dl-1.27.7.dev0-py3-none-any.whl size=568515 sha256=8c3fef44c47c3c0f5b123a50a76a60798804bdfc81509d242dc745c3b561e186
  Stored in directory: C:\Users\ikena\AppData\Local\Temp\pip-ephem-wheel-cache-b8nlpxh6\wheels\2a\d6\3c\0feca23fdac400e011e04ee81cac7ad83f629ec3e2cc8dc1ed
Successfully built gallery_dl
Installing collected packages: gallery_dl
Successfully installed gallery_dl-1.27.7.dev0

@501stRookie
Copy link

Is there a way to download specifically the revisions on an artist's page on kemono.su? For example, one artist has had many of their posts updated with a revision that removed the content, while the original revision retains them. There are hundreds of posts on their page like that, so I was wondering if there was a way to set it to download the original revisions for all of them automatically.

@topchaser
Copy link

topchaser commented Oct 27, 2024

I am getting the error pixiv: Unable to download work 59915441 ('sanity_level' warning) when I try to download this link (NSFW, but you cannot see it unless logged in): https://www.pixiv.net/en/artworks/59915441

I see many mentions of this error: https://github.com/mikf/gallery-dl/issues?q=sanity+level+warning

but I read through many of them trying to understand what to do, and I cannot figure it out. Will someone please tell me how to fix this.

Also, just to vent, I had no idea how long this had been happening, or if any of my attempts to download pixiv profiles prior had been subject to this. I can't retroactively check any logs, since I think I used to have logs, but it would cause redownloading profiles to skip media it already downloaded, which annoyed me. I didn't know if I could disable that specifically, so I just gave up on having logs. So, I potentially am missing media when I intended to get everything. I am a bit sad about it. Also, the "logs" I am describing might actually be something entirely different, and might not have told me of this error anyway. I don't know. I barely manage to get gallery-dl working for myself, so it working at all is essentially where my knowledge on the program ends.

I just noticed that the latest gallery-dl release made this "just werk":
https://github.com/mikf/gallery-dl/releases/tag/v1.27.7

Improvements
[pixiv] implement sanity_level workaround for user artworks results (#4327, #5435, #6339)

I still don't know whether it was possible to download such artwork using gallery-dl before (I thought it was, so I was just asking for someone to explain to me in simple terms how to do it), but, again, it "just werks" now, so, much appreciated.

@biggestsonicfan
Copy link

biggestsonicfan commented Oct 30, 2024

So Seiga is now region-locked. Can I proxy/wireguard just that extractor?

EDIT: I've managed to get Wireguard locally to proxy via a port using wireproxy, but I just need a post(pre)processor to launch it as a daemon and close it when it's done.

EDIT2: Figured it out:

            "actions": {
            "*": "exec wireproxy -c ~/.config/wireproxy/wp-config.conf -d"
            },
            "postprocessors": [
                "json_metadata",
                {
                    "name": "exec",
                    "command": "pkill wireproxy",
                    "event": "finalize"
                }
            ]

@biggestsonicfan
Copy link

biggestsonicfan commented Nov 1, 2024

I hate posting so frequently here but I hate making new issues more. This is once again an issue for me.

I've just supported a user that has a preview image and download urls in their post. I normally parse the json files with a python script, however this preview image had been downloaded previously and I don't overwrite json data anymore. So I will re-run the user with skip set to false, but I really need a solution to separate data if supported or not and by which support tier.

EDIT: I also don't get how the metadata archive works either. Will the metadata entry be the same as the one for the extractor?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests