-
-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[patreon] Patreon post embedded images (type == 'content') always have the same filename #1954
Comments
I used https://www.patreon.com/posts/19987002 as example when implementing embedded images and they do not have filenames other than Fetching the original filename is already done for |
Just checked and Would it be possible to implement some kind of file-unique identifier as an alternative to Most urls seem to have some kind of unique identifier for files that could populate this (or be fed into a hash function of some kind), take the above mentioned embedded images for example:
Or for a more clear example attachment urls:
These could probably only be guaranteed to to be unique to a specific extractor, but that should be good enough. Having a unique file identifier would be nice for ensuring all unique assets are downloaded, even in cases where posts/galleries are updated in a manner that would throw off a simple index like |
There is |
Ah okay, I haven't seen/noticed it show up with the extractors I'm using, so I'm guessing it hasn't been implemented by many. I'll have to tinker with it some more. Closing issue. |
Oh, you meant an unique identifier for all sites? |
Ah, i see for Patreon it's Though this is all dependent on whether or not it's possible to get a unique id for every kind of file on Patreon. |
Also, is it possible to reference |
Not possible. You'll have to copy (and possibly adjust) the format string replacement fields yourself. |
Awesome, I'll have to tinker with that.
It would probably be a good idea to add some kind of version to the database, so you can make changes like this while properly handling older versions. Or maybe just include the format string used to create the id for each row in the database, assuming it wouldn't be too computationally intensive to check against that. |
Currently, I'm using this string for
filename
:Whenever
gallery-dl
encounters a post with multiple inline images, it seems to report the filename as "1" for every single image, resulting in only the first image being downloaded with the above string.The images do appear to have filenames, or at least manually downloading them with a web browser results in unique names.
If this is intentional behavior (if extracting the filename isn't possible), it should probably report
num
as the filename.To work around this, I've revised my config:
The text was updated successfully, but these errors were encountered: