-
-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question: Reddit - Downloading Imgur/Gfycat/Redgif hosted posts to the specific subreddit or user directory #1364
Comments
And with regards to scraping user page posts, if it would be possible to have the posts on the user page, regardless of the subreddit, saved to the same queued user directory. |
You should try to use the For user specific extraction, use the subcategories of the Reddit extractor, in this case (Adapted from the example config file in "reddit":
{
"subreddit": {
"directory": ["{category}", "MySubreddits", "{subreddit}"],
"filename": "{id}{num:? //>02} {title[:220]}.{extension}"
},
"user": {
"directory": ["{category}", "MyUserfollows", "{subreddit}"],
"filename": "{id}{num:? //>02} {title[:220]}.{extension}"
},
"comments": 0,
"morecomments": false,
"date-min": 0,
"date-max": 253402210800,
"date-format": "%Y-%m-%dT%H:%M:%S",
"id-min": "0",
"id-max": "zik0zj",
"recursion": 0,
"videos": true,
"user-agent": "Python:gallery-dl:0.8.4 (by /u/mikf1)",
"category-transfer": true
}, Filename settings are left like in the example, and I'm not entirely sure if |
That probably won't work here. Try enabling |
How would I set up Just in case anyone is familiar, I am trying to set up the scraper so it performs similarly to the Rip-Me application (https://github.com/RipMeApp/ripme), where a queued subreddit or user gallery pulls all of the images or mp4's to a singular folder. |
Yep, just like
You might have to change the |
I believe it is working like intended now, I think one of my issues was having a directory for subreddit and user in the same config; right now I have a separate config for each that I'll switch out when needed.
My last issue or question is if there is any way to force the imgur/gfycat/redgif posts on a subreddit or user gallery to use the FileName convention of their host reddit post. Right now it is spitting the files out with the default "imgur_23aj435_title" filename, but I would prefer if it would retain the information from the reddit post, like with reddit-hosted images. |
Not sure if switching out configs is really necessary, but if works for you, why not.. The |
Is there a way to set up the config so the directory is conditional depending on whether its a user gallery or subreddit? Regarding the filename, at the time being I have the Imgur/Gfycat/Redgif using my Reddit filename convention, just for conformity's sake. The issue is, like you said, the metadata for those sites is separate and different from reddit and the filename will have pieces missing if trying to use keywords from reddit. I guess what I'm asking is if it is possible to have reddit posts whose media is hosted on Imgur/Gfycat/Redgif have the scraper pull metadata from the reddit post itself rather than from the host website. For example, this post, whose media is hosted on imgur, would have the metadata (and thus keywords for filenames) scraped from the reddit post rather than the actual imgur link where the image is (https://i.imgur.com/ZSTidlZ.jpg). |
@mikf , I'm guessing setting up the scraper to name the files how I described is currently not possible. Could you tag this thread with "feature-request"? |
experimental, might not work as expected, etc.
@sourmilk01 could you try the edit: there are open issues with a somewhat similar problem as this one, by the way: #637, #827 |
@mikf, thank you for the commit, I didn't see it. I've tested the Thanks again for this great tool and for updating it so often. You rock! I was going to ask if there was a way we could donate or 'tip' in any way, but I saw #347 . Otherwise, I'd offer to help in any other way but I have no background in coding :/ |
Oh, in case anyone else is reading this and trying to set up |
@mikf , I may have found a potential 'gap' in posts being scraped using The option does not appear to apply to reddit posts that whose original gfycat host is now redirected and hosted on redgifs using the gifdeliverynetwork domain. Example subreddit and post I was scraping (NSFW): I already have a reddit-metadata FileName set up in the config for gfycat and redgif, and I even tried setting up a section for "gifdeliverynetwork" using the same format, but files hosted this way are being downloaded with empty metadata fields ("None"). The posts are still being saved to the directory, so Again, very minor and it only appears to apply for very few posts, but I thought I'd let you know. |
Allow forwarding metadata from the top-level extractor to all children if 'parent-directory' is enabled for all extractors along the way. For example 'reddit' -> 'gfycat' -> 'redgifs'
@sourmilk01 should be fixed in 2364174 as long as you enable $ gallery-dl -o filename="{subreddit}_{author}_{title}_{id}_{num}_{filename}_{date}.{extension}" -o directory= -o parent-directory=1 -o parent-metadata=1 https://www.reddit.com/r/hopelesssofrantic/comments/drbgeg/the_yoga_abs_are_coming_in/
/tmp/hopelesssofrantic_hopelesssofrant…in_drbgeg_0_None_2019-11-04 02:59:45.mp4 Also, it seems that gfycat/redgifs doesn't provide a
I usually don't immediately push commits to GitHub. In this case it only got pushed a minute or so before I left that comment, so there was no way for you to see it before that.
I mentioned them only in case something posted in them would be helpful here, and because the |
I'm having trouble setting up my config to save Reddit posts as I'd like: having all posts within a subreddit or user page save to the proper directory for the subreddit or user folder. This works fine for reddit-hosted images but obviously the extractor pulls out imgur, gfycat, and redgif hosted images and mp4's to their own seperate directories outside of the reddit directory.
What I'm trying to figure out is to have all the files associated with the queued subreddit or user save to the respective folder inside the reddit directory, regardless of if its reddit/imgur/redgif/etc, and for the file names to use the FileName convention I have set up for the reddit Extractor. Can anyone help me with setting this up in the config?
The text was updated successfully, but these errors were encountered: