-
Notifications
You must be signed in to change notification settings - Fork 203
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
implements basic support for #407 'gonewilder' functionality. #408
base: master
Are you sure you want to change the base?
Conversation
…ssing the -A flag to a url matching 'reddit.com/r' pattern, it will rip the submission authors content rather then the provided subreddit url. Content will be saved in the same format as calling a reddit.com/u url directly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this in general, but in addition to the -A
option, I'd prefer if we could expose this config in the rip.properties
config file as well as a checkbox in the UI, so that this is fully-discoverable and usable without requiring use of the CLI.
|
||
// the -A option is limited to just reddit.com/r urls for the time being | ||
// if the -A does not match a reddit.com/r regex, then ripme should fail out. | ||
if(cl.hasOption('A')) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good. I like using an option for this instead of a URL hack.
// the value is preserved for this session and will not | ||
// persist when getAndParseAndReturnNext spawns additional | ||
// AbstractRipper processes | ||
Utils.setConfigBoolean("download.rip_authors", false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couldn't this configBoolean also come from the rip.properties
file? Maybe we should have options that don't coordinate with the rip.properties
so we don't need to hack around it and clear it, even if the user wanted to specify it in the config to always be used.
Seems to work fairly well enough but I got a bunch of error logs in the beginning about not being able to rip some URL (the url had Would like to get that error situation figured out and add the UI discoverability stuff I mentioned before merging this in. |
Is there any heuristic in place for number of pages to go back before stopping, score threshold, etc? I think if you keep going you'll just get all posters from the past month, with the lowest-scoring posters last. |
It uses the same logic as if you were to rip https://reddit.com/user/foo directly. IIRC the default sorting is by New posts. This means it will process user submissions in ascending order by date. It would be trivial to add a property to override this such as |
Can you give me an example? I suspect this is in the ripping engine itself and not with this PR. Either way i can fix it. |
I'll get back to you on this. Don't have time to try it now. |
Should have specified, I meant how far back in the subreddit to go looking for new usernames? First of all, it seems like it didn't actually rip the usernames in order of first appearance in order of subreddit/top by monthly, and also, it didn't seem to know when to stop. I was trying to rip reddit.com/r/AsiansGoneWild and the list of folders I got (in increasing chronological order of last modified), before I killed it:
The first 50 posts under top monthly included only some of those names, and should have also included:
|
When passing the -A flag to a url matching 'reddit.com/r' pattern, it will rip the submission authors content rather then the provided subreddit url. Content will be saved in the same format as calling a reddit.com/u url directly.