implements basic support for #407 'gonewilder' functionality. #408

ghost · 2017-01-01T02:06:24Z

When passing the -A flag to a url matching 'reddit.com/r' pattern, it will rip the submission authors content rather then the provided subreddit url. Content will be saved in the same format as calling a reddit.com/u url directly.

…ssing the -A flag to a url matching 'reddit.com/r' pattern, it will rip the submission authors content rather then the provided subreddit url. Content will be saved in the same format as calling a reddit.com/u url directly

metaprime

I like this in general, but in addition to the -A option, I'd prefer if we could expose this config in the rip.properties config file as well as a checkbox in the UI, so that this is fully-discoverable and usable without requiring use of the CLI.

metaprime · 2017-01-01T20:12:31Z

src/main/java/com/rarchives/ripme/App.java

+
+            // the -A option is limited to just reddit.com/r urls for the time being
+            // if the -A does not match a reddit.com/r regex, then ripme should fail out.
+            if(cl.hasOption('A')) {


Good. I like using an option for this instead of a URL hack.

metaprime · 2017-01-01T20:18:54Z

src/main/java/com/rarchives/ripme/ripper/rippers/RedditRipper.java

+        // the value is preserved for this session and will not
+        // persist when getAndParseAndReturnNext spawns additional
+        // AbstractRipper processes
+        Utils.setConfigBoolean("download.rip_authors", false);


Couldn't this configBoolean also come from the rip.properties file? Maybe we should have options that don't coordinate with the rip.properties so we don't need to hack around it and clear it, even if the user wanted to specify it in the config to always be used.

metaprime · 2017-01-01T23:42:54Z

Seems to work fairly well enough but I got a bunch of error logs in the beginning about not being able to rip some URL (the url had &amp; in it so maybe that was part of the problem). I was about to cancel it but when I came back to the window it had started downloading the users so I guess things look good.

Would like to get that error situation figured out and add the UI discoverability stuff I mentioned before merging this in.

metaprime · 2017-01-02T00:20:16Z

Is there any heuristic in place for number of pages to go back before stopping, score threshold, etc? I think if you keep going you'll just get all posters from the past month, with the lowest-scoring posters last.

ghost · 2017-01-02T02:46:49Z

Is there any heuristic in place for number of pages to go back before stopping, score threshold, etc? I think if you keep going you'll just get all posters from the past month, with the lowest-scoring posters last.

It uses the same logic as if you were to rip https://reddit.com/user/foo directly. IIRC the default sorting is by New posts. This means it will process user submissions in ascending order by date. It would be trivial to add a property to override this such as download.rip_author_sort with accepted values of (new|hot|top|controversial).

ghost · 2017-01-02T02:47:51Z

Seems to work fairly well enough but I got a bunch of error logs in the beginning about not being able to rip some URL (the url had & in it so maybe that was part of the problem). I was about to cancel it but when I came back to the window it had started downloading the users so I guess things look good.

Can you give me an example? I suspect this is in the ripping engine itself and not with this PR. Either way i can fix it.

metaprime · 2017-01-02T02:56:44Z

Can you give me an example?

I'll get back to you on this. Don't have time to try it now.

metaprime · 2017-01-02T03:02:50Z

It uses the same logic as if you were to rip https://reddit.com/user/foo directly.

Should have specified, I meant how far back in the subreddit to go looking for new usernames?

First of all, it seems like it didn't actually rip the usernames in order of first appearance in order of subreddit/top by monthly, and also, it didn't seem to know when to stop.

I was trying to rip reddit.com/r/AsiansGoneWild and the list of folders I got (in increasing chronological order of last modified), before I killed it:

reddit_sub_asiansgonewild
reddit_user_Charmerer
reddit_user_virtualgeisha
reddit_user_Dollywinks
reddit_user_agirlnamedfred
reddit_user_juiciebootie
reddit_user_Ammieow
reddit_user_Zann89
reddit_user_iimaginati0n
reddit_user_trandinhh
reddit_user_mikayla_xxx
reddit_user_xxxpensivetastes
reddit_user_1rrationality
reddit_user_itsmydistraction
reddit_user_Hadaka-sachiko
reddit_user_20and4hours
reddit_user_milehighcowboy
reddit_user_teacuptoy
reddit_user_thaigamergirl
reddit_user_japanese_miya225
reddit_user_secretdownunder
reddit_user_dffg13
reddit_user_seijoubrat

The first 50 posts under top monthly included only some of those names, and should have also included:

koreankarma
berrynoms
ttean
milky_teaa
cutepillow
teamavocado
asiankittilover
lilcreamycat
fdaugirl
AsianExpress87
zann89
bbypocahontas
MaidTiffany
fun-sized-asian
anonimoose_
wastelandwench

metaprime requested changes Jan 1, 2017

View reviewed changes

metaprime added this to the on-deck milestone Jan 1, 2017

metaprime modified the milestones: On-deck for 1.4.x, On-deck for 1.5.x Jun 25, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

implements basic support for #407 'gonewilder' functionality. #408

implements basic support for #407 'gonewilder' functionality. #408

ghost commented Jan 1, 2017

metaprime left a comment

metaprime Jan 1, 2017

metaprime Jan 1, 2017

metaprime commented Jan 1, 2017

metaprime commented Jan 2, 2017

ghost commented Jan 2, 2017

ghost commented Jan 2, 2017

metaprime commented Jan 2, 2017

metaprime commented Jan 2, 2017 •

edited

Loading

implements basic support for #407 'gonewilder' functionality. #408

Are you sure you want to change the base?

implements basic support for #407 'gonewilder' functionality. #408

Conversation

ghost commented Jan 1, 2017

metaprime left a comment

Choose a reason for hiding this comment

metaprime Jan 1, 2017

Choose a reason for hiding this comment

metaprime Jan 1, 2017

Choose a reason for hiding this comment

metaprime commented Jan 1, 2017

metaprime commented Jan 2, 2017

ghost commented Jan 2, 2017

ghost commented Jan 2, 2017

metaprime commented Jan 2, 2017

metaprime commented Jan 2, 2017 • edited Loading

metaprime commented Jan 2, 2017 •

edited

Loading