Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implements basic support for #407 'gonewilder' functionality. #408

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

implements basic support for #407 'gonewilder' functionality. #408

wants to merge 1 commit into from

Conversation

ghost
Copy link

@ghost ghost commented Jan 1, 2017

When passing the -A flag to a url matching 'reddit.com/r' pattern, it will rip the submission authors content rather then the provided subreddit url. Content will be saved in the same format as calling a reddit.com/u url directly.

…ssing the -A flag to a url matching 'reddit.com/r' pattern, it will rip the submission authors content rather then the provided subreddit url. Content will be saved in the same format as calling a reddit.com/u url directly
Copy link
Collaborator

@metaprime metaprime left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this in general, but in addition to the -A option, I'd prefer if we could expose this config in the rip.properties config file as well as a checkbox in the UI, so that this is fully-discoverable and usable without requiring use of the CLI.


// the -A option is limited to just reddit.com/r urls for the time being
// if the -A does not match a reddit.com/r regex, then ripme should fail out.
if(cl.hasOption('A')) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good. I like using an option for this instead of a URL hack.

// the value is preserved for this session and will not
// persist when getAndParseAndReturnNext spawns additional
// AbstractRipper processes
Utils.setConfigBoolean("download.rip_authors", false);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't this configBoolean also come from the rip.properties file? Maybe we should have options that don't coordinate with the rip.properties so we don't need to hack around it and clear it, even if the user wanted to specify it in the config to always be used.

@metaprime metaprime added this to the on-deck milestone Jan 1, 2017
@metaprime
Copy link
Collaborator

Seems to work fairly well enough but I got a bunch of error logs in the beginning about not being able to rip some URL (the url had & in it so maybe that was part of the problem). I was about to cancel it but when I came back to the window it had started downloading the users so I guess things look good.

Would like to get that error situation figured out and add the UI discoverability stuff I mentioned before merging this in.

@metaprime
Copy link
Collaborator

Is there any heuristic in place for number of pages to go back before stopping, score threshold, etc? I think if you keep going you'll just get all posters from the past month, with the lowest-scoring posters last.

@ghost
Copy link
Author

ghost commented Jan 2, 2017

Is there any heuristic in place for number of pages to go back before stopping, score threshold, etc? I think if you keep going you'll just get all posters from the past month, with the lowest-scoring posters last.

It uses the same logic as if you were to rip https://reddit.com/user/foo directly. IIRC the default sorting is by New posts. This means it will process user submissions in ascending order by date. It would be trivial to add a property to override this such as download.rip_author_sort with accepted values of (new|hot|top|controversial).

@ghost
Copy link
Author

ghost commented Jan 2, 2017

Seems to work fairly well enough but I got a bunch of error logs in the beginning about not being able to rip some URL (the url had & in it so maybe that was part of the problem). I was about to cancel it but when I came back to the window it had started downloading the users so I guess things look good.

Can you give me an example? I suspect this is in the ripping engine itself and not with this PR. Either way i can fix it.

@metaprime
Copy link
Collaborator

Can you give me an example?

I'll get back to you on this. Don't have time to try it now.

@metaprime
Copy link
Collaborator

metaprime commented Jan 2, 2017

It uses the same logic as if you were to rip https://reddit.com/user/foo directly.

Should have specified, I meant how far back in the subreddit to go looking for new usernames?

First of all, it seems like it didn't actually rip the usernames in order of first appearance in order of subreddit/top by monthly, and also, it didn't seem to know when to stop.

I was trying to rip reddit.com/r/AsiansGoneWild and the list of folders I got (in increasing chronological order of last modified), before I killed it:

reddit_sub_asiansgonewild
reddit_user_Charmerer
reddit_user_virtualgeisha
reddit_user_Dollywinks
reddit_user_agirlnamedfred
reddit_user_juiciebootie
reddit_user_Ammieow
reddit_user_Zann89
reddit_user_iimaginati0n
reddit_user_trandinhh
reddit_user_mikayla_xxx
reddit_user_xxxpensivetastes
reddit_user_1rrationality
reddit_user_itsmydistraction
reddit_user_Hadaka-sachiko
reddit_user_20and4hours
reddit_user_milehighcowboy
reddit_user_teacuptoy
reddit_user_thaigamergirl
reddit_user_japanese_miya225
reddit_user_secretdownunder
reddit_user_dffg13
reddit_user_seijoubrat

The first 50 posts under top monthly included only some of those names, and should have also included:

koreankarma
berrynoms
ttean
milky_teaa
cutepillow
teamavocado
asiankittilover
lilcreamycat
fdaugirl
AsianExpress87
zann89
bbypocahontas
MaidTiffany
fun-sized-asian
anonimoose_
wastelandwench

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant