Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatic bulk scrape from list of values (eg MediaID's) #21

Open
yarnball opened this issue Sep 27, 2016 · 1 comment
Open

Automatic bulk scrape from list of values (eg MediaID's) #21

yarnball opened this issue Sep 27, 2016 · 1 comment

Comments

@yarnball
Copy link

yarnball commented Sep 27, 2016

Hi,

Great work on this. So I've got a .txt file with a list of mediaID's I'd like to scrape the comments from.

However, I can only do them one-at-a time in your script.

I don't know CoffeeScript. How is this possible in your repo?

I tried with a BashScript- however there are often 404 errors on the comment scrapes. It often works if I "re-attempt" the scrape. Is there a "proper" way to do this?

Here's a copy of my Bash file

while read filename;
do instagram-screen-scrape comments --post "$filename" > "$filename.json";
done < list.txt

@yarnball yarnball changed the title Automated batch scrape from list of data Automat bulk scrape from list of values (eg MediaID's) Sep 27, 2016
@yarnball yarnball changed the title Automat bulk scrape from list of values (eg MediaID's) Automatic bulk scrape from list of values (eg MediaID's) Sep 27, 2016
@notslang
Copy link
Owner

How many ids are you looking to scrape? If it's just a few hundred then I'd use the exit code of the instagram-screen-scrape comments command to retry, up to a maximum number of times.

If it's millions, then I'd put together a set of workers in JS & use RabbitMQ for task distribution / retries. The command line isn't that efficient for scraping (it requires starting up a new node process for each scrape and can't reuse the http connection between them). The CLI is just there because it's quick to setup for little tasks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants