Stackoverflow Crawler

Hey there, This is a Node.js based recursive question crawler, which harvests all questions on Stack Overflow with their encountered frequencies and stores them in the MySQL database and in the CSV file as well.

Demo

Youtube Link: https://www.youtube.com/watch?v=H6zzndSSEQM

Features

Implemented concurrency limit of API requests.
Flexibility to change the concurrency limit of the API requests.
Flexibility to choose the page limit for seed Urls of the stackoverflow homepage by the user.
Scraping total # of upvotes and total # of answers for every question.
Feature to delay API requests in order to prevent the IP address from getting blocked by simulating human behavior.
Total reference count for every encountered URL.
Implemented a trigger to dump the data in a CSV file when the user kills the script.
Implemented a trigger to save the data in the MySQL database when the user kills the script.
Kept the code modular and as understandable following best naming conventions.
Clean, Readable, Easy to follow code.
Used cheerio for HTML parsing.
Solution is asynchronous in nature.

TechStack

Javascript
Node.Js
MySQL

Note

Please comment line if you are not able to connect to your local MySQL database, the script will save the data in the CSV file only.

Stackoverflow-Crawler/index.js

Line 44 in bda4c53

    
           await saveData(data); // comment this line if you don't want to save data to database or having trouble connecting to database

WorkFlow

Installation

npm install

Execution

node index.js

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
.gitignore		.gitignore
Flow.png		Flow.png
README.md		README.md
config.js		config.js
data.csv		data.csv
db.js		db.js
index.js		index.js
package-lock.json		package-lock.json
package.json		package.json
scraper.js		scraper.js

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Stackoverflow Crawler

Demo

Features

TechStack

Note

WorkFlow

Installation

Execution

About

Releases

Packages

Languages

arjungarg07/Stackoverflow-Crawler

Folders and files

Latest commit

History

Repository files navigation

Stackoverflow Crawler

Demo

Features

TechStack

Note

WorkFlow

Installation

Execution

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages