Hey there, This is a Node.js based recursive question crawler, which harvests all questions on Stack Overflow with their encountered frequencies and stores them in the MySQL database and in the CSV file as well.
Youtube Link: https://www.youtube.com/watch?v=H6zzndSSEQM
- Implemented concurrency limit of API requests.
- Flexibility to change the concurrency limit of the API requests.
- Flexibility to choose the page limit for seed Urls of the stackoverflow homepage by the user.
- Scraping total # of upvotes and total # of answers for every question.
- Feature to delay API requests in order to prevent the IP address from getting blocked by simulating human behavior.
- Total reference count for every encountered URL.
- Implemented a trigger to dump the data in a CSV file when the user kills the script.
- Implemented a trigger to save the data in the MySQL database when the user kills the script.
- Kept the code modular and as understandable following best naming conventions.
- Clean, Readable, Easy to follow code.
- Used cheerio for HTML parsing.
- Solution is asynchronous in nature.
- Javascript
- Node.Js
- MySQL
Please comment line if you are not able to connect to your local MySQL database, the script will save the data in the CSV file only.
Stackoverflow-Crawler/index.js
Line 44 in bda4c53
npm install
node index.js