Drupal Web Scraper with Puppeteer

This is a powerful web scraping tool designed specifically for Drupal websites, utilizing Puppeteer—a Node.js library for controlling headless browsers—to scrape all webpages and search for a specific word or pattern. The results are then saved to a text file for easy reference and analysis. For a detailed, step-by-step guide, please refer to the following article: Web Scraping Drupal Websites with Node.js and Puppeteer . If you found this repository useful, please consider giving it a star ⭐ on GitHub. This helps to promote the project and show your support.

Features

Comprehensive Scraping: Scans all pages of a Drupal website to extract links and content.
Word Search: Searches for a specified word or pattern within the HTML content of each webpage.
Text File Output: Saves search results, including webpage URLs and word occurrences, to a text file.

Requirements

Node.js (Install from nodejs.org)
Puppeteer (Automatically installed via npm)
dotenv (Automatically installed via npm)

Installation

Clone the repository:

git clone https://github.com/Ej1seven/Drupal_Web_Scraper.git

Navigate to the project directory
```
cd drupal-web-scraper
```
Install dependencies:
```
npm install dotenv puppeteer
```

Usage

Customize the search parameters:

Open index.js in a text editor.
Update the URL in the retrieveLinks function.
Update the second argument in wordSearch function with the word or pattern you want to search for.

Run the scraper:
```
node index.js
```
View results:

The results will be saved to webpage_scan_results.txt in the project directory.

Contributing

Contributions are welcome! If you encounter issues, have ideas for improvements, or want to add new features, please open an issue or submit a pull request.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
node_modules		node_modules
utils		utils
.gitignore		.gitignore
README.md		README.md
index.js		index.js
package-lock.json		package-lock.json
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Drupal Web Scraper with Puppeteer

Features

Requirements

Installation

Usage

Contributing

About

Releases

Packages

Languages

Ej1seven/Drupal_Web_Scraper

Folders and files

Latest commit

History

Repository files navigation

Drupal Web Scraper with Puppeteer

Features

Requirements

Installation

Usage

Contributing

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages