TRL-Scraper

This Python script uses Selenium to scrape chapters from a website and save them as text files.

Requirements

Python 3
Selenium
Chrome WebDriver (Make sure it's in your PATH or specify its location when initializing the driver)

Install Python: If you don't have Python 3 installed, download and install it from https://www.python.org/downloads/
Install Selenium: pip install selenium
Download ChromeDriver:
- Download the ChromeDriver that matches your Chrome browser version from https://chromedriver.chromium.org/downloads
- Extract the chromedriver executable and place it in your system's PATH or provide its path when creating the Service object in the code.

Replace 'your_website_url':
- In the scrape_paragraphs() function call at the bottom of the script, replace 'your_website_url' with the actual URL of the first chapter you want to scrape from https://novelfull.net.
Run the script: python your_script_name.py (Replace your_script_name.py with the actual name of your Python file.)

Initialization:
- Imports necessary libraries.
- Defines the save_chapter() function to save the extracted content to a text file.
- Defines the scrape_paragraphs() function to handle the scraping process.
Scraping:
- Sets up a Chrome WebDriver using Selenium.
- Navigates to the provided URL.
- Enters a loop to iterate through chapters.
- Locates and extracts the chapter title and paragraphs.
- Calls save_chapter() to save the content.
- Finds the "Next Chapter" button and clicks it.
- Handles cases where the "Next Chapter" button is disabled or not found, breaking the loop.
Error Handling:
- Includes a try...except block to catch potential errors during the process and print them to the console.
Cleanup:
- Closes the WebDriver after the scraping is complete.

This script is tailored to a specific website structure. You might need to adjust the selectors (e.g., By.CLASS_NAME, By.ID, By.TAG_NAME) if you want to use it for a different website.
The script assumes the presence of a "Next Chapter" button with the ID 'next_chap'.
Be mindful of websites' terms of service and robots.txt files to avoid causing issues or overloading servers.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md
scraper.py		scraper.py