This repository contains code for scraping, cleaning, and summarizing text data from websites. It provides a comprehensive process for extracting valuable information from online sources and condensing it into summarized text.
To use this code, follow these steps:
- Clone this repository:
git clone https://github.com/chaymabh/TextMining-ScrapCleanSummarize.git cd TextMining-ScrapCleanSummarize
- Install the required dependencies:
pip install -r requirements.txt
-
Web Scraping: Use
Data_collecter_and_cleaner.py
to perform web scraping. This code leverages Beautiful Soup for automated data extraction. -
Text Summarization: Utilize
Text_summarizer.py
to generate concise summaries of text. It uses the NLTK library for this purpose. -
Customization: This code is designed for web scraping and text summarization but can be customized to suit your specific needs. Feel free to modify it for different websites and data sources.
This project is licensed under the MIT License. See the LICENSE file for details.