Skip to content

Telegram Bot to scrap webpages using Requests, html5lib and Beautifulsoup

License

Notifications You must be signed in to change notification settings

uppy19d0/WebScrapper

 
 

Repository files navigation

Created a new Pypi package for scaping image,text,audio,video and metadata from web check It Here

WebScrapperRoBot

Simple Web scrapper Bot to scrap webpages using Requests, html5lib and Beautifulsoup.

Setting Up a Project and Configuring Environment Variables

To set up the project and configure environment variables, follow these steps:

1. Clone the Repository

Clone the project's repository from your preferred version control platform (e.g., Git) to your local machine.

2. Virtual Environment (Optional)

It's a good practice to create a virtual environment for the project. You can use virtualenv or venv for this purpose.

python -m venv venv
source venv/bin/activate  # On Unix/Linux systems

3. Install Dependencies
Use pip to install the project is dependencies from the requirements.txt file.
bash
Copy code
pip install -r requirements.txt

4. Create the .env File
Create a .env file in the project is root directory. This file will contain the necessary environment variables. You can either copy a sample file or create it manually.

5. Configure Environment Variables
Open the .env file and set the required environment variables in the format VARIABLE_NAME=value. For example:

env
Copy code
BOT_TOKEN=your_bot_token_here
API_ID=your_api_id_here
API_HASH=your_api_hash_here

6. Run the Project
Execute the project using the appropriate command (e.g., python my_project.py) and access your environment variables in the code to retrieve configurations.

7. Consider Secret Management (Optional)

If you deploy your project on a cloud server, consider using a secrets manager like AWS Secrets Manager, Google Secret Manager, or a similar service. This will help you securely store your configurations in a production environment.

<Br><b>Mark your Star ⭐⭐<b>

## What is Web Scraping ?
  Web scraping, web harvesting, or web data extraction is data scraping used for extracting data from websites. The web scraping software may directly access the World Wide Web using the Hypertext Transfer Protocol or a web browser.
## Is web scraping Legal?
  Web scraping itself is not illegal. As a matter of fact, web scraping – or web crawling, were historically associated with well-known search engines like Google or Bing. These search engines crawl sites and index the web. ... A great example when web scraping can be illegal is when you try to scrape nonpublic data.
## Why web scraping is Done?
  Web scraping is used in a variety of digital businesses that rely on data harvesting. Legitimate use cases include: Search engine bots crawling a site, analyzing its content and then ranking it. ... Market research companies using scrapers to pull data from forums and social media (e.g., for sentiment analysis).
## Where can I use web scraping?
  Lead Generation for Marketing. A web scraping software can be used to generate leads for marketing,Price Comparison & Competition Monitoring,E-Commerce,Real Estate,Data Analysis,Academic Research,Training and Testing Data for Machine Learning Projects,,Sports Betting Odds Analysis.
## Are there any Limitations?
   Learning curve, Even the easiest scraping tool takes time to master,The structure of websites change frequently,Scraped data is arranged according to the structure of the website,It is not easy to handle complex websites,To extract data on a large scale is way harder,A web scraping tool is not omnipotent

[Take a Demo Here](https://t.me/WebScrapperRoBot)


# Credits
[Pyrogram](docs.pyrogram.org)<br><br>
## Contributors

![GitHub Contributors Image](https://contrib.rocks/image?repo=bughunter0/WebScrapperRoBot)

About

Telegram Bot to scrap webpages using Requests, html5lib and Beautifulsoup

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 99.8%
  • Procfile 0.2%