Skip to content

Ihor1654/wikipedia_scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Wikipedia Scraper

forthebadge made-with-python

🏢 Description

In this project I create a scraper that builds a JSON file with the political leaders of each country I get from this API. Include in this file the first paragraph of the Wikipedia page of these leaders. Also, I used the asyncio module for asynchronous link processing, as well as BeautifulSoup for scraping data from HTML pages and regular expressions for data cleaning.

wiki_logo

📦 Repo structure

.
├── src/
│   ├── scraper.py
├── .gitignore
├── main.py
├── leaders_data.json
└── README.md

🛎️ Usage

  1. Clone the repository to your local machine.

2 .To run the script, you can execute the main.py file from your command line:

```
python main.py
```
  1. The script creates an instance of the WikipediaScraper class and builds a dictionary with the country code as a key and the first paragraph of the Wikipedia bio for each leader of a specific country as a value. The resulting dictionary is saved to a "leaders_data.json" file in your root directory.
scraper = WikipediaScraper()
scraper.get_final_dict()
scraper.to_json_file('leaders_data.json')

⏱️ Timeline

This project took two days for completion.

📌 Personal Situation

This project was done as part of the AI Boocamp at BeCode.org.

Connect with me on LinkedIn.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published