Skip to content

Latest commit

 

History

History
59 lines (35 loc) · 1.7 KB

README.md

File metadata and controls

59 lines (35 loc) · 1.7 KB

Wikipedia Scraper

forthebadge made-with-python

🏢 Description

In this project I create a scraper that builds a JSON file with the political leaders of each country I get from this API. Include in this file the first paragraph of the Wikipedia page of these leaders. Also, I used the asyncio module for asynchronous link processing, as well as BeautifulSoup for scraping data from HTML pages and regular expressions for data cleaning.

wiki_logo

📦 Repo structure

.
├── src/
│   ├── scraper.py
├── .gitignore
├── main.py
├── leaders_data.json
└── README.md

🛎️ Usage

  1. Clone the repository to your local machine.

2 .To run the script, you can execute the main.py file from your command line:

```
python main.py
```
  1. The script creates an instance of the WikipediaScraper class and builds a dictionary with the country code as a key and the first paragraph of the Wikipedia bio for each leader of a specific country as a value. The resulting dictionary is saved to a "leaders_data.json" file in your root directory.
scraper = WikipediaScraper()
scraper.get_final_dict()
scraper.to_json_file('leaders_data.json')

⏱️ Timeline

This project took two days for completion.

📌 Personal Situation

This project was done as part of the AI Boocamp at BeCode.org.

Connect with me on LinkedIn.