Wikipedia Scraper

🏢 Description

In this project I create a scraper that builds a JSON file with the political leaders of each country I get from this API. Include in this file the first paragraph of the Wikipedia page of these leaders. Also, I used the asyncio module for asynchronous link processing, as well as BeautifulSoup for scraping data from HTML pages and regular expressions for data cleaning.

📦 Repo structure

.
├── src/
│   ├── scraper.py
├── .gitignore
├── main.py
├── leaders_data.json
└── README.md

🛎️ Usage

Clone the repository to your local machine.

2 .To run the script, you can execute the main.py file from your command line:

```
python main.py
```

The script creates an instance of the WikipediaScraper class and builds a dictionary with the country code as a key and the first paragraph of the Wikipedia bio for each leader of a specific country as a value. The resulting dictionary is saved to a "leaders_data.json" file in your root directory.

scraper = WikipediaScraper()
scraper.get_final_dict()
scraper.to_json_file('leaders_data.json')

⏱️ Timeline

This project took two days for completion.

📌 Personal Situation

This project was done as part of the AI Boocamp at BeCode.org.

Connect with me on LinkedIn.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Wikipedia Scraper

🏢 Description

📦 Repo structure

🛎️ Usage

⏱️ Timeline

📌 Personal Situation

Files

README.md

Latest commit

History

README.md

File metadata and controls

Wikipedia Scraper

🏢 Description

📦 Repo structure

🛎️ Usage

⏱️ Timeline

📌 Personal Situation