In this project I create a scraper that builds a JSON file with the political leaders of each country I get from this API. Include in this file the first paragraph of the Wikipedia page of these leaders. Also, I used the asyncio module for asynchronous link processing, as well as BeautifulSoup for scraping data from HTML pages and regular expressions for data cleaning.
.
├── src/
│ ├── scraper.py
├── .gitignore
├── main.py
├── leaders_data.json
└── README.md
- Clone the repository to your local machine.
2 .To run the script, you can execute the main.py
file from your command line:
```
python main.py
```
- The script creates an instance of the WikipediaScraper class and builds a dictionary with the country code as a key and the first paragraph of the Wikipedia bio for each leader of a specific country as a value. The resulting dictionary is saved to a "leaders_data.json" file in your root directory.
scraper = WikipediaScraper()
scraper.get_final_dict()
scraper.to_json_file('leaders_data.json')
This project took two days for completion.
This project was done as part of the AI Boocamp at BeCode.org.
Connect with me on LinkedIn.