Wikipedia Scraper

🏢 Description

In this project I create a scraper that builds a JSON file with the political leaders of each country I get from this API. Include in this file the first paragraph of the Wikipedia page of these leaders. Also, I used the asyncio module for asynchronous link processing, as well as BeautifulSoup for scraping data from HTML pages and regular expressions for data cleaning.

📦 Repo structure

.
├── src/
│   ├── scraper.py
├── .gitignore
├── main.py
├── leaders_data.json
└── README.md

🛎️ Usage

Clone the repository to your local machine.

2 .To run the script, you can execute the main.py file from your command line:

```
python main.py
```

The script creates an instance of the WikipediaScraper class and builds a dictionary with the country code as a key and the first paragraph of the Wikipedia bio for each leader of a specific country as a value. The resulting dictionary is saved to a "leaders_data.json" file in your root directory.

scraper = WikipediaScraper()
scraper.get_final_dict()
scraper.to_json_file('leaders_data.json')

⏱️ Timeline

This project took two days for completion.

📌 Personal Situation

This project was done as part of the AI Boocamp at BeCode.org.

Connect with me on LinkedIn.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Wikipedia Scraper

🏢 Description

📦 Repo structure

🛎️ Usage

⏱️ Timeline

📌 Personal Situation

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
src		src
.gitignore		.gitignore
README.md		README.md
leaders_data.json		leaders_data.json
main.py		main.py
requirements.txt		requirements.txt
wikipedia_scraper.ipynb		wikipedia_scraper.ipynb

Ihor1654/wikipedia_scraper

Folders and files

Latest commit

History

Repository files navigation

Wikipedia Scraper

🏢 Description

📦 Repo structure

🛎️ Usage

⏱️ Timeline

📌 Personal Situation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages