Skip to content

Latest commit

 

History

History
73 lines (49 loc) · 2.1 KB

README.md

File metadata and controls

73 lines (49 loc) · 2.1 KB

immo-eliza-scraping

forthebadge made-with-python

🏢 Description

In this project, I create a scraper that builds a CSV file containing data for 10,000 houses from every region of Belgium. I scrape data from Immoweb. In my project, I use an OOP approach, as well as the asyncio module for asynchronous link processing, and BeautifulSoup for scraping data from HTML pages.

immo_logo

📦 Repo structure

.
├──data/
    ├──links/
        ├──houselinks_for_postcode.json
    ├──raw_data_houses.json
    ├── houselinks_for_postcode.json
├──scraper
    ├──scraper.py
├── src/
│   ├── link_creator.py 
    ├──pipeline.py
├── .gitignore
├── main.py
├── maintest.csv
├── postal-codes.json
└── README.md

🛎️ Usage

  1. Clone the repository to your local machine.

2 .To run the script, you can execute the main.py file from your command line:

```
python main.py
```
  1. The script creates an instance of the Pipeline class and builds the csv file with data for 10000 houses in Belgium. The resulting file is saved to chousen filepath in your root directory. One run of script takes aproximately 15 minutes.
from src.pipeline import Pipeline
import time
start_time = time.localtime()
start = time.strftime("%H:%M:%S", start_time)
print(start)
pipeline = Pipeline()
pipeline.run(input('Enter name of csv file that you want to save'))
finish_time = time.localtime()
finish = time.strftime("%H:%M:%S", finish_time)
print(finish)

⏱️ Timeline

This project took tree days for completion.

Sources

I used Postal codes - Belgium dataset from "Opendatasoft" web site.