immo-eliza-scraping

🏢 Description

In this project, I create a scraper that builds a CSV file containing data for 10,000 houses from every region of Belgium. I scrape data from Immoweb. In my project, I use an OOP approach, as well as the asyncio module for asynchronous link processing, and BeautifulSoup for scraping data from HTML pages.

📦 Repo structure

.
├──data/
    ├──links/
        ├──houselinks_for_postcode.json
    ├──raw_data_houses.json
    ├── houselinks_for_postcode.json
├──scraper
    ├──scraper.py
├── src/
│   ├── link_creator.py 
    ├──pipeline.py
├── .gitignore
├── main.py
├── maintest.csv
├── postal-codes.json
└── README.md

🛎️ Usage

Clone the repository to your local machine.

2 .To run the script, you can execute the main.py file from your command line:

```
python main.py
```

The script creates an instance of the Pipeline class and builds the csv file with data for 10000 houses in Belgium. The resulting file is saved to chousen filepath in your root directory. One run of script takes aproximately 15 minutes.

from src.pipeline import Pipeline
import time
start_time = time.localtime()
start = time.strftime("%H:%M:%S", start_time)
print(start)
pipeline = Pipeline()
pipeline.run(input('Enter name of csv file that you want to save'))
finish_time = time.localtime()
finish = time.strftime("%H:%M:%S", finish_time)
print(finish)

⏱️ Timeline

This project took tree days for completion.

Sources

I used Postal codes - Belgium dataset from "Opendatasoft" web site.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

immo-eliza-scraping

🏢 Description

📦 Repo structure

🛎️ Usage

⏱️ Timeline

Sources

Files

README.md

Latest commit

History

README.md

File metadata and controls

immo-eliza-scraping

🏢 Description

📦 Repo structure

🛎️ Usage

⏱️ Timeline

Sources