The ImmoEliza project aims to develop a machine learning model that predicts real estate prices in Belgium using data from Immoweb. The dataset will include over 10,000 real estate listings with at least 18 features per listing, including location, price, property type, and condition. The model is intended to provide accurate property price predictions based on available data.
- Scrape real estate data from Immoweb.
- Clean and preprocess the collected data.
- Build a machine learning model to predict real estate prices.
- Deploy the model as a user-friendly tool for real estate pricing insights.
- Data Collection: October 1st - October 10th, 2024
- Model Development: October 11th - October 15th, 2024
- Deployment and Testing: October 16th - October 20th, 2024
The project is divided into multiple phases, each of which focuses on a specific part of the AI pipeline:
- Web Scraping: Collect real estate listings and detailed features using Selenium and BeautifulSoup.
- Data Processing: Clean, transform, and preprocess the data using Python libraries like
pandas
,numpy
, andregex
. - Modeling: Build a price prediction model using machine learning algorithms such as linear regression, random forests, or gradient boosting.
- Deployment: Deploy the model as a web-based tool using
Flask
orStreamlit
.
- Scraping Immoweb Data: Using Selenium to automate the extraction of real estate listing information from Immoweb.
- Processing Collected Data: Cleaning data, handling missing values, and creating new features (e.g., price per square meter).
- Building ML Model: Creating a machine learning pipeline to train, validate, and test the real estate price prediction model.
- Web Scraping: Automate data collection from Immoweb using
Selenium
andBeautifulSoup
. - Data Cleaning: Preprocess raw scraped data to ensure accuracy and consistency.
- Feature Engineering: Create new features based on existing data to improve model accuracy.
- Modeling: Implement machine learning models for price prediction.
- Deployment: Build an accessible user interface to query the model for predictions.
- Python: The core language for data collection, cleaning, and modeling.
- Selenium & BeautifulSoup: Tools for web scraping and automating data collection.
- pandas & numpy: Libraries for data manipulation and preprocessing.
- Scikit-learn: For machine learning model implementation.
- Flask/Streamlit: For building a web application to interact with the predictive model.
Phase | Description | Deadline |
---|---|---|
Data Collection | Web scraping real estate data | October 10, 2024 |
Data Cleaning | Clean and preprocess the data | October 11, 2024 |
Model Development | Build machine learning models | October 15, 2024 |
Model Deployment | Deploy the model as a web tool | October 20, 2024 |
In case of any questions or help, feel free to reach out:
- Lead Developer: [Korostelova Anastasiia]
- Email: [email protected]