The Boston housing market is highly competitive, and I want to be the best real estate agent in the area. To compete with my peers, I decide to leverage a few basic machine learning concepts to assist myself and my client with finding the best selling price for their home. Luckily, I've come across the Boston Housing dataset which contains aggregated data on various features for houses in Greater Boston communities, including the median value of homes for each of those areas.My task is to build an optimal model based on a statistical analysis with the tools available. This model will then be used to estimate the best selling price for my client's homes.
In this project, I will apply basic machine learning concepts on data collected for housing prices in the Boston, Massachusetts area to predict the selling price of a new home. I will first explore the data to obtain important features and descriptive statistics about the dataset. Next, I will properly split the data into testing and training subsets, and determine a suitable performance metric for this problem. I will then analyze performance graphs for a learning algorithm with varying parameters and training set sizes. This will enable me to pick the optimal model that best generalizes for unseen data. Finally, I will test this optimal model on a new sample and compare the predicted selling price to your statistics.
This project is designed to get us acquainted to working with datasets in Python and applying basic machine learning techniques using NumPy and Scikit-Learn. Before being expected to use many of the available algorithms in the sklearn library, it will be helpful to first practice analyzing and interpreting the performance of our model.
This project requires Python and the following Python libraries installed:
You will also need to have software installed to run and execute a Jupyter Notebook
If you do not have Python installed yet, it is highly recommended that you install the Anaconda distribution of Python, which already has the above packages and more included.