Course project of DS-GA 1001: Introduction of data science
In our project we have built up a model by supervised learning method to discover relation between neighborhood features and high-potential area in New York real estate market. We used Linear SVM as baseline model, then improved the performance by testing precision scores of highest investment potential class of Decision Tree and SVM with different parameters. The result shows that the East Queens and North Bronx have high investment potential in records. More detail please refer to report
The data are from 2 different resouces
1. StreetEasy
Our primary resource of house price and inventory is StreetEasy, and NYU Furman Center provided neighborhood-level information on housing markets, home affordability, land use, demographics, and neighborhood conditionsand. However, the former has 130 neighbors, but the later only has 55. The challenge was to align the difference of number of neighbors
we define 3 different investing potential from low to high by sorted investing index percentile 3:4:3. A supervised multi-classes classification algorithm can be applied.
The Decision Tree gave the most promosing result. The overall accuracy is not high, but considering to our business goal, we only care the precision of Real Estate with high investment potencial (class 3 in this case). From this confusion matrix we identified that 23 samples are correctly identify, while only 3 Class 2 samples are mis-placed as class 3.
The map is based on Zip Codes areas in NYC and the color shows different level of investment potential. The scale arranges from 1 to 3. Deeper colors represent higher rental-to-sales ratios which consider as higher investment potential in this area. All neighborhoods have been remediated to correct Zip Codes areas.
-
Investment potential has increased but edged down recently. The existing data sample presented on the graph shows deeper filled areas in NYC from 2010 to 2014. But in 2016 the number of deep filled areas dropped which means relative rental price had an increasing trend
-
Outside-of-center has higher potential. Most deep filled areas appear in boroughs except Manhattan which has proved that suburb areas in NYC has higher rental/sales ratio than the center.
-
Queens and Bronx have most deep filled areas. On East Queens and North Bronx the ratios have remained high for several years on record.