Real time census data has the potential to generate timely insights for urban policy makers, allowing them to capture important urban issues such as population displacement and neighborhood change. This study, building on top of the 2015 paper “Studying user income through language, behavior and affect in social media” by Preotiuc-Pietro et al. will show how twitter data can be used to predict user income level while using random forest selected top 20 features. In our study, we trained a Gaussian Process, a Support Vector Machine and a Random Forest model for prediction, achieving 0.42 for highest 10 class income level prediction and 0.88 for highest 3 class income level prediction. In conclusion, this paper shows how using relatively few features we can predict twitter user income level, and it provides a road map for policy makers to use twitter data to generate real time insights. [Keywords: twitter, natural language processing, income prediction]
For more info, read our paper here