My work for the KPMG challenge for bank customer segmentation based on its annual banking industry survey. Dimension of dataset 40,000rows x 150 columns
After data cleanup, I created and selected some specific features of interest. Then I ran K-means to generate 7 clusters and used principal component analysis to run some visual checks.
My methodology is explained in the "Updated_Submission.pdf" file. This file also explains the results obtained from the cluster analysis (the customer personas), and profers ways that the analysis could be improved.
The raw data set and the encoding can be found in the "Data Science Bootcamp Data_2.0.xlsx"