##Allstate Purchase Prediction Challenge
Python 2.7.5 with Scikit-Learn 0.14a1, Numpy 1.8, Pandas 0.12
Windows 8, Intel i5-3230M @ 2.60Ghz, 16GB RAM
Developed on a HP Envy 17 j100tx laptop
Type "python majorityvote_modelselection.py" in Python shell or easily double click on Windows. Watch out on memory usage, even though "should" be configured not to exceed 8 GB with the default settings.
Using the default setting, this will fit the model and creates the submission which will score 0.53705 in the private L. This is the setting which combined with Breakfast Pirate ABCEDF combination, scored 0.53715 in the private LB and .54535 in the public LB. On the above system configuration this will take approximately 3 hours. If you’re impatience, set N=10 and NS=7 and will score 0.53710 in just 30 minutes! If you think is still slow try setting N=8, NS=6, params=[(30,5,23)] and is going to be even faster scoring as my best submission 0.53705 but lower on the public LB. If still slow, get a better computer!!!
The script will perform the the following steps:
- Prepare the data (load the files, transformation, clean and create the engineered features)
- Fit the Random Forests
- Make the prediction of the product G
- Selected the best Random Forest given the train set accuracy
- Do a majority vote using all the N model(s) and print the score on the cross validation set
- Do a majority vote using the NS selected model(s) and print the score on the cross validation set
Then, if submit is set to False:
a. Records the performance of the k-fold and loop
b. Exit the loop and make the prediction on the test set, do
a majority vote using the selected models, fix the product
accordingly with the state rule and create the submission file
Please refer for LICENSE.txt file