This code was written by Paul Duan ([email protected]) and Benjamin Solecki ([email protected]). It provides our winning solution to the Amazon Employee Access Challenge. Our code is currently not merged. You'll find Benjamin's code in the BSMan/ folder, which needs to be run separately.
[python] classifier.py [-h] [-d] [-i ITER] [-f OUTPUTFILE] [-g] [-m] [-n] [-s] [-v] [-w]
Parameters for the script.
optional arguments:
-h, --help show this help message and exit
-d, --diagnostics Compute diagnostics.
-i ITER, --iter ITER Number of iterations for averaging.
-f OUTPUTFILE, --outputfile OUTPUTFILE
Name of the file where predictions are saved.
-g, --grid-search Use grid search to find best parameters.
-m, --model-selection
Use model selection.
-n, --no-cache Use cache.
-s, --stack Use stacking.
-v, --verbose Show computation steps.
-w, --fwls Use metafeatures.
To directly generate predictions on the test set without computing CV metrics, simply run:
python classifier.py -i0 -f[output_filename]
This script will launch Paul's model, which incorporates some of Benjamin's features. Benjamin's model is in the BSMan folder and can be run this way:
(in BSMan/)
[python] logistic.py log 75
[python] ensemble.py
The output of our models is then combined by simple standardization then weighted averaging, using 2/3 Paul's model and 1/3 Benjamin's.
This code requires Python, numpy/scipy, scikit-learn, and pandas for
some of the external code (this dependency will be removed in the
future).
It has been tested under Mac OS X with Python v.7.x,
scikit-learn 0.13, numpy 0.17, and pandas 0.11.
This content is released under the MIT Licence.