This repo provide the experimental code for the KDD paper: Towards Automated Neural Interaction Discovery for Click-Through Rate Prediction. The paper introduces an automated interaction architecture discovering framework for CTR prediction named AutoCTR with the help of neural architecture search techniques.
The repo has been tested with the python3.6 environment. Currently, only GPU running is supported.
Our general setting of the search space and model training are written with thrift files. If you made any changes, please install fb thrift for python first, codes are migrated from: https://github.com/facebook/fbthrift/tree/master/thrift/lib/py
cd py
python3.6 setup.py install
Add "--user" if required during installation
Running the following codes can help you generate the corresponding python files for the thrift files. If you modify the thrift files in /if
folder, please rerun the following codes to regenerate the python codes:
thrift -r --gen py if/config.thrift
Install all the python packages required in the repo.
python3.6 -m pip install -r requirements.txt
You should download and unzip the dataset (criteo/avazu/kdd2012) first, and preprocess the data with script scripts/preprocess.py
. For example, if you want to preprocess criteo dataset, you can use the shell-scripts/preprocess_criteo.sh
file. Please make sure the raw data file path and save data file path, and dataset name is correct before doing the preprocessing. Also, checkout the scripts/preprocess.py
file to see more arguments.
sh shell-scripts/preprocess_criteo.sh
You can follow, modify, and run the shell-scripts/subsample_criteo.sh
file to subsampe criteo data. (Avazu, and KDD2012 dataset are similar.)
sh shell-scripts/subsample_criteo.sh
You may need to change some arguments such as "--total-gpus" to the number of total gpus on your machine. The description of all the arguments can be found in utils/search_utils.py
.
sh shell-scripts/random_search_criteo.sh
You may need to change some arguments such as "--total-gpus" to the number of total gpus on your machine. The description of all the arguments can be found in utils/search_utils.py
.
sh shell-scripts/evo_search_criteo.sh
You can also run our proposed evolutionary search with first 20 arch warm-started by existing random search results if existed to save running time and keep fair comparison. The warm start file could be changed to other folders containing some search results. Also, you may need to change some arguments such as "--total-gpus" to the number of total gpus on your machine. The description of all the arguments can be found in utils/search_utils.py
.
sh shell-scripts/evo_search_criteo_warm_start.sh
You can use the jupyter notebooks in the notebook
folder to display the search results (plot.ipynb
), and the check the name and structure of the best architecutre (graph.ipynb
). We provide two search results as example in the results
folder.
To do the final fit of the best model on the full dataset, you can use the notebook/graph.ipynb
to find out the name and json file of the best architecture. Then change the --model-file
in the random_final_fit_criteo.sh
. Please also change the --data-file
to the dataset you wanna use. Also, change the gpu configurations such as "--total-gpus" to the number of total gpus on your machine.
Note: For other arguments of the search.py
and final_fit.py
scripts, please checkout utils/search_utils.py
for more information.
Qingquan Song, Dehua Cheng, Hanning Zhou, Jiyan Yang, Yuandong Tian, and Xia Hu. "Towards Automated Neural Interaction Discovery for Click-Through Rate Prediction." Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM, 2020. (Download)
Biblatex entry:
@inproceedings{song2020towards,
title={Towards automated neural interaction discovery for click-through rate prediction},
author={Song, Qingquan and Cheng, Dehua and Zhou, Hanning and Yang, Jiyan and Tian, Yuandong and Hu, Xia},
booktitle={Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery \& Data Mining},
pages={945--955},
year={2020}
}
See the CONTRIBUTING file for how to help out.
AutoCTR is Creative Commons Attribution-NonCommercial 4.0 International licensed, as found in the LICENSE file.