Tabular data is still an unconquered area within the field of deep learning where professionals’ research like Shwartz-Ziv’s shows that advanced decision trees dominate the prediction tasks (Shwartz Ziv & Armon, 2022). Deep learning models like TABNET try to improve upon this prediction standard in this context. According to other studies, broadly varying results can be found claiming Xgboost or other deep learning algorithms to be the best performing models. Also, in the case of tabular employee retention prediction, Human Resource (HR) professionals usually have a limited amount of data. Here the application of deep learning models shows room for performance task improvement. Therefore, the current study performs the Random Forest, Lightgbm, Xgboost and TABNET models on three publicly available imbalanced employee retention datasets limited in size to investigate their weighted f1-score (RQ1) (Davin, Wijaya, 2020; Möbius, 2021; Pavan, Subhash, 2017). Alongside the application of these models, Explainable Artificial Intelligence (XAI) methodologies are also applied within this study, like Permutated Feature Importance (PFI – RQ2) metrics and Partial Dependence Plots (PDP – RQ3). Using these XAI tools, the global and local interpretability of the models can be enhanced further. The results suggest that there is not any clear dominating model.
- Google Colab files:
- EDA
- Main analysis workflow
- Excel table:
- RQ2 tables
The text of the Thesis will be published openly at the website of the Tilburg University's Library later at this link: "Coming soon..."
Feel free to run the Google Colab file and reference the work if you would like to use it later as follows:
@online{,
author = {Balázs Gönczy},
title = {{TABULAR EMPLOYEE RETENTION PREDICTION}},
year = {2022},
url = {https://github.com/balazsgonczy/tiu_dss_msc_thesis},
}
Find the links of the code sources used in the Colab file, in above each code cell, where it is applicable.
In case you might be interested in future cooperation, I am open for economics related data science projects!