Code repository of paper: 13th International Symposium on Applied Computational Intelligence and Informatics (SACI), IEEE pp. 143-147, 2019.
- Balabit - Mouse Dynamics Challenge (10 users)
- Chaoshen - Chao Shen's data set (28 users)
- DFL - Our DFL data set (21 users)
Raw data were segmented into mouse actions then 39 features were extracted from each mouse action. For details see
Performances are reported using ROC Area Under Curve (AUC).
A binary classifier (Random forest, 100 trees) was trained for each users using positive and negative data. In the case of positive data, the chronologically first 2/3 of the data was used for training and the remaining 1/3 of the data was used for evaluation. Negative data were selected from the other users (#positive samples = #negative samples).
- First Scenario: user identity predictions using a single mouse action
- Second Scenario: user identity predictions using a sequence of consecutive actions (sliding window, overlap between consecutive windows was 90%).
Software: Python 3, scikit-learn 0.19.1
Example: Evaluate the Balabit data set using the first 500 actions/user and 10 actions for user identity predictions.
- Please set the followings in util/settings.py
- CURRENT_DATASET = DATASET.BALABIT
- DATASET_USAGE = DATASET_AMOUNT.FIRST1000
- NUM_TRAINING_SAMPLES = 500
- NUM_ACTIONS = 10
- Run evaluation
- python main.py