This is the official implementation of the following paper:
DGA-GNN: Dynamic Grouping Aggregation GNN for Fraud Detection
Mingjiang Duan, Tongya Zheng, Yang Gao, Gang Wang, Zunlei Feng, Xinyu Wang
AAAI 2024 Main Track
Fraud detection has increasingly become a prominent research field due to the dramatically increased incidents of fraud. The complex connections involving thousands, or even millions of nodes, present challenges for fraud detection tasks. Many researchers have developed various graph-based methods to detect fraud from these intricate graphs. However, those methods neglect two distinct characteristics of the fraud graph: the non-additivity of certain attributes and the distinguishability of grouped messages from neighbor nodes. This paper introduces the Dynamic Grouping Aggregation Graph neural network (DGA-GNN) for fraud detection, which addresses these two characteristics by dynamically grouping attribute value ranges and neighbor nodes. In DGA-GNN, we initially propose the decision tree binning encoding to transform non-additive node attributes into bin vectors. This approach aligns well with the GNN’s aggregation operation and avoids nonsensical feature generation. Furthermore, we devise a feedback dynamic grouping strategy to classify graph nodes into two distinct groups and then employ a hierarchical aggregation. This method extracts more discriminative features for fraud detection tasks. Extensive experiments on five datasets suggest that our proposed method achieves a 3%~16% improvement over existing SOTA methods.
- Download the dataset from this link.
- Place the downloaded
fraud_graph_rawdata.7z
file in thedata
directory. - Decompress the dataset by executing:
7z x fraud_graph_rawdata.7z
- Change the directory to the code folder:
cd code
- Start the data preprocessing by running:
python data_handle.py
- run
python train.py --config-name elliptic_of_amnet
for Elliptic dataset - run
python train.py --config-name tfinancet
for T-Finance dataset - run
python train.py --config-name tsocialt
for T-Social dataset - run
python train.py --config-name yelpchit
for YelpChi dataset - run
python train.py --config-name amazont
for Amazon dataset
If you are familiar with wandb, you can set nowandb=False in the config.
- torch==1.13.1
- dgl==1.1.2
- toad==0.1.1
- pandas==1.3.5
- numpy==1.21.5
- scikit-learn==1.0.2
- pytorch-lightning==1.9.4
- wandb==0.13.10
- hydra-core==1.3.2
If you use this package and find it useful, please cite our paper using the following BibTeX. Thanks! :)
@inproceedings{duan2024dgagnn,
title={DGA-GNN: Dynamic Grouping Aggregation GNN for Fraud Detection},
author={Duan, Mingjiang and Zheng, Tongya and Gao, Yang and Feng, Zunlei and Wang, Xinyu},
booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
year={2024}
}