Skip to content

DRealHammer/HClustering

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

45 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

HClustering

This project is a code basis for creating high quality graph clusterings with the help of machine learning. The target is to use the local structure of an edge inside of a graph to determine a score of how likely the edge is to be contained inside of a cluster. Graphs are assumed to be in the metis graph format

Installation

Downloading

The project makes use of different libraries. To include them automatically while cloning

git clone https://github.com/DRealHammer/HClustering.git --recurse-submodules

Compilation

first add a build directory

mkdir build

then configure cmake and build the project

cmake -S . -B build
cmake --build build

The compiled executables are now in the build directory (ml_create_data, ml_train, ml_cluster).

Usage

The features for the usage need to be defined beforehand in a file. An example is the file features.

Preprocessing

The data of a graph (features and labels if available) needs to be extracted first

./build/ml_create_data data/3-cluster.metis --features features --communities data/3-cluster.cluster

With this we first specify the graph and the features we want to use. The --communities option is only used for graphs with a known clustering. From this the labels for the training are generated. If no clustering is known or should be used this option can be omitted.

Training

A model needs to be fitted before clustering.

./build/ml_train data/3-cluster.metis-data --model booster.json --iterations 10

Here we use the before created data file of the edges. We also specify the filename of the model (always as .json). The iterations is the number of trees created to fit the model (for more see XGBoost).

Clustering

The model can now score an edge given it's features. First create the folder for the clusterings.

mkdir clustering

Then create a data file containing the edge features for the target graph (no option --communities needed) as before.

Now use the fitted model:

./build/ml_cluster data/3-cluster.metis --data data/3-cluster.metis-data --model booster.json

This command will create clusters. For every individual score apprearing for the edges a clustering will be created in the folder clustering. A clustering is named threshold-clustering-X, with X a float. The clustering has clusters with all edges contracted with a score higher or equal X.

Licence

The program is licenced under MIT licence. If you publish results using our algorithms, please acknowledge our work by linking this page.

Acknowledgements

  • Code basis and graph datastructure KaHIP
  • Machine learning library XGBoost
  • Grahplet extraction PGD Nesreen K. Ahmed, Jennifer Neville, Ryan A. Rossi, Nick Duffield, Efficient Graphlet Counting for Large Networks, IEEE International Conference on Data Mining (ICDM), pages 10, 2015.
  • Conversation for different graph formats https://github.com/guowentian/SubgraphMatchGPU

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published