Palette diagram

A visualization tool for graph clustering

Description

Palette diagrams allow you to assess the group assignment distributions obtained as results of graph clustering algorithms.

Prerequisites

numpy 1.14.5
matplotlib 2.2.2
scikit-learn 0.19.1
seaborn 0.8.1
easydict 1.9
python-igraph 0.7.1

Usage

This code consists of two steps:

Inference of group assignments
Construction of the palette diagrams

To draw a palette diagram, you first perform graph clustering to obtain the group assignment distributions. The MCMC algorithm [1] (which we implemented in python) can be replaced by another clustering algorithm.

1. Inference of group assignments (MCMC)

Input
An edgelist file in the GML format (0-based indexing). Its location must be specified by edgelist_filename.
The detail of parameters for MCMC
- K - Initial number of groups, which also indicates maximum number of groups.
- relaxation_time - Until MCMC sweep count reaches this number, MCMC sampling is not performed.
- sample_interval - Interval MCMC sampling is performed.
- n_samples - Number of times MCMC sampling is performed.
- gml_filename - Relative path to the GML file and the filename.
Output
A text file in which the group assingment distributions are written. The first raw represents the number of vertices in the given network. The rest of the raws represent n_samples group assingment distributions, so that the number of raws of the outputs file is M = n_samples * N + 1, where N is the number of vertices. The output file is saved as ./newman2016/outputs/grp_assign.txt.

The mcmc algorithm here is for a demonstration purpose. We recommend you to replace it with the original C++ implementation [1] when you analyze large datasets.

2. Palette diagrams

This step consists of two substeps.

2.1 kmeans & t-SNE

The objective of this step is to determine the number of groups and the color assignments of those groups. To this end, a clustering algorithm (kmeans) and a manifold learning algorithm (t-SNE) are performed. This step is to be repeated manually until an appropriate parameters is found.

Input
A text file written in the same format as the output file of MCMC. Its location must be specified by grp_assing_filename.
The detail of parameters
- perplexity - Parameter for t-SNE. This is related to the variance of probability distribution of the data. Higher variance and larger data usually require a larger perplexity. The detail of perplexity is explained at here or the original paper of t-SNE [2].
- n_clusters - Number of clusters identified by the kmeans.
- divergenve_option - Option to determine the similarity measure to calculate the similarity matrix.
  - 1 - Total variation distance
  - 2 - L2 norm
  - 3 - KL divergence
  - 4 - JS divergence
  - 5 - Pearson Chi-square distance
  - 6 - Neyman Chi square
  - 7 - Rukhin distance
  - 8 - Triangular discrimination
  - 9 - Squared Hellinger distance
  - 10 - Matsusita distance
  - 11 - Piuri and Vinche divergence
  - 12 - Arimoto distance
- is_savefig - Save figure or not.
- save_path - Relative path where the output figure is to be saved.
- fig_filename - Filename of the output figure.
- grp_assing_filename - Relative path to the input file and the filename (This parameter is needed only when this second step is performed independently).
Output
A scatter plot in two-dimensional space, shown in the jupyter notebook, and cluster labels given by the kmeans, stored in the python local variable labels.

2.2 Palette diagram

This step draws the one-dimensional and two-dimensional palette diagrams.

Input
The cluster labels given by the kmeans, which is output of previous cell.
The detail of parameters
- n_neighbors - Number of neighbors to calculate an approximate geodesic distance for the Isomap.
- figsize - Size of the output figure.
- fig_filename - Filename of the figure to be save.
Output
The palette diagram is saved in save_path as a PDF.

Reference

[1] M. E. J. Newman and G. Reinert, Phys. Rev. Lett. 117, 078301 (2016). The original code (implemented in C++) is available at here.
[2] L. J. P. van der Maaten and G.E. Hinton, Journal of Machine Learning Research 9:2579-2605 (2008).

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
datasets		datasets
figures		figures
newman2016		newman2016
palette_diagram		palette_diagram
.gitignore		.gitignore
README.md		README.md
palette_diagram.ipynb		palette_diagram.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Palette diagram

Description

Prerequisites

Usage

1. Inference of group assignments (MCMC)

2. Palette diagrams

2.1 kmeans & t-SNE

2.2 Palette diagram

Reference

About

Releases

Packages

Languages

chnoguchi/palette_diagram_mcmc

Folders and files

Latest commit

History

Repository files navigation

Palette diagram

Description

Prerequisites

Usage

1. Inference of group assignments (MCMC)

2. Palette diagrams

2.1 kmeans & t-SNE

2.2 Palette diagram

Reference

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages