Author: Joshua D. Loyal
This package provides an interface for learning and inference in latent space models for dynamic networks. Inference is performed using blocked Metropolis-Hastings within Gibbs sampling.
The primary method implemented in this package is the hierarchical Dirichlet process latent position cluster model (HDP-LPCM) described in "A Bayesian nonparametric latent space approach to modeling evolving communities in dynamic networks".
DynetLSM requires:
- Python (>= 3.7)
and the requirements highlighted in requirements.txt.
You need a working installation of numpy and scipy to install DynetLSM. If you have a working installation of numpy and scipy, the easiest way to install DynetLSM is using pip
:
pip install -U dynetlsm
If you prefer, you can clone the repository and run the setup.py file. Use the following commands to get the copy from GitHub and install all the dependencies:
git clone https://github.com/joshloyal/dynetlsm.git
cd dynetlsm
pip install .
Or install using pip and GitHub:
pip install -U git+https://github.com/joshloyal/dynetlsm.git
Latent space models (LSMs) are a powerful approach to modeling network data. One is often interested in inferring properties of nodes in a network based on their connectivity patterns. Originally proposed by Hoff et al. (2002)[1], LSMs learn a latent embedding for each node that captures the similarity between them. This package focuses on embeddings within a Euclidean space so that the log-odds of forming an edge between two nodes is inversely proportional to the distance between their latent positions. In other words, nodes that are close together in the latent space are more likely to form a connection in the observed network. The generative model is as follows:
- For each node, sample a node's latent position from a Gaussian distribution:
- For each edge, sample a connection from a Bernoulli distribution:
For dynamic (time-varying) networks, one is also interested in determining how properties of the nodes change over time. LSMs can also accomplish this task. Sarkar and Moore (2005)[2] and Sewell and Chen (2015)[3] proposed propagating the latent positions through time with a Gaussian random-walk Markovian process. Based on these latent positions, the edges in the network form in the same way as the static case. The generative process is as follows:
- For
t = 1
, sample a node's initial latent position from a Gaussian distribution:
- For
t = 2, ..., T
, a node's latent position follows a Gaussian random walk:
- For each edge, sample a connection from a Bernoulli distribution:
Determining the high-level community structure of a network is another important task in network analysis. Community structure was incorporated into LSMs by Handcock et al. (2007)[4] with their latent position cluster model (LPCM). Intuitively, the LPCM posits that communities are the result of clustering within the latent space. This clustering is incorporated in the LSM framework by assuming the latent positions are drawn from a Gaussian mixture model, i.e,
The LPCM relates the latent positions to the probability of forming an edge in the same way as the original LSM. In practice, one interprets nodes that share the same mixture component as belonging to the same community.
Inferring a network's community structure is especially difficult for dynamic networks because the number of communities may change over time. If one assumes that the number of communities is fixed, then the model of Sewell and Chen (2017)[5] is able to infer a dynamic network's community structure by propagating each node's mixture assignment through time with a autoregressive hidden Markov model (AR-HMM). However, the assumption of a static number of communities is at odds with many real-world dynamic networks. It is often the case that the number of communities evolves over time.
To solve the problem of inferring evolving community structures in dynamic networks, Loyal and Chen (2020)[6] proposed using a sticky hierarchical Dirichlet process hidden Markov model (HDP-HMM) with time-inhomogeneous transition probabilities in conjunction with the LPCM . For this reason, the model is called the hierarchical Dirichlet process latent position cluster model (HDP-LPCM). Under the HDP-LPCM, a node's latent community label propagate through time according to iid HDP-HMMs. Unlike previous models, this allows the HDP-LPCM to create and delete communities over-time as well as infer the number of the communities from the data. The generative model is as follows:
- Draw the time-varying transition probabilities from a sticky-HDP:
- For
t = 1, ..., T
, propagate a node's latent community label through time according to an HMM:
- For
t = 1
, sample a node's initial latent position from its assigned Gaussian mixture component:
- For
t = 2, ..., T
, sample a node's latent position as a mixture between its previous position and its assigned Gaussian mixture component:
- For each edge, sample a connection from a Bernoulli distribution :
DynetLSM exposes two classes for working with latent space models for dynamic networks:
DynamicNetworkLSM
: Interface for learning the LSM in Sewell and Chen (2015)[3],DynamicNetworkHDPLPCM
: Interface for learning the HDP-LPCM in Loyal and Chen (2020)[6].
To understand the merits of both approaches, we provide an example using a synthetic dynamic network which contains two communities at t = 1
and four communities at t = 2
. We can generate the data as follows:
from dynetlsm.datasets import simple_splitting_dynamic_network
# Y : ndarray, shape (2, 50, 50)
# The adjacency matrices at each time point
# labels : ndarray, shape (2, 50)
# The true community labels of the nodes at each time point.
Y, labels = simple_splitting_dynamic_network(n_nodes=50, n_time_steps=2)
To fit a dynamic LSM with a 2-dimensional latent space, we initialize the sampler and call fit
:
from dynetlsm import DynamicNetworkLSM
lsm = DynamicNetworkLSM(n_iter=5000, burn=2500, tune=2500,
n_features=2, random_state=42)
lsm.fit(Y)
To assess the convergence of the algorithm, we visualize the traces:
from dynetlsm.plots import plot_traces
plot_traces(lsm)
We can then visualize the latent space embeddings:
import matplotlib.pyplot as plt
from dynetlsm.plots import plot_latent_space
axes = plt.subplots(ncols=2, nrows=1, figsize=(10, 6))
for t, ax in enumerate(axes.flat):
plot_latent_space(lsm, t=t, connectionstyle=None, number_nodes=False,
linewidth=0.1, node_size=200, border=0.2, ax=ax)
Although the LSM's embedding places nodes that share many connections close together, the true community structure of the network is not apparent. This is easily remedied by applying the HDP-LPCM. As before, we initialize the model and call fit
:
from dynetlsm import DynamicNetworkHDPLPCM
lpcm = DynamicNetworkHDPLPCM(n_iter=5000, burn=2500, tune=2500,
n_features=2, n_components=10, random_state=42)
lpcm.fit(Y)
Once again, we assess the convergence of the algorithm by visualizing the traces:
from dynetlsm.plots import plot_traces
plot_traces(lpcm)
We can then visualize the latent space embeddings as well as the components of the inferred Gaussian mixture:
import matplotlib.pyplot as plt
from dynetlsm.plots import plot_latent_space
fig, axes = plt.subplots(ncols=2, nrows=1, figsize=(10, 6))
for t, ax in enumerate(axes.flat):
plot_latent_space(lpcm, t=t, connectionstyle=None,
number_nodes=False, border=1.2, linewidth=0.2,
center_size=100, node_size=100, ax=ax)
from dynetlsm.plots import alluvial_plot
alluvial_plot(lpcm.z_)
From this diagram, one can see that group 1 primarily splits off into group 3, while group 2 primarily splits off into group 4.
This package includes the simulation studies and real-data applications found in Loyal and Chen (2021):
- A synthetic dynamic network with a time-homogeneous community structure (HDP-LPCM + LPCM): (here).
- A synthetic dynamic network with a time-homogeneous community structure (SBM): (here).
- A synthetic dynamic network with a time-homogeneous community structure (DynSBM): (here).
- A synthetic dynamic network with a time-inhomogeneous community structure (HDP-LPCM): (here).
- A synthetic dynamic network with a time-inhomogeneous community structure (SBM): (here).
- Quantify the HDP-LPCM's sensitivity to group seperation and stability: (here).
- A synthetic dynamic network with a slowly merging group structure: (here)
- Sampson's monastery network: (here).
- A dynamic network constructed from international military alliances during the first three decades of the Cold War (1950 - 1979): (here).
- A dynamic network constructed from character interactions in the first four seasons of the Game of Thrones television series: (here).
We also provide a few jupyter notebooks that demonstrate the use of this package.
[1]: Hoff, P. D., Raftery, A. E., and Handcock, M. S. (2002). Latent space approaches to social network analysis. Journal of the American Statistical Association, 97(460):1090-1098.
[2]: Sarkar, P. and Moore, A. W. (2006). Dynamic social network analysis using latent space models. pages 1145-1152.
[3]: Sewell, D. K. and Chen, Y. (2015). Latent space models for dynamic networks. Journal of the American Statistical Association, 110(512):1646-1657.
[4]: Handcock, M. S., Raftery, A. E., and Tantrum, J. M. (2007). Model-based clustering of social networks. Journal of the Royal Statistical Society A, 170(2):301-354.
[5]: Sewell, D. K. and Chen, Y. (2017). Latent space approaches to community detection in dynamic networks. Bayesian Analysis, 12(2):351-377.