spa networks

SPA stands for Structural Preferential Attachment


SPA in a nutshell

SPA is a discrete growth process for networks featuring scale-free and overlapping community structure.

It models two phenomena simultaneously: the growth of community structure and the growth of the internal structure of communities.

In the growth of community structure, a vertex joins a community by creating an edge with an existing member.

The vertex (community) is a new one with probability p (q), and otherwise chosen within the existing set of vertices (communities) preferentially.

In so doing, SPA generates a network whose building blocks are overlapping groups of scale-free distributed communities.

The internal structure of communities is determined by an intuitive densification process.

A link is added between two vertices members of a same community, at a rate which is controlled by a third parameter, r.

Higher values leads to denser (more connected) communities

.

Structural Properties

Our simple process gives rise to a broad spectrum of synthetic networks.

The structural properties below only start to illustrate the richness of the SPA process.

< 1> 100

Average degree

The average number of edges, and thus of neighbours, per vertex.

Further Reading
0> 0.8

Average clustering coefficient

Average probability that two neighbours of a given vertex are themselves connected.

Further Reading
< -0.2> 0.8

Degree assortativity coefficient

Correlation coefficient between the degree of a given vertex and its neighbours' degrees.

Further Reading
1> 50

Maximal coreness

Maximum average degree within the most interconnected subset of the network.

Further Reading
< 5> 1000

Largest community

The maximum number of vertices that are found within a community.

Further Reading
< 5> 400

Highest memberships

The maximum number of communities in which a vertex appears.

Further Reading
< 0.2> 0.8

Partition density

The average fraction of possible edges that actually appear within communities.

Further Reading
0> 0.005

Average overlap

Similarity of the set of vertices of two randomly selected communities.

Further Reading

Structural Classes

Generating a graph for every combination of parameters is time consuming

The running time explodes for extremal values of p, q, and r.

Since the structural properties vary continuously with the parameters, similar parameters generate similar networks.

Hence, a small subset of qualitatively different networks can give us a good idea of what's going on.

Running time


< 5 sec> 1000 sec

We identified nine representative sets of parameters by

partitioning the space of networks properties.

Here is a summary

Parameters Selected properties
p q r Density Clustering Degree assortativity Maximal coreness Largest community Highest membership Partition density Average overlap
0.100.051 +++ +++ --- +++ +++ +++ +++ +++
0.300.151 +++ ++ -- +++ +++ +++ +++ +++
0.850.851 + + ++ + + + + +
0.450.601 ++ ++ ++ ++ ++ ++ ++ ++
0.750.551 + + + + ++ + + ++
0.050.8510 +++ +++ +++ +++ + +++ +++ +
0.850.2010 ++ + + + ++ + + +
0.750.1010 ++ ++ - ++ +++ ++ ++ ++
0.700.8010 + +++ +++ ++ + ++ ++ +++

Overlapping Community Detection

The SPA process generates realistic synthetic networks with a known overlapping community structure.

Many algorithms aim to uncover such communities in empirical datasets, and telling them apart calls for good benchmarking tools.

SPA networks are a perfect fit for the job.

It's a straightforward process.
One simply needs to produce an SPA network, use a detection algorithm to recover its community structure, and compare the results with the known communities (using the Normalized Mutual Information, for instance).

Below, we applied state-of-the-art community detection algorithms to the nine representative subsets of networks.
And we found pretty interesting results.

Infomap

Infomap infers the community structure from the flow of information.

Further Reading

BigCLAM

BigCLAM fits a generative model to empirical datasets using a coordinate descent.

Further Reading

COPRA

COPRA uses the steady state of a label propagation dynamics to determine the assignment of nodes to communities.

Further Reading

OSLOM

OSLOM optimizes a fitness function that expresses the statistical significance of communities with respect to random fluctuations.

Further Reading

Of course, this list is far from complete, so

get in touch

and we'll add your algorithm here.

Or you can simply download the software below.


Software Download

A detailed benchmarking paper is in the work.

In the meantime, you can check out the stochastic process itself

To get a notification when the benchmarking paper is out, enter your email address here.

In the meantime, you may wish to check our other papers on SPA and its hierarchical variant..


Papers

Built by Jean-Gabriel Young and Louis Roy,

with the help of Edward Laurence and Laurent Hébert-Dufresne

.

Code licensed under MIT, documentation under CC BY 3.0.