Community Detection¶
Find clusters and communities within graphs.
Louvain Algorithm¶
Fast modularity-based community detection.
import grafeo
db = grafeo.GrafeoDB()
algs = db.algorithms()
communities = algs.louvain()
for community_id, members in communities.items():
print(f"Community {community_id}: {len(members)} members")
Label Propagation¶
Semi-supervised community detection.
Connected Components¶
Find disconnected subgraphs.
algs = db.algorithms()
components = algs.connected_components()
print(f"Found {len(components)} components")
for i, comp in enumerate(components):
print(f"Component {i}: {len(comp)} nodes")
Strongly Connected Components¶
For directed graphs.
Weakly Connected Components¶
For directed graphs, ignoring edge direction.
Triangle Count¶
Count triangles for clustering analysis.
Stochastic Block Partition¶
Bayesian community detection using the degree-corrected stochastic block model. Minimizes the description length (MDL) of the graph under the generative model, making it more principled than modularity-based methods for graphs with heterogeneous degree distributions.
algs = db.algorithms()
# Auto-detect optimal number of blocks
result = algs.stochastic_block_partition()
print(f"Found {result['num_blocks']} blocks (DL: {result['description_length']:.2f})")
# Target a specific number of blocks
result = algs.stochastic_block_partition(num_blocks=4)
# Incremental update after adding edges (warm start from prior partition)
result2 = algs.stochastic_block_partition(prior_partition=result['partition'])
Part of the GraphChallenge streaming benchmark suite.
Partition Quality Metrics¶
Compare two community assignments (e.g., predicted vs ground truth):
from grafeo.algorithms import metrics
ri = metrics.rand_index(partition_a, partition_b)
ari = metrics.adjusted_rand_index(partition_a, partition_b)
nmi = metrics.normalized_mutual_information(partition_a, partition_b)
prec = metrics.pairwise_precision(predicted, truth)
rec = metrics.pairwise_recall(predicted, truth)
| Metric | Range | 1.0 means |
|---|---|---|
| Rand Index | [0, 1] | Identical partitions |
| Adjusted Rand Index | [-1, 1] | Identical (0 = random) |
| Normalized Mutual Information | [0, 1] | Identical partitions |
| Pairwise Precision | [0, 1] | Every predicted co-cluster is true |
| Pairwise Recall | [0, 1] | Every true co-cluster is predicted |
Use Cases¶
| Algorithm | Best For |
|---|---|
| Louvain | Large graphs, quality clusters |
| Label Propagation | Fast, scalable |
| Stochastic Block Partition | Heterogeneous graphs, Bayesian inference |
| Connected Components | Graph structure analysis |
| Triangle Count | Clustering coefficient |