org.knowceans.mcl
Class MarkovClustering

java.lang.Object
  extended by org.knowceans.mcl.MarkovClustering

public class MarkovClustering
extends java.lang.Object

MarkovClustering implements the Markov clustering (MCL) algorithm for graphs, using a HashMap-based sparse representation of a Markov matrix, i.e., an adjacency matrix m that is normalised to one. Elements in a column / node can be interpreted as decision probabilities of a random walker being at that node. Note: whereas we explain the algorithms with columns, the actual implementation works with rows because the sparse matrix used has row-major format.

The basic idea underlying the MCL algorithm is that dense regions in sparse graphs correspond with regions in which the number of k-length paths is relatively large, for small k in N, which corresponds to multiplying probabilities in the matrix appropriately. Random walks of length k have higher probability (product) for paths with beginning and ending in the same dense region than for other paths.

The algorithm starts by creating a Markov matrix from the graph, for which first the adjacency matrix is added diagonal elements to include self-loops for all nodes, i.e., probabilities that the random walker stays at a particular node. After this initialisation, the algorithm works by alternating two operations, expansion and inflation, iteratively recomputing the set of transition probabilities. The expansion step corresponds to matrix multiplication (on stochastic matrices), the inflation step corresponds with a parametrized inflation operator Gamma_r, which acts column-wise on (column) stochastic matrices (here, we use row-wise operation, which is analogous).

The inflation operator transforms a stochastic matrix into another one by raising each element to a positive power p and re-normalising columns to keep the matrix stochastic. The effect is that larger probabilities in each column are emphasised and smaller ones deemphasised. On the other side, the matrix multiplication in the expansion step creates new non-zero elements, i.e., edges. The algorithm converges very fast, and the result is an idempotent Markov matrix, M = M * M, which represents a hard clustering of the graph into components.

Expansion and inflation have two opposing effects: While expansion flattens the stochastic distributions in the columns and thus causes paths of a random walker to become more evenly spread, inflation contracts them to favoured paths.

Description is based on the introduction of Stijn van Dongen's thesis Graph Clustering by Flow Simulation (2000); for a mathematical treatment of the algorithm and the associated MCL process, see there.

Author:
gregor :: arbylon . net

Constructor Summary
MarkovClustering()
           
 
Method Summary
private  void addLoops(SparseMatrix a, double loopGain)
          add loops with specific energy, which corresponds to adding loopGain to the diagonal elements.
 SparseMatrix expand(SparseMatrix m)
          expand stochastic quadratic matrix by sqaring it with itself: result = m * m.
 double inflate(SparseMatrix m, double p, double zeromax)
          inflate stochastic matrix by Hadamard (elementwise) exponentiation, pruning and normalisation : result = Gamma ( m, p ) = normalise ( prune ( m .^ p ) ).
static void main(java.lang.String[] args)
          test the MCL algorithm with the matrix loaded from the file in the argument.
private static void print(SparseMatrix a, java.lang.String label)
           
 SparseMatrix run(SparseMatrix a, double maxResidual, double pGamma, double loopGain, double maxZero)
          run the MCL process.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

MarkovClustering

public MarkovClustering()
Method Detail

main

public static void main(java.lang.String[] args)
test the MCL algorithm with the matrix loaded from the file in the argument. If no argument is given, the default file for the matrix is T(G3 + I) from van Dongen (2000), page 50.

Parameters:
args -

run

public SparseMatrix run(SparseMatrix a,
                        double maxResidual,
                        double pGamma,
                        double loopGain,
                        double maxZero)
run the MCL process.

Parameters:
a - matrix
maxResidual - maximum difference between row elements and row square sum (measure of idempotence)
pGamma - inflation exponent for Gamma operator
loopGain - values for cycles
maxZero - maximum value considered zero for pruning operations
Returns:
the resulting matrix

print

private static void print(SparseMatrix a,
                          java.lang.String label)

inflate

public double inflate(SparseMatrix m,
                      double p,
                      double zeromax)
inflate stochastic matrix by Hadamard (elementwise) exponentiation, pruning and normalisation :

result = Gamma ( m, p ) = normalise ( prune ( m .^ p ) ).

By convention, normalisation is done along rows (SparseMatrix has row-major representation)

Parameters:
m - matrix (mutable)
p - exponent as a double
zeromax - below which elements are pruned from the sparse matrix
Returns:
residuum value, m is modified.

expand

public SparseMatrix expand(SparseMatrix m)
expand stochastic quadratic matrix by sqaring it with itself: result = m * m. Here normalisation is rowwise.

Parameters:
matrix -
Returns:
new matrix (pointer != argument)

addLoops

private void addLoops(SparseMatrix a,
                      double loopGain)
add loops with specific energy, which corresponds to adding loopGain to the diagonal elements.

Parameters:
a -
loopGain -