org.knowceans.util
Class RandomSamplers

java.lang.Object
  extended by org.knowceans.util.RandomSamplers

public class RandomSamplers
extends java.lang.Object

Instance-based samplers with diverse sampling methods, including beta, gamma, multinomial, and Dirichlet distributions as well as Dirichlet processes, using Sethurahman's stick-breaking construction and Chinese restaurant process. The random generator used is provided in the constructor.

Author:
heinrich (partly adapted from Yee Whye Teh's npbayes Matlab / C code)

Nested Class Summary
 class RandomSamplers.CrpData
          data structure for a Chinese restaurant process CrpData
 
Field Summary
 double lastRand
           
 
Constructor Summary
RandomSamplers()
          init random sampler using Mersenne twister with default seed.
RandomSamplers(java.util.Random rand)
          init random sampler with random number generator provided.
 
Method Summary
 int binarySearch(double[] a, double p)
          perform a binary search and return the first index i at which a[i] >= p.
 double enumClass(double alpha, int numdata)
          enumclass(alpha,numdata) The expected number of tables in a CRP with concentration parameter alpha and numdata items.
 java.util.Random getRand()
           
static void main(java.lang.String[] args)
           
 int randBernoulli(double p)
          draw a Bernoulli sample.
 double[] randBeta(double[] aa, double[] bb)
          randbeta(aa, bb) Generates beta samples, one for each element in aa/bb, and scale 1.
 double randBeta(double aa, double bb)
          beta as two-dimensional Dirichlet
 int randBinom(double N, double p)
          draw a binomial sample (by counting Bernoulli samples).
 double randConParam(double alpha, int numgroup, int[] numdata, int[] numtable, double alphaa, double alphab, int numiter)
          randconparam(alpha,numdata,numclass,aa,bb,numiter) Generates a sample from a concentration parameter of a HDP with gamma(aa,bb) prior, and number of classes and data items given in numdata, numclass (has to be row vectors).
 double randConParam(double alpha, int numdata, int numtopic, double alphaa, double alphab, int numiter)
          Sample the Dirichlet process concetration parameter given the topic and data counts and gamma hyperparameters alphaa and alphab.
 RandomSamplers.CrpData randCrp(double[] alpha, int numdata)
          [cc numclass] = randcrp(alpha,numdata) Generates a partition of numdata items with concentration parameter alpha, which can be an array, in which case the Chinese restaurant process has "two new tables to chose for each new customer".
 RandomSamplers.CrpData randCrp(double alpha, int numdata)
           
 double[] randDir(double[] aa)
          randdir(aa) generates one Dirichlet sample vector according to the parameters alpha.
 double[][] randDir(double[][] aa, int direction)
          Generate as many Dirichlet column samples as there are columns (direction = 1; randdir(A, 1)) or row samples as there are rows (direction = 2, randdir(A, 2)) in aa (aa[][]), taking the respective parameters.
 double[] randDir(double[] mean, double precision)
          randdir(aa) generates one Dirichlet sample vector according to the parameters alpha.
 double[][] randDir(double[] aa, int repetitions)
          Generate n Dirichlet samples taking parameters aa.
 double[] randDir(double a, int dimension)
          symmetric Dirichlet sample.
 double[] randDmm(double[] probs, double[][] mean, double[] precision)
          DMM sampling
 double[] randDmm(double[] probs, double[][] mean, double[] precision, int[] component)
          DMM sampling
 double[][] randDmm(int n, double[] probs, double[][] means, double[] precisions)
          DMM sampling
 double[][] randDmm(int n, double[] probs, double[][] means, double[] precisions, int[] components)
          DMM sampling
 double randGamma(double rr)
          self-contained gamma generator.
 double[] randGamma(double[] aa)
          randgamma(aa) Generates gamma samples, one for each element in aa.
 double randGamma(double shape, double scale)
          sample from gamma distribution with defined shape a and scale b:
 double randGmm(double[] probs, double[] mean, double[] sigma)
          GMM sampling
 double randGmm(double[] probs, double[] mean, double[] sigma, int[] component)
          GMM sampling
 double[] randGmm(int n, double[] probs, double[] mean, double[] sigma)
          GMM sampling
 double[] randGmm(int n, double[] probs, double[] mean, double[] sigma, int[] components)
          GMM sampling
 int randMult(double[] pp)
          Creates one multinomial sample given the parameter vector pp.
 int[] randMult(double[] pp, int repetitions)
          Multiply sample a multinomial distribution and return a vector with all samples.
 int randMultDirect(double[] pp)
          Creates one multinomial sample given the parameter vector pp.
 int randMultDirect(double[] pp, double randnum)
          Like randMultDirect, but the random number is given as argument.
 int[] randMultFreqs(double[] pp, int repetitions)
          Multiply sample a multinomial distribution and return a vector with category frequencies.
 int randMultSimple(double[] pp)
          old version of the randMult method
 double randNorm(double mu, double sigma)
          uses same approach as java.util.Random()
 int randNumTable(double alpha, int numdata)
          randnumtable(weights,maxtable) For each entry in weights and maxtables, generates the number of tables given concentration parameter (weights) and number of data items (maxtable).
 int[] randPerm(int size)
          Random permutation of size elements (symbols '0'..
 int[] randPerm(int[] set)
          Random permutation of existing set of integers.
 int[][] randPerm(int size, int parts)
          Hierarchical random permutation.
 double[] randStick(double[] alpha, int numclass)
          randstick(alpha,numclass) Generates stick-breaking weights with concentration parameter for numclass "sticks".
 java.lang.String randString(int length, byte[] alphabet)
          create a random string of length alphanumeric characters.
 double randUniform(double numvalue)
           
 int randUniform(int numvalue)
           
 void setRand(java.util.Random rand)
           
 double[] stirling(int nn)
          [ss lmss] = stirling(nn) Gives unsigned Stirling numbers of the first kind s(nn,*) in ss.
 void testMult()
           
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

lastRand

public double lastRand
Constructor Detail

RandomSamplers

public RandomSamplers()
init random sampler using Mersenne twister with default seed.


RandomSamplers

public RandomSamplers(java.util.Random rand)
init random sampler with random number generator provided.

Parameters:
rand -
Method Detail

main

public static void main(java.lang.String[] args)

randNorm

public double randNorm(double mu,
                       double sigma)
uses same approach as java.util.Random()

Parameters:
mu -
sigma -
Returns:

randGmm

public double randGmm(double[] probs,
                      double[] mean,
                      double[] sigma)
GMM sampling

Parameters:
probs - mixture responsibilities
mean - mean vector
sigma - stddev vector
Returns:

randGmm

public double randGmm(double[] probs,
                      double[] mean,
                      double[] sigma,
                      int[] component)
GMM sampling

Parameters:
probs - mixture responsibilities
mean - mean vector
sigma - stddev vector
component - [out] component[0] is filled with the sampled component index, but can be null if not needed.
Returns:

randGmm

public double[] randGmm(int n,
                        double[] probs,
                        double[] mean,
                        double[] sigma)
GMM sampling

Parameters:
n - number of samples to take (this saves the calculation of the cumulative probabilities for successive trials)
probs - mixture responsibilities
mean - mean vector
sigma - stddev vector
Returns:

randGmm

public double[] randGmm(int n,
                        double[] probs,
                        double[] mean,
                        double[] sigma,
                        int[] components)
GMM sampling

Parameters:
n - number of samples to take (this saves the calculation of the cumulative probabilities for successive trials)
probs - mixture responsibilities
mean - mean vector
sigma - stddev vector
components - [out] n-vector is filled with the sampled component indices (ignored if null)
Returns:

randDmm

public double[] randDmm(double[] probs,
                        double[][] mean,
                        double[] precision)
DMM sampling

Parameters:
probs - mixture responsibilities
mean - mean vector of vectors
precision - precision vector
Returns:

randDmm

public double[] randDmm(double[] probs,
                        double[][] mean,
                        double[] precision,
                        int[] component)
DMM sampling

Parameters:
probs - mixture responsibilities
mean - mean vector of vectors
precision - precision vector
component - [out] sampled component of the mixture (or ignored if null)
Returns:

randDmm

public double[][] randDmm(int n,
                          double[] probs,
                          double[][] means,
                          double[] precisions)
DMM sampling

Parameters:
probs - mixture responsibilities
means - mean vector of vectors
precisions - precision vector
Returns:

randDmm

public double[][] randDmm(int n,
                          double[] probs,
                          double[][] means,
                          double[] precisions,
                          int[] components)
DMM sampling

Parameters:
n - number of trials
probs - mixture responsibilities
means - mean vector of vectors
precisions - precision vector
components - n-vector is filled with the sampled component indices (ignored if null)
Returns:

randBeta

public double randBeta(double aa,
                       double bb)
beta as two-dimensional Dirichlet

Parameters:
aa -
bb -
Returns:

randBeta

public double[] randBeta(double[] aa,
                         double[] bb)
randbeta(aa, bb) Generates beta samples, one for each element in aa/bb, and scale 1.

Parameters:
aa -

randGamma

public double randGamma(double rr)
self-contained gamma generator. Multiply result with scale parameter (or divide by rate parameter). After Teh (npbayes).

Parameters:
rr - shape parameter
Returns:

randGamma

public double[] randGamma(double[] aa)
randgamma(aa) Generates gamma samples, one for each element in aa.

Parameters:
aa -

randGamma

public double randGamma(double shape,
                        double scale)
sample from gamma distribution with defined shape a and scale b:

x ~ x^(a-1) * exp(-x/b) / ( gamma(a) * b^a )

E(x) = ab, V(x) = (ab)^2. Note that instead of the scale parameter b, often a rate parameter r = 1/b is used: E(x) = a/r, V(x) = (a/r)^2. For sampling, the following are equivalent: Gamma(a,1)*b <=> Gamma(a,b), with shape parametrisation; Gamma(a,1)/r <=> Gamma(a,r) with rate parametrisation.

Parameters:
shape -
scale -
Returns:

randPerm

public int[] randPerm(int size)
Random permutation of size elements (symbols '0'.. '[size-1]').

Parameters:
size -
Returns:

randPerm

public int[] randPerm(int[] set)
Random permutation of existing set of integers.

Parameters:
set -
Returns:

randPerm

public int[][] randPerm(int size,
                        int parts)
Hierarchical random permutation. Permutes set of items (indexed from 0 to size-1) and partitions it into a set of parts elements with random handling of the modulus size/parts. The concatenation of rows from randPerm(size, parts) is identical to randPerm(size).

Parameters:
size -
parts -
Returns:

randDir

public double[] randDir(double a,
                        int dimension)
symmetric Dirichlet sample.

Parameters:
aa -
Returns:

randDir

public double[] randDir(double[] aa)
randdir(aa) generates one Dirichlet sample vector according to the parameters alpha. ORIG: Generates Dirichlet samples, with weights given in aa. The output sums to 1 along normdim, and each such sum corresponds to one Dirichlet sample.

Parameters:
aa -
normdim -
Returns:

randDir

public double[] randDir(double[] mean,
                        double precision)
randdir(aa) generates one Dirichlet sample vector according to the parameters alpha.

Parameters:
mean - (mean_i = alpha_i / sum_j alpha_j)
precision - (precision = alpha_i / mean_i)
Returns:

randDir

public double[][] randDir(double[][] aa,
                          int direction)
Generate as many Dirichlet column samples as there are columns (direction = 1; randdir(A, 1)) or row samples as there are rows (direction = 2, randdir(A, 2)) in aa (aa[][]), taking the respective parameters. After Teh (npbayes).

Parameters:
aa -
direction - -- 2 is more efficient (row-major Java matrix structure)
Returns:

randDir

public double[][] randDir(double[] aa,
                          int repetitions)
Generate n Dirichlet samples taking parameters aa.

Parameters:
aa -
Returns:

randMultFreqs

public int[] randMultFreqs(double[] pp,
                           int repetitions)
Multiply sample a multinomial distribution and return a vector with category frequencies.

Parameters:
pp -
repetitions -
Returns:
vector of frequencies of the categories

randMult

public int[] randMult(double[] pp,
                      int repetitions)
Multiply sample a multinomial distribution and return a vector with all samples.

Parameters:
pp -
repetitions -
Returns:
vector of all samples.

randMultSimple

public int randMultSimple(double[] pp)
old version of the randMult method

Parameters:
pp -
Returns:

testMult

public void testMult()

randMult

public int randMult(double[] pp)
Creates one multinomial sample given the parameter vector pp. Each category is named after the index (0-based!) of the respective element of pp; Sometimes called categorical distribution (e.g., in BUGS). This version uses a binary search algorithm and does not require normalisation.


randMultDirect

public int randMultDirect(double[] pp)
Creates one multinomial sample given the parameter vector pp. Each category is named after the index (0-based!) of the respective element of pp; Sometimes called categorical distribution (e.g., in BUGS). This version uses a binary search algorithm and does not require normalisation. Note that the parameters used directly changed because the multinomial is cumulated to save memory and copying time.


randMultDirect

public int randMultDirect(double[] pp,
                          double randnum)
Like randMultDirect, but the random number is given as argument.


binarySearch

public int binarySearch(double[] a,
                        double p)
perform a binary search and return the first index i at which a[i] >= p. Adapted from java.util.Arrays.binarySearch.

Parameters:
a -
p -
Returns:

randBinom

public int randBinom(double N,
                     double p)
draw a binomial sample (by counting Bernoulli samples).

Parameters:
N -
p -

randBernoulli

public int randBernoulli(double p)
draw a Bernoulli sample.

Parameters:
p - success probability
Returns:
1 if sucessful, 0 otherwise

randConParam

public double randConParam(double alpha,
                           int numgroup,
                           int[] numdata,
                           int[] numtable,
                           double alphaa,
                           double alphab,
                           int numiter)
randconparam(alpha,numdata,numclass,aa,bb,numiter) Generates a sample from a concentration parameter of a HDP with gamma(aa,bb) prior, and number of classes and data items given in numdata, numclass (has to be row vectors). Uses auxiliary variable method, for numiter iterations.

Modification of Escobar and West. Works for multiple groups of data. numdata, numclass are row vectors, one element per group. After Teh (npbayes).

Parameters:
alpha - alpha
numgroup - number of components ??
numdata - number of data items per class
numtable - number of per DP
alphaa - hyperparameter (gamma shape)
alphab - hyperparameter (gamma scale)
numiter - number of iterations
Returns:

randConParam

public double randConParam(double alpha,
                           int numdata,
                           int numtopic,
                           double alphaa,
                           double alphab,
                           int numiter)
Sample the Dirichlet process concetration parameter given the topic and data counts and gamma hyperparameters alphaa and alphab. After Escobar and West (1995).

Parameters:
alpha -
numdata -
numtopic -
alphaa -
alphab -
numiter -
Returns:

randCrp

public RandomSamplers.CrpData randCrp(double alpha,
                                      int numdata)

randCrp

public RandomSamplers.CrpData randCrp(double[] alpha,
                                      int numdata)
[cc numclass] = randcrp(alpha,numdata) Generates a partition of numdata items with concentration parameter alpha, which can be an array, in which case the Chinese restaurant process has "two new tables to chose for each new customer". cc is sequence of indicator variables denoting which class each data item is in ("on which table each customer sits"), and numclass is the generated number of classes. After Teh (npbayes).

Parameters:
alpha -
numdata -
Returns:

randNumTable

public int randNumTable(double alpha,
                        int numdata)
randnumtable(weights,maxtable) For each entry in weights and maxtables, generates the number of tables given concentration parameter (weights) and number of data items (maxtable). From npbayes-2.1. enumClass seems to be the expected value of randNumTable. After Teh (npbayes).

Parameters:
weights -
maxtable -
Returns:

randStick

public double[] randStick(double[] alpha,
                          int numclass)
randstick(alpha,numclass) Generates stick-breaking weights with concentration parameter for numclass "sticks". XXX: untested. After Teh (npbayes).

Parameters:
alpha -
numclass -
Returns:

enumClass

public double enumClass(double alpha,
                        int numdata)
enumclass(alpha,numdata) The expected number of tables in a CRP with concentration parameter alpha and numdata items. After Teh (npbayes).

Parameters:
alpha -
numdata -
Returns:

randString

public java.lang.String randString(int length,
                                   byte[] alphabet)
create a random string of length alphanumeric characters.

Parameters:
length - of output
alphabet - alphabet to be used or null
Returns:

stirling

public double[] stirling(int nn)
[ss lmss] = stirling(nn) Gives unsigned Stirling numbers of the first kind s(nn,*) in ss. ss(i) = s(nn,i-1). ss is normalized so that maximum value is 1, and the log of normalization is given in lmss ( variable). After Teh (npbayes).

Parameters:
nn -
Returns:

randUniform

public double randUniform(double numvalue)
Parameters:
numclass -
Returns:

randUniform

public int randUniform(int numvalue)
Parameters:
numclass -
Returns:

getRand

public final java.util.Random getRand()

setRand

public final void setRand(java.util.Random rand)