org.knowceans.sandbox.ilda
Class IldaGibbsSampler

java.lang.Object
  extended by org.knowceans.dirichlet.lda.LdaGibbsSampler
      extended by org.knowceans.sandbox.ilda.IldaGibbsSampler
All Implemented Interfaces:
java.io.Serializable

public class IldaGibbsSampler
extends LdaGibbsSampler
implements java.io.Serializable

Gibbs sampler for estimating the best assignments of topics for words and documents in a corpus. The algorithm based on "Parameter estimation for test analysis" (2005), http://www.arbylon.net/publications/text-est_iv.pdf, which gives a more detailed derivation of a Gibbs sampler for the LDA model than Tom Griffiths' white paper "Gibbs sampling in the generative model of Latent Dirichlet Allocation" (2002) and extends it to the infinite limit on K according to Neal's paper "Monte-Carlo sampling methods for the Dirichlet process".

Author:
heinrich
See Also:
Serialized Form

Field Summary
private  int growstep
          array grow step
private  int[] ndunrep
          nubmer of unrepresented words in a document
private  int nwsumunrep
          total number of unrepresented words
private  int[] nwunrep
          number of times a term is unrepresented by current topics
private static long serialVersionUID
           
 
Fields inherited from class org.knowceans.dirichlet.lda.LdaGibbsSampler
backupIteration, conf, dispcol, numstats, phisum, rand, state, thetasum
 
Constructor Summary
IldaGibbsSampler(int[][] documents, int V)
          Initialise the Gibbs sampler with data.
 
Method Summary
private  void addComponent()
          handle size of componentwise structures.
private  void gibbs()
          Main method for gibbs sampling
static void main(java.lang.String[] args)
           
private  void removeComponent(int j)
          removes one component from the model
(package private)  double sampleAlpha()
          sample alpha from a Gam(1,1) distribution using Escobar & West's method
protected  int sampleLdaFullConditional(int m, int n)
          Sample a topic z_i from the full conditional distribution: p(z_i = j | z_-i, w) = (n_-i,j(w_i) + beta)/(n_-i,j(.) + W * beta) * (n_-i,j(d_i) + alpha)/(n_-i,.
 
Methods inherited from class org.knowceans.dirichlet.lda.LdaGibbsSampler
getPhi, getState, getTheta, gibbs, gibbsHeap, gibbsHeap, initialState, load, output, run, sampleCorpus, sampleLdaFullConditional, save, saveState, updateParams, updatePhi, updateTheta, writeParameters
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

serialVersionUID

private static final long serialVersionUID
See Also:
Constant Field Values

growstep

private int growstep
array grow step


nwunrep

private int[] nwunrep
number of times a term is unrepresented by current topics


ndunrep

private int[] ndunrep
nubmer of unrepresented words in a document


nwsumunrep

private int nwsumunrep
total number of unrepresented words

Constructor Detail

IldaGibbsSampler

public IldaGibbsSampler(int[][] documents,
                        int V)
Initialise the Gibbs sampler with data.

Parameters:
V - vocabulary size
data -
Method Detail

addComponent

private void addComponent()
handle size of componentwise structures.

Note: We use arrays for components more readable syntax and possibly speed loss during all cast operations when accessing a Vector. Therefore all loops over components should explicitly use k, not, e.g., mu.length. The problem with this approach is that it is hard to remove unoccupied classes


removeComponent

private void removeComponent(int j)
removes one component from the model

Parameters:
j -

gibbs

private void gibbs()
Main method for gibbs sampling

Overrides:
gibbs in class LdaGibbsSampler

sampleLdaFullConditional

protected int sampleLdaFullConditional(int m,
                                       int n)
Sample a topic z_i from the full conditional distribution: p(z_i = j | z_-i, w) = (n_-i,j(w_i) + beta)/(n_-i,j(.) + W * beta) * (n_-i,j(d_i) + alpha)/(n_-i,.(d_i) + K * alpha)

Parameters:
m - document
n - word

sampleAlpha

double sampleAlpha()
sample alpha from a Gam(1,1) distribution using Escobar & West's method

Returns:

main

public static void main(java.lang.String[] args)