|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Objectorg.knowceans.dirichlet.lda.LdaGibbsSampler
public class LdaGibbsSampler
Gibbs sampler for estimating the best assignments of topics for words and documents in a corpus. The algorithm is introduced in Tom Griffiths' paper "Gibbs sampling in the generative model of Latent Dirichlet Allocation" (2002).
TODO: clean up constructor mess, so invalid inits become more difficult...
| Field Summary | |
|---|---|
int |
backupIteration
iteration in the last backup |
protected ExtLdaConfiguration |
conf
Configuration object with the current parameters. |
int |
dispcol
|
protected int |
numstats
size of statistics |
protected double[][] |
phisum
cumulative statistics of phi |
protected java.util.Random |
rand
Random generator |
private static long |
serialVersionUID
|
protected LdaMarkovState |
state
State variables of the Lda gibbs sampler. |
protected double[][] |
thetasum
cumulative statistics of theta |
| Constructor Summary | |
|---|---|
protected |
LdaGibbsSampler()
For subclasses who know what they do... |
|
LdaGibbsSampler(int[][] documents,
int V,
double alpha,
double beta,
int K,
int iterations)
Initialise the Gibbs sampler with data and standard values. |
|
LdaGibbsSampler(int[][] documents,
int V,
ExtLdaConfiguration conf)
Initialise the sampler with the documents and the configuration. |
|
LdaGibbsSampler(int[][] documents,
int V,
ExtLdaConfiguration conf,
java.util.Random rand)
Initialise the sampler with the documents and the configuration. |
|
LdaGibbsSampler(ITermCorpus corpus,
ExtLdaConfiguration conf)
Initialise the corpus with |
|
LdaGibbsSampler(ITermCorpus corpus,
ExtLdaConfiguration conf,
java.util.Random rand)
Initialise the corpus with |
|
LdaGibbsSampler(ITermCorpus corpus,
LdaMarkovState state,
ExtLdaConfiguration conf,
java.util.Random rand)
Initialise the sampler with an existing state. |
protected |
LdaGibbsSampler(LdaMarkovState state,
ExtLdaConfiguration conf,
java.util.Random rand)
Initialise the sampler with an existing state. |
| Method Summary | |
|---|---|
double[][] |
getPhi()
Retrieve estimated topic--word associations. |
LdaMarkovState |
getState()
Get the current state of the markov chain. |
double[][] |
getTheta()
Retrieve estimated document--topic associations. |
protected void |
gibbs()
Main method: Select initial state ? |
int[][] |
gibbs(int[][] w,
int V,
int[][] z,
int K,
double alpha,
double beta,
int iter)
Native implementation of the Gibbs sampling procedure. |
int[][] |
gibbsHeap(int[][] w,
int[][] z,
int[][] nw,
int[] nwsum,
int[][] nd,
int[] ndsum,
double alpha,
double beta,
int iter)
Native gibbs sampling on the jvm heap. |
void |
gibbsHeap(LdaMarkovState s,
ExtLdaConfiguration c)
Native gibbs sampling on the jvm heap |
protected void |
initialState()
Initialisation: Random assignments with equal probabilities |
static java.lang.Object |
load(java.lang.String filename)
read object from the stream |
static void |
main(java.lang.String[] args)
|
void |
output(int i)
Handle output during sampling |
void |
run()
Run the sampler after initialisation. |
protected void |
sampleCorpus(LdaMarkovState s)
Sample once through the corpus and update the corresponding state. |
protected int |
sampleLdaFullConditional(LdaMarkovState s,
int m,
int n)
Sample a topic z_i from the full conditional distribution: p(z_i = j | z_-i, w) = (n_-i,j(w_i) + beta)/(n_-i,j(.) + W * beta) * (n_-i,j(d_i) + alpha)/(n_-i,. |
void |
save(java.lang.String filename)
Object stream only for testing. |
void |
saveState(java.lang.String file)
Saves the current state of the markov chain and the parameters to a file. |
protected void |
updateParams()
Add to the statistics the values of theta and phi for the current state. |
protected void |
updatePhi()
Update the topic--term association. |
protected void |
updateTheta()
Update the document--topic associations. |
protected void |
writeParameters(java.lang.String file,
org.knowceans.util.Arguments a,
ITermCorpus corpus)
write statistics of the current run to a text file for later review |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Field Detail |
|---|
private static final long serialVersionUID
protected ExtLdaConfiguration conf
protected LdaMarkovState state
protected double[][] thetasum
protected double[][] phisum
public int backupIteration
protected int numstats
public int dispcol
protected java.util.Random rand
| Constructor Detail |
|---|
protected LdaGibbsSampler()
public LdaGibbsSampler(int[][] documents,
int V,
double alpha,
double beta,
int K,
int iterations)
documents - V - alpha - beta - K - iterations -
public LdaGibbsSampler(int[][] documents,
int V,
ExtLdaConfiguration conf)
documents - V - conf -
public LdaGibbsSampler(int[][] documents,
int V,
ExtLdaConfiguration conf,
java.util.Random rand)
documents - V - conf - rand -
public LdaGibbsSampler(ITermCorpus corpus,
ExtLdaConfiguration conf)
corpus - conf -
public LdaGibbsSampler(ITermCorpus corpus,
ExtLdaConfiguration conf,
java.util.Random rand)
corpus - conf - rand -
protected LdaGibbsSampler(LdaMarkovState state,
ExtLdaConfiguration conf,
java.util.Random rand)
corpus - conf - rand -
public LdaGibbsSampler(ITermCorpus corpus,
LdaMarkovState state,
ExtLdaConfiguration conf,
java.util.Random rand)
corpus - state - conf - rand - | Method Detail |
|---|
protected void initialState()
public void run()
protected void gibbs()
public int[][] gibbs(int[][] w,
int V,
int[][] z,
int K,
double alpha,
double beta,
int iter)
w - wordsV - vocabulary sizez - topic associationsK - topic countalpha - beta - iter - number of iterations
public int[][] gibbsHeap(int[][] w,
int[][] z,
int[][] nw,
int[] nwsum,
int[][] nd,
int[] ndsum,
double alpha,
double beta,
int iter)
w - [in] wordsz - [in/out] topic associationsnw - [in/out] topic-word countsnwsum - [in/out] summed topic-word counts (total words per topic)nd - [in/out] document-topic counts (total words per document)ndsum - [in] document lengthsalpha - beta - iter -
public void gibbsHeap(LdaMarkovState s,
ExtLdaConfiguration c)
s - [in/out] statec - [in] configurationpublic void output(int i)
i - public void saveState(java.lang.String file)
file - protected void sampleCorpus(LdaMarkovState s)
s -
protected int sampleLdaFullConditional(LdaMarkovState s,
int m,
int n)
m - documentn - wordprotected void updateParams()
protected void updateTheta()
protected void updatePhi()
public double[][] getTheta()
public double[][] getPhi()
public static void main(java.lang.String[] args)
public void save(java.lang.String filename)
public static java.lang.Object load(java.lang.String filename)
filename -
protected void writeParameters(java.lang.String file,
org.knowceans.util.Arguments a,
ITermCorpus corpus)
file - a - Arguments objectpublic LdaMarkovState getState()
|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||