|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object org.knowceans.dirichlet.lda.LdaGibbsSampler
public class LdaGibbsSampler
Gibbs sampler for estimating the best assignments of topics for words and documents in a corpus. The algorithm is introduced in Tom Griffiths' paper "Gibbs sampling in the generative model of Latent Dirichlet Allocation" (2002).
TODO: clean up constructor mess, so invalid inits become more difficult...
Field Summary | |
---|---|
int |
backupIteration
iteration in the last backup |
protected ExtLdaConfiguration |
conf
Configuration object with the current parameters. |
int |
dispcol
|
protected int |
numstats
size of statistics |
protected double[][] |
phisum
cumulative statistics of phi |
protected java.util.Random |
rand
Random generator |
private static long |
serialVersionUID
|
protected LdaMarkovState |
state
State variables of the Lda gibbs sampler. |
protected double[][] |
thetasum
cumulative statistics of theta |
Constructor Summary | |
---|---|
protected |
LdaGibbsSampler()
For subclasses who know what they do... |
|
LdaGibbsSampler(int[][] documents,
int V,
double alpha,
double beta,
int K,
int iterations)
Initialise the Gibbs sampler with data and standard values. |
|
LdaGibbsSampler(int[][] documents,
int V,
ExtLdaConfiguration conf)
Initialise the sampler with the documents and the configuration. |
|
LdaGibbsSampler(int[][] documents,
int V,
ExtLdaConfiguration conf,
java.util.Random rand)
Initialise the sampler with the documents and the configuration. |
|
LdaGibbsSampler(ITermCorpus corpus,
ExtLdaConfiguration conf)
Initialise the corpus with |
|
LdaGibbsSampler(ITermCorpus corpus,
ExtLdaConfiguration conf,
java.util.Random rand)
Initialise the corpus with |
|
LdaGibbsSampler(ITermCorpus corpus,
LdaMarkovState state,
ExtLdaConfiguration conf,
java.util.Random rand)
Initialise the sampler with an existing state. |
protected |
LdaGibbsSampler(LdaMarkovState state,
ExtLdaConfiguration conf,
java.util.Random rand)
Initialise the sampler with an existing state. |
Method Summary | |
---|---|
double[][] |
getPhi()
Retrieve estimated topic--word associations. |
LdaMarkovState |
getState()
Get the current state of the markov chain. |
double[][] |
getTheta()
Retrieve estimated document--topic associations. |
protected void |
gibbs()
Main method: Select initial state ? |
int[][] |
gibbs(int[][] w,
int V,
int[][] z,
int K,
double alpha,
double beta,
int iter)
Native implementation of the Gibbs sampling procedure. |
int[][] |
gibbsHeap(int[][] w,
int[][] z,
int[][] nw,
int[] nwsum,
int[][] nd,
int[] ndsum,
double alpha,
double beta,
int iter)
Native gibbs sampling on the jvm heap. |
void |
gibbsHeap(LdaMarkovState s,
ExtLdaConfiguration c)
Native gibbs sampling on the jvm heap |
protected void |
initialState()
Initialisation: Random assignments with equal probabilities |
static java.lang.Object |
load(java.lang.String filename)
read object from the stream |
static void |
main(java.lang.String[] args)
|
void |
output(int i)
Handle output during sampling |
void |
run()
Run the sampler after initialisation. |
protected void |
sampleCorpus(LdaMarkovState s)
Sample once through the corpus and update the corresponding state. |
protected int |
sampleLdaFullConditional(LdaMarkovState s,
int m,
int n)
Sample a topic z_i from the full conditional distribution: p(z_i = j | z_-i, w) = (n_-i,j(w_i) + beta)/(n_-i,j(.) + W * beta) * (n_-i,j(d_i) + alpha)/(n_-i,. |
void |
save(java.lang.String filename)
Object stream only for testing. |
void |
saveState(java.lang.String file)
Saves the current state of the markov chain and the parameters to a file. |
protected void |
updateParams()
Add to the statistics the values of theta and phi for the current state. |
protected void |
updatePhi()
Update the topic--term association. |
protected void |
updateTheta()
Update the document--topic associations. |
protected void |
writeParameters(java.lang.String file,
org.knowceans.util.Arguments a,
ITermCorpus corpus)
write statistics of the current run to a text file for later review |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
private static final long serialVersionUID
protected ExtLdaConfiguration conf
protected LdaMarkovState state
protected double[][] thetasum
protected double[][] phisum
public int backupIteration
protected int numstats
public int dispcol
protected java.util.Random rand
Constructor Detail |
---|
protected LdaGibbsSampler()
public LdaGibbsSampler(int[][] documents, int V, double alpha, double beta, int K, int iterations)
documents
- V
- alpha
- beta
- K
- iterations
- public LdaGibbsSampler(int[][] documents, int V, ExtLdaConfiguration conf)
documents
- V
- conf
- public LdaGibbsSampler(int[][] documents, int V, ExtLdaConfiguration conf, java.util.Random rand)
documents
- V
- conf
- rand
- public LdaGibbsSampler(ITermCorpus corpus, ExtLdaConfiguration conf)
corpus
- conf
- public LdaGibbsSampler(ITermCorpus corpus, ExtLdaConfiguration conf, java.util.Random rand)
corpus
- conf
- rand
- protected LdaGibbsSampler(LdaMarkovState state, ExtLdaConfiguration conf, java.util.Random rand)
corpus
- conf
- rand
- public LdaGibbsSampler(ITermCorpus corpus, LdaMarkovState state, ExtLdaConfiguration conf, java.util.Random rand)
corpus
- state
- conf
- rand
- Method Detail |
---|
protected void initialState()
public void run()
protected void gibbs()
public int[][] gibbs(int[][] w, int V, int[][] z, int K, double alpha, double beta, int iter)
w
- wordsV
- vocabulary sizez
- topic associationsK
- topic countalpha
- beta
- iter
- number of iterations
public int[][] gibbsHeap(int[][] w, int[][] z, int[][] nw, int[] nwsum, int[][] nd, int[] ndsum, double alpha, double beta, int iter)
w
- [in] wordsz
- [in/out] topic associationsnw
- [in/out] topic-word countsnwsum
- [in/out] summed topic-word counts (total words per topic)nd
- [in/out] document-topic counts (total words per document)ndsum
- [in] document lengthsalpha
- beta
- iter
-
public void gibbsHeap(LdaMarkovState s, ExtLdaConfiguration c)
s
- [in/out] statec
- [in] configurationpublic void output(int i)
i
- public void saveState(java.lang.String file)
file
- protected void sampleCorpus(LdaMarkovState s)
s
- protected int sampleLdaFullConditional(LdaMarkovState s, int m, int n)
m
- documentn
- wordprotected void updateParams()
protected void updateTheta()
protected void updatePhi()
public double[][] getTheta()
public double[][] getPhi()
public static void main(java.lang.String[] args)
public void save(java.lang.String filename)
public static java.lang.Object load(java.lang.String filename)
filename
-
protected void writeParameters(java.lang.String file, org.knowceans.util.Arguments a, ITermCorpus corpus)
file
- a
- Arguments objectpublic LdaMarkovState getState()
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |