|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object org.knowceans.dirichlet.lda.LdaGibbsSampler org.knowceans.sandbox.ilda.IldaGibbsSampler
public class IldaGibbsSampler
Gibbs sampler for estimating the best assignments of topics for words and documents in a corpus. The algorithm based on "Parameter estimation for test analysis" (2005), http://www.arbylon.net/publications/text-est_iv.pdf, which gives a more detailed derivation of a Gibbs sampler for the LDA model than Tom Griffiths' white paper "Gibbs sampling in the generative model of Latent Dirichlet Allocation" (2002) and extends it to the infinite limit on K according to Neal's paper "Monte-Carlo sampling methods for the Dirichlet process".
Field Summary | |
---|---|
private int |
growstep
array grow step |
private int[] |
ndunrep
nubmer of unrepresented words in a document |
private int |
nwsumunrep
total number of unrepresented words |
private int[] |
nwunrep
number of times a term is unrepresented by current topics |
private static long |
serialVersionUID
|
Fields inherited from class org.knowceans.dirichlet.lda.LdaGibbsSampler |
---|
backupIteration, conf, dispcol, numstats, phisum, rand, state, thetasum |
Constructor Summary | |
---|---|
IldaGibbsSampler(int[][] documents,
int V)
Initialise the Gibbs sampler with data. |
Method Summary | |
---|---|
private void |
addComponent()
handle size of componentwise structures. |
private void |
gibbs()
Main method for gibbs sampling |
static void |
main(java.lang.String[] args)
|
private void |
removeComponent(int j)
removes one component from the model |
(package private) double |
sampleAlpha()
sample alpha from a Gam(1,1) distribution using Escobar & West's method |
protected int |
sampleLdaFullConditional(int m,
int n)
Sample a topic z_i from the full conditional distribution: p(z_i = j | z_-i, w) = (n_-i,j(w_i) + beta)/(n_-i,j(.) + W * beta) * (n_-i,j(d_i) + alpha)/(n_-i,. |
Methods inherited from class org.knowceans.dirichlet.lda.LdaGibbsSampler |
---|
getPhi, getState, getTheta, gibbs, gibbsHeap, gibbsHeap, initialState, load, output, run, sampleCorpus, sampleLdaFullConditional, save, saveState, updateParams, updatePhi, updateTheta, writeParameters |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
private static final long serialVersionUID
private int growstep
private int[] nwunrep
private int[] ndunrep
private int nwsumunrep
Constructor Detail |
---|
public IldaGibbsSampler(int[][] documents, int V)
V
- vocabulary sizedata
- Method Detail |
---|
private void addComponent()
Note: We use arrays for components more readable syntax and possibly speed loss during all cast operations when accessing a Vector. Therefore all loops over components should explicitly use k, not, e.g., mu.length. The problem with this approach is that it is hard to remove unoccupied classes
private void removeComponent(int j)
j
- private void gibbs()
gibbs
in class LdaGibbsSampler
protected int sampleLdaFullConditional(int m, int n)
m
- documentn
- worddouble sampleAlpha()
public static void main(java.lang.String[] args)
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |